Abstract
Apache Cocoon [1] and Forrest [2] are exciting applications of XML and XSLT in an internet environment. Apache Cocoon is a framework which binds URIs to SAX [3]-driven content-generating XML pipelines, while Forrest is a blueprint Cocoon application specifically aimed at software project documentation, making extensive use of XSLT for the creation of skinnable websites. This paper explains some basic usages of each project, focusing especially on the way Cocoon and Forrest make practical use of XML and XML-related standards.
Keywords
Table of Contents
Many web application frameworks nowadays appear to have a quite orthogonal view on where one should make use XML and XSLT in the web tier: while admittedly some of them offer access to an XSLT transformation process or provide the web developer with a framework for generating or accessing XML documents, the use of XML as an intra-component communication language or as a means to easily target multiple devices often comes as an afterthought. Hence the huge burden users of such frameworks will go through when being required to create a multi-device, multiple-audience and multilingual website. Also, many web application frameworks customarily expose implementation details to the end-user by means of incomprehensible, or even worse, attack-prone URIs. This is the case with many classic serverside-scripting based applications like JSP and ASP, but isn't necessarily cured by the newer ones.
Cocoon is radically different, because it separates the URI namespace management level away from the response-generating level. There is no tight relationship between the webserver's filesystem organization and the visible URI namespace, which means URIs can form a binding interface contract to the end-user and, given proper care and attention, can survive rearrangement or refactoring of the application.
On the response-generating aspect, Cocoon sees three other concerns which should be separated:
Content, basically the information which is displayed on a web page, possibly stored in a collection of XML documents
Style, the look & feel of a website, often implemented using XSLT transformations towards XHTML or other delivery formats
Logic, which is used to generate or alter the Content being presented to the user.
In many other application frameworks, these three concerns are mixed into one physical (file) entity, i.e. a JSP containing both serverside logic, static content and layout-oriented HTML markup. Cocoon makes it possible to manage these concerns separately. Still, these concerns need to be working in tandem to create a web application. Bringing them together, we can easily show how they interrelate:
While the main runtime environment of Cocoon is a Java Servlet container, Cocoon has been architected context-independent so that it also operates off-line from the command line, offering an easy way to generate a static rendition of a website. Also, some people are using Cocoon in a middleware context as a sophisticated XML processing framework.
Cocoon has been implemented using the Apache Avalon server component framework, which offers sophisticated configuration and component lifecycle management. Central to its design is the use of XML pipelines capable of generating and manipulating XML documents.
The Cocoon project, which is currently chaired by Stefano Mazzocchi [4], is backed by a thriving development community with a diverse group of about 30 active committers, and many outspoken (and demanding) users, and has recently been promoted into a top-level Apache project, putting it on par with other well-known Apache projects like the HTTP Server, Ant, Jakarta (home of Tomcat amongst many others) and PHP.
As stated previously, Cocoon binds URIs to a set of content-generating pipelines, using a centralized configuration mechanism called the sitemap . The Cocoon sitemap is a declarative XML document describing a set of pipelines, which will be invoked upon a URI pattern match. A pipeline always consists of three main components, which are chained together by passing SAX events across the pipeline:
A Generator, which produces SAX events, often by reading an XML document from disk, but many other Generators are also available, extracting XML from various data sources.
One or more Transformers, which operate on the SAX event stream, transforming it into some other grammar. The XSLT Transformer is of course the more popular Transformer inside Cocoon, but plenty of others exist, e.g. for doing database lookups, XInclude operations, and many other transformation tasks.
A Serializer, which sinks the SAX event stream into an output stream suitable for sending back to the client browser (in a webserving context) or storing the pipeline result on disk.
These components are glued together using SAX ContentHandlers, providing a comprehensible and standardized inter-component interface. Many other components exist inside the Cocoon sitemap, but let's focus on the basic pattern matching stuff first.
A sitemap has a striking resemblance to an XSLT stylesheet, which basically consists of a set of Templates waiting to be invoked by a stylesheet engine. Templates are then composed of output elements and XSLT instructions. Similarly, a Cocoon sitemap consists of a set of Matchers, which refer to the aforementioned components. You can also compare the Cocoon sitemap with a telephone switchboard operator, forwarding requests to the correct correspondent (i.e. pipeline):
Let's look at a simple example:
<map:match pattern="*.html"> <map:generate src="docs/{1}.xml" /> <map:transform src="style/xml2html.xsl" /> <map:serialize /> </map:match>
Basically, when a request hits Cocoon and matches the pattern defined in the pattern attribute of the match element, the pipeline contained inside that match element is triggered. Let's see what happens if a resource is requested with the URI string "introduction.html":
The URI matches successfully against our (only) pipeline, since it follows the pattern of "*.html". The * serves as a wildcard character, and will match one or more characters, slashes notwithstanding, in our URI string. The introduction part of the URI string is stored for later re-use. Now, the components inside our matcher will prepare an answer for this URI request.
A Generator (let's assume it's a file generator) will read an XML document named introduction.xml, stored in the docs directory, and insert the SAX events representing this document into the pipeline. See how the {1} parameter is substituted with the part of the URI string which has been matched by the wildcard in our matcher pattern?
These SAX events are now fed into the next component, an XSLT transformer, which will apply an XSLT transformation upon the document being passed across the pipeline. The name of the stylesheet which will be used is xml2html.xsl, stored in the style directory.
The result of this XSLT transformation process, which presumably will be an (X)HTML document, is now passed on to the final component in our pipeline, the serializer. The serializer will translate the sequence of SAX events into an OutputStream, which is then passed back to the browser.
Of course, you can have many matchers in one sitemap, each of them binding a specific URI pattern to its corresponding response-generating pipeline. Let's see how easy it is to set up a pipeline producing PDF documents instead:
<map:match pattern="*.pdf">
<map:generate src="docs/{1}.xml" />
<map:transform src="style/xml2fo.xsl" />
<map:serialize type="fo2pdf" />
</map:match>This matcher will match URIs following the *.pdf pattern, using the same XML documents stored as docs/*.xml, but now will pipe these through an XSLT transformation creating an XSL-FO representation of that document, which is then fed into an FO2PDF serializer, based on another Apache project named FOP [5]. FOP is capable of generating PDF documents from XSL-FO input. With these two simple pipelines and two XSLT stylesheets, you can already set up a website producing both HTML and PDF renditions of your XML document collection: that's all there is to!
To offer even more flexibility in defining your URI namespace, matchers can also be nested:
<map:match pattern="news/**">
<map:match pattern="news/1999/**">
<map:generate src="oldcontent/news/{1}.html" type="html" />
<map:transform src="styles/old2new.xsl" />
</map:match>
<map:match pattern="news/20*/**">
<map:generate src="docs/news/20{1}/{2}.xml" />
</map:match>
<map:transform src="news2html.xsl" />
<map:serialize />
</map:match>This example introduces a new generator, the HTML generator, which makes use of JTidy [6] to read a possibly mall-formed HTML document and convert it to XHTML, so that further XSLT transformations can be applied upon it. Also, we are making use of another wildcard symbol in the matching pattern: **. A double asterisk matches everything, path separators (slashes) included, while the single asterisk matches up until the next /.
Depending on the incoming URI request, this sitemap fragment goes off and fetches old HTML-stored content which is cleaned up and XHTML-ified by JTidy to be translated to the new news format using the old2new stylesheet, or directly reads XML documents according to the new news format. In either case, the final XSLT transformer (news2html) is confronted with that new news format which it happily transforms into HTML for delivery. The exercise to add some more nested matching to differentiate between HTML and PDF rendition is left to the attentive reader.
As we have seen above, the use of XSLT and XML pipelining is often central to a Cocoon-based application. The Cocoon internals are highly optimized to be very efficient in this regard. Since the communication between components is based on SAX rather than DOM, and components can be designed to stream SAX events to the next component before the entire document has been processed, Cocoon performs and scales well, even with larger documents. The major caveat however is the XSLT transformer component itself, which is Xalan [7] by default, and which will construct a DTM [8] tree of the full document before it starts the XSLT transformation. So the possible gain of using event-based pipelines can be considerably reduced because of the XSLT engine buffering these events into a tree-like table structure before starting the transformation. Also, some XSLT engines perform better for specific transformation cases than others: Xalan and Saxon are competing head-to-head on this. Luckily, Cocoon makes use of the Transformation API for XML (TRAX) API [9] for invoking an XSLT stylesheet engine, which means XSLT engines are pluggable and one can make use of Saxon or any other (Java) XSLT engines which implement the TRAX API.
It is also possible to make use of XSLTC inside Cocoon by setting the right configuration switches in the sitemap. XSLTC precompiles XSLT stylesheets into Translets, which are specialized transformation classes that can very efficiently do one type of XSLT transformation. Translets can be regarded as XSLT executables: precompiled binaries which do offer greater speed at the cost of the initial byte-compilation time. In comparison to generic purpose XSLT engines like Xalan and Saxon, you can experience considerable performance gains, yet the XSLTC implementation is known for not being fully standard-compliant.
Even more interestingly, since Joost [10], a Java Streaming Transformations for XML (STX)[11] implementation also provides the TRAX API, one can easily make use of STX when willing to give up portability and the use of standards for the huge performance benefits promised by these streaming XML transformations (since they don't need to build an entire document tree before transformation).
As a historical note, it is nice to know that the previous version of Cocoon, which has not been developed upon anymore for quite some time, depended on the W3C xml-stylesheet Recommendation to bind XSLT stylesheets to XML documents. One of the big frustrations, and a major motivation for the sitemap design, was the problem this caused when people wanted to change this association for an entire document collection: using the <?xml-stylesheet?> processing instruction, they were forced to edit each file individually. With the sitemap, changing a stylesheet for a entire collection of URIs is done in one central location. Of course, when people want to off-load processing cycles to the web client, it is still possible to deliver XML instead of HTML, and inject the appropriate xml-stylesheet PI.
Related to XSLT, Cocoon also offers some interesting alternatives for a common XSLT usage pattern in web application development: aggregating content from different sources. While the XSLT document() function allows for this, its use inside Cocoon can possibly be confusing to stylesheet editors, as it pulls content inside the pipeline which has not been generated nor inserted using a transformation: finding out where the injected content comes from can be tedious within a sequence of complicated stylesheets. Also, the document() function doesn't make use of the sophisticated caching available in Cocoon (yes, Cocoon is able to cache pipeline results, even partial or intermediate steps, depending on the expiry time or cacheability of (sequences of) individual components). Let's look at some possible ways doing Cocoon-based XML aggregation.
A Cocoon Aggregator can be compared with a Generator, merging multiple sources or pipelines into one. Its use is very simple:
<map:match pattern="newsline/**">
<map:aggregate element="page">
<map:part src="cocoon:/header" element="head" />
<map:part src="cocoon:/news/{1}" element="body" />
<map:part src="cocoon:/footer" element="foot" />
</map:aggregate>
<map:transform src="style/newsline.xsl" />
<map:serialize />
</map:match>The cocoon:/ pseudo URI protocol is used to call other pipelines, resembling the concept of functions in other programming languages. If the respective header and footer pipelines generate something like:
header: <title>The Foobar website</title> footer: <para>(c) 2003 Foobar Inc.</para>
and the news pipeline produces the following XML snippet:
<doc> <newstitle>The latest news</newstitle> <para>No news today.</para> </doc>
the aggregator combines and wraps the individual fragments as follows:
<page> <head><title>The Foobar website</title></head> <body> <doc> <newstitle>The latest news</newstitle> <para>No news today.</para> </doc> </body> <foot><para>(c) 2003 Foobar Inc.</para></foot> </page>
The newsline.xsl stylesheet is then applied upon this resulting, aggregated XML document, creating any HTML markup you want.
Of course, this approach enables only sequential concatenation of XML fragments, while as the document() function allows you to inject content anywhere in the document tree. Coincidentally, when working on the typical HTML table-based lay-out, you'll need some fairly substantial reshuffling afterwards. Since over-usage of XSLT for this kind of operations is a typical abuse of Cocoon, there exists another, smarter approach involving one of the oldest XInclude(-like) implementations up-to-date.
While Cocoon doesn't yet fully implement the XInclude Recommendation, it offers a choice of very similar implementations which come in handy if you want to keep the reduce the complexity of your XSLT transformations. The basic idea is to have several pipelines generating XHTML fragments, which are then injected in precise tree locations of the resulting HTML page. This also reduces the complexity of the individual XSLT transformations and caters for modular stylesheets.
The Cocoon Include transformers act upon special placeholder elements in the SAX stream passed through them. These elements point to other pipelines or static documents, which are then used to replace the placeholder elements.
<map:match pattern="newsline/**.html">
<map:generate src="pagegrid.xhtml" />
<map:transform src="stylesheets/injectdocumenturi.xsl">
<map:parameter name="document-uri" value="{1}" />
</map:transform>
<map:transform type="cinclude"/>
<map:serialize />
</map:match>
<map:match pattern="news/**">
<map:generate src="newsfiles/{1}.xml" />
<map:transform src="stylesheets/news2xhtmlfragment.xsl" />
<map:serialize />
</map:match>The newsline/**.html matcher reads an XHTML document from disk (XHTML being an XML grammar, it is able to do so without any JTidy-cleaning). pagegrid.xhtml could look something like this:
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/TR/xhtml1"
xmlns:cinclude="http://apache.org/cocoon/include/1.0">
<head>
<title>The Foobar website</title>
<link rel="stylesheet" href="some_stylesheet.css" />
</head>
<body>
<cinclude:include />
</body>
</html>The injectdocumenturi.xsl stylesheet simply adds an attribute to the cinclude:include element, and identity-transforms the other nodes. The document-uri XSLT parameter is initialized from within the sitemap with the news document URI to be injected:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:cinclude="http://apache.org/cocoon/include/1.0" >
<xsl:param name="document-uri" />
<xsl:template match="cinclude:include">
<xsl:copy>
<xsl:attribute name="src">
<xsl:value-of select="concat('cocoon:/news/', $document-uri)" />
</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="node() | @*">
<!-- identity transformation template -->
<xsl:copy>
<xsl:apply-templates select="@*" />
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>After this transformation, the document looks like this:
<html xmlns="http://www.w3.org/TR/xhtml1">
<head>
<title>The Foobar website</title>
<link rel="stylesheet" href="some_stylesheet.css" />
</head>
<body>
<cinclude:include src="cocoon:/news/document-uri" />
</body>
</html>When the Include Transformer encounters the special cinclude:include element, it will replace this element with the result of the news/** pipeline, which is a simple XHTML fragment created using the news2xhtmlfragment.xsl stylesheet. The result will look something like this:
<html xmlns="http://www.w3.org/TR/xhtml1">
<head>
<title>The Foobar website</title>
<link rel="stylesheet" href="some_stylesheet.css" />
</head>
<body>
<h1>Something happened</h1>
<p>We don't know what!</p>
</body>
</html>This way, the content and lay-out of that page can be neatly separated and the content can be injected into any place in the document tree. The problem however with this approach is the extra XSLT transformation used to inject the document id into the SAX stream. The less XSLT processing happens in the pipeline, the faster the overall performance will be. Below, we will see how to fix that while introducing some other Cocoon components.
One other application development method offered by Cocoon are eXtensible Server Pages (XSPs). Explaining all about XSPs would barely fit in the scope of this paper, suffice to say that XSPs can be considered more or less like an XML-reformulation of JSPs with taglibs, which are called logicsheets in Cocoon-talk. The aim of XSPs is to separate code (stored in logicsheets) from content (stored in XML documents), while offering a number of simple XML elements that can be used in pages and which are interpreted and executed by the XSP engine. An XSP translates to a Generator, just like JSPs are translated to Servlets.
The main idea behind our injectdocumenturi stylesheet was to add a src attribute to the cinclude:include element, with the value of the news document-uri captured by the sitemap (the trailing part of the URI). Let's look at our sitemap when using XSPs:
<map:match pattern="newsline/**.html">
<map:generate src="news.xsp" type="serverpages">
<map:parameter name="document-uri" value="{1}" />
</map:generate>
<map:transform type="cinclude"/>
<map:serialize />
</map:match>
<map:match pattern="news/**">
<map:generate src="newsfiles/{1}.xml" />
<map:transform src="stylesheets/news2xhtmlfragment.xsl" />
<map:serialize />
</map:match>We replaced our pagegrid.xhtml document by news.xsp, which is very similar except for the use of some XSP elements:
<xsp:page xmlns:xsp="http://apache.org/xsp"
xmlns:cinclude="http://apache.org/cocoon/include/1.0"
xmlns:xsp-util="http://apache.org/xsp/util/2.0">
<html xmlns="http://www.w3.org/TR/xhtml1">
<head>
<title>The Foobar website</title>
<link rel="stylesheet" href="some_stylesheet.css" />
</head>
<body>
<cinclude:include>
<xsp:attribute name="src">cocoon:/news/<xsp-util:get-sitemap-parameter name="document-uri" /></xsp:attribute>
</cinclude:include>
</body>
</html>
</xsp:page>
This XSP produces exactly the same document as the injectdocumenturi stylesheet does, at the benefit of getting rid of one XSLT transformation. Furthermore, this effectively caters for generating a collection of page snippets which are then collated or aggregated into one delivered page. The other steps are identical to the previous setup.
We've barely scratched the surface of Cocoon, and many other components still exist to be discovered. Until now, we mainly focused on active Cocoon components, which directly act upon content being passed across the pipeline. To configure the pipeline, one can use so-called Actions, which are basically Java components configuring the sitemap behaviour or working on the request context. These Actions can configure what components get called, and are often used to build web applications using Cocoon.
On a more general level, some broad categories of technologies supported by Cocoon are:
Database Access, through Transformers, Actions and XSP logicsheets, which can be used to easily created a web front-end on top of relational databases. Cocoon also offers access to XML native databases such as Xindice.
Authentication: an authentication system which integrates well with the sitemap semantics is available
Portals: there exists an entire portal framework inside Cocoon, which can be used to easily create a user-configurable portal, with various datasources pushing content into one browser window
Web Services integration component, configuring Cocoon as a SOAP client or server
Integration with the Apache Lucene full-text search engine
There even exist a number of full-featured applications built on top of Cocoon, most notably:
database reporting using xReporter[12]
an IMAP webmail client (available from Cocoon scratchpad)
blogging software (CocoBlog)
a content management and publication framework called Wyona
... and many more
Furthermore, there is an XForms-like form-based application development framework in the upcoming 2.1 release, along with a truly innovative flow control layer.
Apart from the official project documentation available from http://cocoon.apache.org/, a thriving Wiki exists at http://wiki.cocoondev.org/.
Forrest is a set of tools for project-related documentation, making use of Ant and Cocoon. More specifically, Forrest exists of:
a set of XML grammars (DTDs) to create generic documents, but also how-to's and FAQs - commonly referred to as the document-v11 DTDs.
conversion stylesheets and tools to generate document-v11 compliant markup from some other source formats. Currently, there's support for DocBook XML documents, but also for Wiki markup which is up-translated to XML through the use of Chaperon, a grammar parser generator available as a Cocoon component.
several ways to describe the hierarchy of a project website and the navigation structure
a set of stylesheets used to generate a project website out of these documents, also with printable PDF documents
a specific Cocoon configuration (basically a sitemap) glueing all of this together
Thanks to Cocoon and Ant, it is possible to generate a deployable web application archive for installation inside a Servlet container, but also a filesystem rendition of the entire website which can be uploaded to a static webserver.
After installation, invoking Forrest is very easy. When starting a project and its documentation, you just use the forrest seed command to auto-generate a number of placeholder and sample files, which you can modify to your liking:
C:\my_project | forrest-targets.ent | forrest.properties | status.xml | \---src \---documentation | README.txt | skinconf.xml | +---content | | hello.pdf | | test1.html | | test2.html | | | \---xdocs | | book-sample.xml | | ehtml-sample.ehtml | | faq.xml | | ihtml-sample.ihtml | | index.xml | | sample.xml | | sample2.xml | | site.xml | | tabs.xml | | | \---subdir | book-sample.xml | index.xml | \---resources \---images group-logo.gif icon.png project-logo.gif
A simple configuration file makes it possible to parametrize the default skin, and upon invoking forrest site, you are provided with a default rendition of the project site:
Even better yet, one can run the forrest run command which starts an dynamic instance of your project site running inside Jetty, a small-footprint Servlet container, making it possible to roundtrip between document editing and previewing.
In this short paper, I tried to outline how easy it is to start using Cocoon and Forrest. Since the new version of Cocoon is now 2 years old, with active development continuing, the project has reached a certain level of maturity and commercial interest is growing. Also, three books on Cocoon have been published lately, which is an important sign of project maturity. Being part of the community which drives this project, I can only say: "Nice job, guys - now let's carry on with the good work!".
Thanks to Bruno Dumon and Marc Portier, my Outerthought colleagues, for reviewing this paper.
[Cocoon: Building XML Applications] Langham and Ziegeler, New Riders, ISBN: 0735712352 - 480 pages (July 2002)
![]() ![]() |
Design & Development by deepX Ltd. |