Building a document delivery system from off-the-shelf standards-conformant parts
Track: Publishing, Core Technologies, Integration
Audience Level: High Level/Technical View
Time: Tuesday, November 16 at 14:00
Keywords: Electronic Publishing, Full-Text, Markup, Query Language, CSS, XSLT, XML, Cocoon
Abstract:
OK. So you have your documents in XML. How do you deliver them to readers? You've heard great things about separation of form and content, and would like different kinds of readers to see the documents styled in different ways. And in order to make the collection of documents more useful, you would like to have full-text search. The quality assurance people would like some help with tools for checking documents and finding errors and inconsistencies in existing ones. Oh, and by the way, we just took a budget cut, so can you do it without breaking the bank?
Yes, you can. The paper will show you one way to do it, using off-the-shelf tools that conform to open standards, so they have well defined interfaces and you can swap components into and out of the system. The focus here is on open source and low-cost tools, but many of the basic ideas will apply to those with access to commercial tools, as well.
The system demonstrated here uses Cocoon, the Apache Web publishing tool, to manage delivery of documents. Documents are stored and edited as XML and delivered in XML or in HTML, depending on the user's preference. Document processing takes the form of XML-to-XML transformations, either in batches or at retrieval time. For the most part, XSLT stylesheets are used for structural transformations and for translation into XHTML. CSS stylesheets are used to control the display of the HTML in standard browsers; they are also used to display XML in browsers that don't support XSLT. Existing tools are used to provide full-text search, and XPath-based search for elements matching a particular description; at retrieval time, matching elements can be styled appropriately with links to their context. For quality assurance purposes, the results of a search can be shown in XML form; together with the XPath-based search language, this makes it easy to find all the places in the collection where a particular combination of a attribute values was used, or where the data entry process failed to provide specific required information.
Delivery of XML doesn't need to be as boring as static HTML pages, nor as fraught with difficulties as ad hoc coding of dynamic HTML. It can be simple, straightforward, and powerful, thanks to standards-compliant tools.
XML version
HTML version
PDF version
SVG version