Abstract
You've made the corporate decision to move your valuable technical information into XML format for any variety of good reasons. Now, it's time to figure out how to get that information into all of the forms that your business requires that it be delivered in.
These forms might include:
HTML
Published paper
Other forms of XML
It seems that this would be the easy part, right? After all, you've gone to all this trouble to define your data model, and started tagging your data, now you just want to get it back out. There are still many avenues to explore, and many options that must be weighed. .
This presentation will address many of the different ways to publish XML data, along with some of the pros and cons of each method:
Proprietary software/stylesheets
XSL-FO
XSLT
FOSI
We will discuss the various standards, and their development methodology, along with some tips and real-life examples of using each method
Table of Contents
The need for a standard formatting language was recognized in the early days of SGML. When SGML was first created, people began using it, but relied on proprietary formatting engines to produce final printed output. This was the only option. They had achieved vendor independence as far as their data was concerned - they could switch SGML authoring software at will, but when it came to changing their choice of formatting engines, new style sheets needed to be written, which can be quite a lengthy process.
The first attempt at vendor-neutral formatting was developed as part of the suite of Continuous Acquisition and Lifecycle support (CALS) standards. CALS was a set of standards that was intended for use within the US DoD. The CALS standards were based upon International Standards Organization (ISO) standards but were developed specifically for the military. The standards were put together by a set of sub-committees to the DoD Electronic Publishing Committee, each sub-committee focused on a particular requirement (there were sub-committees for each of the graphic standards used in DoD technical manuals, a DTD sub-committee, etc.). The suite included standards for graphic files, a set of approved SGML DTDs, and the Output Specification Standard, which defined the grammar for FOSIs. FOSI stands for "Formatting Output Specification Instance". A FOSI is an SGML tagged document that conforms to the Output Specification DTD.
This approach to formatting was loosely based on TeX (pronounced tec), which was an early formatting language that required that text objects be prepended with formatting instructions. TeX was (and still is) used in acadamia and scientific communities. Many scientific journal publishers are currently using TeX or LaTeX to produce printed journals.
The US DoD community developed several standard FOSIs for the set of approved DTDs under the CALS umbrella. The Army, Navy and Air Force each developed their own DTD's and FOSI's to support their technical manual paper publishing process. Other large industries, such as the Airline Transportation Association, also developed FOSI's for their publishing requirements. However, FOSIs did not gain wide acceptance outside these communities mainly because they were complicated, difficult to create and had no capability for modularity.
As a result, vendor implementations of the standard were not widespread, and was limited to two vendors, Arbortext and Datalogics. Today, these two vendors still have continue to support FOSIs, as they are still have an established client-base who still rely on FOSIs to support their publishing products. Arbortext has gone a step further in implementing FOSI to XSL-FO conversion software within their latest release, and this release offers the user a choice between publishing utilizing FOSI or XSL-FO (or even both!).
XSL (Extensible Stylesheet Language) has been developed as part of the W3C Style Sheets Activity: "W3C continues to work with its Members, evolving the Cascading Style Sheets (CSS) language to provide even richer stylistic control, and to ensure consistency of implementations. W3C is also developing the Extensible Stylesheet Language (XSL), which has document manipulation capabilities beyond styling." The W3C Style Sheets Activity is itself part of the W3C User Interface Domain.
The W3C XSL specification was split into two separate documents. The first part deals with the syntax and semantics for XSL, applying 'style sheets' to transform one document into another. XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary. Meanwhile the second part is concerned with the XSL formatting objects, their attributes, and how they can be combined. The formatting objects used in XSL are based on prior work on CSS and DSSSL - the Document Style Semantics & Specification Language (DSSSL). XSL is designed to be easier to use than DSSSL, which was only for use by expert programmers. Nonetheless, in practice it is expected that people will use tools to simplify the task of creating XSL style sheets. A separate related specification is published as the XML Path Language (XPath) Version 1.0 . XPath is a language for addressing parts of an XML document, essential for cases where you want to say exactly which of a document are to be transformed by XSL. XPath allows you to say, for example, 'select all paragraphs belonging to the chapter element,' or 'select the elements called special notes.' XPath is designed to be used by both XSLT and XPointer, and has now been adopted by XQUERY as well. XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations and XPointer.
According to the Activity description, XSL is a language quite different from CSS, and caters for different needs. The model used by XSL for rendering documents on the screen builds upon many years of work on a complex ISO-standard style language called DSSSL. Aimed, by and large, at complex documentation projects, XSL has many uses associated with the automatic generation of tables of contents, indexes, reports and other more complex publishing tasks.
Why two Style Sheet languages? - The fact that W3C has started developing XSL in addition to CSS has caused some confusion. Why develop a second style sheet language when implementors haven't even finished the first one? . . . The unique features are that CSS can be used to style HTML documents. XSL, on the other hand, is able to tranform documents. For example, XSL can be used to transform XML data into HTML/CSS documents on the Web server. This way, the two languages complement each other and can be used together. Both languages can be used to style XML documents. CSS and XSL will use the same underlying formatting model and designers will therefore have access to the same formatting features in both languages. W3C will work hard to ensure that interoperable implementations of the formatting model is available.
About the same time MIL-M-280001 was published containing the FOSI specification in 1988, ISO established a committee to develop a style specification for SGML. The DSSSL committee was formed in 1988. It was not until April 1996 that ISO/IEC standard 10179, DSSSL became an ISO standard. By 1996 organizations that had adopted SGML were either using FOSI or proprietary composition engines to publish their SGML information.
Many of the concepts used in XSL and XSLT were derived from the DSSSL standard. DSSSL introduced the concepts of a style language, flow objects and transformation language which were carried forward with XSL and XSLT. DSSSL required a very knowledgeable person (usually an SGML/XML consultant or engineer highly versed in SGML/XML and DSSSL, as well as other programming languages in order to develop stylesheets.
Commercial software that support DSSSL is practically non-existence. There are several open source DSSSL engines that are available today and that are being used by some organizations with SGML and XML as an alternative to XSL.
Besides standards-based publishing, software vendors provide software to perform complex page publications. These systems are usually high-end systems and are used by publishing companies to support complex publishing and pagination requirements that XSL-FO cannot support.
3B2
Miles33
Xyvision
XSL-FO is the "formatting object" (the FO part) vocabulary that expresses the semantics of print layout in and XML vocabulary. It is a bit confusing, because XSLT is the tranformation that is used to change a source XML document in a document that is tagged as blocks of formatting objects, containing printing instructions.
In this section we will address the different standards and how they address the actual styling of XML documents, along with some real-life examples.
FOSIs provide a rich, robust formatting methodology. They are very useful in complex situations where multi-pass formatting is a requirement. These situations might include:
Footnotes where the reference and footnote must appear on the same page (versus footnotes appearing at the end of chapters, for example)
Automatic generation of multiple level indexes
Vendor support is not strong when it comes to FOSIs, however. If you decide to use FOSIs for your formatting requirements, you will be limited to either Arbortext's EPIC Publisher or Datalogic's Pager.
XSLT is primarily a "transformation" language, used for converting an XML document that's been tagged to a particular DTD/Schema to another. It is used most frequently for converting XML to HTML for web display/output, but can be used to produce any kind of tagged output desired. It can also be used to "strip" tags from a tagged file, producing a simple text stream.
A good understanding of XPATH is essential when undertaking either XSLT or XSL-FO coding, as this is the heart of either language. Once you understand XPATH expressions, you will have a good basis for forming not only element matches, but complex string expressions which can be useful for transforming the text along with the tagging. You might for instance, wish to change the text content of all titles to uppercase in your output, this can be done with an XPATH expression.
XSL as a language is still in the process of being enhanced, and there are some things that can't be done completely with XSLT or XSL-FO. Different vendors have enhanced their toolsets to provide "extensions" to the language to accomplish things like placing footnotes on the referenced page, and breaking output into multiple files. These extensions can be very useful in putting together a robust application, but it must be remembered that putting these extensions into your code makes that style sheet function dependent on the particular vendor's extension that you have used. Luckily, if you run a stylesheet through another vendor's software, it will just ignore the extension, rather than erroring.
Some of the most popular XSLT processors are:
| Instant SAXON |
This is the SAXON product for the Windows platform. It includes a scaled down version of the full SAXON package. The XSLT processor can be executed directly on Windows 9x and 2000 platforms. http://users.breathe.com/mhkay/saxon/saxon5-5-1/instant.html |
| Napa |
This XSL processor is not complete, but it is fast and is still being worked on. http://www.tfi-technology.com/xml/napa.html |
| Oracle XML Parser for Java |
Oracle includes a good XSLT processor in their v2 XML parser. http://technet.oracle.com/tech/xml/parser_java2/ |
| SAXON |
SAXON is actually a group of programs to help you manage your XSL documents. It includes an XSLT processor, a Java library, and an XML parser. http://users.breathe.com/mhkay/saxon/saxon5-5-1/instant.html |
| Unicorn XSLT Processor |
This is a C++ processor and comes in three flavors: standard, database, and professional. It is run at the command line. http://www.unicorn-enterprises.com/ |
| Xalan-Java version 1.2.2 |
Xalan is a Java based XSLT processor . It conforms to the W3C recommendations for XSLT and XPath. http://xml.apache.org/xalan-j/index.html |
| Xalan-Java version 2.0.D07 |
Xalan Java 2.0 is similar to Xalan 1.2 except that it is an implementation of the Transformation API for XML interfaces. It builds on SAX 2 and DOM level 2. http://xml.apache.org/xalan-j/index.html |
| XSL:P |
XSL:P is a free, open-source Java XSL processor. It implements the 1.0 XSLT recommendation. |
| XT |
XT is the most commonly used XSLT processor. You can either run the version that requires a SAX parser or you can download the Win32 executable. http://www.blnz.com/xt/index.html |
Let's now take a look at the components of a typical XSLT implementation.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xt="http://www.jclark.com/xt"
extension-element-prefixes="xt">
<!-- Default Templates -->
<xsl:template match="service_manual | owners_manual">
<xt:document href="toc.htm">
<html>
<head>
<link rel="stylesheet" type="text/css"
href="System\Processhtml.css"/>
<title>Service Literature</title>
</head>
<body bgcolor="#F3F3F3">
<a>
<xsl:attribute
name="name">pagetop</xsl:attribute>
</a>
<div class="clsTOCHead">Contents</div>
<div class="toc">
<xsl:apply-templates select="//notices"
mode="toc"/>
</div>
<div class="toc">
<xsl:apply-templates select="//section"
mode="toc"/>
</div>
</body>
</html>
</xt:document>
<xt:document href="frameset.htm">
<html>
<head>
<link rel="stylesheet" type="text/css"
href="System\Processhtml.css"/>
<title>Service Literature</title>
</head>
<frameset rows="20, 80">
<frame src="system\banner.htm">noresize</frame>
<frameset cols="20, 80">
<frame src="toc.htm"
name="tocframe">noresize</frame>
<frame src="notices.htm"
name="content">noresize</frame>
</frameset>
</frameset>
</html>
</xt:document>
<xsl:template match="remark[@type='note']">
<i><p>
<strong><xsl:text>*NOTE: </xsl:text>
</strong>
<xsl:for-each select="para[position()=1]">
<xsl:apply-templates/>
</xsl:for-each>
</p>
<xsl:for-each select="para[position()>1]">
<p>
<xsl:apply-templates/>
</p>
</xsl:for-each>
</i>
</xsl:template>
<xsl:template match="remark[@type='important']">
<i>
<p>
<strong><xsl:text>*IMPORTANT: </xsl:text>
</strong>
<xsl:for-each select="para[position()=1]">
<xsl:apply-templates/>
</xsl:for-each>
</p>
<xsl:for-each select="para[position()>1]">
<p>
<xsl:apply-templates/>
</p>
</xsl:for-each>
</i>
</xsl:template>
<xsl:template match="remark[@type='notice']">
<i>
<p>
<strong>
<xsl:text>*NOTICE: </xsl:text>
</strong>
<xsl:for-each select="para[position()=1]">
<xsl:apply-templates/>
</xsl:for-each>
</p>
<xsl:for-each select="para[position()>1]">
<p>
<xsl:apply-templates/>
</p>
</xsl:for-each>
</i>
</xsl:template>
</xsl:stylesheet>In this case, the root elements are a choice of either service manual or owners manual and the xt:document extension is being used to break the file up into multiple output files, in order to generate a frameset with a navigation frame on the left hand side. Moving down to the template match statements, you will see that the "remark" element is being matched, and the resultant HTML output. The xsl:template match element is the key to any XSLT implementation, and simply contains instructions for what the output will look like.
This style sheet demonstrates the use of XPATH to match the remark element based on the value of the type attribute, and produces different output accordingly. It also distinguishes between the first paragraph and any subsequent paragraphs.
XSL-FO stems from DSSSL, SGML's LISP-like style language. XML-FO is an entire language that describes page geometries such as page sizes, printable areas, headers, footers, margins, gutters, and columns. XSL-FO lets you describe the layout of such page elements in minute detail that you can specify the page areas designated for text and "flow" text into them, like water flowing into a pitcher. Having such strictly defined objects allows a page to be very precisely described.
The XSL-FO language uses tags and attributes for just about any tool you use to put ink on a page. You can control columns and gutters, font styles and sizes, kerning, borders, colors, lines with end caps, page breaks, images, text block alignment, and justification.
XSL-FO can be intimidating at first because you have to learn some background concepts like "layout masters" and "page sequences." Once you understand the basic model though, writing XSLT stylesheets that transforms your XML document to XSL-FO (which itself is another XML format) is easy.
But what can you do with an XSL-FO file? Since the XSL-FO definition so strictly defines the XSL-FO tags, page-rendering software can read XSL- FO files and produce high-quality graphical representations of the documents. Also, a conversion program can read the XSL-FO definition and convert it to PDF, PostScript, TeX, or any other page description language, and then print or display it.
Here is a list of some good XSL-FO processing tools:
Antenna House (http://www.antennahouse.com/)
FOP (http://xml.apache.org/fop/index.html)
PassiveTex (http://www.hcu.ox.ac.uk/TEI/Software/passivetex/)
XEP (http://www.renderx.com/FO2PDF.html)
Now, let's take a look at a brief example of an FO stylesheet:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="xml"
omit-xml-declaration="no"
encoding="UTF-8"
doctype-public=""
doctype-system=""/>
<xsl:template match="/">
<fo:root>
<fo:layout-master-set>
<fo:simple-page-master master-name="oddPage">
<fo:region-before
region-name="oddPageHeader"
extent="0.7in"
display-align="before"
border-style="none"
border-width="thin"
/>
<fo:region-after
region-name="oddPageFooter"
display-align="after"
extent="0.7in"
border-style="none"
border-width="thin"
/>
<fo:region-body
border-style="none"
border-width="thin"
margin-top="0.75in"
margin-left="0.75in"
margin-right="0.75in"
margin-bottom="0.75in" />
</fo:simple-page-master>
<fo:simple-page-master master-name="evenPage">
<fo:region-before
region-name="evenPageHeader"
extent="0.7in"
display-align="before"
/>
<fo:region-after
region-name="evenPageFooter"
display-align="after"
extent="0.7in"
/>
<fo:region-body
border-style="none"
border-width="thin"
margin-top="0.75in"
margin-left="0.75in"
margin-right="0.75in"
margin-bottom="0.75in" />
</fo:simple-page-master>
<fo:page-sequence-master master-name="front">
<fo:repeatable-page-master-alternatives>
<fo:conditional-page-master-reference
master-reference="oddPage"
odd-or-even="odd" />
<fo:conditional-page-master-reference
master-reference="evenPage"
odd-or-even="even" />
</fo:repeatable-page-master-alternatives>
</fo:page-sequence-master>
<fo:page-sequence-master master-name="main">
<fo:repeatable-page-master-alternatives>
<fo:conditional-page-master-reference
master-reference="oddPage"
odd-or-even="odd" />
<fo:conditional-page-master-reference
master-reference="evenPage"
odd-or-even="even" />
</fo:repeatable-page-master-alternatives>
</fo:page-sequence-master>
</fo:layout-master-set>
<!-- front pages -->
<fo:page-sequence initial-page-number="auto-odd"
format="i" master-reference="main">
<fo:static-content flow-name="oddPageHeader">
<fo:block
border-bottom="red solid thin"
font-family="Helvetica"
font-size="10pt"
font-weight="bold"
text-align="right"
space-before="0.5in"
space-after="0.5in"
margin-right="0.75in"
margin-left="0.75in"
margin-top="0.3in">
</fo:block>
</fo:static-content>
<fo:static-content flow-name="oddPageFooter">
<fo:block margin-left="0.75in">
<fo:leader leader-pattern="rule"
leader-length="17.2cm" color="red" />
</fo:block>
<fo:block margin-left="0.75in"
margin-bottom="0.3in"
margin-right="0.75in"
font-family="Helvetica"
text-align-last="justify">
<fo:inline>
<xsl:value-of select="//ninetydash"/>  
</fo:inline>
<fo:inline>
<xsl:apply-templates select="//service_manual/@date"/>
</fo:inline>
<fo:leader/>
<fo:inline>
Page <fo:page-number/>
</fo:inline>
</fo:block>
</fo:static-content>
<fo:static-content flow-name="evenPageHeader">
<fo:block
border-bottom="red solid thin"
font-family="Helvetica"
font-size="10pt"
text-align="left"
font-weight="bold"
space-before="0.5in"
space-after="0.5in"
margin-left="0.75in"
margin-right="0.75in"
margin-top="0.3in">
</fo:block>
</fo:static-content>
<fo:static-content flow-name="evenPageFooter">
<fo:block margin-left="0.75in">
<fo:leader leader-pattern="rule"
leader-length="17.2cm" color="red" />
</fo:block>
<fo:block margin-right="0.75in"
margin-left="0.75in"
margin-bottom="0.3in"
font-family="Helvetica"
text-align-last="justify">
<fo:inline>
Page <fo:page-number/>
</fo:inline>
<fo:leader/>
<fo:inline>
<xsl:value-of select="//ninetydash"/>  
</fo:inline>
<fo:inline>
<xsl:apply-templates select="//service_manual/@date"/>
</fo:inline>
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<xsl:apply-templates select="//front"/>
</fo:flow>
</fo:page-sequence>
<!-- start matching templates -->
<xsl:template match="graphic[@source]">
<fo:block margin-left="1.25in"
space-before=".25in" space-after=".25in">
<fo:external-graphic src="{@source}"/>
</fo:block>
</xsl:template>
<xsl:template match="address">
<fo:wrapper>
<xsl:apply-templates/>
</fo:wrapper>
</xsl:template>
<xsl:template match="admon">
<fo:block margin-left="1.25in"
space-before.maximum="8pt" color="#000000"
space-before.conditionality="discard"
score-spaces="false" margin-right="2pc"
space-before.optimum="7pt"
space-before.minimum="6pt">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
<xsl:template match="admon[@type='danger']">
<fo:block margin-left="1.25in" margin-top="0.1in"
text-align="center"
border="thin solid black"
background-color="red"
font-weight="bold"
color="white"
keep-with-next="always">
<fo:external-graphic src="art/alerticon.jpg" />
DANGER
</fo:block>
<fo:block margin-left="1.25in" margin-bottom="0.1in"
text-align="left"
border="thin solid black"
font-weight="bold"><xsl:apply-templates/>
</fo:block>
</xsl:template>
<xsl:template match="english">
<fo:wrapper>
<xsl:apply-templates/>
</fo:wrapper>
</xsl:template>
</xsl:stylesheet>
You will notice that the first part of the stylesheet is setting up the page geometry, and that there are differences in the header and footer from even to odd pages. This allows you to do things like alternate the location of the page numbers. If you move down the style sheet to the first fo:flow element, this is where you actually "call" the element in the document that will start the "flow" into the area that has been defined. Below all of the page layout information, there are simply a list of template matches that match the source elements, and these elements are output as various fo objects, that tell the formatting tool how to format them. The fo:block and fo:wrapper elements are the most commonly used, and these elements have many attributes that are used to control the appearance of the output, as is demonstrated here.
![]() ![]() |
Design & Development by deepX Ltd. |