Table of contents Author City Company Country State/Province Term Interchange  

XSL-FO: state of the union, state of the art

Lease, Karen , Senior Software Developer ,   SPX Valley Forge ,   Rueil-Malmaison   France 

Biography

Karen Lease has been developing software for SGML and XML applications since 1983. She has always been particularly interested in formatting aspects, starting with Texet's WYSIWYG structured editor and continuing on with DSSSL and XSL. She is currently one of the principal developers on the Apache Fop project.

Ms. Lease has previously spoken at SGML/XML Europe in 1998 and 2000, Documation Paris 2001 and AUGI 2001.

Abstract

One of the main ideas of structured languages (SGML and XML) is to separate structure and content from presentation. XSL is the latest in a long sequence of both standardized and proprietary solutions to this problem. Like DSSSL, and unlike FOSI and CSS, XSL is a two step process model: transformation of the original document and formatting of the transformed document. Unlike DSSSL, the transformation aspect (separately approved as XSLT in a W3C Recommendation stage in November 1999) has given rise to a significant number of implementations. The much longer gestation of the formatting object (FO) aspect of XSL has often led to questions of whether it was even necessary. Now with the XSL-FO specification having attained PR status, and with several public and commercial FO processors being developed, it seems that FO will at least see the light of day. This talk will first give a brief overview of the essential aspects of the standard, and then discuss how XSL-FO tools will fit into the XML landscape.



Introduction

In this section, I'll describe XSL-FO and situate it with respect to the more widely known XSLT specification. I'll give a very brief historical overview of the development of the specification.

XSL, XSL-FO, XSL-T

XSL is the official name of the W3C specification (in the Proposed Recommendation state at the time of writing) which describes a way of attaching formatting information to XML documents  . As described in its abstract, the specification consists of two distinct parts:

1. a language for transforming XML documents, and

2. an XML vocabulary for specifying formatting semantics.

Part 1, the language for transforming XML documents was broken out of the XSL specification and was approved under the name of XSL Transformations or XSLT in November 1999  . XSLT is very widely used in applications which transform an XML document into text, a different XML document, or perhaps most commonly into an HTML document for delivery to a client browser.

To avoid confusion with XSLT, the actual XSL specification which describes formatting semantics is thus frequently referred to as XSL-FO , where FO is shorthand for Formatting Objects .This specification covers the following aspects of formatting:

In this paper, I will use the term XSL to refer to the W3C specification and the terms XSL or XSL-FO to refer to the vocabulary for formatting semantics described by the specification. I will use the abbreviation FO to mean formatting object .

A short history of stylesheet languages

formatting language  stylesheet  One of the main ideas of using a structured language such as SGML or XML is to separate structure and content from presentation. In the 15 years since SGML was first approved as an ISO standard, there have been several major attempts to create a companion standard which would allow users to associate formatting information with structured documents. One of the first standardization efforts resulted in the FOSI , developed by the U.S. Department of Defense, and adopted by Arbortext and Datalogics to define stylesheets for SGML documents. The FOSI defines an SGML vocabulary for attaching formatting characteristics to elements in context . This fundamental concept is also the basic idea behind all other formatting languages, although as we will see, it functions in a slightly different way in XSL.

The next major attempt to resolve the problem was the ISO standard DSSSL  . DSSSL includes both a transformation and a formatting part. However, by the time DSSSL was approved, the Web was becoming very important, and the SGML community reacted by developing XML as a response to HTML. HTML has built-in formatting semantics. Since XML, like SGML, doesn't imply any formatting semantics, it can't be used for presentation without a way of attaching formatting semantics to the elements. This quickly led to the birth of the XSL working group at the W3C.

Initially, the XSL working group was composed of many of the same people who had created DSSSL, and initial versions of the specification were heavily influenced by DSSSL ideas, but using an XML syntax instead of Scheme syntax. Like DSSSL, and unlike FOSI and CSS , the XSL proposal was based on a two step process model: transformation of the original document and formatting of the transformed document. This is a crucial point for the history of XSL. It led rather quickly to the split-off of the transformation aspect–more easily defined and agreed upon–into the XSLT specification. A number of critics considered at the time (and probably still do) that this was an adequate answer to the problem. However work on the main XSL specification continued, and the formatting property set was brought closer to the CSS vocabulary  .

State of the union

By State of the union I mean the actual content of Version 1.0 of the XSL specification. This will help to answer the question: What can I do with XSL?

Fundamental XSL concepts

If you have experience with other formatting systems, XML-based or not, various aspects of XSL may surprise you. This section describes some of the main concepts underlying the specification.

Formatting objects

The first fundamental idea is that XSL does not provide a way of directly attaching formatting properties to elements in an XML document. Instead it attaches formatting properties to formatting objects . These objects are defined by the XSL specification. For example, instead of specifying typographic properties for a title element in a subsec2 in the GCA XML2001 paper namespace, one transforms that into a block element in the FO namespace with particular typographic properties.

Formatting object processors therefore do not deal directly with elements of various XML vocabularies, but only with elements of the formatting object vocabulary. The principal elements in this vocabulary will be described in fovocab.

Transformation

transformationThis notion follows from the previous one: the formatting object structure is (at least conceptually) the result of transforming an XML document in a domain-specific vocabulary into the formatting object vocabulary. This extremely powerful idea is a heritage from DSSSL, although there are some subtle differences between the way DSSSL and XSL are designed to function. The transformation may be quite straightforward, in which case it is similar to simply attaching formatting properties to elements in the original document, or it may involve extensive generation of content, rearrangement or suppression of existing content.

Note

It isn't necessary to use the XSLT part of XSL to create an XML document using the formatting vocabulary. Any other tool, including a standard text editor, may be used to create such a document. However, in many cases, the process will involve the use of XSLT. When it does so, the formatting object document usually does not exist as a persistent representation (for example, a file), but only as an in-memory representation such as a DOM tree or a stream of SAX events.

Areas

Formatting objects are not themselves what one sees on a printed page or a browser screen. Rather they are processed (by an FO processor) to create areas . Areas are boxes which actually contain laid-out content. A formatting object is said to generate one or more areas. The size and position of the generated areas depends on the constraints imposed by the formatting properties and the page masters. For example, a block FO will generate more than one area if it contains too much content to fit on a single page.

Writing-mode independence

writing-modeAnother fundamental XSL concept is the notion of writing-mode independence, which allows style sheets to be developed for non-Western writing systems such as Hebrew, Arabic or Japanese. Again this idea comes from DSSSL . However XSL also retains the "absolute" model (top, bottom, left, right) of specifying formatting properties, mainly to maintain compatibility with the CSS vocabulary.

Writing modes are specified using terms such as lr-tb or tb-rl. These codes specify the inline-progression-direction and the block-progression-direction. For example, lr-tb is interpreted as "lines of text are written from left to right, and stacked from top to bottom". See fig.wrtmode for an illustration of these terms.

Writing mode

The figure on the left shows a Western writing mode; the figure on the right shows the writing mode for Chinese.

Writing-mode independence gives rise to a vocabulary for describing formatting properties which uses different terms from those to which we (in Western typography) are accustomed. For example, instead of specifying left-indent and right-indent, XSL has properties called start-indent and end-indent, where the terms start and end are interpreted with respect to the writing direction for a line of text. When writing a line of English text for example, we start at the left and end at the right.

Similarly XSL uses the terms space-before and space-after rather than space-above and space-below when referring to space around blocks of text. Here, before and after are interpreted with respect to the direction in which lines of text (blocks) are stacked. When writing English, we proceed from top to bottom, so before is above and after is below.

Inheritance

inheritanceThe XSL formatting model, like previous stylesheet languages, makes extensive use of the notion of inheritance . A formatting property which can be inherited may have its value set on a high-level formatting object and the same value will apply to all objects contained within that object. For example, font-related properties such as family, size, weight and so on, are all inherited.

Note

The containment relationships are defined by the hierarchy of the formatting object "document" resulting from the transformation process, which are not necessarily the same as those of the original XML document.

The formatting vocabulary

Bearing these concepts in mind, let's look at Hello, world in XSL. The following example creates a single A4 page containing the greeting, "Hello, world". Note that since no typographical formatting properties are specified the default values will be used.

The FO Hello World example
<root xmlns="http://www.w3.org/1999/XSL/Format">
  <layout-master-set>
    <simple-page-master master-name="simple" page-height="29.7cm"
        page-width="21.0cm" margin-bottom="3cm" margin-left="2cm"
        margin-right="2cm" margin-top="3cm" >
      <region-body/>
    </simple-page-master>
  </layout-master-set>
  <page-sequence master-reference="simple">
    <flow flow-name="xsl-region-body">
      <block>Hello, world</block>
    </flow>
  </page-sequence>
</root>

This example illustrates the basic structure of an FO document. It consists of a root element which contains a single layout-master-set and one or more page-sequence elements. The layout-master-set defines all page masters and sequences of page masters. It must contain at least one simple-page-master element. A page-sequence contains the actual content of the document which will be placed on pages, enclosed in a flow. The flow contains one or more block-level elements, of which the most common is simply block.

In the rest of this section, I'll look at this structure in more detail, expanding on the example as I go along.

Page and page-sequence masters

The formatting vocabulary allows the user to define a number of page masters, each of which defines the size, orientation, margins, and body and side region placement. In the example above, the master named simple defines an A4 page with 2 cm. side margins and 3 cm. top and bottom margins and a single column of body text. Different masters may define pages of different sizes, with different margins and different numbers of columns.

Each page-sequence object references either a single simple-page-master object or a page-sequence-master object, using the master-reference attribute.

Note:

The name of this property was changed from master-name to master-reference between the Candidate and Proposed Recommendation versions.

The page-sequence-master object serves to group together references to different simple-page-master objects and to define the order and rules governing their use. This allows typical recto/verso pagination styles as well as more complex sequences to be designed. If a page-sequence directly references a simple-page-master, that master is used to generate as many pages as necessary to hold all of the content of the flow. If the page-sequence references a page-sequence-master, page masters from that sequence are used to generate the pages which hold the content.

Each simple-page-master defines a region-body and up to four optional regions which are placed on each side of the region-body. The region-body is generally used to hold the content of the flow object while the side regions are used to hold the content of any static-content objects present in the page-sequence. This static content is repeated on each page generated by a master containing the corresponding region. For example, if we want to extend the Hello world example to include a running head with the text "FO examples" and a footer with the page number, we would use the following document.

The FO Hello World example - 2
<root xmlns="http://www.w3.org/1999/XSL/Format">
  <layout-master-set>
    <simple-page-master master-name="simple" page-height="29.7cm"
        page-width="21.0cm" margin-bottom="2cm" margin-left="2cm"
        margin-right="2cm" margin-top="2cm" >
      <region-body margin-top="1cm" margin-bottom="1cm"/>
      <region-before extent="1cm"/>
      <region-after extent="1cm"/>
    </simple-page-master>
  </layout-master-set>
  <page-sequence master-reference="simple">
    <static-content flow-name="xsl-region-before">
      <block>FO examples</block>
    </static-content>
    <static-content flow-name="xsl-region-after">
      <block><page-number/></block>
    </static-content>
    <flow flow-name="xsl-region-body">
      <block>Hello, world</block>
    </flow>
  </page-sequence>
</root>

There are several points to observe in this example:

  • The region-before and region-after zones are inside the area defined by the page margins. To keep them from overlapping the region-body, I have reduced the top and bottom page margins by 1 cm. each and added a 1 cm. top and bottom margin to the region-body itself. This corresponds exactly to the extent property on the region-before and region-after. fig.pageareas shows this graphically.

  • Each static-content element contains one or more block-level formatting objects, just as does the flow.

  • The number of the current page is obtained by using the a special inline-type FO called page-number. The formatting object processor will replace it with the page number during the layout process.

Page areas

Flow content

Let's look in more detail at the content of a flow. I've used the term block-level FO several times without precisely defining it. A block-level FO is one which generates areas which are stacked in the same way as lines of text; this is called the block-progression-direction . For Western languages it is top to bottom. So far, our examples have shown only one kind of block-level FO, which is called block. It is by far the most common FO; it can be used to format titles, paragraphs or long quotes.

The other block-level formatting objects are essentially containers which arrange their child block-level FOs in particular ways. The following list briefly describes each of these FOs; table and list-block will be examined in more detail.

list-block

Definition:

groups one or more list-item FO

table

Definition:

the root of the table FO hierarchy

table-and-caption

Definition:

groups together a table FO and its caption; the caption may be on any side of the table

block-container

Definition:

generates a reference area which may be absolutely positioned on the page and may change the writing-mode; contains one or more block-level FO

Inline-level formatting objects

As you've noticed, the block FO can directly contain text. In fact, it is the only block-level FO which can contain text. We can think of the block as the boundary between block-level and inline-level formatting objects. Each character in a string of text is implicitly a kind of inline formatting object which generates an area containing the glyph : the visual representation corresponding to that character code. The size of this inline area is determined by the current font-related characteristics. fig.textarea shows inline areas generated by text.

Inline areas generated by text

In addition to characters, the block FO can also contain any other kind of inline-level FO. The most common of these are:

character

Definition:

a single character which has specific characteristics

external-graphic

Definition:

generates an area containing the referenced graphic (acceptable formats are determined by the FO processing engine)

instream-foreign-object

Definition:

generates an area containing the rendered content of the element, for example SVG

inline

Definition:

may be used to change a non-inherited property such as background or border for some inline content; may contain other inline FO

inline-container

Definition:

can include block-level objects which are composed inline; also allows changing writing-mode for contained objects

leader

Definition:

creates a fixed or expandable leader or rule

page-number

Definition:

contains the formatting-processor generated page number

page-number-citation

Definition:

contains the page number of the referenced FO

fig.inlinearea shows a mixture of text and an inline area generated by the external-graphic FO. Note that the size of the of the line area is adjusted to account for the size of the graphic.

Text and an inline graphic

Tables and lists

As mentioned previously, there are specific block-level FO for formatting tables and lists. The top level table FO contains substructure which closely follows typical SGML and XML table content models. It contains provisions for defining columns using fixed, proportional or automatically calculated widths, defining header and footer rows and specifying one or more table bodies. Row and column spanning cells and complete control over border attributes are also provided. fig.table illustrates the areas generated by the table-specific formatting objects.

One extremely important point to make concerning tables is that using the transformation language, table-style formatting may easily be applied to an input XML document which is not modeled as a table. This can be a liberating experience for DTD and Schema designers!

Table FO structure

The list model consists of the list-block which simply acts as a container for list-item FO. Each list-item has list-item-label and list-item-body children, as shown in Figure fig.list. The content of both the list-item-label and the list-item-body is one or more block-level formatting objects. This means that even if your list-item-label contains only a number or a bullet, you must enclose it in a block FO. The relative positioning of the label and the body areas is controlled by the indent values specified on each one. Like all indents, these are interpreted with respect to the reference area and not to the areas created by the list-item or list-block formatting objects.

List FO structure

Out-of-line formatting objects

The XSL formatting vocabulary also contains two distinct formatting objects, footnote and float, with out-of-line or floating behavior. The float FO can be used to produce both before and side floats, depending on the value of its float attribute. If the value is before, the float is placed in the "before side" (corresponding to the top in Western writing-mode) of the region-body containing the anchor. If the value is start, end, left, or right, the float is called a side-float and is placed on either the start or end side of the region-body, depending on the value specified. Depending on the values specified for indents on the float and normal flow content, side floats may intrude on normal content in the region-body, forcing it to "run around" the floating area or be positioned in the margin like a sidenote.

Formatting properties

The XSL formatting vocabulary includes a large number of properties . Some properties are specific to certain formatting objects while others are applicable to a large number of objects. In addition to standard typographical and pagination-related properties, XSL includes many properties related to audio presentation, and some properties useful for electronic media.

Print-oriented formatting properties can be broken into several main groups:

  • font and character related, including hyphenation properties,

  • indents and vertical spacing,

  • borders, padding and background,

  • area size, position and alignment,

  • table and list related,

  • keeps and breaks.

Properties determine the size and placement of the areas which are generated by formatting objects. They also determine how they will be rendered. In order to use XSL effectively, you need to understand how the FO processor uses properties. It is particularly important to see how start-indent and end-indent determine the placement of the text in a block FO. fig.blockprops shows the relation of the indent properties to border and padding properties. Note that indents are measured with respect to the containing reference area . In a normal flow, each column of text in the region-body is a reference area, but table-cells, block-containers and inline-containers also define reference areas. A reference area allows the writing-mode to be changed, so that it is possible to mix different writing modes on the same page.

Diagram of major layout properties

As mentioned earlier (inheritance), some formatting properties are inherited while others are not. The value of an inheritable property for a given formatting object is found by looking at that object and each of its ancestors in turn until a value is found. The value of a non-inheritable property must be specified directly on the formatting object where it is needed; if no value is specified, a default value is used. All properties have reasonable defaults defined in the specification.

Common inheritable properties are: font-family, font-size, font-weight, start-indent and end-indent, text-align, line-height, and color (text color). Common non-inheritable properties are space related (space-before, space-after), keeps, breaks, borders, padding and background, and positioning properties.

The following example (fig.propex) shows the use of some common properties.

Example showing use of common properties
<flow flow-name="xsl-region-body" font-family="sans-serif">
  <block>
    Hello, world
  </block>
  <block font-size="14pt" font-weight="bold"
         text-align="center" space-after="12pt" color="blue">
    This centered blue block in 14pt bold is the title
  </block>
  <block font-size="10pt" start-indent="1cm" background-color="yellow"
         padding="6pt" border="1pt" border-style="solid">
    This is a block of body text showing the use of indent, borders,
    padding and background color. There is a 1pt black (the default
    color) border separated from the text area by a padding of 6pt on
    all sides on a yellow background. The text of the paragraph is
    indented by 1cm on the start-side (left here) from the nearest
    reference area, which is the column in this case.
  </block>
</flow>

State of the art

Assuming that your appetite has been whetted by this glimpse at the possibilities XSL offers, this section will provide a brief overview of some of the tools available.

XSL-FO and print

The major interest in XSL-FO is using it to generate printed output. This is frequently desirable in on-line applications as well as in more traditional print-oriented applications. In fact, one of the major uses of FO processors is in servlets which generate PDF and return it to browsers for printing with the Acrobat plug-in.

There are both commercial and open-source XSL-FO processors available. Given that the specification has only recently been finalized, it is understandable that none of the available tools offer complete implementations. Short descriptions of the available tools follow (in alphabetical order).

Antenna House XSL Formatter

This is a commercial product with both standalone and server-based pricing models. The current version is 1.1E, released on April 9, 2001. Antenna House is based in Japan, and their formatter is the only tool which supports the tb-rl writing mode for CJK text. The product runs only on Windows and requires version 3 of the Microsoft MSXML processor. It can be used both from the command line and via a COM interface. See the website at for more information.

Apache FOP

FOP is an Apache open source project, based on original code by James Tauber. It is written in Java and implements a complete formatting processor. The version as of October 2001 is 0.20.2. FOP has fairly complete support for the expression language and partially supports most of the major formatting objects, but is rather weak in overall page layout logic involving keep conditions and floats. It supports PDF, PostScript, PCL, MIF and plain text rendering, as well as a Java AWT preview function. However, the renderers offer varying degrees of functionality, with PDF and PCL being the most complete. FOP uses Apache Batik for its SVG processor. FOP can be used either directly with XML files in the FO namespace or with input XML and XSL files. It uses the javax.xml.transform pluggable interface so that it may be used both with any compatible XSLT processor when doing transformations. It may be used either as a command line application or embedded in an application or a servlet. FOP is integrated into several other projects, such as Cocoon and X-Smiles. For more information, visit the website at .

PassiveTeX

PassiveTeX is a library of TeX macros developed and maintained by Sebastian Rahtz. The input to the macro package is an XML document in the XSL-FO namespace, which can be generated by any other tool. Rahtz describes PassiveTeX as a rapid development environment for experimenting with XSL FO, using a reliable pre-existing formatter. In addition to a modern TeX installation, PassiveTeX depends on the xmltex package written by David Carlisle. PassiveTeX includes native support for the MathML vocabulary. For complete information on what parts of the XSL specification are supported, visit the website at .

RenderX XEP

XEP is a commercial formatting object processor available from RenderX, Inc. The version as of October 2001 is 2.5. It is written in Java and implements a complete formatting processor which can produce PDF and PostScript output. It offers support for most of the print-oriented formatting objects defined in the XSL Proposed Recommendation, with limitations in certain cases. It has limited support for the XSL expression language and very limited support for SVG when used in the instream-foreign-object FO. See for complete information concerning the capabilities, evaluation version and commercial version of XEP.

XSL-FO and browsers

There seems so far to be little interest in using XSL-FO directly in browsers, despite the fact that certain aspects of the specification are aimed at creating style sheets for online use. One might evoke a number of reasons for this: processing power, browser politics, and the pervasive use of XSLT to transform XML into HTML on either the server or client side. This section briefly imagines why one would want to have an XSL-capable browser and what would be involved in building one.

Motivation

One motivation is to be able to get original XML data closer to the destination and thus to be able to process it in intelligent ways without going back to the server to retransform it into new HTML. In some cases, simply applying CSS formatting directly to the XML may offer sufficient functionality. In other cases, the transformation capability of XSL is necessary to achieve the desired result, particularly if one wants to produce a tabular representation of data.

An XSL-enabled browser would also make it possible to achieve correct printed results on the client side based on the original XML and a print-oriented stylesheet. The current alternative is to produce PDF on the server using an XSL-FO engine and then send that PDF to the client. The PDF file is almost always larger than the input XML and XSL files, which is problematic on slow networks.

Some technical issues

XSL-based layout is currently quite an expensive operation, involving parsing of the original XML document, transformation and then formatting of the resulting document structure. When using a streaming architecture the memory requirements can be quite large. When using XSL in the browser view, it would have to be possible to dynamically recompose the entire formatting object hierarchy whenever the window size changed. On the plus side, it is considerably easier to lay out a single conceptually infinite column (or galley ) of text rather than dealing with page and column-breaking issues, floats and so on. For example, in a browser view, out-of-line objects could be shown in pop-up windows.

In order to enable dynamic scripting access to the DOM of the original document, the areas creating by the formatting process must maintain references to nodes in that tree. Although the transformation process clearly makes this more complicated than, for example, simply applying CSS formatting to XML objects, it does not make it technically impossible. This depends on coordination between the various XML processing tools involved; current public interfaces do not provide sufficient information to implement such coordination.

A first attempt: X-Smiles

The open source project X-Smiles is creating an XML browser in Java. The X-Smiles project has ambitious goals. It aims to support XSLT and XSL-FO processing as well as SVG and SMIL , while continuing to maintain Java 1.1 compatibility and a small footprint! This project currently (version 0.3) uses a slightly modified version of the Fop AWT rendering backend to support the XSL formatting specification in its browser. X-Smiles also provides EcmaScript capabilities in its browser, but the current architecture makes that impossible for the moment for the XSL-FO part.

Looking forwards

As we've seen, XSL-FO is still largely uncharted territory. Assuming that the specification will be blessed as a W3C Recommendation in the near future, final barriers to the development and use of XSL processors for print media should fall. There is already high interest in the user community as evidenced by the number of "newbie" questions in fora such as the Fop mailing list. The fact that users are willing to struggle with incomplete products and changing specifications for months at a time would seem to indicate a pressing need for producing high-quality print output from XML.

A final thought: those users are located all over the world. Even in today's "globalized economy", the printed word is still a primary vector of communication. The XSL specification gives users everywhere the tools to write stylesheets for their XML documents. But in order to realize XSL's promise of writing-mode neutral formatting, XSL processors need to address those features of the specification more fully than they do today. And tomorrow, perhaps the browsers will follow suit!


Bibliography

[CSS] Cascading Style Sheets, level 2. W3C Recommendation, 12 May 1998.
[DSSSL] Document Style Semantics and Specification Language (DSSSL). International Organization for Standardization, International Electrotechnical Commission. ISO/IEC 10179:1996.
[XSL] Extensible Stylesheet Language (XSL), Version 1.0. W3C Proposed Recommendation, 28 August 2001.
[XSLT] XSL Transformations (XSLT), Version 1.0. W3C Recommendation, 16 November 1999.
  Table of contents Author City Company Country State/Province Term Interchange