Abstract
The abstract was not available at the time the proceedings were created. Please check an updated version of the paper abstracts at the conference proceedings web site.
Keywords
Table of Contents
Although it was developed specifically to support such transformations of XML documents as are generally necessary for the production of output-oriented formats such as XSL-FO or HTML[1], XSLT has become an extremely popular technology for all kinds of processing of XML, largely due to the ease with which XSLT transforms or “stylesheets” are created and maintained. (Its low cost — XSLT costs only the time it takes to learn and apply it — is another reason.) A declarative and side-effect free (functional) language[2], XSLT benefits from a template-driven processing model, operating over a parsed tree representation of an XML document, which makes it straightforward to create transformations as long as they fall into the category of “down-conversions”. That is to say, as long as all the information a transformation needs to operate is explicit in document tagging, XSLT is probably well-fitted for the job. “Up-conversion” — that category of transformation (such as the conversion of arbitrary streams of plain text, or often, text in proprietary encoding formats, into XML), which requires that a computer program infer or interpolate information that is not explicit in the tagging — is a job generally outside XSLT's scope.[3] But down-conversions are not limited to the straightforward translation of a document's codes into one or another output format. All kinds of useful transformations can be designed that work with the document tagging in its current state; and XSLT is a very suitable tool for building transforms that aid in editorial work, by reflecting or reporting on a document's tagging so that a human editor, or editorial assistant, can intervene to make changes. This is the idea behind an entire class of stylesheets, which this paper will explore.
Necessarily, the boundaries around this kind of application are blurry. It may be said to overlap with document validation, at least insofar as “validation” is not taken (in its technical sense) to confine itself to the sorts of structural validation, or lexical datatyping, provided by schema technologies such as DTDs or W3C XML Schema, but rather to include any type of automated checking to see whether a document conforms to a set of formally-expressed constraints.[4] Likewise, many or most production workflows include passes to normalize data, for example providing for dates or names to be accessible in canonical forms: and these can sometimes be provided (at least up to a point) by automated transformations. The kind of transformation described here is something between these, and between the development of either front-end interfaces such as display stylesheets for authoring, or forms for data entry, and back-end production of final products. What distinguishes these is that they are transformations that are not aimed at automating a job entirely, but rather, more modestly, to put automated processing to work easing the labor of human beings, who have to step in to complete editorial adjustments that are beyond the machine's unaided capabilities.
Three different types of quality-checking transforms are described below: false color proofs; filters to present document fragments known to be of special interest or concern; and filters that contain enough heuristic analysis to detect probable problems (a category I am calling “soft validation“ since it verges on formal constraint-checking). These categories too blend into one another: yet one example of each is sufficient to demonstrate the range of XSLT's capabilities. Given these examples, the reader can undoubtedly come up with more ideas along similar lines, or others.
Any of these transformations can be designed and deployed in several ways, as XSLT is designed to support several different forms of output. HTML output is very useful for the presentation of material on screen, and its hypertext capabilities can sometimes also be made good use of. In some workflows, output to a print-ready format such as PDF (by way of XSL-FO) is very welcome, allowing human editors to work with the convenience and flexibility of paper and pencil. Plain text output can be useful for “quick and dirty” work; finally, XSLT transforms can generate XML, which in some cases can even support “round-tripping” into as well as out of the main flow of document processing.
Additionally, the techniques described here work over XML input in several forms. Sometimes (as in the false-color proof example) a stylesheet will apply to a single document; at others, it may be more convenient to apply a transformation to an aggregated set of documents at once[5]. XSLT is also neutral as to whether documents are stored in a file system or in an XML database or Content Management System (CMS).
The term “false color proof” is derived by analogy from cartographic or other image-oriented applications in which it is sometimes useful to present an image in “false colors”, to distinguish features ordinarily difficult to make out. A false color proof is basically a rendering of the document that is optimized for checking document tagging. A stylesheet that creates such a rendering is the next thing over from a specialized stylesheet for working with XML documents in dedicated editors such as Corel XMetaL, Arbortext Adept and so forth which can provide not What you see is what you get (WYSIWYG), but What you see is what you need to see (WYSIWYNTS). Since false-color proofs can be designed for and rendered on paper as well as on screen, they are very suitable for a close, hands-on editorial review of document tagging as well as the usual things a copy editor typically may check.
A screenshot of a false color proof, produced by applying a stylesheet to a draft of this document, appears in Figure 1. False color proofs are particularly useful for data sets containing content-oriented, descriptive markup. When markup is presentation-oriented, the final output may be all the false-color proof you need.

A useful false color proof may, like this one, actually present glosses or flags to identify the type of tags appearing in the document. Here, the proof may be useful to alert a copy-editor to a number of problems, including not only a missing bio, but also possible erroneous tagging of elements such as the <code> element containing the string “raw XML”. Inspecting such a proof is, if the presentation is well designed, easier than inspecting raw code.
Figure 1. A false color proof
Particularly when document tagging is descriptive of data elements, rather than presentational, it is very important to get the tagging right. This can also be a difficult thing to check, particularly in cases where several kinds of data elements get similar kinds of rendering, or no rendering at all. For example, tag sets to describe bibliographic records commonly provide for elements to mark up cities, states, postal codes, and so forth. A false color proof is a big help, but sometimes one needs to go further. XSLT makes it quite simple to create filters that extract all the elements of a given type, for example listing all the things tagged as cities, all the things tagged as countries, e-mail addresses and so forth. Such a list can then be inspected, to reveal where Philadelphia is described as a country.
The example of such a filter provided here goes one step further than the simplest case. (See Figure 2.) Running over a repository of documents (as it happens, papers for IDEAlliance conferences), it filters out and lists all the authors whose bio elements have no data content. (It may come as a shock for readers to learn that editorial staff for a conference proceedings do commonly have to check whether authors have all submitted their required bios.) Since output is in HTML, where an author gives an email address, a link can be generated so an editor can easily connect from this display to an email client, to send email to the delinquint party. Following a listing of authors with no bios, those bios that appear can be listed, for closer scrutiny of their length and content.

The XML input to the transform that generates this HTML page is just a listing of documents to be checked; the XSLT document() function can then retrieve the documents in question for checking.
In this demonstration, the documents polled included an early draft of this paper, plus several papers delivered at Extreme Markup Languages 2003, a sister conference (held in Montréal in August 2003) of XML 2003. Since they are marked up in a tag set closely related to the tag set used for this conference, the same filter can be used to process both types of paper.
Figure 2. A filter running over a set of documents
Other examples of valuable filters might be a list of all acronyms, reporting which are provided (or not), with expansions (or checking whether an expansion appears the first time a given acronym is used), the checking of any metadata or “semi-structured” information such as bibliographies, or any content tagging or tagging that is especially prone to tag abuse. Filters are thus something between the two other categories described in this paper: like false color proofs, they simply present data in an optimized form for inspection. They also, however, provide a framework in which logic for testing and inspection can be embedded (as occurs in a couple of examples cited), pulling them towards “soft validation”. What recommends filters in practice is largely their combination of power with ease of development and use.
I describe as “soft validation” a sub-species of filters that goes beyond simple presentation, to provide a kind of heuristic analysis of data elements, reporting any elements that fail to conform to certain constraints. Such a stylesheet might, for example, provide for “authority control” (in the librarian's sense), checking whether values that appear in the data also appear in a controlled list. For example, XML 2003 stipulates a list of values that are permitted to appear in the keyword elements in the metadata; a stylesheet could check whether keywords actually used in papers received were among the controlled set.
The example provided here checks code.block elements in XML 2003 papers for line length. Since code blocks are presented with white space preserved, line length is important; the guidelines for tagging papers for XML 2003 state that a maximum line length of 70 characters is safe. A stylesheet can be used to identify and report back with any code blocks that contain longer lines.
Ironically, since this is a paper that must itself conform to these rules, the output of this transformation cannot be included here (any code blocks the stylesheet returns will break the 70-character rule for XML 2003 papers); the code that generates it, however, appears in Section 5.3, “Example 3: A reporter of code.block elements with long lines”. One feature of this stylesheet is that it returns the code block elements in a form that can be edited and pasted back into the document over the erroneous version.
Similarly, soft validation can be used to find things like
Elements with no content (p elements used for whitespace!)
Misplaced elements such as footnotes in bibliographic entries (maybe your DTD isn't quite tight enough)
List items that do not begin with a capital letter
Date values that do not fall within a reasonable range. (Or: do death dates always follow birth dates?)
Whether cross-references appear and/or resolve properly. (Are all figures referred to with a cross-reference in the text?)
This stylesheet is engineered as a “wrapper” stylesheet that may be used in conjunction with any display stylesheet for this data that creates HTML. Only elements to be marked with false colors or outlines require templates here; all other formatting is handled by the imported stylesheet.
The HTML output produced by this stylesheet is depicted in Figure 1.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="../DTD/gca2html.xslt"/>
<xsl:template match="acronym">
<xsl:call-template name="false-color">
<xsl:with-param name="bgcolor" select="'lightsalmon'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="expansion">
<xsl:call-template name="false-color">
<xsl:with-param name="bgcolor" select="'plum'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="big">
<xsl:call-template name="false-color">
<xsl:with-param name="bgcolor" select="'gold'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="i">
<xsl:call-template name="false-color">
<xsl:with-param name="bgcolor" select="'khaki'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="b">
<xsl:call-template name="false-color">
<xsl:with-param name="bgcolor" select="'rosybrown'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="code">
<xsl:call-template name="false-color">
<xsl:with-param name="bgcolor" select="'silver'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="affil | subaffil | aline | city | cntry |
province | fname | surname | suffix |
jobtitle | email | web">
<xsl:call-template name="outlined"/>
</xsl:template>
<xsl:template name="false-color">
<xsl:param name="bgcolor" select="'lightgrey'"/>
<span style="background-color:{$bgcolor}">
<xsl:text> </xsl:text>
<span style="font-family: sans-serif;
font-size: smaller;
font-weight: bold;
font-style: normal;
color: midnightblue;
vertical-align: super">
<xsl:value-of select="local-name()"/>
</span>
<xsl:text> </xsl:text>
<xsl:apply-imports/>
</span>
</xsl:template>
<xsl:template name="outlined">
<span style="border: 1px dotted black; padding: 0em">
<xsl:text> </xsl:text>
<span style="font-family: sans-serif;
font-size: smaller;
font-weight: bold;
font-style: normal;
color: midnightblue;
vertical-align: super">
<xsl:value-of select="local-name()"/>
</span>
<xsl:text> </xsl:text>
<xsl:apply-imports/>
</span>
</xsl:template>
</xsl:stylesheet>Since the bio element is required, a valid XML 2003 paper will always have one. This does not, however, guarantee that the bio element provided in a document actually contains a biography.
The stylesheet below operates on input that takes the form:
<?xml-stylesheet type="text/xsl" href="authorbios.xsl"?> <dir> <file>EML2003Sperberg-McQueen02.xml</file> <file>EML2003StLaurent01.xml</file> <file>EML2003Tennison01.xml</file> <file>examples.gca.xml</file> </dir>
An inspection of the stylesheet reveals that most of it is given to mundane HTML formatting of the output; the file aggregation and filtering is achieved by variable bindings and the simple selection for processing of the node sets bound to those variables.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="fileset" select="document(//file)"/>
<xsl:variable name="fixers"
select="$fileset//author[not(normalize-space(bio))]"/>
<xsl:variable name="okays"
select="$fileset//author[normalize-space(bio)]"/>
<xsl:template match="/">
<html>
<head>
<title>Bio listing</title>
</head>
<body>
<h2>Documents polled:</h2>
<ul>
<xsl:apply-templates select="//file"/>
</ul>
<xsl:if test="$fixers">
<div>
<h1>No bio present for:</h1>
<ul>
<xsl:apply-templates select="$fixers" mode="fixme">
<xsl:sort select="surname"/>
</xsl:apply-templates>
</ul>
</div>
</xsl:if>
<xsl:if test="$okays">
<div>
<h1>Biographies collected for:</h1>
<xsl:apply-templates select="$okays" mode="fine">
<xsl:sort select="surname"/>
</xsl:apply-templates>
</div>
</xsl:if>
</body>
</html>
</xsl:template>
<xsl:template match="file">
<li>
<xsl:apply-templates/>
</li>
</xsl:template>
<xsl:template match="author" mode="fixme">
<li>
<xsl:apply-templates select="fname|surname"/>
<xsl:apply-templates select="address/email"/>
</li>
</xsl:template>
<xsl:template match="author" mode="fine">
<div>
<h3>
<xsl:apply-templates select="fname|surname"/>
<xsl:apply-templates select="address/email"/>
</h3>
<xsl:apply-templates select="bio"/>
</div>
</xsl:template>
<xsl:template match="fname|surname">
<xsl:if test="preceding-sibling::fname">
<xsl:text> </xsl:text>
</xsl:if>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="para">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="email">
<xsl:text> [</xsl:text>
<a href="mailto:{.}">
<xsl:apply-templates/>
</a>
<xsl:text>]</xsl:text>
</xsl:template>
</xsl:stylesheet>The HTML output produced by this stylesheet is depicted in Figure 2.
This stylesheet simply echoes back, in XML, any code.block elements in the source document that have strings longer than 70 characters without line breaks. Since XSLT's string handling is only rudimentary, this template works “low to the ground” it calls itself recursively to “chop” the string value of the code block down piece by piece at line breaks. If a string is found longer than 70 characters, the template copies the code block to the result, where it can be corrected and pasted back into the source.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" cdata-section-elements="code.block"/>
<xsl:param name="maxlength" select="70"/>
<xsl:template match="/">
<xsl:apply-templates select="//code.block" mode="checkup"/>
</xsl:template>
<xsl:template match="code.block" mode="checkup" name="lengthcheck">
<xsl:param name="string" select="string(.)"/>
<xsl:variable name="thisline">
<xsl:choose>
<xsl:when test="contains($string,'
')">
<xsl:value-of select="substring-before($string,'
')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$string"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="found"
select="string-length($thisline) > $maxlength"/>
<xsl:if test="$found">
<xsl:text>

</xsl:text>
<xsl:copy-of select="."/>
</xsl:if>
<xsl:if test="contains($string, '
') and not($found)">
<xsl:call-template name="lengthcheck">
<xsl:with-param name="string"
select="substring-after($string,'
')"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>Thanks to the gang at Mulberry Technologies for helping me develop and refine these ideas, in particular to Tommie Usdin, some of whose refinements I have simply borrowed.
[XSLT 1999] Clark, James, ed. XSL Transformations (XSLT). Version 1.0. W3C Recommendation 16 November 1999. On line at http://www.w3.org/TR/xslt
[Schematron 1999] Jelliffe, Rick. The Schematron: An XML Structure Validation Language using Patterns in Trees. On line at http://www.ascc.net/xml/resource/schematron/schematron.html
[Examplotron 2003] Van der Vlist, Eric, ed. Examplotron. 3rd February 2003. On line at http://examplotron.org/
[XPath 2003] Malhotra, Ashok, Jim Melton and Norman Walsh, eds. XQuery 1.0 and XPath 2.0 Functions and Operators. W3C Working Draft 02 May 2003. On line at http://www.w3.org/TR/xpath-functions/
[1] Having made clear that XSL includes “an XML vocabulary for specifying formatting” (XSL-FO), the XSLT Recommendation makes explicit that “XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL”. [XSLT 1999], Abstract
[2] See [Kay 2001], pp. 14-16 and 36-41.
[3] The next version of the language, currently in draft, will have considerably stronger capabilities for up-conversion than the current standard. See especially [XPath 2003].
[4] For example, XPath-based validation technologies such as Schematron[Schematron 1999] or Examplotron[Examplotron 2003] can be put to work to provide some of the same kinds of operations as these stylesheets demonstrate.
[5] Using the XSLT document() function, as demonstrated in the example.
![]() ![]() |
Design & Development by deepX Ltd. |