Consistent Electronic Publishing from Inconsistent Sources

Keywords: Application architecture, Content Management, Content Repurposing, Conversion, DocBook, Electronic Publishing, Integration, Java, Knowledge Management, Ontology, PDF, Publishing, SVG, XML, XSL-FO, XSLT

Dr. Philip Mansfield
President
SchemaSoft
Vancouver
British Columbia
Canada
philipm@schemasoft.com

Biography

After receiving his Ph.D. in Mathematical Physics from Yale University in 1989, Philip spent a year as Assistant Professor of Physics at Knox College, followed by four years as Assistant Professor of Mathematics at the University of Toronto. His background in Differential Geometry and in computer modelling of physical phenomena served as unorthodox preparation for his subsequent move into industry as a Software Engineer with an emphasis on Computer Graphics. By 1997 Philip was in charge of a software research team creating early Web technologies based on HTML, XML, CSS and Java. Philip now lives and works in Vancouver, Canada, where he is President of SchemaSoft (http://www.schemasoft.com/), a software development consulting company he co-founded in 1999. He is an Advisory Committee Representative of the World Wide Web Consortium (http://www.w3.org/), and has been a member of the W3C Scalable Vector Graphics Working Group (http://www.w3.org/Graphics/SVG/) since its inception in 1998. Philip is Chair of the BC Advanced Systems Institute International Scientific Advisory Board (http://www.asi.bc.ca/). He is also a Director of the Vancouver XML Developers Association (http://www.vanx.org/), an organization that he co-founded in 2000. He regularly writes and lectures on topics related to software engineering, XML and SVG.

Dr. Yuri Khramov
Director of Development
SchemaSoft
Vancouver
British Columbia
Canada
yurik@schemasoft.com

Biography

Yuri Khramov has more than 20 years of experience in the software industry; he is involved in XML and other Web technologies for more than 5 years. He is one of the founding partners of SchemaSoft. Prior to that, he worked at Paradigm Development Corp. in Vancouver, Canada Graphica Corp. in Tokyo, and several industrial and Academic institutions in Moscow. He holds a Ph.D. in Computer Science from Moscow Management Institute. Yuri is a co-director of Vancouver XML Developers Association.

Ahmet Gurcan
Senior Developer
SchemaSoft
Vancouver
British Columbia
Canada
ahmetg@schemasoft.com

Biography

Ahmet Gurcan completed his BSc. in Electrical and Computer Engineering at Istanbul Techical University in 1997, and M.A.Sc. in the same field at University of British Columbia in 2000. While he was pursuing his Master's Degree, he also worked as a research and teaching assistant. His topics of interest were real-time operating systems at that time, and he built a dual-processor real-time system mainly used to control machines and robots. Since his graduation, he has been working for a Vancouver based software company, Schemasoft, a leader in file converters and XML technologies. He worked in several projects that involve manipulation of PDF file format, and its integration with XML and conversion to SVG.


Abstract


Widespread, consistent use of XML to encode documents throughout an organization has well-understood advantages: content can be re-purposed, re-styled, searched, combined, transformed, rendered or otherwise processed with ease, and pre-existing software can be highly leveraged when providing a solution.

However, in the real world people do not consistently adhere to imposed standards on the use of software, and consistent software is not installed throughout large organizations. It is unrealistic to expect that an organization's documents will be widely accessible as XML. To make matters worse, many standard tools for conversion to XML operate at inconsistent semantic levels, or encode an inappropriate semantic level. To illustrate this point, one could easily convert all electronic documents to XML with a common schema by opening the documents in their originating applications, taking screen shots, and encoding these bitmaps using a long list of <pixel> elements. However, this encoding would be useless for a content management system, or for anything other than re-rendering the same view of the same documents on the same device, for that matter.

We address the problem of large-scale conversion of heterogeneous content to XML, with an emphasis on the conversions needed to produce useful and compatible semantic representations from source files in various formats. We then discuss the applicability of these ideas to content management systems and electronic publication workflows.

An improved architecture for content management systems arises from the discussion. A wide variety of content management applications can be achieved simply by building appropriate pipelines of converters to, from and between XML languages. This is a lightweight, flexible approach that does not depend on a proprietary content management server because it leverages XML-processing functionality already included in operating systems and Web servers. To make this approach practical, one needs a large and varied toolkit of converters as well as standard pipeline architecture with which to connect them. We will provide a demonstration of such a rich toolkit that takes advantage of Apache Cocoon as the pipeline architecture. As an example, we will show how our toolkit was applied to the paper publication workflows of this conference.

Cocoon pipeline definitions are created in XML "sitemap" files, allowing new processes to be configured without writing code. In order to determine what converters are needed to make useful pipelines, it is helpful to regard each standard document-processing function as a conversion. In particular,

• retrieval of content from a standard desktop application file is a binary to XML conversion

• reconstruction of semantics implied by styling is possible with a profile-driven XML to XML conversion

• data-driven graphs, charts and maps are made possible by XML to SVG conversions

• documents can be prepared for print via XML to XSL-FO to PDF conversions

Similarly, database query, Web services, content aggregation, search, classification, Web publishing and e-book publishing can all be regarded as arising from specific converters in the content management toolkit. When effectively combined, such converters can produce powerful and sophisticated solutions to real-world problems.


Table of Contents


1. XML in Content Management and Publishing
2. Single-format Fantasy
3. Up-Conversion
4. Multi-format Solutions
     4.1 Requirements
     4.2 Use Cases
          4.2.1 Data-Driven Graphics
          4.2.2 Online Newspapers
          4.2.3 Conference Proceedings
     4.3 Architecture
          4.3.1 XML Pipelines
          4.3.2 Conversion Components
          4.3.3 Processing Phases
               4.3.3.1 Extract Patterns
               4.3.3.2 Synthesize Patterns
               4.3.3.3 Publish Patterns
5. Results
     5.1 Data-Driven Graphics
     5.2 Online Newspapers
     5.3 Conference Proceedings
Appendix 1. Indexing XML Content
Acknowledgements
Bibliography
Footnotes

1. XML in Content Management and Publishing

There are many reasons to use XML in content management and publishing, among them the following:

Advantages of XML

Since XML is basically just a set of conventions for encoding information, you might wonder how it can offer so many advantages. This has much more to do with the large community of users than with the specific choice of syntax with which to encode information. Along with a large community of users comes a large amount of software that follows the XML information-encoding conventions, and this is the basis for many of the listed advantages of XML.

Two of the traits that have allowed XML to achieve a large community of users are:

Ironically, these same two traits determine limits on the utility of XML-processing software:

The current paper addresses the problem of overcoming these limitations in XML-based content management and publishing systems.

2. Single-format Fantasy

A naïve information systems manager, enamoured with the advantages of XML, might think:

If I can just get my whole organization using the same XML-based authoring tools in the same way, then I will be able to build the ultimate enterprise-wide content management solution.

However, it is difficult to find any examples of success with this approach. The problem is, you can overhaul software installations but you cannot overhaul people.

Every division of a large corporation tends to act autonomously, making its own technological choices on its own schedule. Even within a division, there is the problem of getting employees to act in unison by following rigid authoring guidelines. People do not always read instructions, let alone follow them. And if software is really supposed to increase their productivity, then it should make things easier for them, not harder.

Furthermore, native authoring tool formats continue to be binary, not XML. This is because tool vendors are motivated to preserve market share by making it difficult to migrate, and proprietary binary formats tie users to the tool that knows how to read and write them.

Finally, there is the problem of rapidly-advancing software technology. By the time an enterprise has implemented a technology overhaul, there is already something better to replace it. Forward-thinking solutions should anticipate this in advance, one aspect of which is to accommodate future inclusion of as-yet-unknown file formats.

3. Up-Conversion

In addition to the problem of dealing with many source formats, there are often difficulties with the information being represented by these source formats. Popular document formats do not always encode structure at a useful semantic level. For example, an HR department wants name, address, past jobs, degrees, publications, etc. from résumés. Yet these categories of text are indistinguishable in the submitted word-processing documents. Likewise, spreadsheet document formats may encode row and column information, but not categories like cost, revenue, assets, date and company name for a quarterly financial report; or categories like transportation, lodging and per-diem for a travel expense report. PDF documents contain instructions for drawing absolutely-positioned text and figures, but do not specify what collection of text and figures makes up a single article in a magazine; what is a title, subtitle, author, side note, glossary term, vignette or reference; or what collections constitute an advertisement, editorial or table of contents.

Up-conversion is needed. This is independent of binary to XML conversion. Binary to XML conversion is typically just a change of syntax, in which binary-encoded objects become XML elements, attributes and text. However, up-conversion is a re-construction of semantic structure, in which content is tagged with higher-level semantic categories than were available in the original markup. Automatic up-conversion is often feasible within a given collection of documents (such as résumés, financial reports, expense reports or magazines in the above examples), but since there are so many different kinds of document collections, it is important to have a general way of profiling a given collection of documents, or encoding the rules for up-converting that collection.

4. Multi-format Solutions

4.1 Requirements

To solve the problems discussed in the foregoing sections, it will be necessary to meet these high-level requirements:

Requirements of an effective content management solution
  1. It should not assume anything about source formats.
  2. It should be highly flexible, adaptable and re-configurable over time.
  3. It should have strong up-conversion capabilities.

4.2 Use Cases

Here are some use cases to bear in mind while coming up with a general architecture for content management. All are projects recently completed by SchemaSoft.

4.2.1 Data-Driven Graphics

A business reporting system requires graphs and charts to be drawn on the fly from current data. Data is variously available as Microsoft Excel files, database tables and XML. The structure or schema of the data is also variable within each of those formats. It has to be possible to quickly define and hook up a new data source without additional programming or modifications to the source code.

4.2.2 Online Newspapers

Articles from newspapers around the world are to be automatically published as HTML pages. Newspapers are available in PDF format. Sections, articles, article continuations across pages, titles, bylines, figure captions, advertisements, etc. are to be identified in the source PDF. Each PDF file is to give rise to many HTML files, one for each article. The PDF file is to be augmented with hyperlinks to the HTML page corresponding to each article. The articles are to be indexed by title, author, section, etc.

4.2.3 Conference Proceedings

The XML 2004 conference papers are to be published as HTML, PDF and SVG. Source documents are Microsoft Word files and DocBook [DB] XML files with links to SVG, PNG, GIF, JPEG and BMP files. HTML index pages are to be constructed listing the papers by author, city, country, keyword, organization, time, title and track. Author biography pages, paper abstract pages and other conference information pages are to be derived from data. Cross-reference hyperlinks are to be constructed wherever applicable — for example, from index entries to paper abstract pages; from paper abstract pages to papers, author biographies and companies; from author biographies to companies and abstracts of papers written by the author; etc.

4.3 Architecture

Our approach is to assemble content management solutions from a toolkit of useful components, rather than by configuring a monolithic application. As long as component APIs permit virtual plug-and-play, this approach is inherently more flexible, and better able to accommodate frequent change in data sources, formats, schema, content management functions, publishing targets, and publication styling.

A pipeline architecture is used to manage data flow and order of execution of components. Pipeline definitions determine how the output stream(s) of one component feed the input stream(s) of other components. XML is normally passed between components, although the schema of each XML stream is dependent on the nature of the components that pass it.

The primary function of a component is to convert data from one form to another. For example, a binary document might be parsed and mapped to corresponding XML, an XML data set might be sorted without changing the schema, or an XML document might be up-converted to a schema with more specialized structure.

4.3.1 XML Pipelines

The Apache Cocoon project provides one possible framework for pipelining conversion components. Cocoon is specially tailored for Web publishing, since the pipeline implementation is integrated with the Web server. Specifically, Cocoon is implemented as a Tomcat servlet.

Pipelines are defined in sitemap files, which are written in an XML grammar. A useful feature of Cocoon is that pipelines can be triggered by URL wildcards. For example, one can make a rule that if the URL requested by a client Web browser ends in .doc, then that document is sent through a pipeline that first converts it from Microsoft Word to XML, then styles it using XSLT to HTML+CSS. Much more complex pipelines are also possible, such as ones that depend on user profile, session information, or Web service calls.

Components are classified into those that can generate, transform or serialize XML. In data conversion terminology, this means any-to-XML, XML-to-XML, and XML-to-any conversions, respectively. In general, Cocoon components are defined in Java classes, but they may take parameters that utilize other languages. Of particular interest is the XSLT transformer, since it can take an XSLT stylesheet as a parameter. When designing a pipeline, one's strategy is usually to convert incoming documents and data to XML at the first possible opportunity, and to do XML to XML transformations thereafter. The XML to XML transformations are normally done with the XSLT transformer.

NOTE: A Cocoon-based content management system called Lenya is also available from the Apache Software Foundation. Although we used Cocoon for the pipelines in one of the examples of this paper, we did not use Lenya.

Various other pipeline technologies are possible, each appropriate for a different kind of application. A lightweight but platform-specific approach is to use batch files. A developer-centric approach is to use Ant build files, which are XML files that encode build instructions for each target. Since pipeline definitions can themselves be produced by running part of the pipeline, it is useful to choose an XML syntax.

4.3.2 Conversion Components

To meet the objective of supporting a range of formats and schemas, it is good to start with a collection of ready-made readers, writers and translators of popular file formats. To handle a range of intermediate processing tasks, one needs a collection of utility XSLT programs as well. An example of such a utility XSLT program is given in Appendix 1. However, the real power comes from being able to rapidly produce new components that handle new formats or new intermediate processing tasks, in order to deal with typical scenarios in which technology and industry requirements change often. This requires rapid application development kits tuned to the problems of format translation and XSLT development. Such RAD kits have been presented in previous papers [Trans] and [Cat], and have been used to implement the content management solutions discussed herein.

XSL (XSLT + XSL-FO) is an effective language for specifying page layout in print or e-book publishing solutions. This is the standard use of XSLT, and can be implemented by connecting an XSLT translator component to a serializer component that does the formatting, with an appropriate target serialization such as Adobe PDF or IBM AFP.

However, XSLT translators are capable of much more: they can generate arbitrary data visualizations. In a previous paper [GS], we introduced the notion of Graphical Stylesheets; XSLT programs to draw data as SVG. More recently [SVG-XAML] we discussed strategies to target multiple vector graphic output formats, including Microsoft's XAML, from the same Graphical Stylesheets. Starting with XML formats as varied as MathML (Mathematical Markup Language), XBRL (eXtensible Business Reporting Language), GML (Geographic Markup Language) or X3D (eXtensible 3D), we have used Graphical Stylesheets to render diagrams of the data. Specific examples are elaborated in [DWGraphs], [SVGMaps] and [3D-SVG].

4.3.3 Processing Phases

In a typical content management solution, data flow pipelines can be roughly divided into the following phases of processing:

Extract
Import all documents and data to XML
Synthesize
Up-convert and transform to a more useful form of XML
Publish
Export for the Web or print

The next three sections will show the most common patterns that occur in data flow diagrams at each of these phases. For this purpose, we will use the symbology shown in Table 1. The full data flow diagram for any particular content management solution will normally contain many of these patterns.

Legend for Data Flow Diagrams
Data Flow
RunFlow.png
Run-time data flow
DesignFlow.png
Design-time data flow
Compilation.png
Design-time compilation
Content Sources
BinaryDoc.png
Binary format document
XMLDoc.png
XML format document
Database.png
Database
Translation Components
Generator.png
XML generator
Transformer.png
XML transformer
Serializer.png
XML serializer
BatchProcess.png
Batch process

Table 1

4.3.3.1 Extract Patterns

Information is often available in binary format documents or database tables, and must be converted to XML first in order to participate in the XML conversion pipeline. Extract Pattern #1 and Extract Pattern #2 are the simple patterns in which a generator component extracts the information.

ExtractPattern1.png

Binary format document is converted to XML by a generator component.

Figure 1: Extract Pattern #1

ExtractPattern2.png

XML is generated from a database by a generator that takes a query parameter such as XQuery.

Figure 2: Extract Pattern #2

4.3.3.2 Synthesize Patterns

As discussed in Chapter 3, up-conversion is a crucial step in recovering the valuable information needed to drive content processing pipelines. Automating this process requires programmatic reconstruction of semantics from styling. This is possible for content created from a common style template, as discussed in [PDF2XML]. The rules that associate style with semantics are encoded in an XML file called a profile, and semantic reconstruction is done by a transformer that reads in both the input XML stream and the profile. This is Synthesize Pattern #1. Another application of semantic reconstruction is reported in [TestSpec].

SynthesizePattern1.png

XML is up-converted using a transformer that takes a profile as parameter.

Figure 3: Synthesize Pattern #1

Since XSLT is frequently needed for translation components, it is useful to have a GUI application for rapidly specifying and generating the XSLT. An example of such an application is Catwalk, as described in [Cat]. Catwalk has been deployed successfully to generate Graphical Stylesheet transformations, B2B transformations, and HTML reports. Synthesize Pattern #2 is the pattern for a generated XSLT transformer.

SynthesizePattern2.png

A generator component generates XSLT used by a transformer component. The generator reads in sample input XML files used at design time to specify the mapping.

Figure 4: Synthesize Pattern #2

XSLT is neither suitable nor efficient for transforming the DOM generated by typical binary format readers. Nonetheless, it is possible to rapidly develop C++ transformers by compiling a translation specification as described in [Trans]. The translation specification is XML adhering to the schema translation.xsd of that paper. Synthesize Pattern #3 shows an XML translation specification compiled into a transformer.

SynthesizePattern3.png

Compile an XML translation specification into a transformer.

Figure 5: Synthesize Pattern #3

Often the pipeline itself can be determined from data. An example is a batch process in which the URLs of the files to be processed are available as data. Synthesize Pattern #4 consists of a transformer generating a pipeline definition that converts a given collection of XML documents to a binary format.

SynthesizePattern4.png

Transformer output is a batch process description.

Figure 6: Synthesize Pattern #4

4.3.3.3 Publish Patterns

Once a document has been assembled and styled, it can be published in print form or for the Web. Publish Pattern #1 shows the output of an XSLT stylesheet being passed to an XSL formatter to produce PDF for print, and Publish Pattern #2 shows the output of another XSLT stylesheet being serialized as an XHTML or SVG file for the Web.

PublishPattern1.png

XSLT transformer generates XSL-FO which is formatted to PDF by a serializer.

Figure 7: Publish Pattern #1

PublishPattern2.png

XSLT transformer generates XHTML or SVG which is serialized to file.

Figure 8: Publish Pattern #2

5. Results

Below are Web references to the results of each of the three use cases introduced in Section 4.2.

5.1 Data-Driven Graphics

Our Catwalk application was used to generate the XSLT transformers from XML data to SVG graphs. Using Cocoon, we were able to extract this XML data from our Microsoft Excel generator component and a database query component. Extract Pattern #1, Extract Pattern #2, Synthesize Pattern #2 and Publish Pattern #2 were utilized.

5.2 Online Newspapers

The online newspaper publishing system is available in modified form from NewspaperDirect. The key challenge of this project was to perform profile-driven up-conversion on newspaper documents available as PDF. Thus, the solution makes critical use of Synthesize Pattern #1.

5.3 Conference Proceedings

The initial phases of content conversion for the XML 2004 conference are done by authoring tools made available to conference paper authors. For example, SchemaSoft provides a freeWord to DocBook Converter Web service that extracts from the Microsoft Word .doc binary format and synthesizes DocBook XML for conference submission. There are many other synthesize and publish steps leading to the indexed proceedings, such as the HTML paper publishing step achieved with our DocBook Styler XSLT.

The final result can be viewed at the IDEAlliance XML 2004 Proceedings site, including this paper in XML, XHTML, PDF and SVG forms.

Appendix 1. Indexing XML Content

Suppose you are constructing a typical index for a book. You would assign index terms to items such as pages or sections, and then list the terms at the end, in alphabetical order. Each term would be followed by a list of references to the places in which it occurred, such as page numbers. In the more general problem of indexing, the items can be anything (plant inventory, train trips, mayors, etc.) and the terms can be any properties of those items (available colours, departure times, hobbies, etc.) In the use case of Section 4.2.3, the items were papers and there were eight indices, with the terms author, city, country, keyword, organization, time, title and track.

Example 1 is a minimal DTD illustrating this idea. The XML contains an items element with item children, and an index element with term children. Each item element has a unique id attribute as well as any number of child elements that refer to the id attributes of its associated terms. Likewise, each term element has a unique id attribute as well as any number of child elements that refer back to the id attributes of its associated items.

<!ELEMENT crossref (items, index)>
<!-- items with reference to index terms -->
<!ELEMENT items (item*)>
<!ELEMENT item (termref*, content)>
<!ATTLIST item
  id ID #REQUIRED
>
<!ELEMENT termref  EMPTY>
<!ATTLIST termref
  ref IDREF #REQUIRED
>
<!ELEMENT content (#PCDATA)>
<!-- index terms cross-referenced to items -->
<!ELEMENT index (term*)>
<!ELEMENT term (itemref*)>
<!ATTLIST term
  id ID #REQUIRED
  name CDATA #REQUIRED
>
<!ELEMENT itemref EMPTY>
<!ATTLIST itemref
  ref IDREF #REQUIRED
>
      

Example 1: Cross-reference DTD

We will discuss the problem of writing XSLT to construct such a cross-referenced index from raw data. For our actual solutions, we have generalized the software so that it can handle multiple indices on input with arbitrary DTD, by parameterizing the XPaths used to fetch things like items and terms. However, for illustration purposes we will assume one index and the fixed DTD given.

The steps are to read in the raw data, assign IDs to items and every instance of a term in an item, separate the terms into an index table, construct the references and cross-references, sort the terms (which brings duplicate terms next to each other), and finally remove duplicate terms. These are multiple steps in a pipeline, and to keep things simple, we will restrict our attention to the last step only.

When eliminating duplicate terms, each termref IDREF has to be fixed up to point to the single term element that remains, and the itemref children of all duplicate terms have to be combined as children of the single term element that remains. Example 2 is the XSLT that eliminates duplicate terms according to this prescription.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <!-- This stylesheet removes duplicate terms in the index, combines the
       item references of duplicate terms, and resolves each IDREF to the
       ID of the corresponding retained term -->
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
  <!-- Start by copying the existing content -->
  <xsl:template match="/">
    <xsl:apply-templates mode="copy"/>
  </xsl:template>
  <xsl:template match="node()|@*" mode="copy">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" mode="copy"/>
    </xsl:copy>
  </xsl:template>
  <!-- Replace each IDREF with the unique retained target term ID -->
  <xsl:template match="item/termref" mode="copy">
    <xsl:copy>
      <xsl:attribute name="ref">
        <xsl:apply-templates select="id(@ref)" mode="lookup"/>
      </xsl:attribute>
    </xsl:copy>
  </xsl:template>
  <!-- Look up the ID of the first term among duplicates -->
  <xsl:template match="term" mode="lookup">
    <xsl:if test="not(@name=preceding-sibling::term[1]/@name)">
      <xsl:value-of select="@id"/>
    </xsl:if>
    <xsl:apply-templates mode="lookup"
      select="preceding-sibling::term[@name=current()/@name][last()]"/>
  </xsl:template>
  <!-- Retain only the first term among duplicates -->
  <xsl:template match="term" mode="copy">
    <xsl:if test="not(@name=preceding-sibling::term[1]/@name)">
      <xsl:copy>
        <xsl:apply-templates select="node()|@*" mode="copy"/>
        <xsl:apply-templates mode="combine"
          select="following-sibling::term[@name=current()/@name]"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>
  <!-- Combine the content of duplicates, including item back references -->
  <xsl:template match="term" mode="combine">
    <xsl:apply-templates mode="copy"
      select="itemref[not(@ref=../preceding-sibling::*[1]/itemref/@ref)]"/>
  </xsl:template>
</xsl:stylesheet>
      

Example 2: XSLT to Make Index Terms Unique

Acknowledgements

The authors would like to thank conference organizer Lauren Wood for giving them the opportunity to apply the methods outlined in this paper to the problem of creating the XML 2004 conference proceedings, and for assisting in this procedure by pre-processing the data used as input.

Bibliography

[3D-SVG]
Adding Another Dimension to Scalable Vector Graphics, P. A. Mansfield, C. B. Otkunc, XML 2003 Conference Paper, 9 December 2003. Available at http://www.idealliance.org/papers/dx_xml03/papers/03-02-04/03-02-04.html.
[Cat]
Catwalk, a RAD Tool for Dynamic SVG-Generating Web Applications, P. A. Mansfield, XML 2001 Conference Presentation, 13 December 2001. Available at http://www.idealliance.org/papers/xml2001papers/slides/Mansfield/Catwalk.zip.
[DB]
DocBook: The Definitive Guide, 2nd ed., N. Walsh, L. Muellner, O'Reilly & Associates, Inc., 2004 (in progress). Available at http://www.docbook.org.
[DWGraphs]
Programmatic Rendering of Directed, Weighted Graphs, P. A. Mansfield, M. Ambachtsheer, SVG Open 2003 Conference Paper, 16 July 2003. Available at http://www.svgopen.org/2003/papers/RenderingGraphs/.
[GS]
Graphical Stylesheets: Using XSLT to Generate SVG, P. A. Mansfield, D. W. Fuller, XML 2001 Conference Paper, 13 December 2001. Available at http://www.idealliance.org/papers/xml2001/papers/html/05-05-02.html.
[PDF2XML]
Converting PDF to XML with Publication-Specific Profiles, A. Gurcan, Y. Khramov, A. Kroogman, P. A. Mansfield, XML 2003 Conference Paper, 11 December 2003. Available at http://www.idealliance.org/papers/dx_xml03/papers/05-03-03/05-03-03.html.
[SVGMaps]
Cleopatra: Publishing GML Data as Interactive SVG Maps, A. Meynert, SVG Open 2003 Conference Paper, 18 July 2003. Available at http://www.svgopen.org/2003/papers/cleopatra/.
[SVG-XAML]
Targeting SVG and XAML in a Single Application, P. A. Mansfield, SVG Open 2004 Conference Presentation, 9 September 2004. Available at http://www.svgopen.org/2004/papers/TargetingSVGandXAML/TargetingSVGandXaml.zip.
[TestSpec]
Building Formal Requirements and Test Cases from Loosely Formatted Text Documents, Y. Khramov, P. A. Mansfield, XML 2004 Conference Paper, 18 November 2004. Available at http://www.idealliance.org/proceedings/xml04/abstracts/paper235.html.
[Trans]
How to Make a File Format Translator Using XML, P. A. Mansfield, XML 2002 Conference Paper, 12 December 2002. Available at http://www.idealliance.org/papers/xml02/dx_xml02/papers/05-04-04/05-04-04.html.

Footnotes

  1. Text formats are defined as character streams that utilize pre-existing conventions for character encoding.

  2. Binary formats are defined as non-text formats.

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.