Click here to start

How to Make a File Format Translator Using XML

Presented at XML 2002 by Philip Mansfield

Installation Requirement

In order to view this presentation, you must have version 3.0 or later of the Adobe SVG Viewer Plug-In installed in your Web Browser, or equivalent SVG viewing software. You can download the Plug-In from http://www.adobe.com/svg/.

Start the Presentation

The presentation begins at Slides/Slide1.svg, and can be navigated by clicking on the arrows at the lower right of each slide. Additionally, there are a number of demo slides, accessible by clicking the red dot in the lower right corner on the presentation slides wherever it appears. When you are done with a demo, hit the browser "back" button to return to the main slide sequence.

Read the Paper

There is an accompanying conference paper that goes into greater detail than the slide presentation. It is included here at Paper/05-04-04.html, styled with SchemaSoft's XSLT and CSS, and is also available in other forms with the conference online paper archives.

Abstract

The era of the personal computer brought with it a proliferation of proprietary binary file formats. With the rise of the Internet, there has been a need to break down the barriers between machines and between applications, resulting in the current trend toward open, Web-accessible publication formats including XML grammars. Therefore there is a pervasive need for software to convert binary file formats to XML.

Just as XML is at the root of this conversion problem, so is it a basis for a solution. By taking advantage of the XML meta-language at every stage of the data conversion process, one can maximize code re-use.

The author will present research aimed at facilitating and partially automating the process of creating binary-to-XML file format translators. The process consists of the following stages: (i) file format analysis, (ii) creation of a parser, (iii) mapping analysis, (iv) creation of a mapper, (v) serialization of the target format. Here is how XML processing software is leveraged throughout:

First, we define an XML grammar for binary file format schema. Then we write special-purpose parser-generator software that reads in binary file format schema instances. Next, we write a rapid file format analysis tool that allows the user to discover file format schema in an iterative process: at each step a single schema modification is suggested by the user, a new parser is generated, a series of test files are parsed, and the results are dumped as XML for inspection.

The outcome of applying the rapid file format analysis tool to any particular binary format is a parser that handles that format. The outcome of applying this parser to any particular test file is an in-memory tree representation of that file, serializable as XML. Therefore the mapping stage reduces to an XML to XML transformation. In simple cases this mapping can be done efficiently enough with XSLT; in more realistic cases the XSLT needs to be cross-compiled with more traditional languages (e.g. with Java, using XSLTC).

Because the source and target of the mapping are of a common meta-language, it is possible to create a rapid mapping analysis tool that generates the mapper software. Again, the tool is used in an iterative process that culminates in the final mapping software.

As long as the target format is an open standard, one can take advantage of existing software to style, view, search, augment, edit or otherwise process that format. An important special case is the use of SVG (Scalable Vector Graphics) as a target. By translating from a binary format to SVG, one effectively has a viewer for that binary format without having to write any rendering software. The rendering stage is accomplished by standard browser or browser plug-in functionality. With the help of CSS, the view can be customized for user, purpose or device. With the help of SMIL Animation, the view can be dynamic. With the help of JavaScript and the DOM, the view can be interactive, with rich navigation, search or redlining functionality.

Speaker Bio

After receiving his Ph.D. in Mathematical Physics from Yale University in 1989, Philip spent a year as Assistant Professor of Physics at Knox College, followed by four years as Assistant Professor of Mathematics at the University of Toronto.  His background in Differential Geometry and in computer modelling of physical phenomena served as unorthodox preparation for his subsequent move into industry as a Software Engineer with an emphasis on Computer Graphics.  By 1997 Philip was in charge of a software research team creating early Web technologies based on HTML, XML, CSS and Java.  Philip now lives and works in Vancouver, Canada, where he is President of SchemaSoft (http://www.schemasoft.com/), a software development consulting company he co-founded in 1999. He is an Advisory Committee Representative of the World Wide Web Consortium (http://www.w3.org/), and has been a member of the W3C Scalable Vector Graphics Working Group (http://www.w3.org/Graphics/SVG/) since its inception in 1998.  Philip is Chair of the BC Advanced Systems Institute International Scientific Advisory Board (http://www.asi.bc.ca/). He is also a Director of the Vancouver XML Developers Association (http://www.vanx.org/), an organization that he co-founded in 2000.  He regularly writes and lectures on topics related to software engineering, XML and SVG.

More Information

For more information, visit http://www.schemasoft.com or write to info@schemasoft.com.