Normal Form Conventions for XML Representations of Structured Data
ABSTRACT
Starting with the work Andrew Layman presented to the W3C Query Language workshop in 1998, the issue of establishing conventions for the XML representation of structured data has generated considerable interest. In this paper we present an explicit definition not only of what might be called Layman normal form, but also of three other normal forms. A virtually complete draft of the paper can be found at http://www.ltg.ed.ac.uk/~ht/normalForms.html.
The initial motivation for the work presented here is to serve as the starting point for a declarative approach to XML data binding, i.e. to the use of XML as a transfer medium for structured. Such an approach must accommodate both marshalling application data into XML documents, and unmarshalling XML documents into application data. However with hindsight it has become apparent that it is also a useful point of departure for consideration of the general question of the semantics of markup vocabularies in the wild, so to speak. That is, when we look at the DTDs and other schemas which have been written for markup applications, what generalisations can we make about how markup is used to convey or record domain properties?
It is my contention that in practice existing DTDs often can be understood as employing one or another of the normal forms set out in this paper as their encoding strategies. Understanding these normal forms can thus contribute to the analysis of DTD patterns of markup use.
This paper presents a sample dataset, and then defines four distinct Normal Forms (Alternating, Relation, Individual and the original Layman), illustrating all four with the sample dataset. In our view the paper is particularly timely not only because of the growing interest in XML data binding in the context of SOAP/XML Protocols, but also because although it does not depend on any particular schema formalism for defining XML document structure, it suggests requirements for any such formalism.
Table of Contents
1. Complete paper unavailable
This presenter's paper was not received in time to be included in the proceedings.

