XML 2002 logo

Lattices and Documents: A Schema Language Independent Model of Types for XML and Information Interchange

Abstract

The continuing existence of multiple schema languages, the ongoing need to map from XML schemas to multiple data definition languages, as well as the appearance of XML programming models beyond SAX, DOM, and XSL, shows a requirement for a general model of document type that, while mathematically rigorous, stands above any particular schema language and is able to incorporate them all without prejudice. This paper proposes a lattice model to accomplish just that. This is a lattice over the powerset of all possible documents, ordered by inclusion and through a special operation called “thinning”. The lattice therefore contains the extension of every possible document type defined by any possible schema language or type system, whether currently used or yet to be invented, as well as types that are demonstrably undefinable. (Any particular type in any XML schema language describes a set of documents. The description forms the intensional definition of the type. The (potentially infinite) set of documents forms the extensional definition.) For example, the set of XML documents of papers accepted to this conference has its place somewhere in the lattice, but it is unlikely there is a schema language capable of specifying them in advance of their selection. Different schema languages can find common ground in the lattice in that different type descriptions in these languages will identify the same set of documents. Operations on types (such as XML Schema’s restriction and extension) can be extended across schema languages to the extent they can be seen as operating on extensions rather than intensions. This paper will introduce the lattice and point out how such a construct can assist in the issues mentioned above.