Accommodating XML 1.1 in XML Schema 1.0

Keywords: XML, Schema, datatypes, interoperability, W3C XML Schema

Henry Thompson
Reader
University of Edinburgh
Edinburgh
Scotland
United Kingdom
ht@cogsci.ed.ac.uk

Biography

Henry S. Thompson is Reader in Artificial Intelligence and Cognitive Science in the Division of Informatics at the University of Edinburgh, based in the Language Technology Group of the Human Communication Research Centre, and Managing Director of Markup Technology Ltd.He received his Ph.D. in Linguistics from the University of California at Berkeley in 1980. His university education was divided between Linguistics and Computer Science, in which he holds an M.Sc. While still at Berkeley he was affiliated with the Natural Language Research Group at the Xerox Palo Alto Research Center, where he participated in the GUS and KRL projects. He research interests have ranged widely, including natural language parsing, speech recognition, machine translation evaluation, modelling human lexical access mechanisms, the fine structure of human-human dialogue, language resource creation and architectures for linguistic annotation. His current research is focussed on articulating and extending the architectures of XML.He was a member of the SGML Working Group of the World Wide Web Consortium which designed XML, is the author of the XED, the first free XML instance editor and co-author of the LT XML toolkit and is currently a member of the XSL and XML Schema Working Groups of the W3C. He currently holds a World Wide Web Consortium Fellowship, and is lead editor of the Structures part of the XML Schema W3C Recommendation, for which he co-wrote the first publicly available implementation, XSV. He has presented many papers and tutorials on SGML, DSSSL, XML, XSL and XML Schemas in both industrial and public settings over the last five years.


Abstract


As published the W3C XML Schema specification references XML 1.0 explicitly, and incorporates by reference certain key definitions, in particular those of the 'Char', 'Name' and 'S' character classes. XML 1.1 changes the contents of these classes, so although nothing in the existing XML Schema specification specifically bars infosets produced by XML 1.1 conformant parsers, such infosets, if they exploit any of the relevant changes in XML 1.1, will not be accepted as valid by conformant XML Schema 1.0 processors.

This is a bad state of affairs -- users should be able to process XML 1.1 documents using schema processors. It appears this issue will not be officially addressed until a new version of the W3C XML Schema specification is approved. As this may take some time, this paper addresses the question of what should be done in the interim to best serve the XML community. It suggests a strategy for implementors to adopt, starting from the assumption that the XML declaration of a document is the definitive factor in determining how it should be validated, and going on to specific changes which processors implementing the XML Schema specification can make to enable sensible and interoperable support for XML 1.1.

Since the relevant aspects of W3C XML Schema are all encompassed by Part 2 (Datatypes) of the Recommendation, this issue and the proposed solution are relevant to other XML Schema languages, such as Relax NG, which incorporate the datatypes defined therein.

An implementation of XML Schema employing the proposed approach is strictly speaking non-conformant to the current version of the W3C XML Schema specification. I contend that none-the-less interoperability will best be served by such non-conformant processors until such time as a subsequent version of W3C XML Schema addressing this issue normatively is approved.

Consider the following four cases:

1. c1 vs. c0 in content

2. Old vs. new name chars in element names

3. Old vs. new name chars in ID-typed content

4. LF vs NEL in length-specified list-typed content

In each of the above cases, the first alternative is OK and has the same behaviour with respect to Schema validation in both XML 1.0 and XML 1.1, whereas the second alternative either is not Schema-valid under the strict XML 1.0 interpretation (1-3) or might be expected to have different behaviour between XML 1.0 and XML 1.1 (4).

The paper describes how to make the patterns associated with the relevant built-in types, that is xs:string, xs:Name and xs:NCName, depend on the version of the document being validated. It explains the details, and argues against using the version of the _schema_ document(s) involved, if any. Examples are provided and processed with two processors which already implement the proposed strategy.


Table of Contents


1. Late-breaking Talk

1. Late-breaking Talk

Since this was a late-breaking talk, the author did not have time to complete the paper for the proceedings.

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.