Accommodating XML 1.1 in XML Schema 1.0
Track: Late Breaking News, Core Technologies, Web Services
Audience Level: Technical View
Time: Thursday, November 18 at 16:00
Keywords: XML, Schema, Datatypes, Interoperability, W3C XML Schema
Abstract:
As published the W3C XML Schema specification references XML 1.0 explicitly, and incorporates by reference certain key definitions, in particular those of the 'Char', 'Name' and 'S' character classes. XML 1.1 changes the contents of these classes, so although nothing in the existing XML Schema specification specifically bars infosets produced by XML 1.1 conformant parsers, such infosets, if they exploit any of the relevant changes in XML 1.1, will not be accepted as valid by conformant XML Schema 1.0 processors.
This is a bad state of affairs -- users should be able to process XML 1.1 documents using schema processors. It appears this issue will not be officially addressed until a new version of the W3C XML Schema specification is approved. As this may take some time, this paper addresses the question of what should be done in the interim to best serve the XML community. It suggests a strategy for implementors to adopt, starting from the assumption that the XML declaration of a document is the definitive factor in determining how it should be validated, and going on to specific changes which processors implementing the XML Schema specification can make to enable sensible and interoperable support for XML 1.1.
Since the relevant aspects of W3C XML Schema are all encompassed by Part 2 (Datatypes) of the Recommendation, this issue and the proposed solution are relevant to other XML Schema languages, such as Relax NG, which incorporate the datatypes defined therein.
An implementation of XML Schema employing the proposed approach is strictly speaking non-conformant to the current version of the W3C XML Schema specification. I contend that none-the-less interoperability will best be served by such non-conformant processors until such time as a subsequent version of W3C XML Schema addressing this issue normatively is approved.
Consider the following four cases:
1. c1 vs. c0 in content
2. Old vs. new name chars in element names
3. Old vs. new name chars in ID-typed content
4. LF vs NEL in length-specified list-typed content
In each of the above cases, the first alternative is OK and has the same behaviour with respect to Schema validation in both XML 1.0 and XML 1.1, whereas the second alternative either is not Schema-valid under the strict XML 1.0 interpretation (1-3) or might be expected to have different behaviour between XML 1.0 and XML 1.1 (4).
The paper describes how to make the patterns associated with the relevant built-in types, that is xs:string, xs:Name and xs:NCName, depend on the version of the document being validated. It explains the details, and argues against using the version of the _schema_ document(s) involved, if any. Examples are provided and processed with two processors which already implement the proposed strategy.
XML version
HTML version
PDF version
SVG version