XML Europe 2004 logo

Versioning made easy with W3C XML Schema and Pipelines

Abstract

There is a great deal of interest at the moment in managing evolution

and versioning of XML document types, for Web Services and other

application areas. Eduardo Gutentag and Arofan Gregory, in their work

on versioning for UBL, have developed a very powerful methodology for

managing evolution using W3C XML Schema [1].

David Orchard, in his draft TAG finding on versioning and

extensibility [2], has concentrated on usage scenarios for which the

UBL approach is inappropriate, because the UBL approach requires all

document consumers to use up-to-date schemas. That is, the UBL

approach describes how an application designed to handle version 1

documents can handle version 2 documents, _but_ it requires the

application to use the version 2 schema to validate the version 2

documents. Orchard's scenarios on the other hand assume that a version

1 application either cannot or will not use anything other than a version

1 schema. He also would prefer a 'passive' approach to versioning,

that is, one in which the version 1 schema does not contain any

explicit provision for extensibility such as wildcards. What he would

like is an approach which allowed version 2 documents which differed

from version 1 documents only in that they contain _additional_

content (perhaps only at the end of content models) to be successfully

processed by version 1 applications none-the-less. This kind of

scenario does indeed seem to be one likely to occur often as Web

Services are deployed and begin to evolve.

Taken together, Orchard's two requirements seem to render the problem

unsolvable without requiring special-purpose processing of the outcome

of validation -- processing which would have to interrogate the PSVI

(Post Schema-Validation Infoset) from version 2 documents (that is, in

practice, any document which failed validation with a version 1

schema) in detail to detect whether the failure was a real version 1

failure, or whether there was simply extraneous material which could

safely be ignored. Such special processing would have to recapitulate

virtually all of schema content model validation, which seems a

particularly wasteful duplication of effort.

In this paper I present an solution to this problem which requires no

special processing, and demonstrate an implementation using Markup

Technology's implementation of the Sun XML Pipeline language [3].

This approach consists of a validation step, a step which strips out

all elements whose declarations were not found during schema

validation, and a further validation step. Because the pipeline is

compiled and run as a whole by the pipeline engine, the double

validation is very efficient.

Not all schemas are suitable for use in this way -- I discuss the

design recommendations schema authors should follow to ensure this

will work properly, namely avoiding local element declarations, and

adding material only at the end (these trade off to a certain extent,

in fact).

Finally, I discuss the relevance of this approach to possible changes

to the interpretation of local element declarations in version 1.1 of

the W3C XML Schema specification -- making a change to interpreting

local element declarations more as declarations at the level of their

containing type definition, which can (indeed must) then be referenced

in the same way global declarations are referenced, would make the

validate-twice-with-intermediate-surgery approach cover a much wider

range of schemas.

[1] http://www.idealliance.org/papers/dx_xml03/papers/04-04-04/04-04-04.html

[2] http://www.w3.org/2001/tag/doc/versioning

[3] http://www.markup.co.uk/XML2003.html

Keywords


The full paper was not available at the time the proceedings were created. Please check the conference web site, http://www.xmleurope.com, to find an updated version of this paper.

Biography

Henry S. Thompson is Reader in Artificial Intelligence and Cognitive Science in the Division of Informatics at the University of Edinburgh, based in the Language Technology Group of the Human Communication Research Centre, and Managing Director of Markup Technology Ltd.He received his Ph.D. in Linguistics from the University of California at Berkeley in 1980. His university education was divided between Linguistics and Computer Science, in which he holds an M.Sc. While still at Berkeley he was affiliated with the Natural Language Research Group at the Xerox Palo Alto Research Center, where he participated in the GUS and KRL projects. He research interests have ranged widely, including natural language parsing, speech recognition, machine translation evaluation, modelling human lexical access mechanisms, the fine structure of human-human dialogue, language resource creation and architectures for linguistic annotation. His current research is focussed on articulating and extending the architectures of XML.He was a member of the SGML Working Group of the World Wide Web Consortium which designed XML, is the author of the XED, the first free XML instance editor and co-author of the LT XML toolkit and is currently a member of the XSL and XML Schema Working Groups of the W3C. He currently holds a World Wide Web Consortium Fellowship, and is lead editor of the Structures part of the XML Schema W3C Recommendation, for which he co-wrote the first publicly available implementation, XSV. He has presented many papers and tutorials on SGML, DSSSL, XML, XSL and XML Schemas in both industrial and public settings over the last five years.