Abstract
Still work in progress, the ISO DSDL Interoperability framework is a specification defining the flow of processes involved in the validation of XML documents and document fragments using one or several schema languages.
This talk will present the latest version of the specification and show on simple examples how the interoperability framework can be used both inside and outside XML schemas to apply transformations on document fragments both before and during the validation.
Keywords
This presentation will focus on the ISO DSDL Interoperability Framework. Still work in progress, the ISO DSDL Interoperability framework is a specification defining the flow of processes involved in the validation of XML documents and document fragments using one or several schema languages.
Still work in progress, the ISO DSDL Interoperability framework is a specification defining the flow of processes involved in the validation of XML documents and document fragments using one or several schema languages.
This talk is an update of the presentation given at XML 2002 under the same title. After a reminder of context and history of the ISO DSDL framework, it will present the latest proposal and show on simple examples how the interoperability framework can be used to validate a document using different schema and transformation technologies.
What's DSDL?
Why an interoperability framework?
At the beginning: two complementary proposals
The latest proposal
Conclusion
What's DSDL?
DSDL stands for "Document Schema Definition Languages" (mind "Document" and the "s").
Why DSDL?
Because other XML schema languages (read W3C XML Schema) do not meet the needs of "document heads".
Because document validation requires more than a schema language.
What's the plan?
DSDL is proposing a set of specifications which will include a framework, several schema languages (including Relax NG and Schematron), a datatype system and other pieces needed for document validation.
Who is behind DSDL?
DSDL is a project of the ISO/IEC JTC 1/SC 34 (chair: James Mason) WG 1 (chair: Charles Goldfarb). DSDL is chaired by Martin Bryan and its editors include James Clark, Murata Makoto, Rick Jelliffe, Martin Bryan, Diederik Gerth van Wijk, Ken Holman and myself).
Why does DSDL need an Interoperability Framework?
The Interoperability Framework is the glue between all the pieces of DSDL.
The design principle of DSDL is to split the issue of describing and validating documents into simpler issues (grammar based validation, rule based validation, content selection, datatypes, ...).
Different tools exist which needs to be integrated.
Different types of validations and transformations (defined inside or outside the DSDL project) often need to be associated and a framework is needed to perform the integration.
Examples of such mixing include localization of numeric or date formats, pre-validation canonicalization to simplify the expression of a schema, independent content split into different documents validated independently, aggregation of complex content into a single text node or split of structured simple content into a set of elements, ...
The DSDL interoperability framework is work in progress and its first wave had given birth to two different proposals or strawmen based on two different and complementary approaches:
Rick Jelliffe's Schemachine
My own XVIF
We can qualify Rick Jelliffe's Schemachine as being "traditional" (no offense meant) in the sense that this proposal is in the continuation of XPipe or the W3C "XML-Pipeline" Note and describes pipes of transformations and validations applied to full documents.
Rick Jelliffe gives the following description of his proposal:
based on XML Pipeline structures ( http://www.w3.org/TR/xml-pipeline/ ), but with rearrangement and renaming,
embedded in Schematron-like superstructure with titles and phases,
a minimal implementation is possible, where all validators and translators are command-line executable programs, and the framework document is translated into BAT files or Bourne shell scripts (i.e., validators etc. are treated as black boxes) ,
the purpose is validation rather than declarative description per se. (In particular, the further down a transformation chain that data gets, the more difficult it will be to tie the effect of a schema to the original document. )
this framework supports both validation of explicit structure and validation of complex data values. It leaves issues of simple datatyping to particular validators,
validation is a tree of processes,
supports inband signalling (@exclude) and out-of-band signalling (@haltOnFail).
A couple of short examples will be better than a long explanation...
<schemachine xmlns="....">
<title>Example Schema</title>
<pass>
<validate engine="schemachine:xsd" />
<validate engine="schemachine:schematron">
<param name="schema" href="a Schematron schema"/>
</validate>
</pass>
</schemachine>This first example passes a document through a W3C XML Schema validation followed by a Schematron validation.
<schemachine xmlns="....">
<title>Another Example Schema</title>
<ns prefix="html" url="..." />
<pass>
<select engine="schemachine:namespace_selector">
<param name="pattern">html:body</param>
<output name="htmlbody" />
</select>
<validate engine="schemachine:relax_ng">
<param name="schema" href="...."/>
<param name="feasible">true</param>
<input name="htmlbody"/>
</validate>
</pass>
</schemachine>Here, the document is passed through a "selector" which will select the html:body element and the output of the selection is used as the input of a Relax NG validation.
Rick Jelliffe has used all his experience to carefully craft a proposal with all the features needed to validate the more complex documents.
Some concepts such as phases are inherited from Schematron and Schemachine has all the bells ands whistles needed to fly:
Phases let users define different validation phases.
Selectors are filters which retain only the part of a document on which a partial schema will be applied.
Validators are containers to invoke schema validation.
Tokenizers split a text node into a set of elements.
Titles let you define info for the validation report.
While Rick Jelliffe has come with a solid proposal obviously easy to implement, I have wanted to explore more adventurous fields and felt that a proof of concept was needed to check their dangers and potential.
XVIF (for XML Validation Interoperability Framework) is thus both a strawman and a prototype written in Python and published under a MPL open source license at http://downloads.xmlschemata.org/python/xvif/.
XVIF is both very similar and very different from the approach taken by the Schemachine:
Designed to be used within a "host language" which could be a schema language (Relax NG, W3C XML Schema, Schematron, ...), a transformation language (XSLT, Regular Fragmentations, STX, ...) or a "pipelining" language (XVIF could be embedded within the structure of the Schemachine, Ant, XPipe, ...).
Note: the current version of the prototype implements only XVIF within Relax NG.
Defines "micro-pipes" of transformations and validations applied locally on the "current" node.
High integration with the hosting languages.
For Relax NG, a XVIF pipes are patterns, for XSLT they would be extension elements, ...
Fallback mechanisms are provided to insure that a schema or transformation can be read by non XVIF processors.
The current version is "minimal": bells and whistles will be added if it flies.
Takes advantage of the structures of the host language for complex features.
Focus on defining the basic bricks right.
Shortcuts will be added later on where needed and verbosity isn't an issue at this stage.
A first example of XVIF is:
<?xml version="1.0" encoding="utf-8"?>
<element xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</element>This is defining a Relax NG schema where the implicit "start" pattern is an element with name "foo" and which content is validated by a pattern "if:pipe", ie a micro-pipe of transformations and validations applied to all the elements, text nodes and attributes found in the "foo" element.
The pipe itself is a transformation splitting text nodes using the regular expression "," and a Relax NG validation applied on the result of this transformation.
A text node will thus be interpreted as a comma separated list of values and the list validates against a Relax NG schema expecting one or more values equal to "foo" or "bar".
The most basic building block of XVIF is "if:transform" to define a transformation:
<!-- The context nodeset "x" is defined by the host language here -->
<if:transform type="URI identifying the nature of T">
<if:apply>
Implementation of T
</if:apply>
</if:transform>
<!-- The result of the transformation "y=T(x)" is the context nodeset here -->Notes:
The implementation of T may be held in a "apply" element or attribute.
The implementation of T may be located in an external resource (if:apply/@href).
A validation is a transformation which returns either its input or an error:
<!-- The context nodeset (x) is defined by the host language here -->
<if:validate type="URI identifying the nature of V">
<if:apply>
Implementation of V
</if:apply>
</if:validate>
<!-- The pipe is aborted if the result is false,
otherwise, the context nodeset is left unchanged -->Transformations and validations can be chained in pipes:
<if:pipe>
<!-- The context nodeset (x) is defined by the host language here -->
<if:transform type="URI indentifying the nature of T2">
<if:apply>
Implementation of T2
</if:apply>
</if:transform>
<!-- The result T2(x) is the context nodeset here -->
<if:validate type="URI identifying the nature of V1">
<if:apply>
Implementation of V1
</if:apply>
</if:validate>
<!-- The pipe is aborted with an exception if the validation fails.
The context node is unchanged otherwise. -->
<if:transform type="URI identifying the nature of T1">
<if:apply>
Implementation of T1
</if:apply>
</if:transform>
<!-- The result y=T1(T2(x)) is the context nodeset her -->
<if:validate type="URI identifying the nature of V">
<if:apply>
Implementation of V
</if:apply>
</if:validate>
<!-- The result of the validation of y by V is the result of the pipe.-->
<if:pipe>That's all!
The examples show up to now were simple and do not make much difference with a document approach such as Schemachine. There are a couple of reasons where micro-pipes are interested:
Modularity: these pipes can be used in named patterns and reused in lieu of native Relax NG patterns:
<define name="csv">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</define>
.../...
<element name="foo">
<ref name="csv"/>
</element>Plays nicely with the schema language. For instance if we want to validate the list as a comma separated list if a type attribute is "csv" and as a whitespace separated list if the type attribute is "list", we can write:
<?xml version="1.0" encoding="utf-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe">
<start>
<element name="foo">
<choice>
<group>
<attribute name="type">
<value>csv</value>
</attribute>
<ref name="csv"/>
</group>
<group>
<attribute name="type">
<value>list</value>
</attribute>
<list>
<ref name="check-values"/>
</list>
</group>
</choice>
</element>
</start>
<define name="check-values">
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</define>
<define name="csv">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<ref name="check-values"/>
</if:apply>
</if:validate>
</if:pipe>
</define>
</grammar>In complicated cases, micro-pipes keep the transformations and validations close to the locations where they are needed. I think that this is important to insure the structure of the document is coded with a schema language instead of being a combination of selectors and bits of schemas.
Of course, these are only guesses and I don't think anyone has enough experience to have the final word in this debate!
An interoperability framework can't be an isolated technology and XVIF is linked to many other developments. These links to other technologies include:
W3C XML Schema: I see no reason why XVIF couldn't be associated with W3C XML Schema as it is with Relax NG.
Schema annotation: we have seen how close are transformation and validation... validations could be extended to add annotations to instance documents.
XPath 2.0/XPath NG axis: these annotations could be used by the proposal from Jeni Tennison to add extension axis to XPath 2.0 and/or an eventual "XPath NG".
Schemachine: some features of Schemachine could be added to XVIF or, XVIF could be used within the Schemachine framework, or a standalone version could be developed.
XSLT: XVIF could be used as a XSLT extension element.
Finally, I am considering adding some features to XVIF:
If/then/else statements.
Sub pipes.
Variables.
The two initial proposal (Schemachine and XVIF) have been presented to the ISO DSDL working group in Baltimore and although they have been considered a valuable input, both have been rejected for different reasons:
Schemachine has been considered "too procedural": its focus is on defining pipes, ie defining the algorithm used to validate a document while it would be more appropriate to focus on defining the rules to meet to consider that a document is valid.
XVIF has been considered too intrusive: to fully support XVIF, the semantics of the different schema languages must be extended and the schema validators need to be upgraded. An interoperability framework should work with existing schema languages and processors without requiring any update.
To take these two requirements into account, a new proposal has been made which builds upon ideas from Schemachine and XVIF but also from XSLT and Schematron. This proposal has been named "XVIF/Outie" after a joke from Rick Jelliffe. The description of XVIF/Outie can be found at http://downloads.xmlschemata.org/python/xvif/outie/about.xhtml and a prototype implementation is available.
The basic ideas behind outie are pretty simple:
Outie is all about defining assertions.
These assertions are schema validations applied on instance documents.
These instance documents can be the instance document presented for validation, other documents or results of transformations.
Assertions about the same instance can be grouped into rules.
The basic building blocks of an outie framework are thus the rules:
Each rule is about checking one and only one instance document.
By default this instance document is the instance document presented for validation.
Other instance documents may be selected:
Inline by specifying a transformation to apply on existing instance.
By references through a URL or reference to a variable.
Global variables may be defined to store the result of transformation.
Rules may belong to a "mode" and rules for a mode are explicitly applied.
Outie is purely declarative and side effect free:
Rules and variable definitions may appear in any order.
The order in which rules and assertions are processed is not guaranteed.
Variables which are not used may never been evaluated.
An example showing most of the features could be to define that a document is valid if and only if it is valid after transformation by "normalize1.xsl" per the schemas "schema1.sch" and "schema1.rng" or if it is valid after transformation by "normalize2.xsl" per the schemas "schema2.sch" and "schema2.rng".
A framework to express this using a variable to store the result of the transformation by "normalize2.xsl" could be:
<?xml version="1.0" encoding="utf-8"?>
<framework>
<rule>
<assert>
<choice>
<apply-rules mode="mode1"/>
<apply-rules mode="mode2"/>
</choice>
</assert>
</rule>
<rule mode="mode1">
<instance>
<transform transformation="normalize1.xsl"/>
</instance>
<assert>
<isValid schema="schema1.sch"/>
<isValid schema="schema1.rng"/>
</assert>
</rule>
<rule mode="mode2" instance="$instance2">
<assert>
<isValid schema="schema2.sch"/>
<isValid schema="schema2.rng"/>
</assert>
</rule>
<variable name="instance2">
<transform transformation="normalize2.xsl"/>
</variable>
</framework>We have seen most of the features of outie in this first example, let's just insist on the most "hidden" of them:
Outie is purely declarative and side effect free:
Rules and variable definitions may appear in any order.
The order in which rules and assertions are processed is not guaranteed.
Variables which are not used may never been evaluated.
The tools to apply for a transformation or schema validation is implicit.
The choice of the tool is function of the document type.
The document type is assumed from the extension of the document.
Implementations need to provide a way to define the match between extensions and tools.
Schemas can also be the result of transformations.
To illustrate this last bullet, we can take the example of a schema created by the "getStage" transformation proposed by Bob DuCharme:
<?xml version="1.0" encoding="utf-8"?>
<framework>
<rule>
<assert>
<isValid>
<schema>
<transform extension=".xsd" instance="schema.xsd" transformation="getStage.xsl">
<with-param name="stageName" select="'final'"/>
</transform>
</schema>
</isValid>
</assert>
</rule>
</framework>Or, using a variable:
<?xml version="1.0" encoding="utf-8"?>
<framework>
<variable name="final">
<transform extension=".xsd" instance="schema.xsd" transformation="getStage.xsl">
<with-param name="stageName" select="'final'"/>
</transform>
</variable>
<rule>
<assert>
<isValid schema="$final"/>
</assert>
</rule>
</framework>The perspective for outie is of course to get the approval of the ISO DSDL working group and, ultimately, to become an ISO DIS.
Before this, some issues need to be fixed:
Some transformations may split an instance document into several pieces, how do we address the different pieces in this case?
Should the format of the validation report be specified?
Should the format of the configuration file matching extensions and tools be specified?
More issues will probably be raised during the ISO meetings in London during this conference.
![]() ![]() |
Design & Development by deepX Ltd. |