Abstract
Still work in progress, the ISO DSDL Interoperability framework is a specification defining the flow of processes involved in the validation of XML documents and document fragments using one or several schema languages.
This talk will present the latest version of the specification and show on simple examples how the interoperability framework can be used both inside and outside XML schemas to apply transformations on document fragments both before and during the validation.
The examples will include demonstrations of interoperability between different transformation and validation techniques including XSLT, Regular Fragmentations and Relax NG.
Keywords
Table of Contents
What's DSDL?
DSDL stands for "Document Schema Definition Languages" (mind "Document" and the "s").
Why DSDL?
Because other XML schema languages (read W3C XML Schema) do not meet the needs of "document heads".
Because document validation requires more than a schema language.
What's the plan?
DSDL is proposing a set of specifications which will include a framework, several schema languages, a datatype system and other pieces needed for document validation.
Who is behind DSDL?
DSDL is a project of the ISO/IEC JTC 1/SC 34 (chair: James Mason) WG 1 (chair: Charles Goldfarb). DSDL is chaired by Ken Holman and its editors include James Clark, Murata Makoto, Rick Jelliffe, Martin Bryan, Diederik Gerth van Wijk, Ken Holman and myself).
Why does DSDL need an Interoperability Framework?
The Interoperability Framework is the glue between all the pieces of DSDL.
The design principle of DSDL is to split the issue of describing and validating documents into simpler issues (grammar based validation, rule based validation, content selection, datatypes, ...).
Different tools exist which needs to be integrated.
Different types of validations and transformations (defined inside or outside the DSDL project) often need to be associated and a framework is needed to perform the integration.
Examples of such mixing include localization of numeric or date formats, pre-validation canonicalization to simplify the expression of a schema, independent content split into different documents validated independently, aggregation of complex content into a single text node or split of structured simple content into a set of elements, ...
The DSDL interoperability framework is work in progress which has given birth to two different proposals or strawmen based on two different and complementary approaches:
Rick Jelliffe's Schemachine
My own XVIF
We can qualify Rick Jelliffe's Schemachine as being "traditional" (no offense meant) in the sense that this proposal is in the continuation of XPipe or the W3C "XML-Pipeline" Note and describes pipes of transformations and validations applied to full documents.
Schemachine basics
Schemachine example
Schemachine features
Rick Jelliffe gives the following description of his proposal:
based on XML Pipeline structures ( http://www.w3.org/TR/xml-pipeline/ ), but with rearrangement and renaming,
embedded in Schematron-like superstructure with titles and phases,
a minimal implementation is possible, where all validators and translators are command-line executable programs, and the framework document is translated into BAT files or Bourne shell scripts (i.e., validators etc. are treated as black boxes) ,
the purpose is validation rather than declarative description per se. (In particular, the further down a transformation chain that data gets, the more difficult it will be to tie the effect of a schema to the original document. )
this framework supports both validation of explicit structure and validation of complex data values. It leaves issues of simple datatyping to particular validators,
validation is a tree of processes,
supports inband signalling (@exclude) and out-of-band signalling (@haltOnFail).
A couple of short examples will be better than a long explanation...
<schemachine xmlns="....">
<title>Example Schema</title>
<pass>
<validate engine="schemachine:xsd" />
<validate engine="schemachine:schematron">
<param name="schema" href="a Schematron schema"/>
</validate>
</pass>
</schemachine>This first example passes a document through a W3C XML Schema validation followed by a Schematron validation.
<schemachine xmlns="....">
<title>Another Example Schema</title>
<ns prefix="html" url="..." />
<pass>
<select engine="schemachine:namespace_selector">
<param name="pattern">html:body</param>
<output name="htmlbody" />
</select>
<validate engine="schemachine:relax_ng">
<param name="schema" href="...."/>
<param name="feasible">true</param>
<input name="htmlbody"/>
</validate>
</pass>
</schemachine>Here, the document is passed through a "selector" which will select the html:body element and the output of the selection is used as the input of a Relax NG validation.
Rick Jelliffe has used all his experience to carefully craft a proposal with all the features needed to validate the more complex documents.
Some concepts such as phases are inherited from Schematron and Schemachine has all the bells ands whistles needed to fly:
Phases let users define different validation phases.
Selectors are filters which retain only the part of a document on which a partial schema will be applied.
Validators are containers to invoke schema validation.
Tokenizers split a text node into a set of elements.
Titles let you define info for the validation report.
While Rick Jelliffe has come with a solid proposal obviously easy to implement, I have wanted to explore more adventurous fields and felt that a proof of concept was needed to check their dangers and potential.
XVIF (for XML Validation Interoperability Framework) is thus both a strawman and a prototype written in Python and published under a MPL open source license at http://downloads.xmlschemata.org/python/xvif/.
XVIF basics
XVIF example
XVIF features
Another XVIF example
Why micro-pipes
Fall back
Perspectives
More examples (if time permits)
XVIF is both very similar and very different from the approach taken by the Schemachine:
Designed to be used within a "host language" which could be a schema language (Relax NG, W3C XML Schema, Schematron, ...), a transformation language (XSLT, Regular Fragmentations, STX, ...) or a "pipelining" language (XVIF could be embedded within the structure of the Schemachine, Ant, XPipe, ...).
Note: the current version of the prototype implements only XVIF within Relax NG.
Defines "micro-pipes" of transformations and validations applied locally on the "current" node.
High integration with the hosting languages.
For Relax NG, a XVIF pipes are patterns, for XSLT they would be extension elements, ...
Fall back mechanisms are provided to insure that a schema or transformation can be read by non XVIF processors.
The current version is "minimal": bells and whistles will be added if it flies.
Takes advantage of the structures of the host language for complex features.
Focus on defining the basic bricks right.
Shortcuts will be added later on where needed and verbosity isn't an issue at this stage.
A first example of XVIF is:
<?xml version="1.0" encoding="utf-8"?>
<element xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</element>This is defining a Relax NG schema where the implicit "start" pattern is an element with name "foo" and which content is validated by a pattern "if:pipe", ie a micro-pipe of transformations and validations applied to all the elements, text nodes and attributes found in the "foo" element.
The pipe itself is a transformation splitting text nodes using the regular expression "," and a Relax NG validation applied on the result of this transformation.
A text node will thus be interpreted as a comma separated list of values and the list validates against a Relax NG schema expecting one or more values equal to "foo" or "bar".
The most basic building block of XVIF is "if:transform" to define a transformation:
<!-- The context node-set "x" is defined by the host language here -->
<if:transform type="URI identifying the nature of T">
<if:apply>
Implementation of T
</if:apply>
</if:transform>
<!-- The result of the transformation "y=T(x)" is the context node-set here -->Notes:
The implementation of T may be held in a "apply" element or attribute.
The implementation of T may be located in an external resource (if:apply/@href).
A validation is a transformation which returns either its input or an error:
<!-- The context node-set (x) is defined by the host language here -->
<if:validate type="URI identifying the nature of V">
<if:apply>
Implementation of V
</if:apply>
</if:validate>
<!-- The pipe is aborted if the result is false,
otherwise, the context node-set is left unchanged -->Transformations and validations can be chained in pipes:
<if:pipe>
<!-- The context node-set (x) is defined by the host language here -->
<if:transform type="URI identifying the nature of T2">
<if:apply>
Implementation of T2
</if:apply>
</if:transform>
<!-- The result T2(x) is the context node-set here -->
<if:validate type="URI identifying the nature of V1">
<if:apply>
Implementation of V1
</if:apply>
</if:validate>
<!-- The pipe is aborted with an exception if the validation fails.
The context node is unchanged otherwise. -->
<if:transform type="URI identifying the nature of T1">
<if:apply>
Implementation of T1
</if:apply>
</if:transform>
<!-- The result y=T1(T2(x)) is the context node-set her -->
<if:validate type="URI identifying the nature of V">
<if:apply>
Implementation of V
</if:apply>
</if:validate>
<!-- The result of the validation of y by V is the result of the pipe.-->
<if:pipe>That's all!
Another more complex example is a schema that may be used to validate dates split as three elements as W3C XML Schema dates:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<element name="year"><text/></element>
<element name="month"><text/></element>
<element name="day"><text/></element>
</element>
</if:apply>
</if:validate>
<if:transform type="http://www.w3.org/1999/XSL/Transform">
<if:apply>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/date">
<xsl:copy>
<xsl:value-of select="format-number(year, '0000')"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number(month, '00')"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number(day, '00')"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</element>
</if:apply>
</if:validate>
</if:pipe>
</start>
</grammar>Note how we have chained a first validation making sure that the initial document looks like expected, a transformation and a second validation to check that the result of the transformation is a date.
The examples show up to now were simple and do not make much difference with a document approach such as Schemachine. There are a couple of reasons where micro-pipes are interested:
Modularity: these pipes can be used in named patterns and reused in lieu of native Relax NG patterns:
<define name="csv">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</define>
.../...
<element name="foo">
<ref name="csv"/>
</element>Plays nicely with the schema language. For instance if we want to validate the list as a comma separated list if a type attribute is "csv" and as a whitespace separated list if the type attribute is "list", we can write:
<?xml version="1.0" encoding="utf-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe">
<start>
<element name="foo">
<choice>
<group>
<attribute name="type">
<value>csv</value>
</attribute>
<ref name="csv"/>
</group>
<group>
<attribute name="type">
<value>list</value>
</attribute>
<list>
<ref name="check-values"/>
</list>
</group>
</choice>
</element>
</start>
<define name="check-values">
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</define>
<define name="csv">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<ref name="check-values"/>
</if:apply>
</if:validate>
</if:pipe>
</define>
</grammar>In complicated cases, micro-pipes keep the transformations and validations close to the locations where they are needed. I think that this is important to insure the structure of the document is coded with a schema language instead of being a combination of selectors and bits of schemas.
Of course, these are only guesses and I don't think anyone has enough experience to have the final word in this debate!
One of the first concerns of the Oasis Relax NG TC about XVIF was that this could lead to write schemas which wouldn't be valid per the Relax NG specification.
This specifications says that any element and attribute from a "foreign" namespace is removed during the phase of simplification of the schema.
This means that our schema:
<?xml version="1.0" encoding="utf-8"?>
<element xmlns="httphttp://relaxng.org/ns/structure/1.0"
xmlns:if="httphttp://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="split/,/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
</element>is read by a non XVIF Relax NG processor as:
<?xml version="1.0" encoding="utf-8"?> <element xmlns="httphttp://relaxng.org/ns/structure/1.0" name="foo"> </element>
Which is not valid. The fall back mechanism which I have implemented consists to adding an alternative for non XVIF processors which will be ignored by XVIF processors, for instance:
<?xml version="1.0" encoding="utf-8"?>
<element xmlns="httphttp://relaxng.org/ns/structure/1.0"
xmlns:if="httphttp://namespaces.xmlschemata.org/xvif/iframe" name="foo">
<if:pipe>
<if:transform type="httphttp://namespaces.xmlschemata.org/xvif/regexp"
apply="split/,/"/>
<if:validate type="httphttp://relaxng.org/ns/structure/1.0">
<if:apply>
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
</choice>
</oneOrMore>
</if:apply>
</if:validate>
</if:pipe>
<text if:ignore="1"/>
</element>Note the if:ignore attribute which will be ignored (as a foreign attribute) by non XVIF processors and will instruct XVIF processors to ignore the pattern.
Authors who want to write portable schemas are invited to provide this kind of alternatives to their if:pipe(s) for non XVIF processors.
An interoperability framework can't be an isolated technology and XVIF is linked to many other developments. These links to other technologies include:
W3C XML Schema: I see no reason why XVIF couldn't be associated with W3C XML Schema as it is with Relax NG.
Schema annotation: we have seen how close are transformation and validation... validations could be extended to add annotations to instance documents.
XPath 2.0/XPath NG axis: these annotations could be used by the proposal from Jeni Tennison to add extension axis to XPath 2.0 and/or an eventual "XPath NG".
Schemachine: some features of Schemachine could be added to XVIF or, XVIF could be used within the Schemachine framework, or a standalone version could be developed.
XSLT: XVIF could be used as a XSLT extension element.
Finally, I am considering adding some features to XVIF:
If/then/else statements.
Sub pipes.
Variables.
French date into ISO 8601 (with regular expressions)
French date into ISO 8601 (with XSLT)
ISO 8601 split (with Regular Fragmentations)
To transform a date such as "16 octobre 2002" into "2002-10-10" using regular expressions:
<element xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
name="date" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<if:pipe>
<if:validate type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="m/[0-9]+ .+ [0-9]+/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/^[ \t\n]*([0-9] .*)$/0\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) février ([0-9]+)/\2-02-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) août ([0-9]+)/\2-08-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/>
<if:transform type="http://namespaces.xmlschemata.org/xvif/regexp"
apply="s/([0-9]+) décembre ([0-9]+)/\2-12-\1/"/>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</if:apply>
</if:validate>
</if:pipe>
</element>Note the usual pattern: pre validation/transformation/post validation.
Same exercise with XSLT:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:transform type="http://www.w3.org/1999/XSL/Transform">
<if:apply>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:vdv="http://eric.van-der-vlist.com/tmpns" version="1.0">
<vdv:dates>
<month name="janvier"/>
<month name="février"/>
<month name="mars"/>
<month name="avril"/>
<month name="mai"/>
<month name="juin"/>
<month name="juillet"/>
<month name="août"/>
<month name="septembre"/>
<month name="octobre"/>
<month name="novembre"/>
<month name="décembre"/>
</vdv:dates>
<xsl:template match="*|@*|text()">
<xsl:copy>
<xsl:apply-templates select="@*|*|text()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/date/text()">
<xsl:variable name="n" select="normalize-space(.)"/>
<xsl:if test="contains(., $n)">
<xsl:variable name="d" select="substring-before($n, ' ')"/>
<xsl:variable name="m"
select="substring-before(substring-after($n, ' '), ' ')"/>
<xsl:variable name="y"
select="substring-after(substring-after($n, ' '), ' ')"/>
<xsl:value-of select="format-number($y, '0000')"/>
<xsl:text>-</xsl:text>
<xsl:apply-templates
select="document('')/xsl:stylesheet/vdv:dates/month@name=$m"
mode="month"/>
<xsl:text>-</xsl:text>
<xsl:value-of select="format-number($d, '00')"/>
</xsl:if>
</xsl:template>
<xsl:template match="month" mode="month">
<xsl:value-of
select="format-number(count(preceding-sibling::month)+1, '00')"/>
</xsl:template>
</xsl:stylesheet>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<data type="date">
<param name="minInclusive">2002-09-01</param>
<param name="maxInclusive">2002-12-31</param>
</data>
</element>
</if:apply>
</if:validate>
</if:pipe>
</start>
</grammar>In this case we have no pre validation and have taken care to copy any node to the result tree in our transformation.
Regular Fragmentations may be use for instance to split ISO 8601 dates into their parts:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<if:pipe>
<if:transform type="http://simonstl.com/ns/fragments/">
<if:apply>
<fragmentRules xmlns="http://simonstl.com/ns/fragments/">
<fragmentRule pattern="^[ \t\n]*([0-9]{4})-([0-9]{2})-([0-9]{2})[ \t\n]*$">
<applyTo>
<element localName="date"/>
</applyTo>
<produce>
<element localName="year"/>
<element localName="month"/>
<element localName="day"/>
</produce>
</fragmentRule>
</fragmentRules>
</if:apply>
</if:transform>
<if:validate type="http://relaxng.org/ns/structure/1.0">
<if:apply>
<element name="date">
<element name="year">
<data type="unsignedInt">
<param name="minInclusive">2000</param>
</data>
</element>
<element name="month">
<data type="unsignedByte">
<param name="maxInclusive">12</param>
</data>
</element>
<element name="day">
<data type="unsignedByte">
<param name="maxInclusive">31</param>
</data>
</element>
</element>
</if:apply>
</if:validate>
</if:pipe>
</start>
</grammar>Stay tuned:
Either "micro" or "macro", the ISO DSDL Interoperability Framework is on its way.
In any case, Python has now a Relax NG implementation (80% complete), and
I am commited to implement what's missing.
ISO standard or not, I am commited to develop XVIF and it's micro pipes.
![]() ![]() |
Design & Development by deepX Ltd. 2002 |