Abstract
The W3C's XML Schema has now become the accepted standard for validation of XML documents. However, schema validation only goes so far. Real world users of XML have found a need for validation beyond the structure and datatype validation provided by XML Schema, such as cross-field validation and real-time access to external data sources.
Various mechanisms exists for performing additional XML validation. For example Schematron, built on the W3C's XPath standard, provides one possible solution to additional validation requirements, but offers limited integration with XML Schema.
Up until now XML Schema has been used as a simple validation mechanism. Introduction of the Post Schema Validation Infoset (PSVI) makes validation-time type information available to XML applications. The XPath spec, in its version 2 revision, is being extended to make use of the PSVI. Although the introduction of type information into the XPath standard is, like the XML Schema standard itself, not without its critics, we believe that these enhancements open the door for a new level of XML validation building upon the provisions of XML Schema.
Our proposal is a system for attaching XQuery rules and methods to XML Schema constructs. The specification builds upon the object orientated principles encompassed in the XML Schema typing mechanism to create a system for writing in-context business rules. It also borrows many features from traditional OO languages such as inheritance, polymorphism and encapsulation. Type functionality in XPath 2 and XQuery allows business rules definitions to be closely integrated with XML Schema types.
DecisionSoft have extensive experience of providing live XML validation services, and have worked closely with the UK Inland Revenue's PAYE Internet Filing project. DecisionSoft are also active in the development of Open Source XML tools and have made significant contributions the Xerces-C project. Building on the successful Pathan project, DecisionSoft are currently developing Pathan 2, a leading Open Source implementation of the XPath 2 spec for the Xerces-C parser.
Keywords
Table of Contents
One of the great advantages of XML as a format for data interchange is the standardisation of validation technology. XML Schema (now often referred to as WXS) provides a standard format for defining an XML message, and tools to validate messages against schema definitions are readily available. This has huge benefits to all parties looking to exchange data using XML. For the receiver, the availability of schema validating parsers allows developers to build applications based on the assumption that the data already meets specified structural and data type restrictions. This dramatically decreases the amount of data checking that needs to be hand-coded. It is estimated that up to 90% of program code is for exception handling, error-processing and house-keeping [MS 1993], so there is substantial gain to be made here. For those generating XML messages, it enables developers to test their applications "off-line". Anyone who has ever put together an XML document by hand will know that the first cut is rarely schema-valid, and that several edit-validate cycles are required to get a valid instance document. Without the ability to schema validate, it is much more likely that an application will produce invalid XML messages, closer to those you create by hand the first time round.
XML Schema can be used to validate several aspects of an XML document:
Structural constraints
Data type constraints
Key/ref and uniqueness constraints
Structural constraints include what elements and attributes can appear where, and how many times they can appear.
Data type constraints control what values are valid for a given element or attribute, and include constraints such as length, pattern, and minimum and maximum values. XML Schema provides a very powerful mechanism for specifying data types, and for deriving new types from existing ones.
Key/ref constraints allow relationships to be expressed that do not follow the natural hierarchical relationship of XML documents. They allow one element to refer to another in a standard manner that can be validated.
The introduction to the XML Schema specification (Part 1) [TB 2001] acknowledges that there are limitations to what can be validated using XML Schema, and that this is by design. Types of validation than cannot be defined using XML Schema include:
Cross-field validation rules
Validation with respect to external data
Complex data type validation
Cross-field validation rules are the most significant type of constraint that cannot be handled by XML Schema. This is the ability to check the value or existence of one part of an XML document against the value or existence of another part of a document. An example of such a constraint would be the requirement that a tax field on an invoice document be a certain percentage of the total, or if a particular field exceeds a certain value then another section of the document must be present.
Schema validation cannot reference any external data sources. This prevents checking values against databases, or against the current time, for example.
There are certain times types of single-field validation that cannot be done with XML Schema. It is always dangerous to say that something cannot be validated using regular expressions, as there is always somebody who will delight in proving you wrong with an incomprehensible stream of punctuation, but it is true that anything that requires stateful parsing cannot be validated using regular expressions. Surprisingly, email addresses - as defined by RFC822 - fall into this category, as they allow arbitrarily nested comments. Other values, such as dates or items containing checksums, can only be validated using particularly tortuous regular expressions and could be checked much more effectively using a few lines of program code.
The ability to add co-constraints to XML Schema is listed as a requirement (RQ-38) for XML Schema 1.1. [CH 2003]
The line between what can and can't be validated using XML Schema is entirely logical to XML experts. The design of XML Schema is very clean, and academically sound. However, to a business analyst designing a message definition that is to have an XML representation the line appears entirely arbitrary. Typically, they will have a list of Business Rules that apply to the message, and the fact that it is possible to check that a field is greater than a particular value with XML Schema, but not possible to check that a field is greater than another field is an uninteresting frustration. The fact is that real world uses of XML require validation beyond what is possible in XML Schema and, at present, there is no standard way to implement this.
One popular XML structure validation language is Schematron [RA 2001]. The fundamental difference between XML Schema and Schematron is that Schematron is not based on grammars, but is based on finding tree patterns in the parsed document. A Schematron schema consists of a set of rules. For each rule there is an XPath expression describing the node or nodes to be tested, and an XPath expression that is to be asserted with respect to that node. This is a simple yet very powerful method for validating many business rules, including cross-field rules. Undoubtedly, rules based validation is very effective for many business rules, but it is not usually the most effective way to describe the underlying structure of an XML message. For this, a grammar-based schema language is more suitable, and many advocates of Schematron concede that it is only part of the solution, and is best used in conjunction with a grammar-based schema language such as XML Schema or RELAX NG[1] [CM 2001].
Schematron is a simple and powerful mechanism for performing cross field validation, but it does have limitations:
Assuming that Schematron is used in addition to a grammar-based schema, the implementation of business validation rules will be split across two definitions. This immediately creates maintenance issues as the match expressions in the Schematron schema must be kept in line with the structure defined in the schema grammar.
Schematron assertions are limited to a single XPath expression. Certain business rules are very difficult to express in just a single expression, and can be coded much more clearly, and therefore maintainably, using a more complete functional or procedural language.
Schematron rules are tied to instance documents. The pattern match expressions navigate nodes in the instance document, rather than the types and structures in the schema. This makes it impossible for validation constraints to be inherited through the XML Schema type hierarchy, and for constraints to be re-used anywhere a type, or a derivation from a type, is used. As we shall see, this limitation could be partially overcome if XPath2 [BB 2002] expressions were used in Schematron.
One way in which Schematron is used is by embedding rules into appInfo annotations within XML Schema definition. This is described in "Combining Schematron with other XML Schema languages" [RE 2002]. At a first glance, this appears to solve the maintenance problem, and the fact that Schematron relates to instance documents. In fact it does neither. Even when embedded within XML Schema definitions, it is necessary include the full context for items matched by rules. This context relates to the instance document, and these contexts must be maintained independently. For example, take the following XML instance:
<message>
<senderAddress>
<postcode>W5 3QA</postcode>
<country>United Kingdom</country>
</senderAddress>
<recipientAddress>
<zipcode>12345-2345</zipcode>
<country>US</country>
</recipientAddress>
</message>
Defined by the following schema:
<xsd:schema
elementFormDefault='qualified' attributeFormDefault='unqualified'
xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name='message'>
<xsd:complexType>
<xsd:sequence>
<xsd:element name='senderAddress' type="addressType" />
<xsd:element name='recipientAddress' type="addressType" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="addressType" >
<xsd:sequence>
<xsd:choice minOccurs="0">
<xsd:element name="postcode" type="xsd:string" />
<xsd:element name="zipcode" type="xsd:string" />
</xsd:choice>
<xsd:element name="country" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Now, suppose we want a pair of additional constraints on each of the two addresses in the message.
If country is UK, then postcode must be present.
If country is US, then zipcode must be present.
The ideal place to define these rules would be on the addressType definition. With Schematron it is certainly possible to place the rules here, but it does not actually bind the rule to the type. It is still necessary to specify the context for the rule using an XPath expression. Unfortunately, in XPath 1.0 there is no way to say "all elements of type addressType". Let us consider how we might embed Schematron into the above schema in order to implement these rules:
<xsd:schema
elementFormDefault='qualified' attributeFormDefault='unqualified'
xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name='message'>
<xsd:complexType>
<xsd:sequence>
<xsd:element name='senderAddress' type="addressType" />
<xsd:element name='recipientAddress' type="addressType" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="addressType" >
<xs:annotation>
<xs:appinfo>
<sch:pattern name="addressType_rules" xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:rule context="//senderAddress | //recipientAddress">
<sch:assert test="country != 'UK' or count(postcode) = 1">postcode must exist if country is UK</sch:assert>
<sch:assert test="country != 'US' or count(zipcode) = 1">zipcode must exist if country is US</sch:assert>
</sch:rule>
</sch:pattern>
</xs:appinfo>
</xs:annotation>
<xsd:sequence>
<xsd:choice minOccurs="0">
<xsd:element name="postcode" type="xsd:string" />
<xsd:element name="zipcode" type="xsd:string" />
</xsd:choice>
<xsd:element name="country" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Note the context expression for the rules:
//senderAddress | //recipientAddress
This has a number of problems. Firstly, such an expression would have to be maintained to include every place in the schema in which the type was used. Secondly, using the "//" notation has drawbacks; if another element called senderAddress appeared within the document the rule would be applied to it, irrespective of whether it was derived from addressType. The "//" searches are also inefficient to execute. We could write the context expression differently:
/message/senderAddress | /message/recipientAddress
This isn't much of an improvement. For example, if we were to rename or relocate the message element, we would have to maintain the path in the rule definition as well. Consider what would happen if we wanted to wrap the instance in an envelope of some kind. The expressions, which relate to the root of the enclosed instance would become invalid.
The fundamental issue here is that Schematron rule definitions are bound to nodes in an instance document. They are not bound to definitions in an XML Schema. Embedding Schematron in XML Schema is certainly convenient, but it does not actually get any closer to binding rules to XML Schema definitions.
It is of course possible to develop validation code using a traditional programming language, such as Java. Performing initial validation against XML Schema provides an excellent basis for this, as it allows the developer to make assumptions about the structure and data types within the message to be validated. Custom code to perform such validation will typically involve a large amount of tree navigation and will result in a large body of code that cannot be maintained easily should the structure of the message change.
DecisionSoft have developed a system where the complete definition for a message is stored in a metadata repository. This definition contains all information required to generate an XML Schema, as well as cross field validation rules, typically written in Java, which can then be automatically assembled into validation code. This has a number of advantages:
The tree navigation code is automatically generated from the same source from which the schema is generated. This immediately reduces the maintenance overhead. The validation rule will be called once for each matching node in the XML instance document.
The validation rules can be written "in context". That is, the validation code is passed a pointer to the node being validated. This simplifies the task of writing validation rules. The rule will be called once for each matching node in the instance document.
This method couples the benefits of rules-based validation with the power of a full procedural programming language. It also makes it possible to tie business rules to type definitions, rather than to nodes in an instance document (although in this case the types are in the metadata repository and the effect is simulated at the point of code generation).
Whilst the method of embedding Java code into a metadata definition has proven to be particularly effective in creating real-world validation engines, it is undoubtedly not the cleanest solution. We have done significant research into how this system could be improved in order to make better use of XML standards.
As suggested earlier, a key to our research into the problem involves new features in XPath 2.0 [BB 2002], and in particular, the availability of type information in the XQuery 1.0 and XPath 2.0 Data Model [FM 2002]. At the most simplistic level, it makes it possible to write Schematron context expressions that bind rules to particular types, rather than to elements or attributes in the instance document. For example,
//*[ . instance of element of type addressType ]
would match all elements of type addressType, as we were looking to do in our previous Schematron example. This has the effect of binding rules to XML Schema types, but it is far from efficient to execute and the rule is still defined separately from the rest of the type. A mechanism involving storing rule definitions within annotations in an XML Schema and generating a Schematron schema from this is certainly conceivable, but we believe that validation rules should be more closely tied to XML Schema definitions.
XML Schema borrows many Object Orientated concepts. Data types can be derived from other data types, data types can be declared as abstract, and basic polymorphic behaviour can be achieved using substitution groups. More recent developments take this further, for example, XPath 2.0 introduces the instance of, treat as and cast as operators, enabling further Object Orientated functionality.
One significant item of Object Orientated behaviour that is not yet available is the ability to attach code to data definitions. This is exactly the functionality we are looking for in order to implement beyond-schema validation. For the rest of this paper we will investigate how XML Schema and XPath would need to be extended to allow this sort of validation, and what benefits this would offer for validation of XML messages.
It is our experience that language used to implement validation rules can be largely independent of the mechanism for associating them with document nodes. This is also the position of Schematron, where although using XPath expressions for both the match expression and the assertion expression is by far the most common way to use Schematron, it is not actually a requirement for a Schematron schema.
For the purposes of presenting an optimum solution to beyond-schema validation, we will consider what would be the preferred language for writing validation rules, based on our experience of coding "real world" business rules. In order to maximise re-use of existing W3C standards, we looked to languages defined by the W3C standards process. There are three obvious alternatives:
Schematron has shown that validation rules can be written as single XPath expressions, but this does not necessarily result in the most readable or maintainable rules. For example, consider one of the rules described earlier, "postcode must exist if country is 'UK'". This would be written as:
country != 'UK' or count(postcode) = 1
Whilst this is concise, it takes a certain mental shift to rewrite rules written in imperative English into a single boolean expression. XPath 2.0 introduces an "if ... then ... else ... " construct which would allow rules to be written more naturally:
if(country = 'UK') then count(postcode) = 1 else true()
As the rule complexity increases, the requirement that the rule be written as a single expression becomes onerous. In the early days of DecisionSoft's validation engine, we implemented real world business rules as single DSLPath [2] expressions. Whilst successful, the coding of certain business rules was nothing short of heroic. In order to consider XPath as a viable language for coding rules, we would need a mechanism for breaking down business rules into multiple expressions by providing the ability to define and call functions. After significant research, we concluded that an entirely functional language of this type would be sufficient, but we had reservations about using a functional language as our experience shows that users are more comfortable coding in a procedural or declarative language.
XSLT has been used successfully as a validation language in DecisionSoft's validation engine. Like the XPath proposal above, XSLT is a functional language, which we consider a drawback for this application. XSLT is generally acknowledged to have a steep learning curve as a result of its functional nature. XSLT is expressed as XML; this has some advantages but we believe that XML is not the ideal syntax for a programming language.
XQuery was designed as a language for performing queries on XML documents. It is a declarative language, and has support for user defined functions. Unlike XSLT it allows variables to be modified after their initial assignment. These features make it easy to break down complex validation rules into smaller components. XQuery is very closely related to XPath, and shares the same data model. [FM 2002]
Our conclusion is that XQuery is the most appropriate language for coding validation rules.
The proposed mechanism for incorporation additional validation rules into XML Schema would require the following changes:
Changes to XML Schema to define a syntax for the inclusion of validation rules in schemas
Changes to XML Schema to specify the execution of validation rules
Extension of XQuery to allow calling of methods defined on schema types
We propose two mechanisms for including validation code in XML Schema: rules and methods. Both are attached to type definitions. Rules are called in order to validate a node that is derived from a type containing that rule. Methods can be called from rules or other methods, and are used to encapsulate utility code that relates to a particular data type within the type definition.
A possible syntax for a Rule Definition would be:
<rule name="exampleRule"> <expression>if (Y = 'foo') then X = 0 else true()</expression> </rule>
A Rule Definition can appear as part of a Simple Type Definition or a Complex Type Definition. One area of this proposal that requires further development is the behaviour of Rule Definitions when included as part of a type derivation step. The current specification is that Rule Definitions may only be included where a type is derived by either restriction or extension. However, in either case, the act of adding rules serves to further restrict the type, as a valid instance of the type must conform to all rules defined on any base type.
When validating an XML document against an XML Schema containing Rule Definitions, for each element or attribute, any rules defined on the type on which they are based are evaluated by finding the Effective Boolean Value of the XQuery expression, with the current context set to the node being validated. The document is considered to be valid only if all validation rules evaluate to true.
Methods are blocks of code defined as part of data type definitions that are not called directly as part of the validation process, but instead can be called by rules or other methods. The purpose of methods is to allow encapsulation of code into data type definitions. Two types of method are proposed: instance methods and static methods. These are analogous to instance and static methods in Object Orientated languages such as Java. Instance methods may only be called with the context of an instance of the data type (or a derived data type) upon which they are defined. Static methods may be called without reference to an instance of the data type.
A possible syntax for a Method Definition would be:
<method name="method1" returnType="xs:string" static="true"> <parameter name="param1" type="xs:float"> <parameter name="param2" type="xs:integer"> <expression>if ($param1 = $param2) then "EQUAL" else "UNEQUAL"</expression> </rule>
Methods may be added to types at any point in the type derivation hierarchy. How derivation occurs needs further consideration. One possibility is that method definitions override definitions on base types with the same method name, essentially giving virtual method functionality.
The parameter list specifies what parameters the method accepts. The return type specifies the type of value returned by the method. These are all specified as SequenceTypes, as defined by the XQuery specification [BC 2002].
To allow calling of methods, the XQuery syntax needs to be extended. The XQuery language allows for calling functions, but in order to call methods it is necessary to specify an instance of the type (or just the type in the case of static methods) on which the method is to be called. The proposed extension is to ValueExpr, as defined in the XQuery specification.
ValueExpr ::= ValidateExpr | CastExpr | TreatExpr | Constructor |
PathExpr | SchemaMethodCall
SchemaMethodCall ::= "execute" ( FunctionCall |
"static" FunctionCall SchemaGlobalContext ("/" SchemaContextStep)* )
An example of a dynamic method call would be:
/foo/bar/( execute method1(1.0,1) )
For dynamic method calls, the behaviour for SchemaMethodCall is defined as follows: for each item in the previous step of the current navigation that are element or attribute nodes and contain in their type definition a method definition conforming to the name specified by the SchemaMethodCall and that is not defined as static, the method is executed with the supplied parameters, the current context is set to the the current node, and the resulting sequence of items is returned.
For static method calls, the behaviour for SchemaMethodCall is defined as follows: for the specified SchemaContext, if there exists a method definition conforming to the name specified by the SchemaMethodCall that is defined as static, the method is executed with the supplied parameters and the resulting sequence of items is returned.
We believe that this is the cleanest way to incorporate method call functionality into the existing XQuery working draft, however the syntax is somewhat unwieldy to use. We have considered defining an equivalent to the abbreviated syntax for axes, for dynamic method calls. For example:
/foo/bar=>method1(1.0,1)
This syntax was used to dereference IDREF links in early version of the XPath 2.0 working draft [BF 2002], but is not used in current working draft. This could be included as an extension the StepExpr production:
StepExpr ::= ( ForwardStep | ReverseStep | SchemaMethodCallStep
| PrimaryExpr ) Predicates SchemaMethodCallExpr?
SchemaMethodCallExpr ::= "=>" FunctionCall
An example of a static method call is as follows:
execute static method1(1.0,1) type addressType/postcode
The key point to note in this call is that the method is called with a SchemaContext, as defined in the XQuery specification [BC 2002]. This refers to a type or element definition within the schema, rather than to a particular instance of the type or the element. This is entirely analogous to static methods in object orientated languages such as Java.
We will now consider a few use cases for this system. Let us start with the Schematron example used earlier to check for the existence of post codes or zip codes based on the country code. This would be written as:
<xsd:schema
elementFormDefault='qualified' attributeFormDefault='unqualified'
xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name='message'>
<xsd:complexType>
<xsd:sequence>
<xsd:element name='senderAddress' type="addressType" />
<xsd:element name='recipientAddress' type="addressType" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="addressType" >
<xsd:rule name="country_UK">
<xsd:expression>
if(country = 'UK') then count(postcode) = 1 else true()
</xsd:expression>
</xsd:rule>
<xsd:rule name="country_US">
<xsd:expression>
if(country = 'US') then count(zipcode) = 1 else true()
</xsd:expression>
</xsd:rule>
<xsd:sequence>
<xsd:choice minOccurs="0">
<xsd:element name="postcode" type="xsd:string" />
<xsd:element name="zipcode" type="xsd:string" />
</xsd:choice>
<xsd:element name="country" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Apart from making use of the if ... then ... else construct available in XQuery 1.0, what is different from the embedded Schematron example shown previously? The key features lies in what is not included in the above example: there is no context expression. The rules are tied to the addressType definition, and will be applied anywhere that an element of this type is instantiated, irrespective of where it appears in the instance document, or what the element is called. It would also be possible to derive new types to which the rule would still apply.
Consider the following XML instance:
<invoice xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation="invoice.xsd">
<invoiceLine>
<quantity>3</quantity>
<itemCost>30.00</itemCost>
<taxRate>17.5</taxRate>
</invoiceLine>
<invoiceLine>
<quantity>1</quantity>
<itemCost>15.00</itemCost>
<taxRate>5.00</taxRate>
</invoiceLine>
<subTotal>105.00</subTotal>
<taxTotal>6.00</taxTotal>
<invoiceTotal>110.00</invoiceTotal>
</invoice>We wish to enforce the following validation rules:
The subTotal element must be equal to the sum of quantity times itemCost for each invoice line.
The taxTotal must be equal to the sum of quantity times itemCost times the taxRate (as a percentage) for each line.
The invoiceTotal must equal the sum of subTotal and taxTotal.
It is not possible to write the first two rules as an XPath 1.0 [CD 1999] expression. It is certainly possible to code each of them as a single XPath 2.0 / XQuery 1.0 expression, for example:
subTotal = sum(for $i in invoiceLine return $i/quantity * $i/itemCost)
This can be improved by the use of methods. For example:
<xsd:schema
elementFormDefault='qualified' attributeFormDefault='unqualified'
xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name='invoice'>
<xsd:rule name="subTotal">
<expression>subTotal = sum(invoiceLine=>lineTotal())</expression>
</xsd:rule>
<xsd:rule name="taxTotal">
<expression>taxTotal = sum(invoiceLine=>taxTotal())</expression>
</xsd:rule>
<xsd:rule name="invoiceTotal">
<expression>invoiceTotal = subTotal + taxTotal</expression>
</xsd:rule>
<xsd:complexType>
<xsd:sequence>
<xsd:element name='invoiceLine' type="lineType" maxOccurs="unbounded" />
<xsd:element name='subTotal' type="xsd:decimal" />
<xsd:element name='taxTotal' type="xsd:decimal" />
<xsd:element name='invoiceTotal' type="xsd:decimal" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="lineType" >
<xsd:method name="lineTotal" returnType="xsd:decimal" static="false">
<expression>quantity * itemCost</expression>
</xsd:method>
<xsd:method name="taxTotal" returnType="xsd:decimal" static="false">
<expression>.=>lineTotal() * taxRate / 100</expression>
</xsd:method>
<xsd:sequence>
<xsd:element name="quantity" type="xsd:decimal" />
<xsd:element name="itemCost" type="xsd:decimal" />
<xsd:element name="taxRate" type="xsd:decimal" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>Although in this simple example this does not appear to be an obvious improvement in terms of conciseness, we have achieved several things:
Encapsulation. The code to calculate a line total and a tax total for each invoice line is now encapsulated entirely within the definition for the lineTotal type.
Code re-use. The code used to calculate a line total is re-used in the calculation of the tax total for the line.
Readability. We have broken down the three rules into five readable expressions, written in the context of the types to which they relate.
Readers familiar with Schematron will notice that one feature apparently missing from this proposal is a provision for specifying error messages for to be reported with failed validation rules. We believe that this is best served by using appInfo annotations to the business rules that can then be reported via a post-processor to a schema validating parser. This allows schema designers to specify their own format for error messages which may include application specific data such as error codes, or multi-language error messages.
We have seen that there is a real need for XML validation beyond that provided by the W3C's XML Schema standard, and that existing tools to perform such validation offer a very low level of integration. As a result of XML Schema's Object Orientated approach to defining types, and of the propagation of type information into recent XML standards such as XPath 2.0 and XQuery 1.0 we have seen that only limited extensions to existing standards would be required in order to provide a clean, standard and tightly integrated approach to XML validation.
[BB 2002] Berglund, A., Boag, S., Chamberlin, D., Fernandez, M., Kay, M., Robie, J., Simeon, J., editors. XML Path Language (XPath) 2.0 W3C, November 2002, http://www.w3.org/TR/xpath20/
[BC 2002] Boag, S., Chamberlin, D., Fernandez, M., Florescu, D., Robie, J., Simeon, J., editors. XQuery 1.0: An XML Query Language W3C, 2002, http://www.w3.org/TR/xquery/
[BF 2002] Berglund, A., Boag, S., Chamberlin, D., Fernandez, M., Kay, M., Robie, J., Simeon, J., editors. XML Path Language (XPath) 2.0 W3C, April 2002, http://www.w3.org/TR/2002/WD-xpath20-20020430/
[CD 1999] Clark, J., DeRose, S., editors. XML Path Language (XPath) Version 1.0, W3C, 1999, http://www.w3.org/TR/xpath
[CH 2003] Campbell, C., Malhotra, A, Walmsley, P., editors. Requirements for XML Schema 1.1, W3C, 2003, http://www.w3.org/TR/2003/WD-xmlschema-11-req-20030121/
[CJ 1999] Clark, James, editor. XSL Transformations (XSLT) Version 1.0, W3C, 1999, http://www.w3.org/TR/xslt
[CM 2001] Clark, James and Murata, Makoto, editors. RELAX NG Specification, OASIS, 2001, http://www.oasis-open.org/committees/relax-ng/spec.html
[FM 2002] Fernandez, M., Malhotra, A., Marsh, J., Nagy, M., Walsh, N., editors. XQuery 1.0 and XPath 2.0 Data Model, W3C, 2002, http://www.w3.org/TR/query-datamodel/
[RA 2001] Jelliffe, Rick and Academia Sinica Computing Centre. The Schematron Assertion Language 1.5, http://www.ascc.net/xml/resource/schematron/Schematron2000.html
[RE 2002] Robertson, E. Combining Schematron with other XML Schema languages, http://www.topologi.com/public/Schtrn_XSD/Paper.html
[TB 2001] Thompson, H., Beech, D., Maloney, M., Mendelsohn, N., editors. XML Schema Part 1: Structures, W3C, 2001, http://www.w3.org/TR/xmlschema-1/
[1] RELAX NG is a simple, grammar based alternative to XML Schema. For the purposes of this paper, it does not offer significant validation capabilities beyond XML Schema, so we will not discuss it further.
[2] DSLPath was an in-house node selection language developed by DecisionSoft before the widespread adoption of XPath. It had a great deal in common with XPath, and for the purposes of this discussion may be treated as being equivalent.
![]() ![]() |
Design & Development by deepX Ltd. |