XML Europe 2003 logo

Better Conformance Testing Through Automation:

a software-based approach to creating conformance tests for W3C XML Schema

Abstract

The W3C is placing increasing emphasis on the provision of comprehensive, openly available conformance test suites to accompany its Recommendations. Conformance testing is seen as a powerful aid to developers in implementing standards-conformant software, and thus in encouraging and speeding adoption of Recommendations.

Unfortunately, traditional methods of conformance test development, painstaking and labor intensive as they are, cannot provide the necessary broad, evolving coverage of the feature sets of current and developing Recommendations: they are neither efficient nor agile enough. As a result, test developers are looking into methods of automating test development.

This paper describes our experiences in developing a highly configurable, extensible, component-based tool for the creation of conformance tests for XML Schema. It discusses our goals in building the tool, the needs it was designed to fill, its architecture, and finally its capabilities and limitations.

The tests produced consist of a set of schemas, each with a corresponding set of instance documents. Alongside the test files themselves is produced a set of metadata, which enables fully automated processing and result presentation of the test collection. The tool achieves this by combining information obtained from the normative Schema for schemas and a local XML control document, using a Java(TM) class library to generate the required test values, and wrapping these values appropriately.

To date, the tool is capable of producing conformance tests for all Schema built-in simple datatypes, including list and union datatypes. Some 6,000+ tests produced by the tool are currently included in the W3C's test suite for XML Schema. We are also experimenting with incorporating the tool in the automation of test production for XML Query.

Our aim as testers is to develop a tool which is flexible, extensible, responsive, easily configurable and modifiable, and which enables us to provide broad coverage of the Recommendation while at the same time minimizing our involvement in individual test production and simplifying the testing procedure for product developers. We believe this tool is a good first step towards that end.

Keywords


Table of Contents

1. Introduction
2. Test suite requirements
3. System capabilities
3.1. Tests
3.2. Documentation
4. Initial approach to automation
5. System Architecture
5.1. The Test Specification
5.2. Controller module
5.3. Java(TM) class library
5.4. Print architecture
6. Advantages of automation
7. Outstanding issues and future work
7.1. Scope of the tests
7.2. Quality control
7.3. Complexity of the codebase and code reuse
Bibliography
Biography

1. Introduction

This paper describes the design and use of a tool for the automated generation of conformance tests for the World Wide Web Consortium's (W3C) XML Schema Recommendation [1],[2], [3]. It discusses our goals in building the tool, the needs it was designed to fill, the capabilities and limitations of the tool, the tool's architecture, the decisions that were made in designing the tool, and the reasoning behind those decisions.

The W3C XML Schema Recommendation provides a collection of fundamental datatypes for use in XML documents, together with a set of rules which describe how other datatypes may be created from these built-in types to suit the data description requirements of XML-based applications. It is beyond the scope of this document to give an introduction to W3C XML Schema: see [7] for a collection of tutorials, example schemas, parsers, commentaries, and the Recommendation itself.

The National Institute of Standards and Technology (NIST) has a successful track record of providing conformance tests for various W3C technologies, including XML, DOM, XSLT and XSL/FO [4]. We participate in a number of W3C Working Groups, where our focus is on the design, development, and management of conformance test suites.

At NIST, our emphasis is shifting increasingly towards the automation of conformance testing for XML technologies. Our primary motivation in doing this is the widening dichotomy between the conformance testing requirements of the developer community on the one hand, and our ability to supply necessary tests using traditional methods on the other. We need to supply tests of sufficient number, quality, and scope, in a timely fashion. Further, we need to provide tests that track the evolution of multiple Recommendations through all stages in their development, even as the number of Recommendations proliferates, and to track issues regarding the tests we provide. In doing this, we assist the Working Groups with which we are involved to meet the requirements of the W3C Quality Assurance Activity [5], which calls for the parallel development of specifications and their respective test suites [6].

Traditional methods of test suite development tend to be time consuming, laborious and error-prone. Skilled test designers must pore over a specification, deriving semantic requirements from the document text and any productions it may contain, transforming these into test assertions and finally writing tests based on these assertions. This approach tends to produce small numbers of tests which are difficult to adapt to changing specifications and to correct for errors. We are currently researching a number of ways to introduce automation into the process which have demonstrated significant improvements in each of these key areas.

2. Test suite requirements

We had two principal requirements for any test suite produced, which were based upon our previous experiences with conformance test suites for XML-based Recommendations.

First, the tests should be atomic in nature, meaning that each test should examine a single, simple operation. Thus, each test schema document is confined to the derivation of a new datatype from a built-in datatype through restriction of a single facet. Similarly, a document conforming to this schema contains a single element, of the type defined in the schema, with a single value.

The rationale behind such atomic testing is to constrain possible interactions between independent aspects of the test to a minimum, and as a result enable implementors to more easily identify the source of any error. Thus, the purview of the test suite, by definition, is to measure the accuracy with which datatypes and the datatype derivation procedure are represented by schema-enabled processors.

Second, any test suite produced should be fully documented in XML. Such documentation enables straightforward automated traversal and filtering of the test suite by processors and provides users with references back to the relevant section of the Recommendation for each test. Report generation for human consumption is also greatly simplified.

A subsidiary goal, at least initially, was that we should limit ourselves to testing the simple datatypes defined in Part 2 of the Recommendation [3], and the mechanisms of derivation by restriction, list, and union.

3. System capabilities

3.1. Tests
3.2. Documentation

The system creates a test suite along two parallel axes: the tests themselves and documentation of the tests. A given test suite consists of a set of schemas, each of which has a set of associated instance files. The documentation provides traceability back to the Recommendation for each test and a route to the tests themselves for automated processing of the test suite as a whole. The test schemas, test instances and test documentation are all well-formed XML documents.

3.1. Tests

At the time of writing, the system can build conformance tests and associated documentation for all the built-in simple datatypes of W3C XML Schema [3] with the exception of ENTITY, IDREF, and NOTATION (and their corresponding list and union types). The reason for these omissions is that these types are intended for use only in XML attributes, which our model for test generation does not accommodate at present.

For each supported simple datatype, the system can test derivation by restriction (by all applicable facets), by list, and by union. Derivation by list and union is itself tested by deriving types from list or union types by restriction. Thus, a list type is first derived from a built-in simple type, and then various test types are derived by restriction of the list type by applicable facets.

The system allows the production of schema-valid and -invalid instance documents, with particular emphasis on the boundary conditions which separate valid from invalid instance values. As such, the values included in schema-invalid test instance documents are not arbitrary, but rather are chosen from that part of the value space of the base type which lies outside the restricted value space of the derived type defined by a test schema. As an example, a test schema defines a type, myType, by derivation by restriction from the built-in type integer with the maxInclusive facet set to 10. Valid instance values for myType would be 10, 7, -124, and so on. An invalid instance value might be L. However, the inclusion of such a value in an instance document would tell us nothing about a schema processor's ability to distinguish myType from integer, as this value is invalid for integer. Thus, the values selected for inclusion in invalid instance documents would come from within the value space of integer and outside that of myType, e.g. 11, 17, 472, and so on.

3.2. Documentation

The documentation for the test suite is in the form of a series of linked test descriptions, which are easily parsed for report generation and automated test suite traversal. Each test schema and test instance is described and linked to, and each test description contains one or more links to the relevant part(s) of the Recommendation for easy verification of the test. This approach to documenting the tests lends itself easily to transformation-based manipulation, for example into HTML for readability and to enable the testing of browser-based processors.

4. Initial approach to automation

Initially we had hoped that the creation of a test suite for XML Schema could be accomplished largely through the manipulation of a fairly straightforward set of XML documents: primarily a document describing the set of tests to be generated along with the normative schema for schemas itself [8]. However, we soon discovered that the schema for schemas, while it can easily be used to discover the names of the built-in datatypes, the attributes associated with them, and their relationship to the other datatypes, does not provide the kind of information necessary for the construction of such things as permissible value ranges for types or the syntactic structure of values associated with a given type. A good example is the following definition of dateTime from the Schema for Datatype Definitions [8]:

      <xs:simpletype name="dateTime" id="dateTime">
        <xs:annotation> 
	  <xs:appinfo>
            <hfp:hasfacet name="pattern"/> 
	    <hfp:hasfacet name="enumeration"/> 
	    <hfp:hasfacet name="whiteSpace"/> 
	    <hfp:hasfacet name="maxInclusive"/> 
	    <hfp:hasfacet name="maxExclusive"/> 
	    <hfp:hasfacet name="minInclusive"/> 
	    <hfp:hasfacet name="minExclusive"/> 
	    <hfp:hasproperty name="ordered" value="partial"/>
            <hfp:hasproperty name="bounded" value="false"/>
            <hfp:hasproperty name="cardinality" value="countably infinite"/> 
	    <hfp:hasproperty name="numeric" value="false"/> 
	  </xs:appinfo> 
	  <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#dateTime"/>
        </xs:annotation> 
	<xs:restriction base="xs:anySimpleType"> 
	  <xs:whitespace value="collapse" fixed="true" id="dateTime.whiteSpace"/> 
	</xs:restriction>
      </xs:simpletype>
      

From this definition it can be seen that dateTime is a primitive (as opposed to derived) type, derived directly from the abstract supertype of all the datatypes, anySimpleType. Additionally, it can be seen that the facets pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, and minExclusive can be applied to this datatype, that the datatype has certain properties, and that the value of the whiteSpace facet is fixed at collapse. However, various crucial properties of the datatype are not determinable from its schema definition alone. For example, the fact that values of this datatype have the syntax CCYY-MM-DDThh:mm:ss (e.g. 1997-03-29T09:40:37), or the fact that the year 0000 is prohibited.

Accordingly, we decided to use a hybrid model consisting of (1) manually constructed, code-based representations of the form and behavior of the built-in datatypes, as specified by the normative text descriptions and productions in the body of the Recommendation and, (2) of a data-driven mechanism to direct the production of tests.

5. System Architecture

Figure 1 shows the three principal components of the system:

  • a controller module, which reads configuration information and directs execution appropriately in order to create the required tests and accompanying documentation,

  • an object class library, in which each supported datatype is modeled by an object class which contains the necessary apparatus to report on key behaviors associated with the datatype and to produce datatype-specific test values for schema and instance documents for all applicable facets,

  • a print architecture, which combines test values with test-specific wrappers to produce the test documents and creates the documentation for each test.

click image for full size view

Figure 1. System architecture

5.1. The Test Specification

The test specification document is the key to the flexibility of the test generation tool. Since it is an XML document, it can be of arbitrary complexity (within the scope of its document definition), allowing the tester to specify the generation of any number of sets of tests at any level of granularity.

As of this writing, the level of granularity specified can range from all-encompassing (the generation of valid and invalid test scenarios for all atomic datatype/facet combinations and all list datatype/facet combinations - this is the default condition) to the highly specific (e.g. the generation of invalid test scenarios for the decimal datatype as restricted by the minInclusive facet). Note, however, that because of the combinatorial implications of creating tests for all possible union datatypes, tests for union datatypes must be explicitly specified.

The number of test schema documents per datatype/facet combination and the number of test instance documents per test schema are also independently specified here.

The combination of levels of granularity with the ability to specify the number of tests enables the tool to generate anything from a handful to literally thousands of tests in a run.

5.2. Controller module

The controller uses an XML parser to parse the test specification document. In the default case, where no datatypes and/or no facets are specified locally, the controller reads the schema for schemas [8] and uses it to build a list of all datatype and facet names. This latter capability is particularly useful for tracking the evolution of the Recommendation.

Once the datatype/facet combinations under test have been ascertained, the system loops through the list of datatype names. For each datatype name, the system checks that a library class is available to model the named datatype. If the library class exists, the controller issues a series of requests along the following lines (see Figure 2):

  • Is a specified facet applicable to the datatype? If yes:

  • What are the values of the absolute and constrained bounds on the axis or dimension of the datatype's value space represented by the facet?

  • Provide a set of n restriction values for incorporation into test schema documents.

  • For each restriction value, provide a set of p values for incorporation into test instance documents.

Once the schema restriction values and the associated instance value sets have been assembled, they are passed off to the print architecture for packaging and documentation.

click image for full size view

Figure 2. Interactions between Controller Module and Datatype Library Classes

5.3. Java(TM) class library

The datatypes are modeled as Java(TM) classes, one class for each atomic datatype, plus a class to model list datatypes and a class to model union datatypes. These classes are not general purpose representations of the datatypes: rather they model only those aspects of the datatypes necessary for the creation of test values. Typically those aspects are as follows:

  • Modeling of the value space of a datatype with regard to all its facets. The class is required to handle any value from within the defined value space of the datatype it represents and report any bounds on that value space. The class is also required to generate a specified number of arbitrary values from within the value space of the datatype for use as restriction values in test schemas.

  • Modeling of datatype derivation by restriction. The class should be able to accept a constraint on the value space of the datatype it represents and produce a set of arbitrary values in accordance with this constraint for use in test instance documents. Note: because of the atomic nature of the tests, it is not strictly necessary for our purposes for a library class to be able to model more than one simultaneous constraint, but see below.

Thus, the principal methods of the library classes are used to report numerical bounds (absolute and constrained), return a set of restriction values approriate to a type/facet combination, set the value of a constraining facet, and return a set of valid and/or invalid values for the datatype under some constraint.

It should be noted that the system uses Java(TM) class inheritance to mirror the derivation of W3C XML Schema's built-in derived datatypes. Because of the desirability of representing datatype derivation in this way, the library classes were built in such a way as to enable restriction of the value space through multiple facets simultaneously.

5.4. Print architecture

The print architecture is responsible for the creation of well-formed XML documents around the values sets produced by the controller module. These documents constitute the tests themselves (schema and instance documents) and the test suite documentation. The tasks performed can be summarized as follows:

  • Systematic naming and location of test schema and instance files.

  • Formatting and content of test files, including namespace declarations, systematic naming of elements and text descriptions of the test.

  • Augmentation of test file content under particular circumstances (for example, to resolve namespace issues with respect to QName values containing a namespace qualifier);

  • Production of the XML documentation for the test suite.

The precise format of the documentation is in flux, but it presently contains the following information for each test:

  • The location(s) of the schema file(s)

    • currently a URI

  • The location(s) of the instance files (if any)

    • currently a list of URIs

  • The expected result of the test

    • the expected validity of the schema (Boolean)

    • if the schema is expected to be valid, the expected validity of each instance document (Boolean)

  • A reference to the part of the Recommendation under test

    • currently a URI pointing into the Recommendation itself

  • A (human readable) description of the test

    • currently a text description

6. Advantages of automation

The system described has several advantages over traditional methods of conformance test production, which have greatly increased our productivity in test production and have enabled us to provide a far more significant contribution to the W3C's XML Schema conformance testing initiative than would otherwise have been possible to date.

The system is highly configurable, and is capable of producing a tailored, documented, referenced test collection very quickly and with minimal tester effort. The number and scope of the tests produced is easily configurable, allowing the creation of everything from a broad-based test suite, such as NIST's contribution to the W3C Schema test suite [9], to a tightly focused set of tests aimed at exploring a particular aspect of a processor's behavior.

Programmatic encoding of the test logic enhances the consistency of the test suite. This means that systematic upgrades can easily be applied and revised tests quickly published, for example in response to errata or the ongoing evolution of the Recommendation. Further, errors discovered in the test suite itself tend to be systematic and consistent: they are typically traceable to a small subset of the codebase, which can be patched to produce corrected tests across the board. This compares favorably with traditional methods of test production, in which errors tend to be unevenly distributed and difficult to track, and where each new test, upgrade, or correction must be painstakingly researched, corrected, and documented by hand.

Our initial investment in building a system to produce only schema-valid tests for the built-in atomic datatypes has paid off several times in allowing us to rapidly extend the test generator to produce invalid tests, tests for the "semi-structured" list and union datatypes, and test value sets for use in NIST's XML Query conformance testing effort. We believe that further expansion of the software will be equally straightforward.

Finally, programmatic encoding of the documentation logic greatly enhances the value of the test generator: a small set of rules, applied repeatedly and consistently, creates a set of documentation which facilitates automated processing and report generation by processors, provides traceability of each test back to the relevant part of the Recommendation, and enables custom transformations of the test suite into multiple views.

7. Outstanding issues and future work

The test generator is a work in progress and there are various issues we plan address in order to both improve the quality and comprehensiveness of the generated test sets and to render the software itself more usable.

7.1. Scope of the tests

During the evolution of the Recommendation and the early stages of processor development, our focus was on the breadth of the test suite we could provide, in terms of the proportion of simple datatypes for which we could produce tests. Now that the Recommendation has been stable for some time and Schema-aware processors are reaching maturity, our focus has shifted to the depth of the test suite, and we are looking at ways to better explore processors' capabilities with regard to the individual datatypes.

Our first goal in this regard has been to add the capability to generate invalid tests. To date, we have been successful in adding the capability to generate schema-invalid instance documents: documents containing elements whose values lie outside the value space defined by the referenced schema. This effort will grow to encompass errors explicitly identified in the Recommendation (e.g. that it is an error to set both the length and minLength facets in a single derivation step) as well as implicitly identified errors (e.g. that the value of the length facet must be a nonNegativeInteger). Our aim is to research and test for as many error conditions as possible. Beyond this, we intend to look at production of tests of complex types.

7.2. Quality control

The ability to create a test suite containing literally thousands of files, with accompanying documentation and references, poses a particular problem for quality control. Specifically, it is impractical to inspect and validate each test, each description and each reference manually. To date, we have used three complementary approaches to quality control: visual inspection of samples of the generated data, processing the tests with a variety of established Schema-aware processors, and visual inspection of the codebase. Nevertheless, each of these methods has various drawbacks, and even in combination should only be relied on as a high-level sanity check.

We are in the process of establishing a far more thorough mechanism for quality control, in collaboration with the W3C and the developer community. This mechanism has as its foundation a feedback model between developers and testers, centered around a discussion group where issues can be raised, a hierarchical dispute resolution protocol, and a Bugzilla-style issue tracking mechanism which will list issues raised along with actions undertaken and issue status. This process is modeled after similar processes associated with the W3C's test suites for XML [10] and DOM [11], which have proven to be very effective.

7.3. Complexity of the codebase and code reuse

The codebase of the test generator is by no means unwieldy, however we are concerned about minimizing its growth as we add new capabilities. As is all too often the case, a good design resulting in modular code is open to degradation over time with each new tweak to add a feature or special case a certain condition. We had hoped that extensive code reuse within the system would at least partially mitigate any descent into entropy, and indeed test generation of list and union types uses the library classes modeling the underlying item- and memberTypes, in addition to other less extensive examples. However, we remain vigilant for ways to optimize the code and to minimize the entropic impact of modifications!

While we intend to continue on with the established codebase, we are also examining other methods which might be used to minimize the impact of software error and attendant errors in the tests. One approach which shows promise is to increase the data centeredness of the system, such that all expected conditions and outcomes can be specified upfront. The process of test generation then becomes an exercise in transforming a priori data into a set of tests in accordance with some specification, using a tool such as an XSLT processor. The codebase might then be called piece by piece, as required, by the transformation processor, whenever complex, datatype-specific operations are required.

Bibliography

[1] The World Wide Web Consortium, XML Schema Part 0: Primer: http://www.w3.org/TR/xmlschema-0/

[2] The World Wide Web Consortium, XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/

[3] The World Wide Web Consortium, XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/

[4] The National Institute of Standards and Technology, Web Technologies project: http://www.nist.gov/xml

[5] The World Wide Web Consortium, Quality Assurance Activity: http://www.w3.org/QA/

[6] The World Wide Web Consortium, QA Framework: Operational Guidelines: http://www.w3.org/TR/qaframe-ops/

[7] The World Wide Web Consortium, XML Schema home page: http://www.w3.org/XML/Schema

[8] The World Wide Web Consortium, XML Schema for Datatype Definitions (normative): http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#schema

[9] The World Wide Web Consortium, W3C XML Schema Test Collection: http://www.w3.org/2001/05/xmlschema-test-collection.html

[10] The World Wide Web Consortium, Extensible Markup Language (XML) Conformance Test Suites: http://www.w3.org/XML/Test/

[11] The World Wide Web Consortium, Document Object Model (DOM) Conformance Test Suites: http://www.w3.org/DOM/Test/

Biography

John is a computer scientist whose current focus is on improving conformance testing methodology for W3C Recommendations. He is a member of the W3C XML Schema Working Group, where he is taking the lead in the conformance testing initiative.