XML 2003 logo

Converting DTDs (and DTD Developers) to RELAX NG Schemas

Abstract

Schemas wouldn't exist if they didn't offer advantages over DTDs. Publishing operations who've used XML for years are attracted by these advantages, but the idea of making the conversion while keeping existing products in production still intimidates many of them.

Tools like trang make the actual conversion easy. The human factor is the difficult part: you can't throw a switch and tell a staff that's been developing and maintaining DTDs for years to start doing it with an entirely new syntax. Converted versions of DTDs that they wrote themselves can look strange and foreign, and a class the week before and a few new books on the shelf won't be enough. The transition must be gradual.

One trang feature combined with a feature of the RELAX NG (RNG) schema language lets a DTD staff begin to take advantage of RNG's extra power while keeping their DTDs in production. They can write small RNG schemas that define only the data constraints that DTDs can't, and problems with these new schemas won't be disruptive because their existing system will remain in place.

When a DTD is modularized using parameter entities, a converter could make the entity substitutions and then perform conversion, but trang converts the parameter entities to RNG named patterns and references those from content models. For example, a DTD's quantity.type parameter entity with the value "#PCDATA" that gets used in a quantity element's content model gets converted to an RNG named pattern with the same name.

Because RNG lets you incorporate one schema into another and then override definitions from the included one, you can check that a quantity element's contents are always integers with the following steps:

1. Convert a DTD with the quantity and quantity.type declarations to an RNG schema, e.g. bigbook.rng. No one ever needs to look at this schema, as long as they regenerate it each time the DTD is modified.

2. Create a small secondary RNG schema which includes bigbook.rng and redefines quantity.type to have the value "xs:integer".

3. After doing normal validation against the original DTD, an additional validation against the secondary RNG schema with a tool such as James Clark's jing or Sun's msv would flag any non-integer quantity elements.

Additional constraints for the original DTD can easily be added to the secondary schema. The creation and use of these additional constraints let DTD developers get accustomed to the power and syntax of RNG gradually while the original production system is still in place. Once the staff is comfortable with the RNG syntax, they can abandon the original DTDs and continue with the completely RNG-based schemas. They can even do this one DTD at a time, with no shock to anyone's system.

While the general principles of conversion, inclusion, and redefinition can be part of a system using W3C Schemas, there are limitations to the conversion and redefinition capabilities that make this a less generalizable technique for conversion from DTDs to W3C Schemas. The presentation will close by reviewing these limitations.

Keywords