XML 2001 logo

Standards: Master or Servant?

Sam Wilmott

ABSTRACT

Standards were made for man, not man for standards. But with all the new standards coming out, it would seem sometimes that things are the other way around. How can we use standards to our benefit, and not our detriment?

These days, it seems that everyone is writing standards for the internet, for data communication and information management -- every month there's something new. It can often seem that more effort is spent on writing new standards than is saved by using them. What chance do we have of keeping au courant? And what do those of us who just want to get things done, do?

There's a number of things we can do. First, we have to determine what standards can do for us -- no standards for standards sake, thank you. Standards have a role in our lives when they make communication and interaction easier, between us and our fellows, and between our programs and work tasks. Standards introduce a degree of predictability -- there are fewer surprises. Beyond enabling communication, standards serve to encapsulate the past experience of those performing the same tasks, or similar.

Second, we have to pick and choose what standards are going to help us along. In the communication and interaction areas standards are typically chosen for us long before we arrive on the scene -- these standards tell us what we have to do. But a lot of standards go beyond the needs of communication and interaction -- they address processing methodologies -- and choice is much harder when it comes to how we do things.

Third, we have to decide when we've got what we want and what we need from the standards, and get on with the job. Often the hardest choice of all.

Table of Contents

1. Introduction

There are a whole lot of standards out there. It's hard to determine which ones one should be using and supporting.

Standards are supposed to help us improve our productivity — and often they do — but using too many standards, or using them inappropriately, can quickly become counter-productive. So how to avoid the down-side of standards? This paper attempts to help clarify the roles of the XML family and related standards in helping us do our work.

To help think about the relationship between standards and the work they help with, this paper develops an analogy in a more mundane area. A good analogy for understanding the different roles of the different technologies and standards used in moving, storing and making use of textual data is in what's done and what's needed when moving house — something we're all painfully familiar with.

2. Moving House

What's being moved is your furniture and other household belongings, the contents. The contents are put in boxes and wrappers, the packaging. The contents, together with the packaging, are what's moved. The packaging is there:

An important role for packaging — although it is most commonly looked at as something that can be done with it, rather than why the packaging is there — is that things can be written on the boxes, and become part of the packaging:

In general, this can all be called labeling the contents.

Only confusion can result from confusing the different roles of labeling — it needs to be clear what is the destination, what is the location, what is handling, and what is post-move usage — the different labeling has different target audiences, both human, and in these days of bar codes, machine.

Packaging is part of moving the household, not part of the household:

Occasionally, you'll have goods that you'll want to leave in their packaging:

You need experts (a.k.a. moving men) to help you move house. The experts you hire to move your household goods are experts in packaging, and in moving packaged goods from place to place — in particular in the use of transport vehicles. On the other hand, the experts may not, and need not, know anything about the contents. With any luck, the experts are very little in evidence after the day of the move. (You can get away without the experts if the distance isn't great, if the amount of goods isn't large — i.e. when you're in your 20's — if there's nothing of particular value that might be broken, or if you've got a lot of spare time.)

Even though what's used for packaging, transport vehicles, contents and usage doesn't make much difference in general, in particular it often does, and there are good reasons for these things to conform to standards and to use generic materials, tools and processes:

Standards and generic materials (packaging) and processes (transport vehicles) are easiest to devise, use and generalize for what goes on on moving day. But they are very much harder to devise, use and generalize for what is later done with what is moved — the contents, and at best, tend to get used in an "ad hoc" fashion after moving day.

3. Moving from Household Goods to Computer Data

Moving data from one place to another involves contents, packaging transport vehicles and experts just as does moving household goods, and just about all the same considerations apply to each, even though the particular contents, packaging, transport vehicles and experts are very different in each case.

Like moving-day packaging, data packaging:

Data differs from household goods in a few but important ways:

The similarities between moving house and moving data go a long way to explaining why standard and generic tools and methodologies are of use in moving data, and why standard and generic tools and methodologies are of use in making use of data. The differences between them go a long way to explaining why standard and generic tools are a bigger part of moving and using data than of moving and using household goods.

4. Moving Away from the Analogy — What do Standards in the Data Movement and Usage Industries Do For Us?

Standards facilitate communication:

Standards improve convenience, safety and economy by reusing existing technology, by reusing existing expertise, and in general, by not having to reinvent the wheel

5. So What About XML?

What do the XML family of standards define, at their core?

To start with XML is a packaging mechanism. XML start and end tags serve as boxes. The attributes of start tags, comments and processing instructions, provide for different kinds labeling, and character references (identifying characters by number or by name, instead of just putting in the characters) serve as protective wrappers.

XML itself, and the XML family provide a way of specifying those things that are commonly understood and agreed on (the things "everybody knows" in human activities, such as household moving, but which have to be explicated when talking to computers): DTDs for the form of the packaged data in outline, and XML Schemas for the form of the packaged data in more detail.

XML family standards provide ways of describing the usage of packaged data:

XML describes the form of data — what it looks like — not its "meaning". DTDs and XML Schemas likewise, describe limits on the form of data — they describe what XML data should look like — or constitute a promise about what XML data does look like. Data types in XML Schemas may identify some data as a "date", but they just describe what a date looks like, not what it means.

XML encoded data is a serialization — it flattens out your data into a one-dimensional stream. For example, a two-dimensional table is serialized into a sequence of rows or columns, each containing a sequence of the other. Likewise a nested tree-like structure is serialized into a particular "normalized walk" — left-to-right, depth-first. So the structure of an XML document is not necessarily the structure of the data it packages, but rather a flattening of the "true" data structure, from which flattening the original structure can be reproduced.

Why serialization? Because of the physical characteristics of our data movement media — the transport vehicles of our data — and because of the nature of our storage devices. And because serialized data is econimical to transmit and store.

6. Beyond XML

What else is there that should be considered, outside of the XML family of standards?

The transport vehicles for XML packaged data — the various Internet, data base and networking standards and usages used both for XML packaged and other data — generally fall outside of the XML family.

Industry and other conventions for describing and placing constraints on the form of packaged data also fall outside of the XML family, whether these conventions and constraints are expressed using DTDs or XML Schema, or using other means. (DTDs and XML Schema can never say everything — nor should they attempt to do so.)

Industry and other conventions for describing usage of data, be it XML packaged or otherwise, mostly fall outside of the XML family.

Perhaps most importantly, the contents, which at the end of the day is up to you, and for which you are the only standard, are outside of the XML family.

7. Where do DTDs and XML Schemas fit in?

Like basic XML, DTDs and XML Schemas describe form.

There are basicly two kinds of form:

DTDs and XML Schamas also provide default values — mostly of attributes — where the context makes it unambiguous that a required value would otherwise be missing.

DTDs and XML Schemas don't describe all form issues. Some issues are the business of the underlying standard (e.g the XML standard describes what tags look like, and that's that). Some issuses are conventional for the class of applications targeted. And some issues are strictly the business of usage that will be made of the data (for example, if a an element contains a person's name, then whether it's a legitimate name is up to the processing application).

XML Schemas allow you to describe a richer set of properties than do DTDs, and in a superficially simpler descriptive format. XML Schemas describe more detailed lexical properties — i.e. data types — than do DTDs, and there are more of them. On the other hand, the syntactic properties described by XML Schemas and DTDs are similar in their richness — XML Schemas allow for a bit more flexibility than DTDs.

DTDs and XML Schemas serve a number of important roles:

8. What about Usage?

Computer programs use data, whether for their own nefarious purposes or for presentation to human beings. There are a great variety of kinds of computer programs, with different capabilities and contexts in which they operate, but they are basicly all computer programs — written in computer programming languages. When moving a household, it's relatively easy to standardize the packaging and transport vehicles, but much harder to categorize the contents and its usage — because it can be anything.

Usage falls into two broad categories:

9. When are Standards Standards and When are They Not Standards?

Standards are standards when they are stable and serve well-defined roles — as described for our industries in this presentation.

Standards are not standards when they are a moving target, where the roles they serve are not clear, and where overlapping roles (for example, DTDs and XML Schemas) cause confusion.

10. So What Should Each Of Us Do?

Use common and popular standards for data packaging standards, transport, and client-side — and don't be much concerned with how much you like them so long as your data gets to its destination, well labeled and undamaged.

Be clear as to what you are getting from standards. Ubiquity for packaging, labeling, transport and client-side technologies. Expertise for server-side technologies.

Take care to avoid overlapping use of standards — having both a DTD and an XML Schema for a class of XML documents typically causes confusion, and does little, if any, good.

Biography

Sam Wilmott
OmniMark Technologies
Ottawa
Ontario
Canada
Web: www.omnimark.com

Sam Wilmott is the lead researcher at OmniMark Technologies, and architect of the OmniMark programming language. He has also worked on document markup standards since the late '70's, and has served Canadian representitive on the ISO SGML committee.