|
Table of contents | Author | City | Company | Country | State/Province | Term | Interchange | ![]() |
Wilmott, Sam
,
OmniMark Technologies
,
Ottawa
Ontario
Canada
Web site:www.omnimark.com
Sam Wilmott is the lead researcher at OmniMark Technologies, and architect of the OmniMark programming language. He has also worked on document markup standards since the late '70's, and has served Canadian representitive on the ISO SGML committee.
Standards were made for man, not man for standards. But with all the new standards coming out, it would seem sometimes that things are the other way around. How can we use standards to our benefit, and not our detriment?
These days, it seems that everyone is writing standards for the internet, for data communication and information management -- every month there's something new. It can often seem that more effort is spent on writing new standards than is saved by using them. What chance do we have of keeping au courant? And what do those of us who just want to get things done, do?
There's a number of things we can do. First, we have to determine what standards can do for us -- no standards for standards sake, thank you. Standards have a role in our lives when they make communication and interaction easier, between us and our fellows, and between our programs and work tasks. Standards introduce a degree of predictability -- there are fewer surprises. Beyond enabling communication, standards serve to encapsulate the past experience of those performing the same tasks, or similar.
Second, we have to pick and choose what standards are going to help us along. In the communication and interaction areas standards are typically chosen for us long before we arrive on the scene -- these standards tell us what we have to do. But a lot of standards go beyond the needs of communication and interaction -- they address processing methodologies -- and choice is much harder when it comes to how we do things.
Third, we have to decide when we've got what we want and what we need from the standards, and get on with the job. Often the hardest choice of all.
standardsThere are a whole lot of standards out there. It's hard to determine which ones one should be using and supporting.
Standards are supposed to help us improve our productivity -- and often they do -- but using too many standards, or using them inappropriately, can quickly become counter-productive. So how to avoid the down-side of standards? This paper attempts to help clarify the roles of the XML family and related standards in helping us do our work.
To help think about the relationship between standards and the work they help with, this paper develops an analogy in a more mundane area. A good analogy for understanding the different roles of the different technologies and standards used in moving, storing and making use of textual data is in what's done and what's needed when moving house -- something we're all painfully familiar with.
What's being moved is your furniture and other household belongings, the contents . The contents are put in boxes and wrappers, the packaging . The contents, together with the packaging, are what's moved. The packaging is there:
to make the contents fit into what the contents and packaging will be travelling in -- the dollies, the corridors, the elevators, the moving trucks, and the moving men's arms -- a.k.a. the transport vehicles ;
to make the contents more convenient to handle (of which convenience making the contents fit is one part);
to protect the contents from damage so that it arrives at its destination without loss of value;
even, at times, to protect the transport vehicles from damage by the contents (for example, sharp knives); and
to provide somewhere where identifying notes and labels can be placed.
An important role for packaging -- although it is most commonly looked at as something that can be done with it, rather than why the packaging is there -- is that things can be written on the boxes, and become part of the packaging:
destination information -- the street address or addresses to which the contents are being moved;
location information as to where each item of contents are to be placed at the destination;
handling instructions for individual items of contents -- especally fragile things -- during the move; and
usage notes about what is to be done with the contents once delivered -- these notes, unlike the others, are usually for use after the move, not during the move.
In general, this can all be called labeling the contents.
Only confusion can result from confusing the different roles of labeling -- it needs to be clear what is the destination, what is the location, what is handling, and what is post-move usage -- the different labeling has different target audiences, both human, and in these days of bar codes, machine.
Packaging is part of moving the household, not part of the household:
The packaging is not what you're paying to be moved, even though they are a necessary part of the move.
Typically the packaging is removed from the premises at the end of the move, although a do-it-yourself move can leave empty boxes piled in a corner.
Post-move usage notes need to be kept even when there is no other remaining use for the packaging.
Contents may be repackaged in transit -- the packaging used for a particular phase of a move typically has more to do with the transport vehicles in use than it has to do with the contents.
It doesn't really matter what you use for packaging -- so long as it does the job. The transport vehicles used are even less important -- again so long as they do the job.
Occasionally, you'll have goods that you'll want to leave in their packaging:
You might hold some goods for a later unpacking or even for a later move. (I've got some unpacked boxes in my basement from two house moves back, and my last house move was over 12 years ago.)
Sometimes the packaging in which it arrived is the most convenient for its use. (If you haven't sufficient bookshelf space for your books it may be best to leave them in the boxes.)
You need experts (a.k.a. moving men) to help you move house. The experts you hire to move your household goods are experts in packaging, and in moving packaged goods from place to place -- in particular in the use of transport vehicles. On the other hand, the experts may not, and need not, know anything about the contents. With any luck, the experts are very little in evidence after the day of the move. (You can get away without the experts if the distance isn't great, if the amount of goods isn't large -- i.e. when you're in your 20's -- if there's nothing of particular value that might be broken, or if you've got a lot of spare time.)
Even though what's used for packaging, transport vehicles, contents and usage doesn't make much difference in general, in particular it often does, and there are good reasons for these things to conform to standards and to use generic materials, tools and processes:
Packaging can be of standard material, strength, and dimensions for economy of manufacture, for predictability in use -- to make it easier to plan moves, and for ease of reuse, and the attendant economy.
Transport vehicles can be of standard dimensions, capacities and usability for ease of reusability of the vehicles, for familiarity to the experts who use them, to fit well with the standards chosen for packaging and usage, and to conform to other standards requirements that themselves have nothing to do with household moving (for example, motor vehicle licencing).
Contents can be fit within standard limits of weights and dimensions, and can be be of standard form to fit well with their use -- sofas should be comfortable. Moving contents is generally only a minor consideration in choosing and using contents -- but in exceptional cases, you might decide you shouldn't buy something that is too heavy or too large to move, or which doesn't fit your house or your needs.
Usage can use standard descriptions and methodologies, but for reasons rarely having to do anything with moving the household. Generally, standards for usage correspond to their usage post-move: making sofas comfortable for a wide range of sitters, making electronics work with the signals that come into your house, and making kitchens and bathrooms safe for everyone to use.
Standards and generic materials (packaging) and processes (transport vehicles) are easiest to devise, use and generalize for what goes on on moving day. But they are very much harder to devise, use and generalize for what is later done with what is moved -- the contents, and at best, tend to get used in an "ad hoc" fashion after moving day.
Moving data from one place to another involves contents, packaging transport vehicles and experts just as does moving household goods, and just about all the same considerations apply to each, even though the particular contents, packaging, transport vehicles and experts are very different in each case.
Like moving-day packaging, data packaging:
protects the transport vehicles from damage by the contents;
provides a medium for labeling the contents with destination, location, handling and usage information -- which needs to be unambiguous in its audience and role;
is not the primary thing being moved, but is there to facilitate the move and its later usage;
is typically wrapped around the contents at the start of a move and is removed at the end of the move;
occasionaly has a role post-move, and is left wrapping the contents post-move;
can carry post-move usage notes that need to be kept around even after the packaging has been or could have been discarded for other reasons;
doesn't really matter as to what is used, so long as it works; but
is subject to standards, for reasons of economy, predictability and ease of reuse; and
is typically provided and handled by experts, even though simple, low-volume, low-value moves can be done on a do-it-yourself basis;
Data differs from household goods in a few but important ways:
Data is a lot more movable than houshold goods -- in fact it's movability and reusability is often its greatest value -- consequentially data is moved a lot more often than household goods, and to many more kinds of destinations and locations than houshold goods.
Keeping data in its packaging makes sense more often, for later use, for HOLDING so that it can be moved again, and because it is most usefully used in its packaged form.
Instructions for both moving and using data have to be much more precise than for moving and using household goods, even though the variety of contents and the variety of usage is generally no more complex than on moving day. This is mostly because there are many commonly understood invariants that don't need stating in human activities, that need explicating when using machines.
Most importantly, data generally needs to be serialized in order to transmit it between machines and to store it -- flattened out into one dimension. Serialization can greatly change the "shape" of data, especially when it is not a priori of serial form -- something that can't practically be done with household goods -- at least without a fight with the insurance company.
The similarities between moving house and moving data go a long way to explaining why standard and generic tools and methodologies are of use in moving data, and why standard and generic tools and methodologies are of use in making use of data. The differences between them go a long way to explaining why standard and generic tools are a bigger part of moving and using data than of moving and using household goods.
Standards facilitate communication:
by facilitating transmitting and receiving data with a minimum of fuss by having common agreement on as much as possible;
by increasing understanding amongst sharers of information to promote implementation of technical work; and
by developing a market of knowledgable personel with common understanding of the technology.
Standards improve convenience, safety and economy by reusing existing technology, by reusing existing expertise, and in general, by not having to reinvent the wheel
What do the XML family of standards define, at their core?
To start with XML is a packaging mechanism. XML start and end tags serve as boxes. The attributes of start tags, comments and processing instructions, provide for different kinds labeling, and character references (identifying characters by number or by name, instead of just putting in the characters) serve as protective wrappers.
XML itself, and the XML family provide a way of specifying those things that are commonly understood and agreed on (the things "everybody knows" in human activities, such as household moving, but which have to be explicated when talking to computers): DTDs for the form of the packaged data in outline, and XML Schemas for the form of the packaged data in more detail.
XML family standards provide ways of describing the usage of packaged data:
ID/IDREF, XLink, XPointer and others -- a variety of inter-document and intra-document linking conventions,
XPath -- a way of identifying data and within a document packed in XML, and
XSL, XSLT and others -- a variety of more general ways of describing usage, of XML packaging and its contents.
XML describes the form of data -- what it looks like -- not its "meaning". DTDs and XML Schemas likewise, describe limits on the form of data -- they describe what XML data should look like -- or constitute a promise about what XML data does look like. Data types in XML Schemas may identify some data as a "date", but they just describe what a date looks like, not what it means.
XML encoded data is a serialization -- it flattens out your data into a one-dimensional stream. For example, a two-dimensional table is serialized into a sequence of rows or columns, each containing a sequence of the other. Likewise a nested tree-like structure is serialized into a particular "normalized walk" -- left-to-right, depth-first. So the structure of an XML document is not necessarily the structure of the data it packages, but rather a flattening of the "true" data structure, from which flattening the original structure can be reproduced.
Why serialization? Because of the physical characteristics of our data movement media -- the transport vehicles of our data -- and because of the nature of our storage devices. And because serialized data is econimical to transmit and store.
What else is there that should be considered, outside of the XML family of standards?
The transport vehicles for XML packaged data -- the various Internet, data base and networking standards and usages used both for XML packaged and other data -- generally fall outside of the XML family.
Industry and other conventions for describing and placing constraints on the form of packaged data also fall outside of the XML family, whether these conventions and constraints are expressed using DTDs or XML Schema, or using other means. (DTDs and XML Schema can never say everything -- nor should they attempt to do so.)
Industry and other conventions for describing usage of data, be it XML packaged or otherwise, mostly fall outside of the XML family.
Perhaps most importantly, the contents, which at the end of the day is up to you, and for which you are the only standard, are outside of the XML family.
Like basic XML, DTDs and XML Schemas describe form.
There are basicly two kinds of form:
lexical form (what the individual pieces -- the tags, entity references and characters -- look like), and
syntactic form (how the individual pieces fit together -- into nested element structures).
DTDs and XML Schamas also provide default values -- mostly of attributes -- where the context makes it unambiguous that a required value would otherwise be missing.
DTDs and XML Schemas don't describe all form issues. Some issues are the business of the underlying standard (e.g the XML standard describes what tags look like, and that's that). Some issuses are conventional for the class of applications targeted. And some issues are strictly the business of usage that will be made of the data (for example, if a an element contains a person's name, then whether it's a legitimate name is up to the processing application).
XML Schemas allow you to describe a richer set of properties than do DTDs, and in a superficially simpler descriptive format. XML Schemas describe more detailed lexical properties -- i.e. data types -- than do DTDs, and there are more of them. On the other hand, the syntactic properties described by XML Schemas and DTDs are similar in their richness -- XML Schemas allow for a bit more flexibility than DTDs.
DTDs and XML Schemas serve a number of important roles:
DTDs and XML Schemas help in document creation. DTDs and XML Schema can be used by authoring sofware as an aid in creating XML documents. DTDs and XML Schema can be used by data transformation sofware as an aid in converting data into XML documents. DTDs and XML Schema can be used by people as a source of information about what they should create, either by hand or in computer programs they are writing to create XML documents.
DTDs and XML Schemas serve as a promise when delivering XML documents, a DTD or XML Schema, provided explicitly or implicitly can constitute a promise about the data that accompanies it. Depending on the extent to which a provider of data is a "trusted" source, the promise can be sufficient knowledge for use of the data. "Well-formed" XML documents are the best choice for trusted source documents.
DTDs and XML Schemas enable validation, when receiving XML documents from untrusted sources, a promise is not sufficient and you have to check for yourself.
DTDs and XML Schemas aid usage. A DTD or XML Schema can be used to help generic tools -- such as XML parsers -- deliver data from XML documents to data processing programs in appropriate ways -- data fields, for example.
Computer programs use data, whether for their own nefarious purposes or for presentation to human beings. There are a great variety of kinds of computer programs, with different capabilities and contexts in which they operate, but they are basicly all computer programs -- written in computer programming languages. When moving a household, it's relatively easy to standardize the packaging and transport vehicles, but much harder to categorize the contents and its usage -- because it can be anything.
Usage falls into two broad categories:
Client-side usage is primarily a matter of receiving, presenting and capturing data -- browsers and their kin. There are strong reasons for wanting standards for client-side technologies: they need to be ubiquitous -- it needs to be known that they are always there, they need to be stable -- opportunities for keeping them up-to-date don't generally exist, they need to be inexpensive and trouble-free, and the mechanics of the technology needs to be largely invisible to its users -- who are typically not EXPERTS -- and shouldn't be expected to be.
Client-side usage is typically limited in its scope -- browsers for example.
Client-side usage standards include style sheets, scripting languages, and "secure" programming languages such as Java and C# -- "behind the scenes" technologies.
Server-side usage is primarily a matter of delivering, transforming and otherwise processing data -- data bases and the programs that work on them. Standards in server-side technology need standards for quite different reasons than client-side technologies. Server-side technologies typically represent a major investment, and this investment needs to be preserved. Server-side technologies are used by experts -- and expertise depends on a stable technology base. Server-side usage is typically unlimited in its scope.
Ubiquity is not a requirement for server-side technology -- functionality -- the ability to do the job is -- so standards are not as overwhelmingly important as for client-side technologies.
Server-side usage standards include data-base standards, programming language standards (C++, Visual Basic etc.), and standard programming tools (Tk and other graphics packages).
Standards are standards when they are stable and serve well-defined roles -- as described for our industries in this presentation.
Standards are not standards when they are a moving target, where the roles they serve are not clear, and where overlapping roles (for example, DTDs and XML Schemas) cause confusion.
Use common and popular standards for data packaging standards, transport, and client-side -- and don't be much concerned with how much you like them so long as your data gets to its destination, well labeled and undamaged.
Be clear as to what you are getting from standards. Ubiquity for packaging, labeling, transport and client-side technologies. Expertise for server-side technologies.
Take care to avoid overlapping use of standards -- having both a DTD and an XML Schema for a class of XML documents typically causes confusion, and does little, if any, good.
|
Table of contents | Author | City | Company | Country | State/Province | Term | Interchange | ![]() |