Abstract
The abstract was not available at the time the proceedings were created. Please check an updated version of the paper abstracts at the conference proceedings web site.
Table of Contents
literally thousands of programs, commercial and Open Source, now use XML in some way;
the store shelves are lined with books about XML, with new ones appearing all the time; and
XML is one of the most talked-about technologies everywhere, from venture capital firms to university research labs to governments to Fortune 50 corporations: some large software companies are basing their future strategies around it.
A parting of the ways.
A thousand failed experiments.
Thinking small.
Specifications like XML are designed to standardise what people do in common, so that people can concentrate on what they do differently. Like its predecessor SGML, XML captures the common ground among human-readable documents and machine-readable data: both benefit from some kind of recursive tagging syntax. Even at that low level, there is tension: mixed content adds complexity that data people would be happy to do without, while proper element nesting makes it difficult for document people to tag overlapping areas.
Specifications trying to capture further common ground between the document and data camps will run into seriously diminishing returns, since any benefit they give to one side will be matched by an unacceptable cost to the other. XML is the fork in the road where the document and data people must shake hands, part, and follow their own paths, just as, in networking, the TCP and UDP people part ways after IP.
Let the data people concentrate on finding better ways to exchange data using XML, and let the document people concentrate on better ways to exchange documents using XML: each side will make much faster progress once it is freed from having to worry about the other's conflicting requirements. If, in the future, they do discover more common ground, they can always go back to the XML level and standardise it together.
Very Big Companies have been interested in XML almost from the start, and their interest has been as much of a liability as an asset, simply because people have an unfortunately tendency to listen to what the big companies say. Consider how few of the big, noisy, technological master plans of companies like IBM, Microsoft, Oracle, and Sun actually came to anything at all during the 1990's: while there were exceptions, most real innovation bubbled up from the bottom, rather than sinking down from the top. If central planning by big organisations actually worked better than the free market, it would be the United States considering hoping the Warsaw Pact right now rather than Russia hoping to join NATO: we know that top-down planning does not work with governments, so why do we believe that it will work with large companies?
That is not to imply that big organisations are any worse than the little ones; rather, we always hear about what the big organisations plan, no matter how ludicrous, but we don't hear from the little people until they actually prove themselves. We will not succeed in making anything out of XML by sitting down in committees and writing master plans: instead, we all need to go home, sit in our studies or rec rooms, and come up with ways that XML can work for end users. It is guaranteed that we will almost all fail, many of us in sad and humiliating ways, but from among a thousand failed experiments we may extract one success that helps XML to transform the world. We already have one small success, RSS, to guide us.
Big technological successes seem to start with small ideas: the Web, for example, was originally just a slightly-improved Gopher that allowed links to be embedded in text rather than separated out into menus. In that spirit, I would like to propose the first of the thousand failed experiments, kindly reducing the probability of failure for the rest of you by about a tenth of one percent.
Since HTML already does a pretty good job of letting people publish human-readable documents online, it makes sense to concentrate on publishing machine-readable data instead.
XML is capable of representing data in both the relational and hierarchical idioms.
<data> <airport id="id11111"> <icao-code>CYOW</icao-code> <name>Macdonald-Cartier International</name> <location-ref ref="#id22222"/> </airport> <municipality id="id33333"> <name>Ottawa</name> <region-ref ref="#id44444"/> </municipality> <region id="id44444"> <name>Ontario</name> <country-ref ref="#id55555"/> </region> <country id="id55555"> <name>Canada</name> </country> </data>
Figure 1. Relational data in XML
<data> <airport> <icao-code>CYOW</icao-code> <name>Macdonald-Cartier International</name> <location> <municipality>Ottawa</municipality> <region>Ontario</region> <country>Canada</country> </location> </airport> </data>
Figure 2. Hierarchical data in XML
Because the relational model has a flat representation, we can exchange relational data using simpler markup methods like comma- or tab-delimited text or even spreadsheet files -- XML has no natural advantage.
On the other hand, legacy markup methods do a very poor job with hierarchical and repeatable data, which plays to XML's natural strengths. The hierarchical idiom is also easier to learn and teach, and it follows a similar structure to many computer's filesystems. We can allow identifers for any branch or leaf node (using attributes), and we can also develop a simple path language, possibly a subset of XPath. The path language can follow the fragment identifier '#' in URLs to provide a globally unique way of addressing any part of an online data document.
Most people use spreadsheets to work with data on their personal computers. Once we have a path language that can point to any part of an online data document, spreadsheet integration becomes trivially easy: we simply create a new function to pull data into a cell from an online XML document.
=xmlref("http://www.funds.com/prices.xml#fund[growth01]/current-price")
Figure 3. Spreadsheet integration
The spreadsheet can now automatically update the current share price of my mutual fund from data published online in a simple XML format. This is a small application, but it's a real one that people can use and appreciate, and it involves minimal disruption: no new infrastructures, no new hardware or software, and minimal new training.
Some other features, like linking, will make the simple data format even more useful. As we experiment with this (or perhaps with the experiment that doesn't fail), we can learn what it does and does not make sense to do with XML, and how we can change people's lives in some meaningful, if small, way.
More details on the simple format introduced here, together with Java-based software, will be available through http://www.megginson.com/ close to the conference date.
![]() ![]() |
Design & Development by deepX Ltd. |