XML Europe 2003 logo

Making XML into a Really Useful Engine

Abstract

The abstract was not available at the time the proceedings were created. Please check an updated version of the paper abstracts at the conference proceedings web site.


Table of Contents

1. A parting of the ways
2. A thousand failed experiments
3. Thinking small
Biography
After five years' hard work, we can start by taking a moment to celebrate XML's successes:
The fact that XML is in so many people's minds is an astounding success, and much of it is due to the groundwork laid down by early, tireless evangelists like Tim Bray and Jon Bosak.
But what about results?
After all, we didn't design XML to become famous; we designed it to do work. Five years ago, many people hoped that XML would produce an even bigger information revolution than HTML did: after all, HTML is still intended for human consumption, while XML is designed to be read directly by computers. Smart e-commerce agents would crawl the Web, looking for the best prices on blue jeans or fibre-optic bandwidth (before they had to start giving it away), companies would exchange data seamlessly, and search engines would always find what they were looking for.
By the mid-1990s', when HTML was five years old, it was already easy to point out how it and the Web had transformed the lives of ordinary people. How has XML transformed ordinary people's lives? Has XML made the world a different place now than it was in 1998?
The answer, unfortunately, is no. XML remains tomorrow's technology instead of today's. E-mail and the Web changed the world, but XML has hardly touched anyone yet. It is not that we have not had time. Change has happened quickly elsewhere over the past five years: the peer-to-peer revolution started, Napster rose and fell, and the entertainment industry was changed forever; broadband made its way into North American homes, while Europeans and Asians went wireless; Java-in-the-browser faded and DHTML rose up to take its spot; the dot.com and telecoms bubbles inflated and then burst. While XML was poised to change the world, the world shrugged, went on, and changed without it.
I'm not here today, however, to complain about XML, because I do not believe that there's anything seriously wrong with it. No about of tweaking, expanding, subsetting, debating, or rewriting is going to make XML any more relevant than it already is. XML is, after all, just plumbing: it does not have to be beautiful, as long as it works. The problem with plumbing is that while it's essential, it's not useful until you attach something to it, and that is where we all are stumbling.
What can we do with XML? What do we want to do with XML?
People started to get interested in the Web when they could look at it through the window of early graphical browsers like Chimera and Mosaic. There was a certain amount of excitement in our own community early on, when Internet Explorer and Mozilla gained the ability to browse XML documents directly using stylesheets. Unfortunately, the rest of the world could not see much to get excited about: XML pages looked just like HTML pages, but there were very few authoring tools and the pages wouldn't work in as many browsers.
A year or two ago, there was a lot of excitement once again, this time, about Web services using SOAP. Again, the non-technical world is simply puzzled by the whole thing, and constantly needs reminding why they would want to subscribe to commercial services over the network instead of using programs they already have installed on their personal computers.
If XML is going to transform the world in any meaningful way, it will have to let people do something new and useful, not just provide a more technically-correct way of doing what people can already do. The path to success has (predictably) three steps:
  1. A parting of the ways.

  2. A thousand failed experiments.

  3. Thinking small.

1. A parting of the ways

Specifications like XML are designed to standardise what people do in common, so that people can concentrate on what they do differently. Like its predecessor SGML, XML captures the common ground among human-readable documents and machine-readable data: both benefit from some kind of recursive tagging syntax. Even at that low level, there is tension: mixed content adds complexity that data people would be happy to do without, while proper element nesting makes it difficult for document people to tag overlapping areas.

Specifications trying to capture further common ground between the document and data camps will run into seriously diminishing returns, since any benefit they give to one side will be matched by an unacceptable cost to the other. XML is the fork in the road where the document and data people must shake hands, part, and follow their own paths, just as, in networking, the TCP and UDP people part ways after IP.

Let the data people concentrate on finding better ways to exchange data using XML, and let the document people concentrate on better ways to exchange documents using XML: each side will make much faster progress once it is freed from having to worry about the other's conflicting requirements. If, in the future, they do discover more common ground, they can always go back to the XML level and standardise it together.

2. A thousand failed experiments

Very Big Companies have been interested in XML almost from the start, and their interest has been as much of a liability as an asset, simply because people have an unfortunately tendency to listen to what the big companies say. Consider how few of the big, noisy, technological master plans of companies like IBM, Microsoft, Oracle, and Sun actually came to anything at all during the 1990's: while there were exceptions, most real innovation bubbled up from the bottom, rather than sinking down from the top. If central planning by big organisations actually worked better than the free market, it would be the United States considering hoping the Warsaw Pact right now rather than Russia hoping to join NATO: we know that top-down planning does not work with governments, so why do we believe that it will work with large companies?

That is not to imply that big organisations are any worse than the little ones; rather, we always hear about what the big organisations plan, no matter how ludicrous, but we don't hear from the little people until they actually prove themselves. We will not succeed in making anything out of XML by sitting down in committees and writing master plans: instead, we all need to go home, sit in our studies or rec rooms, and come up with ways that XML can work for end users. It is guaranteed that we will almost all fail, many of us in sad and humiliating ways, but from among a thousand failed experiments we may extract one success that helps XML to transform the world. We already have one small success, RSS, to guide us.

3. Thinking small

Big technological successes seem to start with small ideas: the Web, for example, was originally just a slightly-improved Gopher that allowed links to be embedded in text rather than separated out into menus. In that spirit, I would like to propose the first of the thousand failed experiments, kindly reducing the probability of failure for the rest of you by about a tenth of one percent.

Since HTML already does a pretty good job of letting people publish human-readable documents online, it makes sense to concentrate on publishing machine-readable data instead.

XML is capable of representing data in both the relational and hierarchical idioms.

<data>

 <airport id="id11111">
  <icao-code>CYOW</icao-code>
  <name>Macdonald-Cartier International</name>
  <location-ref ref="#id22222"/>
 </airport>

 <municipality id="id33333">
  <name>Ottawa</name>
  <region-ref ref="#id44444"/>
 </municipality>

 <region id="id44444">
  <name>Ontario</name>
  <country-ref ref="#id55555"/>
 </region>

 <country id="id55555">
  <name>Canada</name>
 </country>

</data>

Figure 1. Relational data in XML

<data>

 <airport>
  <icao-code>CYOW</icao-code>
  <name>Macdonald-Cartier International</name>
  <location>
   <municipality>Ottawa</municipality>
   <region>Ontario</region>
   <country>Canada</country>
  </location>
 </airport>

</data>

Figure 2. Hierarchical data in XML

Because the relational model has a flat representation, we can exchange relational data using simpler markup methods like comma- or tab-delimited text or even spreadsheet files -- XML has no natural advantage.

On the other hand, legacy markup methods do a very poor job with hierarchical and repeatable data, which plays to XML's natural strengths. The hierarchical idiom is also easier to learn and teach, and it follows a similar structure to many computer's filesystems. We can allow identifers for any branch or leaf node (using attributes), and we can also develop a simple path language, possibly a subset of XPath. The path language can follow the fragment identifier '#' in URLs to provide a globally unique way of addressing any part of an online data document.

Most people use spreadsheets to work with data on their personal computers. Once we have a path language that can point to any part of an online data document, spreadsheet integration becomes trivially easy: we simply create a new function to pull data into a cell from an online XML document.

=xmlref("http://www.funds.com/prices.xml#fund[growth01]/current-price")

Figure 3. Spreadsheet integration

The spreadsheet can now automatically update the current share price of my mutual fund from data published online in a simple XML format. This is a small application, but it's a real one that people can use and appreciate, and it involves minimal disruption: no new infrastructures, no new hardware or software, and minimal new training.

Some other features, like linking, will make the simple data format even more useful. As we experiment with this (or perhaps with the experiment that doesn't fail), we can learn what it does and does not make sense to do with XML, and how we can change people's lives in some meaningful, if small, way.

More details on the simple format introduced here, together with Java-based software, will be available through http://www.megginson.com/ close to the conference date.

Biography

David Megginson, principal of Megginson Technologies, has been active within the SGML and, later, XML communities since 1991. He led the original initiative that created SAX, the [Simple API for XML], which is now the most widely used streaming API for XML and has been implemented in products by IBM, Oracle, Apache, and Sun, along with many others.

David's work includes many Open Source software packages, together with the book [Structuring XML Documents], published by Prentice-Hall.

David formerly chaired the XML Information Set Working Group at the World Wide Web Consortium (W3C) and served as a member of the W3C's XML Working Group and XML Co-ordination Group.

In Spring 2000, David was proud to receive the [Java Technology Achievement Award For Outstanding Individual Contribution to the Java Community] from Sun Microsystems and [JavaPro] magazine.