How Much Pain for XML's Gain?

Keywords: XML, Binary encoding, SOAP, Mobile, Interoperability, Web services, Middleware, Performance

Michael Champion
Software AG
Reston
Virginia
United States of America
michaelc.champion@gmail.com

Biography

Michael Champion has worked as a software developer since 1980, and has specialized in SGML/XML since 1996. Employed by Software AG from 1999 through 2004, he represented the company on several W3C working groups, and co-chaired the Web Services Architecture group.


Abstract


XML's core value proposition is that it is a text-based, platform-neutral markup approach to describing a wide range of document and data formats that provides a good compromise between the conflicting needs of humans and machines. It is becoming apparent, however, that these advantages come at a significant cost in terms of the overhead required to store and process data. This has been the subject of considerable investigation that has been presented at a W3C-sponsored workshop in September 2003 and in a Workshop on High Performance XML Processing held in conjunction with the WWW 2004 Conference. This paper attempts to pull together what is actually known on this subject and to provide an evenhanded analysis of options that have been put forth to address the performance problems. A firm conclusion on what is "best" is impossible in principle, given the range of requirements and use cases for XML. The objective here is to support an informed discussion of the situation and options to address it, not to advocate for or against a specific solution.


Table of Contents


1. Introduction
2. Measuring the Pain
     2.1 XML compared with existing alternatives
     2.2 XML compared to proposed alternative formats
     2.3 Assessment
3. Diagnosing the Pain
4. Some Proposed Analgesics
     4.1 Code optimization
     4.2 Simplification
     4.3 Standardized Alternative Infoset Serialization(s)
     4.4 Hardware Acceleration
     4.5 Hybrid Approaches
5. Lots of Second Opinions
     5.1 Premature optimization is the root of all evil
     5.2 It's the interop, stupid!
     5.3 One or two optimized serializations are not enough
     5.4 "View Source Principle"
     5.5 Alternative serializations don't support all of XML
     5.6 If XML doesn't work for you, avoid it, don't pollute it
6. Conclusion
     6.1 Facts which don't appear to be in serious dispute
     6.2 Personal Reflections
     6.3 What should application developers do?
          6.3.1 Standards developers
Bibliography

1. Introduction

By now, most people in the software industry are generally familiar with XML's core value propositions -- it is a text based, platform-neutral markup approach to describing a wide range of document and data formats that provides a good compromise between the conflicting needs of humans and machines. It is becoming apparent, however, that these advantages come at a significant cost in terms of the overhead required to store and process data.

This presentation attempts to pull together what is actually known on this subject and to provide an evenhanded analysis of options that have been put forth to address the performance problems. Specific points covered include:

2. Measuring the Pain

There seems to be little dispute that XML is considerably more computationally intensive to parse than application-specific binary formats. Even Microsoft, which has collectively opposed the idea of standardizing a binary representation of XML, acknowledges [PAL] :

An important measure is the parsing cost of the binary form compared to the text representation. It can be up to one order of magnitude faster or even better, which saves a great deal of power on small devices

2.1 XML compared with existing alternatives

There have been a number of more detailed reports of benchmarks indicating that XML parsing is a serious bottleneck in some real-world systems. Many of these are found in position papers prepared for the W3C Workshop on Binary Interchange of XML Information Item Sets [W3C-Workshop] . Leventhal et al. [LEVENTHAL] categorize and tabulate the various proposals. This section briefly summarizes some from the workshop and elsewhere that present actual data.

In the discussion at the W3C workshop [W3C-Workshop] it became clear than Sun has a significant number of customers who use RMI, wish to migrate to a more open mechanism, have experimented with SOAP, but find the performance unacceptable. The Sandoz et al. [SANDOZ] paper investigated the performance of SOAP remote procedure calls relative to Java RMI. The investigators found that using the relatively verbose rpc/encoded format, the latency was approximately 10 times that of RMI in the same environment; using the less verbose rpc/literal encoding, the difference was approximately a factor of 5. An alternative binary serialization "FAST Infoset" that leverages the ASN.1 [ASN] standards improved performance by a factor of 1.6. Another binary serialization that exploited the known schema of the SOAP messages increased performance to be comparable to RMI.

Nicola et al. [NICOLA] report on several real world XML usage situations in which XML parsing was shown to be a significant bottleneck. In the most detailed case presented, they examine the relative time to load a single file containing more than 10 million rows in a delimited field format, compared against the time to load approximately 3GB of XML representing the same data. The XML load was approximately 26 times slower, and over 99% of the overhead was XML parsing. No XML validation was performed. Other examples, involving a variety of XML parsers, DBMS vendors, and types of data show a pattern of parsing overhead being a critical bottleneck preventing XML-based applications from achieving anything near the performance of the systems they were designed to replace.

They also present a more detailed analysis of the number of CPU instructions needed to parse and validate a variety of XML documents using the XML4C SAX parser. Their findings indicate that XML parsing can easily double or triple the instruction count of a database transaction. DTD and especially schema validation can double or triple this again.

Kohlhoff and Steele [KOHLHOFF] compare the performance and overhead of SOAP with those of two financial industry formats, one textual (FIX) and one binary (CDR). They found that SOAP messages were 2-4x bigger and 8-10x slower to process than equivalent CDR messages. This is not due simply to the overhead of text-based formats, since SOAP messages were also found to be 3-4x bigger and 9x slower than FIX.

Another team [TATARINOV] , in an article about XML and relational databases, allude somewhat tangentially to hard data on XML parsing overhead: "In our experiments, we found that parsing overhead was often an order of magnitude more expensive than XPath processing."

A study by the Mitre Corporation [COKUS] focused only on the issue of XML using far more bandwidth than is desirable in military applications. (Current military communications formats are highly optimized to preserve bandwidth, but naturally pose significant interoperability problems that XML might address). They compare a variety of schema-based and conventional redundancy-eliminating compression schemes on a range of XML documents of varying sizes.

Generally, we note that there exists a point at which redundancy based compression overtakes the efficiency of schema-based encoding and that this point is relatively low for the data set tested, roughly estimated to be in the neighborhood of 10,000 bytes. Schema-based techniques appear to be optimal for smaller XML files due to the efficiencies of compacted localized field and structural encoding. Application of redundancy compression for smaller XML files provides no benefit or proves to be a disadvantage.

Hybrid techniques, combining schema-based and redundancy compression, produced the best results overall with the exception of the smallest. We expect schema-based encodings to be of significant advantage versus the other techniques for XML messages of this type for sizes 10,000 bytes or less.

2.2 XML compared to proposed alternative formats

The Cubewerx paper [CUBEWERX] evaluates a general-purpose infoset encoding they call BXML. This was designed to address the challenges encountered in processing Geography Markup Language (GML) data in its native XML format. Most of the information in a GML document is floating-point numeric data, which proved to be quite inefficient to process. They authors conclude: "The binary encoding used in BXML is greatly faster than XML encoding for processing general data and is enormously faster for dense numeric data. The uncompressed files are also significantly smaller, especially for numeric data, and are more efficient to compress and uncompress."

Dennis Sosnoski [SOSNOSKI] presents an encoding format for XML documents that is intended for use in transmitting XML documents between application components efficiently. He reports findings that this XBIS format offers several times the read performance of SAX2 provides even more of a performance advantage during the serialization phase. XBIS also provides a considerable size reduction as compared to XML text, at least for certain types of content. Compared to alternatives such as gzip, XBIS delivers considerably less compression but costs much less in terms of CPU performance.

Mike Conner of IBM [CONNER] presents results learned from experiments with a compact binary XML representation.

The basic idea of CBXML is to provide an encoding of XML information that retains all the platform neutrality and self-descriptive benefits of the standard character-oriented encoding of XML while greatly reducing document size and allowing them to be processed at greater speed with simpler software.

Measurements indicated that these objectives were achieved, with an up to 6x decrease in message size and a 2x to 7.5x decrease in parsing time. As Sosnoski also found [SOSNOSKI] , achieved compression was less than ZIP for large messages but compression actually speeded up processing performance.

2.3 Assessment

It is quite clear from surveying the research in this area that XML really does impose a significant overhead on a signficant set of real-world applications, especially those in enterprise-class transaction processing environments and those involving wireless communication. In both scenarios it is clear that developers, vendors, and customers desire the benefits of standards-based portability and interoperability, but are unable to use XML in its current form.

Furthermore, currently deployed technological fixes do not alleviate this pain for these two classes of users. As for reducing size, conventional text compression algorithms do not work at all on the short messages with little redundant text that are common in web services applications and preferred by wireless developers. Likewise, the studies noted above generally show that the processing of of these algorithms often negates any perceived performance benefit from reducing the amount of bandwidth needed to send message. Furthermore, "throwing hardware at the problem" is not a viable option for battery powered mobile devices with intrinsically limited and where every extra CPU cycle drains the battery all the sooner.

3. Diagnosing the Pain

To sum up the previous section, there is now some fairly solid evidence that the size and processing overhead of XML causes some significant pain for potential users in at least two areas, enterprise-scale transaction processing (especially when there is a web service interface), and in the civilian and military wireless infrastructure. Wireless industry members are particularly explicit about the problem: XML's basic value proposition is very appealing -- they want to exploit the expertise and (Infoset-based) tools that XML's popularity offers, but unless the "XML tax" is reduced, XML itself will not become ubiquitous in the wireless world.

What causes this problem? van Lunteren et al. [VANLUNTEREN] state the basic problem succinctly:

There seems to be a fundamental problem with XML processing in software that will prevent the processing rate to be improved beyond a best processing rate of tens of clock cycles per character, and that for many XML applications can result in processing rates on the order of hundreds of clock cycles per character.

This paper cannot investigate them in detail, but a few hypotheses leap out from an immersion in the papers and discussions on this topic:

4. Some Proposed Analgesics

4.1 Code optimization

Several of the materials reviewed here note that XML parsing is still relatively immature technology. Much of it is implemented in open source projects in which modularity and readability of code are more important to success than brute speed. For example, modularity considerations may require separate scans through an instance character by character to transcode from the external encoding to an internal Unicode character representation, locate tokens in the XML grammar, determine the length of strings, copy text inputs to SAX events or tree structures, validate the element/attribute structure and make type annotations, and so forth. Memory management is a well-known bottleneck in tree-based APIs, but some of the analyses reviewed here noted that as a performance issue in SAX parsers as well. The optimization challenges are magnified by the problems of matching the impedance of XML processing with the lower level network and database interfaces and the higher level applications on whose behalf the parsing is done. All this is fertile ground for some rocket-science of the sort that was applied to problems of program language compilation in previous decades. Some attendees at the W3C Workshop appeared to believe that this would address make up most of the performance shortfall that was noted. We shall see, of course.

4.2 Simplification

This is mentioned only in passing by most discussions of XML processing bottlenecks, but the complete XML 1.0 specification has a number of features that are very difficult to support when performance is critical. For example, the main reason that SOAP 1.2 specifies that an "XML infoset of a SOAP message MUST NOT contain a document type declaration information item" is performance: "Doing general entity substitution beyond that mandated by XML 1.0 (e.g. <) implies a degree of buffer management, often data copying, etc. which can be a noticeable burden when going for truly high performance. This performance effect has been reported by workgroup members who are building high performance SOAP implementations." [FALLSIDE]

This suggests that a sine qua non of an alternative Infoset serialization that is optimized for parsing efficiency is that general entity references should either be forbidden altogether or resolved in some sort of a canonicalization step before the binary format is created. This would require a significant refactoring of profiling of the W3C corpus, but could facilitate performance improvement with little pain to mainstream XML users (who tend to shy away from the more exotic and burdensome features of XML in the first place).

4.3 Standardized Alternative Infoset Serialization(s)

Although it is a handy label, "Binary XML" is a misnomer for many reasons, not the least of which is that many consider it an oxymoron. The more ponderous term "binary serialization of the XML infoset" covers a number of ideas. The W3C Workshop report [W3C-Workshop], and Leventhal et al. writeup [LEVENTHAL] summarizes them at a high level.

There appear to be two main classes of XML serializations that are studied, those which assume shared knowledge of a schema governing the information exchanged, and those which carry along descriptive metadata, along the lines of XML itself. It is worth noting that shared-schema encoding schemes can be considerably smaller than the schemaless ones, but there is a smaller set of use cases for them.

4.4 Hardware Acceleration

There are a number of vendors currently selling hardware products that address the types of processing bottlenecks discussed here. The details are beyond the scope of this paper, but see . Some of these vendors have been involved with the W3C efforts to assess the potential needs and use cases for optimized XML serializations. I would expect these to be complementary rather than competitive approaches.

In the longer run, there is fundamental research going on, e.g. that described in the van Lunteren et al. paper [VANLUNTEREN], to develop formal automata optimized for XML processing that could be implemented in real hardware.

4.5 Hybrid Approaches

It may be possible to develop hybrid approaches that combine the strengths and cancel the weaknesses of specific alternatives. For example, hardware accelerated parsing offers fewer advantages if all the XML processing has to be done on a specific appliance, but could be more generally useful if the hardware emits a efficient format that XPath, XSLT, DOM, etc - based applications can efficiently consume. XML accelerator vendors do this with proprietary XML serializations now, and at least some are participating in the W3C efforts to investigate the feasibility of standards.

Likewise, it may be possible in principle to combine the performance advantages of a binary format with the interoperability and visibility advantages of a text format by associating a binary index that describes the parsed structure of a document with the text itself. For example, a binary "attachment" to an XML document or message might describe the element hierarchy, with offsets/lengths into the text format used to pick up actual values. (Of course, this buys speed at the cost of additional size). Applications that understand the binary index can build some Infoset (or application-specific) representation quickly by using the pre-indexed attachment, yet applications that don't can simply rebuild the Infoset structure by parsing the text. This is essentially applying a design pattern used in XML (and other text) database systems to document instances, and could work in an interoperable fashion if the binary structure component were standardized. For example, the Ximpleware position paper [ZHANG]proposes:

While the processing performance of XML is a very important issue that will directly impact the its adoption as a platform-independent, interoperable and open data/document encoding format, a good solution should not compromise XML's human-readability that lies in the core of its value proposition. By pioneering a hardware-accelerated XML processing model that has some interesting properties, XimpleWare hopes to help alleviating XML's performances issue and does so without violating the "view-source principle."

This has now been implemented in an open source project [VTD-XML]

5. Lots of Second Opinions

Some of the proposed alternatives simply involve processing standard XML text more intelligently, or adding optional additional information or processing steps. Mailing lists, webpage discussion sections, and probably a few drinking establishments at conference hotels have been filled with pushback on the general sentiments described above. Some of the most commonly heard concerns include:

5.1 Premature optimization is the root of all evil

There is little doubt that the sound byte "XML is bloated and inefficient" one sees thrown around by XML opponents misses this point completely. It also seemed obvious to many that the Web couldn't succeed because HTTP is bloated and inefficient, or that Java wouldn't succeed because of the virtual machine overhead, and these assertions look absurd in retrospect. The truth of C.A.R. Hoare's famous dictum that "premature optimization is the root of all evil in programming" has been demonstrated repeatedly, and many argue that XML will eventually illustrate it once again. Jon Udell [UDELL] puts this point across well (although in a somewhat different context). 'For any technology, the statement "X doesn't scale" is a myth. The reality is that there are ways X can be made to scale and ways to screw up trying.'

In fact, many of the suggestions in the Nicola et al. paper [NICOLA] are to investigate how to tweak the basic architecture of XML applications and libraries so as to optimize parsing, e.g. by exploiting opportunities for parallelism, to ensure that all the necessary work gets done in a single pass through the XML instance, to exploit opportunities for 'laziness', and so forth. These can be seen as similar to the kinds of optimizations to processors and tweaks to the specifications that took place behind the scenes to make internet and web technologies scale.

5.2 It's the interop, stupid!

There is an ancient one-liner "the operation was a success, but the patient died." That might summarize the most common objection to eliminating the pain of XML but eliminate its value as a universal, text-based format -- the whole point of XML is interoperability, and anything that threatens it undermines XML's very reason for existence.

Omri Gazitt [GAZITT] states what seems to be a strong consensus among Microsoft webloggers: "The XML activity is about interop, and we have great interop using a text format - in fact that's one of the key reasons that's consistently cited for XML's success." Likewise, Joshua Allen [ALLEN] argues: "Having a solid, reliable "obvious choice" like XML 1.0 means freedom from pain for millions of developers. Let's please don't mess with that too hastily."

5.3 One or two optimized serializations are not enough

On a closely related point, some argue that there are so many use cases [W3C-XBC] that it is likely that a large number of alternatives will be necessary to cover them. This point is made quite forcefully in Microsoft's position paper [PAL] :

For different classes of applications, the criterion (minimize footprint or minimize parse/generate time) for the binary representation is different and often conflicting. There is no single criterion that optimizes all applications. Consequently, a binary standard could result in a suite of allowable representations that clients and servers must be prepared to receive and process. This is a retrograde step from the portability goals of XML 1.0.

5.4 "View Source Principle"

XML is a text format that humans can read, and many argue that this will generate greater actual business benefits for auditing and debugging than it costs to throw extremely cheap and commoditized hardware at any bottlenecks.

5.5 Alternative serializations don't support all of XML

Some have noted that most, perhaps all, of the efficient serialization or external indexing proposals can handle only a subset of the XML 1.x Recommendation. Few support DTDs, mainly because the introduce the issue of entity declarations and references which have proven extremely difficult to optimize.

5.6 If XML doesn't work for you, avoid it, don't pollute it

Some of the most fervent opposition to all this talk about the bottlenecks that XML creates for web services in enterprise-class applications comes from people who are content with XML's capabilities for the relatively simple, text-oriented applications it was originally designed to address. [CLARK] argues that binary XML efforts raise "a kind of social concern for the rest of us, that is, those who think XML, as it is, is good enough; or, at least, good enough often enough that the binary variants are likely to be a waste of time, at best."

6. Conclusion

6.1 Facts which don't appear to be in serious dispute

5- 10x processing overhead of XML over application-specific binary formats:there does appear to be some dispute as to how much that could come down with intensive research and competition to improve parsing algorithms and implementations.

The wireless industry needs an optimized XML-like standard. It is very clear that XML itself won't get much traction in that domain without optimization of the specs. XML text is a non-starter for things like images and maps given foreseeable technological realities. The wireless industry, however, wants to exploit internet and web standards as much as possible. As Nokia's position paper [NOKIA] puts it, formats and protocols "ought be wireless-aware but not wireless-specific ."

Conventional compression is not sufficient. GZIP and similar compression approaches are already incorporated into much of the HTTP and communications (e.g. MNP-5) infrastructure. They are useful for relatively large documents/messages, but not for small ones.. They also require non-negligible processing power, especially on the side doing the compression. Overall, they appear useful in some application situations, but cannot be relied on to be a general solution to XML's data size issues.

Bandwidth per se is not the issue . At least in the consumer market, an important goal is to minimize the time the user perceives between requesting a service and the display of useful information. Efficient bandwidth utilization is only one aspect of that, and the only one addressed by compression; working around the intrinsic latencies in wireless networks and minimizing the amount of processing by the highly constrained processors/batteries in mobile devices is generally more effective at improving the user experience.

"Moore's Law does not apply to batteries" Some variant of this phrase appears in almost every presentation by a wireless industry representative on the subject of XML. Battery life is a key quality of service measure to mobile device users, and exercising the CPU drains the battery, so it is seldom a good tradeoff to use computation-intensive compression methods to conserve bandwidth.

Massive experimentation in optimized infoset serializations is already happening . Leventhal et al. [LEVENTHAL] count 18 different existing formats and proposals in the submissions to the W3C XML workshop. At least ASN [ASN] is an international standard. Binary representations of parsed XML are widely used inside proprietary architectures and products, including those from companies expressing the most skepticism at the W3C Workshop. The issue is whether any should be standardized by W3C and treated as part of the "XML" corpus that generic XML tools might be expected to handle gracefully.

6.2 Personal Reflections

Is all this handwringing about the pain of XML just "premature optimization?" Looking at the XML application space as a whole, one would have to answer "yes" -- in the vast majority of cases where XML is applied, parsing overhead and network bandwidth is not a critical bottleneck. Sean McGrath[MCGRATH] makes this point exceedly well:

Anybody making XML system design decisions based on XML parser benchmarks is kicking the tires (i.e., focusing on the tires when buying a car). The tires are an obvious and accessible part of the system, their function is well understood and measurable. But frankly, there are way more important things going on under the hood from a performance perspective....Time for another quote. This time from Jon Bentley: "Make it work before you make it work fast".

There is one big difference between what Sean is talking about and the conclusion of the "Measuring the Pain" section above, however: there is a limited set of real-world scenarios in which XML parsing is demonstrably a bottleneck. Like Sean, I've learned and re-learned the lessons about the pitfalls of premature optimization. In this case, however, the XML industry as a whole has done the Right Thing: explored the utility of XML text, discovered a wide range of applications for which it is well suited, and found some where it appears to be suited except for well-documented bottlenecks. People are now exploring exactly how to break through or route around those bottlenecks.

The argument that the XML community as a whole need not worry about these bottlenecks in a few scenarios because interoperability trumps efficiency seems somewhat disingenuous to me: the interoperability of XML text applications is problematic! The most obvious issue is that XML is open to additional encodings beyond the mandatory UTF-8 and UTF-16, making the ability to mix-n-match parsers and data sources highly problematic at best. Furthermore, "in the wild" instances claiming to be of the most popular XML formats such as XHTML and the numerous variants of RSS can seldom be assumed to be schema-valid, and often require massaging even to be well-formed. Finally, even applications that really can assume well-formed and schema-valid XML inputs must cope with schema evolution over time as well as semantic mismatches (the same markup for different concepts and different markup for the same concept). One can sympathize with those who don't want to make this any worse, but it's hard to get all misty-eyed about the impending collapse of interoperability if binary encodings are introduced into the XML standards.

I do agree strongly with many of the other points mentioned in the "pushback" section above. For example, View Source was pretty much how I learned HTML and Javascript, how I debug XML produced by WYSIWYG tools, etc. Still, as the Cubewerx submission [CUBEWERX] argues, most tools that support View Source are actually serializing their internal data structures. That won't apply to auditing raw messages on the wire, of course, and real tradeoffs between making life easier for humans or faster for computers will have to be made.

The point that none (as far as I know) of the optimized serializations allow full round-tripping of XML syntax nor even support the full Infoset is undeniable. These can only be used in scenarios, such as SOAP-based web services, where they add no additional constraints.

6.3 What should application developers do?

First, of course, benchmark, profile, and optimize real bottlenecks. The chances are good that they will have nothing to do with XML parsing. If and only if XML parsing is the real bottleneck, worry about the things discussed here.

Throw hardware at the problem? . This might entail simply buying more general purpose hardware to handle the increased load imposed by the increasing popularity of XML, or may entail purchasing special-purpose XML accelerators from vendors such as Conformative, Datapower, Sarvega, and Tarari.

Minimize the size of XML instances in anapplication -- Judicious use of DBMS and designing appropriate message granularity can eliminate many of the concerns here, at least from the perspective of the end users.

Use right tool for the job. Investigate pull parsers and other 'lazy' approaches-- , especially if only a portion of a document is used. For example, it is well known that building and re-serializing a DOM tree in order to filter out a few elements is very inefficient. Just don't DO that!

Consider hybrid approaches that associate structure indexes with text so as to allow fast parsing "downstream" while preserving the textual virtues of XML. The open source VTD-XML project [VTD-XML] seems like a promising place to look for ideas along these lines. Of course, there are tradeoffs -- for exaple, one might have to support only a subset of XML 1.0 that can be represented in such hybrid formats.

6.3.1 Standards developers

I believe that the W3C is approaching the questions addressed here in the right way: Enumerating use cases in which XML is too inefficient to be productively used, identifying the specific properties that would need to be optimized in order for an XML-like format to be useful, and planning to evaluate proposed alternatives by measuring those properties. I do agree with the position most clearly espoused by Microsoft -- at the present time, experimentation rather than standardization is most important.

I am quite skeptical about the utility of schema-specific / application-specific optimizations which cannot be understood by routers, firewalls, etc. These can produce some good benchmark results, but overall this approach really does defeat the whole purpose of XML; if the endpoints of a distributed application are essentially shipping around opaque binary blobs, they might as well avoid the overhead of XML altogether.

Although this would create something of a perfect storm -- considering how controversial both "binary XML" and "XML subsetting is in the XML community -- there really is something to be said for defining performance-optimized profile of XML 1.x. The subset of XML implicitly defined by the SOAP Recommendation may be a good starting place.

Finally, Kendall Clark [CLARK] argues against W3C investing effort in this area because the "idea of a binary variant seems like a fairly radical proposal at what is a relatively late point in the game." This really gets to the nub of the issue here: Is it late in the game and changes to current XML practice will threaten interoperability? Or is it early in the game and the failure to address real-world problems created by XML will affect its viability?

I take the latter position: XML has been phenomenally successful because it effectively addressed a host of real problems by effectively leveraging the ideas and infrastructure that became prevalent in the 1990's. XML, however, has raised expectations, created new realities, and introduces a new set of problems. Something or other will address the pains described in the work summarized here. That "something" is quite certain to leverage the XML infrastructure -- especially the infoset-oriented tools such as XSLT/XPath/XQuery, schemas, and the numerous SOAP-based web services technologies, but will inevitably leave some non-interoperable debris in its wake. The open question for me is whether that something is a lineal descendent of XML that comes from the W3C, or whether it is an offshoot that is fostered elsewhere and comes back to claim XML's inheritance. I prefer the former, but predict the latter.

Bibliography

[ALLEN]
Binary XML Joshua Allen weblog entry. http://www.netcrucible.com/blog/2003/09/27.html#a360 2003
[ASN]
What ASN.1 can offer to XML http://asn1.elibel.tm.fr/xml/
[CHIU]
Investigating the Limits of SOAP Performance for Scientific Computing Kenneth Chiu, Madhusudhan Govindaraju, Randall Bramley. http://www.extreme.indiana.edu/xgws/papers/soap-hpdc2002/soap-hpdc2002.pdf 2003
[CLARK]
Binary Killed the XML Star? Kendall Grant Clark,. http://www.xml.com/pub/a/2003/11/19/deviant.html 2003
[COKUS]
Binary XML Position Paper: The Need for Standard Schema-based and Hybrid Compression Mike Cokus, Scott Renner, Dan Winkowski. Mitre Corporation. Workshop on Binary Interchange of XML Information Item Sets, September 2003 http://www.w3.org/2003/08/binary-interchange-workshop/25-MITRE-USAF-Binary-XML.htm 2003
[CONNER]
CBXML: Experience with Binary XML Mike Conner, IBM Corporation. Workshop on Binary Interchange of XML Information Item Sets, September 2003 http://www.w3.org/2003/08/binary-interchange-workshop/19-IBM-CBXML-W3C-Submission-updated.zip 2003
[CUBEWERX]
Cubewerx Position Paper for Binary Interchange of XML W3C Workshop on Binary Interchange of XML Information Item Sets, September 2003. http://www.w3.org/2003/08/binary-interchange-workshop/05-cubewerx-position-w3c-bxml.pdf 2003
[FALLSIDE]
XMLP WG Response on "SOAP and the Internal Subset" David Fallside http://lists.w3.org/Archives/Public/www-tag/2002Dec/0119.html 2003
[GAZITT]
Binary XML Standard Considered Harmful Omri Gazitt weblog entry. http://www.gazitt.com/OhmBlog/PermaLink.aspx/13178f4f-b851-4568-972b-2886048490a2 2003
[KOHLHOFF]
Evaluating SOAP for High Performance Business Applications Christopher Kohlhoff and Robert Steele.
[LEVENTHAL]
Binary Showdown Michael Leventhal, Eric Lemoine, Stephen Williams. XML Journal, November 2003. http://www.sys-con.com/xml/articleprint.cfm?id=745 September 2003
[MCGRATH]
XML Is Too Slow ... Not!2001Sean McGrath, ITworld.com http://www.itworld.com/nl/xml_prac/10112001/
[NICOLA]
XML Parsing: A Threat to Database Performance Matthias Nicola, Jasmi John. CIKM'03, November 3-8, 2003, New Orleans Louisiana 2003
[NOKIA]
Nokia Position Paper: W3C Workshop on Binary Interchange of XML Information Item Setshttp://www.w3.org/2003/08/binary-interchange-workshop/02-Nokia-Position-Paper_02.htm September 2003
[PAL]
A Case against Standardizing Binary Representation of XML Shankar Pal, Jonathan Marsh, Andrew Layman, Microsoft Corporation. W3C Workshop on Binary Interchange of XML Information Item Sets. http://www.w3.org/2003/08/binary-interchange-workshop/29-MicrosoftPosition.htm September 2003
[SOSNOSKI]
XBIS XML Infoset Encoding Dennis Sosnoski. W3C Workshop on Binary Interchange of XML Information Item Sets, September 2003. http://www.w3.org/2003/08/binary-interchange-workshop/09-Sosnoski-position-paper.pdf September 2003
[SANDOZ]
Fast Web Services. Paul Sandoz, Santiago Pericias-Geersten, Kohsuke Kawaguchi, Marc Hadley. W3C Workshop on Binary Interchange of XML Information Item Sets http://www.w3.org/2003/08/binary-interchange-workshop/01-FWS_Sun.pdf see also http://java.sun.com/developer/technicalArticles/xml/fastinfoset/ http://java.sun.com/developer/technicalArticles/WebServices/fastWS/ September 2003
[TATARINOV]
Storing and querying ordering XML using a relational database system Tatarinov, I., Viglas, S., Beyer, K. Shanmugasundaram, J, Shekita, E J and Zhang, C. SIGMOD Conference 2002: 204-215. http://www.cs.cornell.edu/people/jai/papers/OrderedXML.pdf 2002
[UDELL]
IT Myth #6: IT Doesnt Scale Jon Udell. http://www.infoworld.com/article/04/08/13/33FEmyth6_1.html2004
[VTD-XML]
Project Homepage of VTD-XML http://vtd-xml.sourceforge.net/ September 2003
[W3C-XBC]
XML Binary Characterization Use Cases; W3C Working Draft 28 July 2004. W3C XML Binary Characterization Workding Group http://www.w3.org/TR/xbc-use-cases/ September 2003
[W3C-Workshop]
Report From the W3C Workshop on Binary Interchange of XML Information Item Setshttp://www.w3.org/2003/08/binary-interchange-workshop/ September 2003
[ZHANG]
XimpleWare W3C Position Paper J. Zhang, K. Lovette W3C Workshop on Binary Interchange of XML Information Item Sets http://www.w3.org/2003/08/binary-interchange-workshop/20-ximpleware-positionpaper-updated.htm September 2003
[VANLUNTEREN]
XML Accelerator Engine2004Jan van Lunteren, Ton Engbersen, Joe Bostian, Bill Carey, Chris Larsson. First International Workshop on High Performance XML Processing, http://wam.inrialpes.fr/www-workshop2004/ZuXA_final_paper.pdf

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.