Keywords: parsing, acceleration, offloading
Biography
Biswadeep Nag has a Ph.D. in Computer Science from the University of Wisconsin-Madison. He presents regularly at conferences on databases, web services, JavaOne and OracleWorld. He presented two papers and a tutorial at XMLConference 2003. Dr. Nag is currently working at Sun Microsystems on areas related to database performance, XML processing and web services performance. Inside Sun he is a frequent presenter on database and XML performance issues.
This paper presents the architecture of an XML Offload Engine (XOE) that can accelerate the processing of XML documents and messages. The XOE plugs into the I/O bus of the host system and is designed to support almost all current and future generations of XML processing technologies. Careful attention has been paid to the design of the communication between the XOE and the host so that most of the existing software stack and user applications can remain unchanged. By attacking the largest common source of XML processing cost, this XOE aims to be effective for the broadest class of XML processing applications.
1. Introduction
2. The XML Processing Stack
3. XML Acceleration Architecture
4. Design of an Offload Engine
5. The Tokenized XML Format
6. Conclusion
The processing of XML documents has entered the mainstream of software application development. System and software configuration files, UI specifications and many office documents (such as spreadsheets, presentations, etc.) are being specified as XML. This is in addition to the exchange of SOAP-XML messages in web-service oriented applications that are used for enterprise application integration and B2B transactions.
The problem though is that the emphasis on human readability of XML documents and the use of an ASCII or UTF representation of data has made XML processing relatively CPU intensive. XML processing typically involves the use of an off-the-shelf parser program that has to perform lexical analysis, tokenization and character integrity checks on the XML stream. In addition it has to convert UTF characters to Unicode, verify well-formednes of the document and optionally validate against an XML Schema or DTD. Finally there is the stage that may construct a DOM tree, perform transformations, bind to Java or C# objects and call in to the application program. All this consumes a fair amount of CPU cycles such that most of the time taken in a service-oriented application is spent in parsing XML.
The general idea we espouse in this paper is that general purpose computing processors may not be well-suited to the specialized task of processing XML. It might be far more efficient to offload much of the XML processing to an appliance or I/O coprocessor that handles most of the XML crunching as it comes off the network or is read from a file, much in the manner of a TOE. In addition to the use of special-purpose hardware that can shave off many CPU instructions, there is the added benefit of proximity to the network interface, greater cache locality because of the repeated execution of a limited amount of code, and freeing up cycles on the host CPU that can concentrate on running the user application.
There is a large diversity of applications that are interested in using XML, and so there is an equally large number of XML processing technologies. The W3C has been instrumental in standardizing a large number of XML processing APIs such as SAX, DOM, XSLT, StaX and so on. In the Java ecosystem, the JSR process has played a major role in enveloping the W3C APIs, into industry-standard Java APIs that can be uniformly used in all Java applications. There are several implementations of these APIs and they in turn can use most of the available parsers that support the W3C APIs. In the Microsoft world, there is a similar software stack that implements some of the W3C APIs and has equivalent implementations that support similar functionality in other areas such as binding XML elements to C#. For the purpose of this paper we will illustrate our ideas with examples from the Java stack, but the reader should be able to readily appreciate how the basic concept would apply to any XML processing stack.
Figure 1 shows the standard Java stack for processing XML. Note that there are many implementations possible for each of the Java Standard API, XML Parsing API and XML Parser layers. Applications could even bypass the Java Standard APIs and directly code to the Parsing APIs, but that is not recommended for portability reasons. Using the JAX standards allows the choice of the XML parser to be decided at deployment time. We will now briefly describe the Java Standard APIs for processing XML.
Our proposed XML acceleration architecture leverages the fact that most of the Java XML processing technologies can make use of a common XML parsing infrastructure as long as they can access it using their desired XML Parsing API such as SAX, DOM, StaX etc. Typically they would not even care about what exact XML parser is used underneath. However in our proposal, it would be possible to create accelerated versions of most currently available XML parsers, so much of the software stack remains unchanged.
Figure 2 shows a blowup of a typical XML parser architecture. Though each parser implementation will differ in their exact class diagram, they should each implement the functionality shown. In Figure 2, the XML tokens flow from the bottom to the top. Let us now discuss each of the modules in more detail.
There will of course be other assisting modules in a functional parser such as an entity resolver, a grammar cache pool and so on that have not been depicted in Figure 2. Though such modules are important we consider them peripheral to the issue of XML acceleration. The above architecture has been loosely based on the open-source Xerces Java parser, though we believe that any usable XML parser, regardless of its structure has to implement very similar functionality.
Offload engines have become very popular in accelerating repetitive and computationally expensive tasks. Examples include the TCP/IP Offload Engine found in many network interface cards, and SSL or cryptographic algorithm accelerators. We believe the exchange of XML messages is going to become equally commonplace and will also burn up a large number of CPU cycles. The use of an XML offload engine is designed to take care of two issues:
There have been several different proposals for special purpose XML accelerators from Datapower, Tarari and Convergys. These have typically concentrated on narrow problems such as XPath and XSLT processing. It is difficult to justify the high cost of development, and procurement of these acclerators given the limited scope of deployment. Instead, we chose to focus on the most common subset of the XML processing problem: that of generic XML parsing. Figure 3 shows our proposal for which parts of the XML processing stack are targeted for offloading.
It is clear from Figures 1 & 3 that is possible to build all the common XML applications in use today on top of a common parsing infrastructure. By drawing the boundary of the XML Offload Engine (XOE) between the parser and the validator module, we are ensuring that all XML applications can take advantage of the XOE. At the same time, parts of the stack that are optional (such as validation) and those that will differ from one application to the other (e.g. XML Parsing APIs) can be implemented in software on the host CPU. The goal is to also keep most of the software running on the host be unchanged. For example, the implementation of the JAXB and JAX-RPC APIs will not have to change if the underlying software provides the SAX & StaX APIs. In fact existing XML parsers, such as Xerces and XPP3 can take advantage of the XOE and eliminate much of their code path.
Similarly, the expectation is that the design of the XOE, which is a combination of special-purpose hardware and firmware, should be independent of the current generation of XML APIs and even independent of particular parser implementations. It should concentrate on solving problems that are of general applicability in all XML parsing situations. The goal is to keep the optional and variable elements of XML processing still running on the host CPU in software. This led us to the XOE design illustrated in Figure 4.
Figure 4 shows our XOE prototype in the form of hardware card that can be plugged into a PCI or PCI-X slot in a host system. The specialized hardware in the XOE is used to implement the UTF to Unicode decoder and also the scanner that tokenizes the XML stream to segment it into element and attribute names as well as element contents and attribute values. The hardware can do the decoding in multi-byte chunks and at the same time do much of the regular expression matching involved in tokenization in parallel. The parser/well-formedness checker that runs in the XOE firmware makes use of two data structures:
The choice of what XOE features are implemented in hardware versus what would are implemented in firmware is a tradeoff that weighs the benefits against the real-estate costs required for the hardware solutions. It also make sense to combine the XOE with the smart network interfaces available today that implement a TOE (TCP Offload Engine) and a crypto accelerator (for SSL sessions). Though this would be a natural fit for processing XML that is coming over a wire, it could also be used to process an XML document stored in the local filesystem by transmitting the document over the I/O bus to the XOE. Instead of creating an offline XML appliance, we chose to design an inline network card that can also perform in coprocessor mode. The crucial aspect of the XOE design is to make sure that the additional latencies and CPU overhead associated with communicating with the XOE do not offset the benefit of offloading. This led us to carefully consider the data exchange format between the XOE and host CPU that is described in the next section.
There are three major design goals of the communication infrastructure between the host CPU and the XOE.
The tokenized XML format represents the output of the XOE that is consumed by the software running on the host and tailored for the XML application being accelerated. SAX events, DOM trees and StaX nodes can all be generated from it. While the tokenized XML encodes the complete XML infoset, the ability to make certain assumptions allows a more efficient encoding than the original document. The format is easier to explain if we use an example. Suppose the following XML fragment is processed by the XOE.
<customer id="1234">
<name>
<firstname>John</firstname>
<lastname>Bull</lastname>
</name>
</customer>
<customer id="4321">
<name>
<firstname>Jane</firstname>
<lastname>Doe</lastname>
</name>
</customer>
|
This will be converted into a message that has two parts. The first part is a symbol table (that has also been internally used in the XOE) containing all the element and attribute names found in the document.
| Key | Value |
|---|---|
| 1 | customer |
| 2 | id |
| 3 | name |
| 4 | firstname |
| 5 | lastname |
Table 1
The second part of the message will contain all the parsed XML tokens in document order. This will make references to the symbol table of the first part. It will also make use of a common code to differentiate between element, attribute and namespace names etc.
| Code | Value |
|---|---|
| 1 | Begin Element Name |
| 2 | Attribute Name |
| 3 | Namespace Name |
| 4 | End Element |
| 5 | Element Content |
| 6 | Attribute Value |
Table 2
Now comes the actual tokenized XML corresponding to the above document fragment.
| Code | Length | Value |
|---|---|---|
| 1 | - | 1 |
| 2 | - | 2 |
| 6 | 4 | 1234 |
| 1 | - | 3 |
| 1 | - | 4 |
| 5 | 4 | John |
| 4 | - | - |
| 1 | - | 5 |
| 5 | 4 | Bull |
| 4 | - | - |
| 4 | - | - |
| 4 | - | - |
| 1 | - | 1 |
| 2 | - | 2 |
| 6 | 4 | 1234 |
| 1 | - | 3 |
| 1 | - | 4 |
| 5 | 4 | Jane |
| 4 | - | - |
| 1 | - | 5 |
| 5 | 4 | Doe |
| 4 | - | - |
| 4 | - | - |
| 4 | - | - |
Table 3
There are a number of observations to be made here:
Let us now see how this tokenized XML format satisfied the requirements outlined at the beginning of this section.
In this paper we presented an architecture for accelerating XML document processing by using an XML Offload Engine (XOE). The XOE is a combination of hardware and firmware that plugs into the I/O bus of a system and does some of the fundamental XML parsing that is common to all XML processing and web-service applications. As such it as wide applicability, has the ability to support all current and future XML processing APIs and at the same time is not tied to any particular technology. It is optimized for latency, bandwidth and for reducing CPU consumption on the host system. It is a natural fit with network protocol and crypto acceleration technologies. We expect such general purpose XML Offload Engines to become widely available in the near future.
XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.