Random Access XML Programming Assisted with XML Hardware

Keywords: random access, XML hardware, hardware, acceleration, high performance, random access XML, RAX, XPath, SAX, DOM, programming, benchmark, content-based routing, publish-subscribe

Michael Leventhal
Senior Director, XML Products
Tarari, Inc.
San Diego
California
United States of America
michael.leventhal@tarari.com

Biography

Michael Leventhal is Senior Director of XML Products at Tarari, Inc. and contributed to the design of Tarari's random access XML implementation.


Abstract


Tarari's Random Access XML Content Processor (RAX-CP) is a completely new approach to programming XML applications. Tarari's XML-in-silicon, chip-level technology gives a power boost that enables random access to data in the XML document as an alternative to DOM or SAX or any other approach to programming XML. RAX applications achieve 40 times the throughput compared to the fastest XML software available and have been clocked at performance as high as 200 times that of some of the most commonly used XML software components. At the same time, most of the work done in RAX applications in expressed declaratively, so the programming effort is significantly less than required for SAX and DOM approaches. Tarari's RAX implementation is being used in some of the most demanding XML applications in the world - in XML security appliances, on Wall Street, and in network devices. Many of these applications were unachievable before RAX due to the inherent performance limitations of XML processing.

Of course, a great deal of the extraordinary performance of the Tarari implementation of RAX comes from the hardware. However, from the programmer's point of view, the hardware-assist is transparent - RAX is simply an API, offered in C and Java. Part of the significance of RAX does lie in the fact that it is the first API designed to take advantage of the capabilities of purpose-built silicon for accelerated XML processing. While XML-in-silicon can accelerate DOM, SAX and XSLT, to some extent a much more powerful approach to XML processing becomes possible if one assumes certain capabilities that a chip can provide. RAX is not, on the other hand, necessarily limited to Tarari's implementation or even a hardware-based implementation. Random access to XML is described by the W3C's XML Binary Characterization Working Group as one of the properties of high-performance XML. RAX may be implemented in software, possibly in conjunction with a more optimized XML format.

At the heart of RAX is the use of XPaths to index into the document and/or to make assertions against it. The XPath statements are declared before the document is processed and grouped together. The XPath statements are evaluated at the same time the document is parsed and the results of the XPaths may be used as indices to have direct access to the data of interest in the document. Some applications may need to operate only on the business data delivered by the XPaths and then move directly to the business logic without any further XML programming. In applications such as content-based routing, the XPaths act as assertions and the XPath results are sufficient to generate a result without any manipulation of the document at all. For applications where traversal of the document remains necessary, the XPaths serve as entry points to areas of the document which are of interest, effectively dealing with the document as a set of small fragments. RAX provides operations for traversing the document, for transformation, and for streaming out document fragments.

The Tarari RAX implementation uses the XML chip to process the XPaths. Since processing in silicon is very fast and since the processing is offloaded from the host CPU to the XML chip, this processing effectively takes place in near-zero CPU time. Since many applications can do almost all their work in the XPaths themselves, this technology can reduce some applications which were formerly unimplementable due to performance limitations down to the level of network noise.

The presenter contributed to the design of RAX-CP. The design and use of RAX will be explored in the presentation.


Table of Contents


1. Current Approaches to XML Processing Do Not Meet Demand
2. ...And it gets harder with SOAP
3. ...And it gets beyond hard with Web Services
4. XML is a Network Technology
5. Ubiquitous XML
6. HOW TO MAKE XML REALLY REALLY FAST
7. Use Amdahl's Law
8. A Solution We Already Have XPath!
     8.1 Three key processing operations describable with XPaths
9. Simple XPaths Yield Highest Application Performance
     9.1 Complex XPaths cannot be highly optimized.
     9.2 Simple XPaths can be very highly optimized.
10. Simple XPaths and Optimization
     10.1 What is a simple XPath?
11. Building Complex Processing on Simple XPath
     11.1 Compilation
     11.2 Post-Processing
12. Random Access XML Programming (RAX)
13. Assertion-based Problems and RAX
     13.1 Examples of Assertion-Based Problems Solved by RAX Approach
14. Comparison of XML Programming Methods
15. Ultimate Performance
16. Tarari Content Processing Platform
     16.1 Content Processing Engines
     16.2 Content Processing Controller
17. RAX-CP - Random Access XML Content Processor
18. A Look at the Random Access in RAX
     18.1 Best Performance Case for Data Access: Random Access
     18.2 Flaws?
19. Random Access XML in an Ideal Universe
20. Ultra-Efficient Processing with Hardware "Magic" Assumed
21. RAX-CP Performance
22. RAX Assertion Application Example: Router
23. Traditional XML Application Turned Into Assertion Problem Example: Rate Table Billing
24. RAX Assertion-based Solution: Rate Table Billing
25. Random Access XML: Key to Efficient PubSub
26. Publish/Subscribe Concepts
27. Tarari PubSub-CP
28. Homeland Security Publish/Subscribe Usage Models
29. RAX-CP Deployment
30. Tarari XML Solutions in the Enterprise

1. Current Approaches to XML Processing Do Not Meet Demand

It's well known and established that XML suffers from performance issues. In November 2003 Matthias Nicola and Jasmi John of IBM researched current approaches to processing XML for database processing. Their conclusion:

XML parsing is generally known to have poor performance characteristics relative to transactional database processing. Yet, its potentially fatal impact on overall database performance is being underestimated the desired response times and transaction rates over XML data can not be achieved without major improvements in XML parsing technology.

— Matthias Nicola, Jasmi John: "XML Parsing: A Threat to Database Performance", 12th Intl. Conference on Information and Knowledge Management, CIKM'2003, New Orleans, November 2003.

1.png

Figure 1: Performance is a challenge with only basic XML processing

2. ...And it gets harder with SOAP

Many researchers have compared SOAP performance to other message encodings. The conclusions are universally consistent:

SOAP is also orders of magnitude slower than JavaRMI and CORBA

Davis and Parashar, Latency Performance of SOAP Implementations, IEEE Cluster Computing and the Grid 2002

Serialization and deserialization speeds for SOAP-based (Apache SOAP and nanoSOAP) implementations are approximately 100 times slower and their throughputs are also 100 times lower.

Govindaraju, et. al. , Requirements for and Evaluation of RMI Protocols for Scientific Computing, IEEE, 2000

When compared to FIX, SOAP again exhibits poorer performance, SOAP messages are 3.5-4.5 larger than FIX, latency is 2-3 times worse, and encoding/decoding costs are increased by up to nearly 9 times

Kohlhoff and Steele , Evaluating SOAP for High Performance Business Applications: Real-Time Trading Systems

2.png

Figure 2: ...And it gets harder with SOAP

3. ...And it gets beyond hard with Web Services

This quote below from Don MacVittie of Network Computing goes to show the extent of the problem:

I'd thought for sure customers had been demanding and getting up to 3,000 TPS from Web services, just as they get up to 3,000 connections per second from Web servers. But in fact, the only way anyone could get millions of hits a day on a Web service is by paying millions of dollars for the hardware to host the service and the staff to support the hardware

Don MacVittie, Network Computing, 5.27.2004

3.png

Figure 3: ...And it gets beyond hard with Web Services

4. XML is a Network Technology

5. Ubiquitous XML

Use Case Where Current XML Methods Fail

Impacts Future of This Computing Domain

Electronic Documents

Documents

XML Documents in Persistent Store

Databases

Web Services for Small Devices

Wireless Networks

XMPP Instant Messaging Compression

Instant Messages

Metadata in Broadcast Systems

Television and Radio

XML Content-based Routing and Publish-Subscribe

Network Infrastructure

Table 1

6. HOW TO MAKE XML REALLY REALLY FAST

If you look at the problem carefully you come to a simple conclusion: It's Hard. Amdahl's Law tells us that to get a high acceleration value you have to accelerate a very high proportion of the overall work:

4.png

Figure 4: Amdahl's Law

Now assume that you are able to accelerate 10% of the application by an extremely high value - say 50X. You can see in Example 1 below that this only delivers a 1.11X overall performance boost - pretty small.

Example 1: Accelerate 10% of application by 50x

5.png

Figure 5: Example 1

Even accelerating a gigantic 90% of the application by an equally gigantic 50x you get an overall acceleration of 8.47 - 847%. Pretty good - but given that we need this kind of acceleration or more to meet the demand of XML processing in the network are we going to be able to acceleraton 90% or DOM or SAX by 50X?

Example 2: Accelerate 90% of application by 50x

6.png

Figure 6: Example 2

7. Use Amdahl's Law

Using the lesson of Amdahl's Law we recognize that we need to rethink the XML programming model so that as much work as possible is united in one core unit of logic, then optimize that core unit of logic.

XML programming models such as DOM and SAX are very difficult to optimize because:

8. A Solution We Already Have XPath!

It turns out that XPath is a very good fit for expressing the greatest part of the processing done on XML documents for a very large set of applications.

8.1 Three key processing operations describable with XPaths

Processing Operation

Description

Classify

XPaths can act as assertions against content. Assertion testing replaces conditional logic to precisely classify the message content.

Extract

Extraction of known data for binding to business logic is purpose of many, if not most, XML programs.

Locate

Used for positioning updates, transformation and for handling of fragments.

Table 2

9. Simple XPaths Yield Highest Application Performance

9.1 Complex XPaths cannot be highly optimized.

Additional complexity may mean that 100% of the processing work can be done in XPaths, but at a performance level which renders this work useless.

9.2 Simple XPaths can be very highly optimized.

They might, for some applications, express only 90% of the processing work, but if that work can be accelerated 50X the application will be 8 times faster than the one using complex XPaths (including doing the remaining 10% as post-processing).

10. Simple XPaths and Optimization

10.1 What is a simple XPath?

Answer: Whatever can be optimized by 50X

11. Building Complex Processing on Simple XPath

11.1 Compilation

Compilation can be used to turn some complex features into equivalent simple ones. As long as the XPath engine implements only the core simple XPath engine, this is an acceptable strategy to obtain more feature coverage - although ultimately a dangerous one.

11.2 Post-Processing

Additional processing can be done as post-processing following XPath evaluation. Moving complex processing to post-processing is the best strategy because simple XPath processing can be used to filter down results, reducing the number of operations involving complex processing. This has been confirmed in published research on XPath optimization.

12. Random Access XML Programming (RAX)

Random Access XML programming is more than a specific API or tool for processing XML - like DOM and SAX it involves a model for XML data representation and a way of thinking about the problem of processing XML. But RAX is based on a lot of algorithmic research in computer science on how to obtain optimal performance. Like DOM and SAX, RAX requires that you think about XML processing in a certain way. With RAX it is important to divide the problem up into static and dynamic parts, into a compiliation stage and a postprocessing stage. RAX enables us to filter the problem down and reduce the number of messages to which we apply complex processing.

There are six steps in using RAX to process XML:

Design

Design processing in simple XPaths

Compile

Compile XPath expressions as a group

Evaluate

Evaluate XPaths produce random access table

Classify

Classify document

Extract

Extract data for business logic or binding to application

Locate

Locate data using transformation or other post-processing

Table 3

13. Assertion-based Problems and RAX

Now let's look at some more examples and how RAX addresses them:

13.1 Examples of Assertion-Based Problems Solved by RAX Approach

14. Comparison of XML Programming Methods

Table 4 shows the comparison of RAX and four other programming methods. RAX is designed to be the fastest by far and achieves this goal. Compared to other methods it uses a low amount of memory, just like SAX. Though RAX is a mix of Declarative and Procedural programming it's more oriented to Declarative, and we have labeled it so. In terms of flexibility we rate RAX at the Medium level, because it is strong in assertions but it takes relatively more work to think in a RAX-like way. When we review the strengths of each programming method we see that each has strengths in different areas. RAX provides extraordinary strength for Assertion-based problems, whereas DOM is a good choice for update-intensive tasks. SAX is the choice for streaming applications and Beans is best for schema-based solutions. Lastly, XSLT supports transforms par excellence.

RAX

DOM

SAX

Beans

XSLT

Speed

Very Fast

Slow

Fast

Very Slow

Very Slow

Memory

Low

High

Low

High

Very High

Paradigm

Declarative

Procedural

Procedural

Declarative

Declarative

Flexibility

Medium

High

Low

High

High

Problem-Specific Strengths

Assertion-based

Update

Stream

Schema-based

Transforms

Table 4

15. Ultimate Performance

Having isolated the bulk of XML processing to core simple XPath evaluation, we can now apply the ultimate optimization: A purpose-built micro-parallel processing engine for simultaneous XPath evaluation.

Tarari has done this. Tarari's Random Access XML Content Processor RAX-CP is the combination of our purpose-build XML chip and an API to support RAX programming.

16. Tarari Content Processing Platform

16.1 Content Processing Engines

16.2 Content Processing Controller

7.png

Figure 7: Tarari Content Processing Platform

17. RAX-CP - Random Access XML Content Processor

Tarari recently produced an architecture and product, Random Access XML (RAX), with a surprising new approach to XML acceleration.

The Standish Group

8.png

Figure 8: This is how Tarari's implementation of RAX works.

In the Tarari implementation of RAX, multiple XML documents are processed in parallel. A compiled set of XPaths is evaluated against each file. A parsed XML file in an internal representation and a Random Access Table are created. Tarari uses what we call "Simultaneous XPaths" - basically really fast XPath processing - to produce this result. With just a small amount of post-processing, data can be isolated and delivered easily.

Not DOM! - Parallel XML parsing with simultaneous XPath processing

Not SAX! - Random Access to XML data with near-zero host CPU cost

18. A Look at the Random Access in RAX

18.1 Best Performance Case for Data Access: Random Access

18.2 Flaws?

19. Random Access XML in an Ideal Universe

9.png

Figure 9: Random Access XML in an Ideal Universe

Taccess.gif for conventional parsing is much higher than (Tlu.gif + Tseek.gif) for random access, but the time to create the random access table - Tcat.gif - is very high. Tcat.gif is, however, amortized over a sufficiently large number of seeks.

We assert that:

  1. Only hardware acceleration can reduce Tcat.gif to a very low value
  2. Only hardware can beat conventional parsing for low values of Nseeks.gif
  3. Therefore, only hardware can delivery random access XML for transactional uses of XML

20. Ultra-Efficient Processing with Hardware "Magic" Assumed

21. RAX-CP Performance

10.png

Figure 10: Message throughput of software versus the Tarari RAX Content Processor

Figure 10 shows the results when the rubber hits the road. Tarari created a very realistic, content-based routing application involving inspection, classification and routing which maps very well to RAX methodology. It is nearly a 100% assertion problem. In this application, we input 3,500 messages per second (actually the system can handle more, but this was the input rate) and the Tarari hardware handled all of them. Compared with the software, which only handled 60-80 messages per second, the Tarari hardware handled over 40 times the throughput. The software used for these tests was Saxon, which is actually regarded as being pretty fast. We also benchmarked Xalan, but it only runs at 9 messages per second and so we didn't want to appear to be biased by comparing Xalan's low results against the Tarari hardware. We also did not intentionally avoid any of the “reputed to be faster than X” conventional XPath processors - we tried all that we could get our hands on. Supposing it is true that “reputed to be faster” is 20 times faster than Xalan they would be running at all of 180 messages per second.

22. RAX Assertion Application Example: Router

11.png

Figure 11: RAX Assertion Application Example: Router

In figure 11 we show the block diagram of the system we used in the benchmarks. First of all, the messages are classified and the truth table is used to correlate the files into different routing categories. There is a unique column for each truth-value. Then we get the results from the random access table and process the node sets. A lot of the rules are just binary - if they match, 1 is returned; if they don't match, 0 is returned. Some items need additional, more-complex processing, but 95% are processed by RAX directly and only 5% need additional processing; thus, we accelerate close to 100% of the system and gain the huge performance boost.

23. Traditional XML Application Turned Into Assertion Problem Example: Rate Table Billing

12.png

Figure 12: Traditional XML Application Turned into Assertion: Problem Example: Rate Table Billing

This example is based on Voice over IP (VoIP). It depicts the conventional way in which the problem was being solved, and it involves three different DOM processes. First, a traditional rate table was set up with all the permutations. Next, the rate is extracted using some sort of XQuery, and finally the result is stored in a database.

24. RAX Assertion-based Solution: Rate Table Billing

13.png

Figure 13: RAX Assertion-based Solution: Rate Table Billing

In Figure 13, the example from Figure 12 has been re-written to use a RAX Assertion-based solution. The Rate table is transformed into a series of XPath statements - one statement for each possible unique combination of rate conditions. Thus, each rate ticket will match one and only one XPath statement. The index number of the matching XPath statement is used to directly access the corresponding rate in an array. Thus, a seemingly conventional database look-up problem has been turned into a 100% pure assertion problem. Surprising?! This is a good example of how the tools we use and their limitations limit they way we look at problems. The key here is that the hardware can handle efficiently a very large number of XPaths - as many XPaths as there are rate conditions.

25. Random Access XML: Key to Efficient PubSub

14.png

Figure 14: Random Access XML: Key to Efficient PubSub

Figure 14 depicts another problem which shows the amazing amount of isomorphism the RAX approach has to Publish/Subscribe (PubSub). Publish-subscribe is used heavily in the financial sector and in messaging, information architectures, and distributed computing. Our basic solution to PubSub is nearly identical to routing. Instead of routing categories, we define subscribers and our post-processing facilities are more elaborate. Tarari has a PubSub product built on top of RAX-CP which exposes RAX functionality through the PubSub paradigm.

26. Publish/Subscribe Concepts

15.png

Figure 15: Publish/Subscribe Concepts

The basis of a good Publish/Subscribe system depends on:

And, subscribers can have multiple Subscriptions. Indeed, several models of operation are possible: Publisher-centric, Subscriber-centric, and Broker-centric.

27. Tarari PubSub-CP

16.png

Figure 16: Tarari PubSub-CP

28. Homeland Security Publish/Subscribe Usage Models

Many different solutions can take advantage of a hardware-accelerated XML solution. Figure 17 depicts some of the Homeland Security Publish/Subscribe usage models.

17.png

Figure 17: Homeland Security Publish/Subscribe Usage Models

29. RAX-CP Deployment

Wirespeed acceleration of XML can be integrated into your application through simple C and Java APIs. Use of the acceleration hardware is completely transparent.

18.png

Figure 18: RAX-CP Deployment

30. Tarari XML Solutions in the Enterprise

19.png

Figure 19: Tarari XML Solutions in the Enterprise

Figure 19 shows the many places in which Tarari-accelerated hardware can help reduce the XML bottleneck.

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.