XML Europe 2004 logo

High-Performance XML Data Retrieval

Abstract

As XML becomes the industry standard for exchanging business data, a continuing performance challenge exists in retrieving selective data from large XML documents. The XML navigational language-XPath-lies at the foundation of standards-based data retrieving solutions. However, XPath supported solutions such as XSLT have up to now required creation and traversal of the input XML's Document Object Model (DOM). In practice, DOMs have required up to 10 times the memory of the original document with traversal APIs that are difficult to optimize. While other XML processing methods such as SAX and StAX are event-based and require less memory, they lack the desired XPath support for retrieving XML data.

This paper examines typical requirements for data retrieval from XML documents and then provides an in-depth analysis of the existing XML data retrieval strategies and why they are not optimal. It then discusses the design of a stream-based framework that provides an XPath-based XML data retrieval. It will analyze how a lightweight data extraction engine can efficiently match XPaths in a streaming XML process and how the publish-subscribe framework efficiently disseminates XML data to the receiving end.

Through real-world examples, it will discuss how this streaming XML data retrieval technology can efficiently disseminate XML data for content management applications. It will also present how this framework can be used as a high-performance data router for Web services or integrating within XSLT and XQuery engines as a data acquiring processor.

Keywords

»DOM, »SAX, »StAX, »XPath, »XQuery, »XSLT.