High performance XML processing using VTD-XML

Keywords: non-extractive parsing, processing mode, incremental update, ASIC implementation

Jimmy Zhang
Founder
XimpleWare
Los Altos
California
United States of America
jzhang@ximpleware.com

Biography

Jimmy Zhang is founder of Ximpleware, a provider of high performance XML processing solutions. Prior to founding XimpleWare, he worked for a few technology companies in the Silicon Valley ranging from EDA (electronic design automation) to VOIP (voice over IP). He holds a BS EECS and a MSEE from UC Berkeley.


Abstract


As the first step of most existing XML parsing algorithms, one usually creates many string objects by extracting tokens out of the input XML document. We describe a "non-extractive" way of tokenizing XML without taking apart the document. Using a binary encoding specification called Virtual Token Descriptor (VTD) we represent tokens exclusively using starting offset and length. A VTD record is a 64-bit integer that encodes the starting offset, length, type and nesting depth of a token in an XML document. A processing model based on VTD also requires that the original XML document be kept intact in memory. Because VTD records can be stored in chunk-based buffers, one can potentially achieve both high performance and efficient memory usage processing XML. Also because VTD is entirely based on offset and length, it is inherently persistent. Our internal benchmark indicates that VTDAs XML becomes more ubiquitous, the new processing model can potentially offer a worthy alternative to DOM and SAX.


Table of Contents


1. Waitlisted Paper

1. Waitlisted Paper

Since this talk was waitlisted, no paper was prepared for the proceedings.

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.