High performance XML processing using VTD-XML
Track: Core Technologies, Web Services, Late Breaking News
Audience Level: Technical View
Time: unscheduled
Keywords: Non-Extractive Parsing, Processing Mode, Incremental Update, ASIC Implementation
Abstract:
As the first step of most existing XML parsing algorithms, one usually creates many string objects by extracting tokens out of the input XML document. We describe a "non-extractive" way of tokenizing XML without taking apart the document. Using a binary encoding specification called Virtual Token Descriptor (VTD) we represent tokens exclusively using starting offset and length. A VTD record is a 64-bit integer that encodes the starting offset, length, type and nesting depth of a token in an XML document. A processing model based on VTD also requires that the original XML document be kept intact in memory. Because VTD records can be stored in chunk-based buffers, one can potentially achieve both high performance and efficient memory usage processing XML. Also because VTD is entirely based on offset and length, it is inherently persistent. Our internal benchmark indicates that VTDAs XML becomes more ubiquitous, the new processing model can potentially offer a worthy alternative to DOM and SAX.
XML version
HTML version
PDF version
SVG version