TD-XML is a new, open-source, non-validating, non-extractive XML processing API written in Java. Different from current XML processing technologies, VTD-XML is designed to be random-access capable without incurring excessive resource overhead. One key optimization of VTD-XML is non-extractive tokenization. Internally, VTD-XML retains the XML message in memory intact and un-decoded, and tokens represents tokens using starting offset and length exclusively.
VTD-XML's tokenization is based on the Virtual Token Descriptor (VTD) core binary encoding specification. A VTD record is a 64-bit integer that encodes the token length, starting offset, type and nesting depth of a token in XML.
Because VTD records are constant in length, one can bulk-allocate memory buffers to store those records, and therefore avoid creating the large numbers of string/node objects typically associated with other XML processing technologies. As a result, VTD-XML achieves reductions in both memory usage and object creation cost, thus leading to significantly higher processing performance. On a 1.5 Ghz Athlon machine, VTD-XML delivers random access at a performance level of 25~35MB/sec, outperforming most SAX parsers with null content handlers. An in-memory VTD-XML document typically consumes only 1.3 to 1.5 times the size of the XML document itself.
For software developers, VTD-XML provides several benefits. For example, when you start working on a project involving XML, you have to pick a processing model. You've probably been told that DOM is slow and consumes too much memory, particularly for large documents. But you also find SAX difficult to use, especially for XML documents with complex structures. VTD-XML provides a new option that doesn't force you to trade processing performance for usability. In fact, VTD-XML's random-access capability is critical in providing the best possible performance. Although SAX is fast, because of its forward-only nature, raw SAX performance is usually not indicative of real-world performance. In some situations, you end up doing lots of buffering to extract the data you need, while in others, you have to repeat SAX parsing on the same document multiple times. No matter what you do, SAX programming usually results in ugly and unmaintainable code, while the performance benefit over DOM isn't always significant. Using VTD-XML, you should be able to simultaneously achieve ease-of-use and high-performance. And its performance benefit over DOM is substantial.