or the past several years, XPath has been steadily gaining popularity as an effective tool when developing XML applications. XPath was originally viewed as an adjunct element for the W3C's XSLT and XPointer specifications, but developers found its simplicity appealing. With XPath, instead of manually navigating the hierarchical data structure, you can use compact, "file-system"-like expressions to address any node or set of nodes in XML documents. However, most existing XPath engines work with DOM trees or similar object models, which are slow to build and modifyand consume excessive amounts of memory. This presents a dilemma for anyone looking to take advantage of XPath for SOA applications that are either performance sensitive or routinely deal with large XML documents.
My last two articles with DevX (see the Related Resources) introduced VTD-XML as a next-generation XML processing model that goes beyond DOM and SAX in performance, memory usage, and ease of use. VTD-XML is simultaneously:
- Memory-efficient: VTD-XML typically requires only somewhere between 1.3 ~ 1.5 times the size of the XML document itselfmeaning it's far more memory-efficient than DOMand works very well with large XML documents.
- High-performance: VTD-XML typically outperforms DOM parsers by 5 ~ 10 times, and it typically outperforms SAX parsers with null content handlers by about 100 percent
- Easy to use: Applications written in VTD-XML are more compact and readable than those written in DOM or SAX.
What is VTD-XML's not-so-secret sauce? Unlike traditional XML parsers which create string-based tokens as the first step of parsing, VTD-XML uses linear buffers internally to store 64-bit integers containing the starting offsets and lengths of XML tokens, while keeping the un-decoded XML document intact in memory. All VTD-XML's benefits are the resultone way or the otherof this "non-extractive" tokenization. At the API level, VTD-XML consists of the following core classes:
VTD-XML's XPath Implementation
- VTDGen encapsulates the main parsing, index writing, and index loading functions.
- VTDNav exports a cursor-based API that contains the methods that navigate the XML hierarchy.
- AutoPilot supports document-order element traversalsimilar to Xerces' NodeIterator.
VTD-XML's XPath implementation, introduced with version 1.0, supports the full W3C XPath 1.0 spec. It builds upon VTDNav's concept of cursor-based navigation. The AutoPilot class exports all the XPath-related methods. As described in one of the earlier articles, to manually navigate an XML document's hierarchical structure, you obtain a VTDNav instance, and repeatedly call the toElement()
method to move the cursor to various parts of the document. Using XPath you can either move the cursor manually or tell AutoPilot to move it to qualified nodes in the document automatically
Table 1 shows AutoPilot's XPath-related methods.
Table 1. AutoPilot's XPath-related Methods: The table lists AutoPilot's XPath-related methods along with a short description of each.
||Binds a namespace prefix (used in the XPath expression) to a URL.
||Compiles an XPath expression into an internal representation.
||Moves the cursor to a qualified node in the node set.
|evalXPathToBoolean(...), evalXPathToNumber(...), evalXpathToString(...)
||These three methods evalute an XPath expression to a Boolean, a double and a string,
||Resets the internal state so the XPath can be re-used.
||Call this method to verify the correctness of the compiled expression.
VTD-XML's XPath implementation also introduces two exception classes:
- XPathParseExceptionThrown when there is a syntax error in the XPath expression.
- XPathEvalExceptionThrown when an exception condition occurs during XPath evaluation.