Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Improve XPath Efficiency with VTD-XML

Even though XML and XPath have been around for several years, there's still room for performance improvements—and VTD-XML and its XPath implementation provide them.


advertisement
or the past several years, XPath has been steadily gaining popularity as an effective tool when developing XML applications. XPath was originally viewed as an adjunct element for the W3C's XSLT and XPointer specifications, but developers found its simplicity appealing. With XPath, instead of manually navigating the hierarchical data structure, you can use compact, "file-system"-like expressions to address any node or set of nodes in XML documents. However, most existing XPath engines work with DOM trees or similar object models, which are slow to build and modify—and consume excessive amounts of memory. This presents a dilemma for anyone looking to take advantage of XPath for SOA applications that are either performance sensitive or routinely deal with large XML documents. My last two articles with DevX (see the Related Resources) introduced VTD-XML as a next-generation XML processing model that goes beyond DOM and SAX in performance, memory usage, and ease of use. VTD-XML is simultaneously:

  • Memory-efficient: VTD-XML typically requires only somewhere between 1.3 ~ 1.5 times the size of the XML document itself—meaning it's far more memory-efficient than DOM—and works very well with large XML documents.
  • High-performance: VTD-XML typically outperforms DOM parsers by 5 ~ 10 times, and it typically outperforms SAX parsers with null content handlers by about 100 percent
  • Easy to use: Applications written in VTD-XML are more compact and readable than those written in DOM or SAX.
What is VTD-XML's not-so-secret sauce? Unlike traditional XML parsers which create string-based tokens as the first step of parsing, VTD-XML uses linear buffers internally to store 64-bit integers containing the starting offsets and lengths of XML tokens, while keeping the un-decoded XML document intact in memory. All VTD-XML's benefits are the result—one way or the other—of this "non-extractive" tokenization. At the API level, VTD-XML consists of the following core classes:
  • VTDGen encapsulates the main parsing, index writing, and index loading functions.
  • VTDNav exports a cursor-based API that contains the methods that navigate the XML hierarchy.
  • AutoPilot supports document-order element traversal—similar to Xerces' NodeIterator.
VTD-XML's XPath Implementation
VTD-XML's XPath implementation, introduced with version 1.0, supports the full W3C XPath 1.0 spec. It builds upon VTDNav's concept of cursor-based navigation. The AutoPilot class exports all the XPath-related methods. As described in one of the earlier articles, to manually navigate an XML document's hierarchical structure, you obtain a VTDNav instance, and repeatedly call the toElement() method to move the cursor to various parts of the document. Using XPath you can either move the cursor manually or tell AutoPilot to move it to qualified nodes in the document automatically.

Table 1 shows AutoPilot's XPath-related methods.

Table 1. AutoPilot's XPath-related Methods: The table lists AutoPilot's XPath-related methods along with a short description of each.
Method Description
declareXPathNameSpace(...) Binds a namespace prefix (used in the XPath expression) to a URL.
selectXPath(...) Compiles an XPath expression into an internal representation.
evalXPath(...) Moves the cursor to a qualified node in the node set.
evalXPathToBoolean(...), evalXPathToNumber(...), evalXpathToString(...) These three methods evalute an XPath expression to a Boolean, a double and a string, respectively.
resetXPath() Resets the internal state so the XPath can be re-used.
getExprString() Call this method to verify the correctness of the compiled expression.

VTD-XML's XPath implementation also introduces two exception classes:


  • XPathParseException—Thrown when there is a syntax error in the XPath expression.
  • XPathEvalException—Thrown when an exception condition occurs during XPath evaluation.


Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap