Improve XPath Efficiency with VTD-XML

Improve XPath Efficiency with VTD-XML

or the past several years, XPath has been steadily gaining popularity as an effective tool when developing XML applications. XPath was originally viewed as an adjunct element for the W3C’s XSLT and XPointer specifications, but developers found its simplicity appealing. With XPath, instead of manually navigating the hierarchical data structure, you can use compact, “file-system”-like expressions to address any node or set of nodes in XML documents. However, most existing XPath engines work with DOM trees or similar object models, which are slow to build and modify?and consume excessive amounts of memory. This presents a dilemma for anyone looking to take advantage of XPath for SOA applications that are either performance sensitive or routinely deal with large XML documents.

My last two articles with DevX (see the Related Resources) introduced VTD-XML as a next-generation XML processing model that goes beyond DOM and SAX in performance, memory usage, and ease of use. VTD-XML is simultaneously:

  • Memory-efficient: VTD-XML typically requires only somewhere between 1.3 ~ 1.5 times the size of the XML document itself?meaning it’s far more memory-efficient than DOM?and works very well with large XML documents.
  • High-performance: VTD-XML typically outperforms DOM parsers by 5 ~ 10 times, and it typically outperforms SAX parsers with null content handlers by about 100 percent
  • Easy to use: Applications written in VTD-XML are more compact and readable than those written in DOM or SAX.

What is VTD-XML’s not-so-secret sauce? Unlike traditional XML parsers which create string-based tokens as the first step of parsing, VTD-XML uses linear buffers internally to store 64-bit integers containing the starting offsets and lengths of XML tokens, while keeping the un-decoded XML document intact in memory. All VTD-XML’s benefits are the result?one way or the other?of this “non-extractive” tokenization. At the API level, VTD-XML consists of the following core classes:

  • VTDGen encapsulates the main parsing, index writing, and index loading functions.
  • VTDNav exports a cursor-based API that contains the methods that navigate the XML hierarchy.
  • AutoPilot supports document-order element traversal?similar to Xerces’ NodeIterator.

VTD-XML’s XPath Implementation
VTD-XML’s XPath implementation, introduced with version 1.0, supports the full W3C XPath 1.0 spec. It builds upon VTDNav’s concept of cursor-based navigation. The AutoPilot class exports all the XPath-related methods. As described in one of the earlier articles, to manually navigate an XML document’s hierarchical structure, you obtain a VTDNav instance, and repeatedly call the toElement() method to move the cursor to various parts of the document. Using XPath you can either move the cursor manually or tell AutoPilot to move it to qualified nodes in the document automatically.

Table 1 shows AutoPilot’s XPath-related methods.

Table 1. AutoPilot’s XPath-related Methods: The table lists AutoPilot’s XPath-related methods along with a short description of each.
Method Description
declareXPathNameSpace(…) Binds a namespace prefix (used in the XPath expression) to a URL.
selectXPath(…) Compiles an XPath expression into an internal representation.
evalXPath(…) Moves the cursor to a qualified node in the node set.
evalXPathToBoolean(…), evalXPathToNumber(…), evalXpathToString(…) These three methods evalute an XPath expression to a Boolean, a double and a string, respectively.
resetXPath() Resets the internal state so the XPath can be re-used.
getExprString() Call this method to verify the correctness of the compiled expression.

VTD-XML’s XPath implementation also introduces two exception classes:

  • XPathParseException?Thrown when there is a syntax error in the XPath expression.
  • XPathEvalException?Thrown when an exception condition occurs during XPath evaluation.

Non-Blocking Node Set Evaluation
The most important innovation in VTD-XML’s XPath implementation is its stateless, non-blocking evaluation of the node set. Unlike other XPath engines (such as JAXEN, XALAN, etc) that return the entire node set all at once, VTD-XML returns a single qualified node as soon as the XPath engine finds a match. After receiving the first match, you repeatedly call the key method in the AutoPilot class called evalXPath(), which moves the cursor to the next node (as long as at least one matching node remains in the node set) every time it is called. To illustrate the differences between those two styles of XPath evaluation, compare the two short snippets written in JAXEN and VTD-XML respectively that execute the same application logic on the catalog.xml in Listing 1.

Here’s the JAXEN-based code. Note that JAXEN returns the entire matching node set for the provided XPath query:

   import javax.xml.parsers.*;   import java.io.*;   import org.w3c.dom.*;   import org.xml.sax.InputSource;   import org.jaxen.*;   import org.jaxen.dom.*;   import java.util.*;    public class jaxenXPath   {       public static void main(String args[]) throws Exception      {         DocumentBuilderFactory factory =            DocumentBuilderFactory.newInstance();         DocumentBuilder parser = factory.newDocumentBuilder();         Document d = parser.parse("catalog.xml");         XPath expression = new            org.jaxen.dom.DOMXPath("/CATALOG/CD/TITLE/text()");            // Return the entire node set all at once         List result = expression.selectNodes(d);         int sz = result.size();         for (int i=0;i "+              ((Node)result.get(i)).getNodeValue());         }      }   } 

In contrast, here’s the same application written for VTD-XML:

   import com.ximpleware.*;    public class vtdXPath   {      public static void main(String args[]) throws Exception      {         VTDGen vg =new VTDGen();         int i;         AutoPilot ap = new AutoPilot();          ap.selectXPath("/CATALOG/CD/TITLE/text()");         if (vg.parseFile("catalog.xml", false))          {            VTDNav vn = vg.getNav();            ap.bind(vn);            //XPath eval returns one node at a time            while((i=ap.evalXPath())!=-1)            {              System.out.println(" text ==> "+                 vn.toString(i));             }             ap.resetXPath();         }      }    }

In the second example above, VTD-XML’s XPath implementation returns the first matching node immediately, and then returns additional matching nodes in the while loop, letting the application begin handling matching nodes immediately, rather than waiting until the XPath implementation finds all the matching nodes.

The code sample shown above deals with only one XPath expression; to deal with multiple XPath expressions, you must instantiate multiple AutoPilot objects, each of which corresponds to a different XPath expression. When you call selectXPath(…) multiple times, the AutoPilot object remembers only the last XPath expression. After an AutoPilot object selects an XPath expression, you call evalXPath() which moves the embedded VTDNav instance cursor to the next available qualified node, and returns its VTD record index. When no more matching nodes remain in the node set, evalXPath() returns -1.

Querying Multiple Documents
Another common scenario consists of executing the same XPath query against multiple XML documents. In this case, you can reuse the XPath query for each document, through AutoPilot’s delayed binding feature. Here’s the procedure: After executing a query on one XML document, you bind (by calling bind(…)) the same AutoPilot object to a new VTDNav object that corresponds to a different XML document. You also need to call resetXPath(), which resets the internal XPath state so it is ready for reuse.

Listing 2 shows an example that applies two XPath expressions to two different XML documents.

The Benefits
When comparing with other XPath engines, VTD-XML’s non-blocking XPath evaluation approach has a number of benefits. Without sacrificing any inherent benefits of XPath, VTD-XML offers unsurpassed flexibility and puts more control in your hands. Consider the use case in which you want to use XPath to process a large XML document and the node set contains thousands of nodes. With other XPath engines, your application logic will have to wait for the entire node set to be returned before beginning to process matches. But with VTD-XML, your application logic can begin processing matches as soon as the XPath implementation finds and returns the first node.

If your application processes only a small number of nodes in the node set, or you have prior knowledge of the XML document (e.g. you know there is only one occurrence of the matching node at the beginning of a large XML file), this saves the time you’d have to wait while the XPath implementation finishes searching the entire document. In this case, other types of XPath evaluation unnecessarily block the program flow and seem more like the wrong approach!

As another example, you can use VTD-XML’s non-blocking XPath to walk the nodes, which simply isn’t possible with other types of XPath engines.

When combined with the other benefits and features of VTD-XML, the comparison with DOM and SAX gets lopsided in a hurry. Regardless of the file sizes, whether it is parsing, updating or XPath evaluation (shown later), VTD-XML delivers the kind of across-the-board performance improvements never before thought possible by many. For transactional SOA applications dealing with small-to-medium-size XML files, VTD-XML easily blows away DOM?and the benefit increases for large XML files. When compared with SAX, you literally move from long, ugly, and difficult-to-maintain SAX code to clean, short VTD-XML code, while enjoying the full power of VTD-XML’s random access and improving performance between 10 and 50 times along the way.

If you regularly deal with large XML documents or XPath expressions returning huge node sets, VTD-XML is a must-have!

Common Usage Patterns
There are a few common patterns to follow in order to base your application on VTD-XML.

Embed evalXPath(..) in while Statements
The most common pattern is to embed the call to evalXPath() in the control statement of a while loop. So far the examples you’ve seen code all use this pattern to navigate to the qualified nodes in the document. In this pattern, the while loop checks the return value of evalXPath(); when it equals -1, there are no more matching nodes in the node set. Otherwise, the return value equals the VTD record index of the matching node, to which the cursor of the VTDNav object (bound to the AutoPilot object) also points. The following two short code fragments illustrate this?although syntactically slightly different, they produce equivalent results:

   int i;    ap.bind(vn);    ap.selectXPath(...);    while( (i=ap.evalXPath())!=-1){         }       int i;    ap.bind(vn);    ap.selectXPath(...);    while(ap.evalXPath()!>=0){       i = vn.getCurrentIndex();    } 

Combining XPath with Manual Navigation
But sometimes you want to navigate the VTDNav cursor a bit further within the while loop. To do that, the rule is that the code segment nested in the loop must not alter the node position, meaning that you can either navigate where you like, and then manually move the cursor back to the original location before the end of the loop, or you can use the VTDNav object’s push() and pop() methods, which push and pop cursor locations onto a stack, to comply with the rule. I’ll show you both patterns.

Using the catalog.xml file in Listing 1, this first example navigates manually within the while loop to the first child node of each matching element, extracts the text, and then moves the cursor back to the starting position before the next iteration:

   import com.ximpleware.*;    public class vtdXPath2   {      public static void main(String args[]) throws Exception      {         VTDGen vg =new VTDGen();         int i;         AutoPilot ap = new AutoPilot();          ap.selectXPath("/CATALOG/CD[PRICE < 10]");         if (vg.parseFile("catalog.xml", false))         {            VTDNav vn = vg.getNav();            ap.bind(vn);               //XPath eval returns one node at a time            while((i=ap.evalXPath())!=-1)            {                // get to the first child               if (vn.toElement(VTDNav.FIRST_CHILD, "TITLE"))               {                   int j = vn.getText();                   if (j!=-1)                      System.out.println(" text node ==>"+vn.toString(j));                   vn.toElement(VTDNav.PARENT); // move the cursor back                }             }             ap.resetXPath();         }      }   } 

This second example (which is slightly less efficient) uses push() and pop() to reset the cursor position:

   import com.ximpleware.*;    public class vtdXPath3    {      public static void main(String args[]) throws Exception      {         VTDGen vg =new VTDGen();         int i;         AutoPilot ap = new AutoPilot();          ap.selectXPath("/CATALOG/CD[PRICE < 10]");         if (vg.parseFile("catalog.xml", false))         {            VTDNav vn = vg.getNav();            ap.bind(vn);               //XPath eval returns one node at a time            while((i=ap.evalXPath())!=-1)            {                // push the current cursor position               vn.push();                // get to the first child               if (vn.toElement(VTDNav.FIRST_CHILD, "TITLE"))               {                   int j = vn.getText();                   if (j!=-1)                      System.out.println(" text node ==>"+vn.toString(j));                }                // restore the cursor position               vn.pop();             }             ap.resetXPath();         }      }   } 

Nested XPath Support
You aren’t limited to manual navigation within the while loop. For more complicated navigation, you can nest XPath queries. The following example is equivalent to the examples in the preceding section, but has been rewritten to use nested XPath queries. Note that the rule that ensures cursor position consistency still applies:

   import com.ximpleware.*;    public class vtdXPath4   {      public static void main(String args[]) throws Exception      {         VTDGen vg =new VTDGen();         int i;         AutoPilot ap = new AutoPilot();          ap.selectXPath("/CATALOG/CD[PRICE < 10]");          AutoPilot ap2 = new AutoPilot();          ap2.selectXPath("TITLE/text()");          if (vg.parseFile("catalog.xml", false))         {            VTDNav vn = vg.getNav();            ap.bind(vn);             ap2.bind(vn);               //XPath eval returns one node at a time            while((i=ap.evalXPath())!=-1)            {                vn.push();                // get to the first child                int j;               while ((j=ap2.evalXPath())!=-1)               {                   System.out.println(" text node ==>" + vn.toString(j));                }                // don't forget this next statement; the next iteration will reuse               // ap2's XPath!!!                ap2.resetXPath();                vn.pop();             }             ap.resetXPath();         }      }   } 

Ad-Hoc Tree-walker Mode
As mentioned earlier, one way to use evalXPath() is just to move the cursor to the desired destination(s) in the XML document:

   import com.ximpleware.*;    public class vtdXPath5   {      public static void main(String args[]) throws Exception      {         VTDGen vg =new VTDGen();         int i;         AutoPilot ap = new AutoPilot();          ap.selectXPath("/CATALOG/CD[PRICE < 10]");          if (vg.parseFile("catalog.xml", false))         {            VTDNav vn = vg.getNav();            ap.bind(vn);            //XPath eval returns one node at a time            if (ap.evalXPath()>=0)            {                // do whatever you like here with vn 
 
Figure 1. XPath Evaluation Performance Comparison: The charts in this figure show the average latencies of several different query executions for JAXEN, Xalan, and VTD-XML against three documents of similar structure, but differing in size.
} } } }

XPath Evaluation Performance Benchmarks
Finally, to round out the performance claims, the figures in this section briefly show the results of comparing VTD-XML’s XPath performance with that of Xalan and JAXEN, both of which work with the Xerces DOM. The benchmark programs pre-compile a set of XPath expressions and then repetitively execute the queries over the same set of documents. The charts in Figure 1 show the average latencies of several different query executions for JAXEN, Xalan, and VTD-XML against three documents of similar structure, but differing in size.

You can read a complete description and discussion of the test set up here, but the figures shown here should give you a very good idea of the overall results.

In this article, you’ve seen an introduction and overview of VTD-XML’s XPath implementation. VTD-XML empowers you to write clean, readable code and achieve peerless performance, efficiency, and flexibility. In fact, VTD-XML is changing some of the most deeply entrenched perceptions and assumptions about XML programming. Consequently, the infrastructure built on top of XML, such as Web services and SOA, is on the verge of positive changes as well.

devx-admin

devx-admin

Share the Post:
USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India,

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor

USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies in the USA. Through a

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in and explore the leaders in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India, and kickstart your journey to

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner for your online project. Your

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the state. A Senate committee meeting

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor supply chain and enhance its

Revolutionized Low-Code

You Should Use Low-Code Platforms for Apps

As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with minimal coding. These platforms not

Cybersecurity Strategy

Five Powerful Strategies to Bolster Your Cybersecurity

In today’s increasingly digital landscape, businesses of all sizes must prioritize cyber security measures to defend against potential dangers. Cyber security professionals suggest five simple technological strategies to help companies

Global Layoffs

Tech Layoffs Are Getting Worse Globally

Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

Huawei Electric Dazzle

Huawei Dazzles with Electric Vehicles and Wireless Earbuds

During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

Cybersecurity Banking Revolution

Digital Banking Needs Cybersecurity

The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

FinTech Leadership

Terry Clune’s Fintech Empire

Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

Generative AI Revolution

Is Generative AI the Next Internet?

The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

Microsoft Laptop

The New Surface Laptop Studio 2 Is Nuts

The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

Cisco Splunk Deal

Cisco Splunk Deal Sparks Tech Acquisition Frenzy

Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

Iran Drone Expansion

Iran’s Jet-Propelled Drone Reshapes Power Balance

Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

Solar Geoengineering

Did the Overshoot Commission Shoot Down Geoengineering?

The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

Remote Learning

Revolutionizing Remote Learning for Success

School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

Revolutionary SABERS Transforming

SABERS Batteries Transforming Industries

Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

Build a Website

How Much Does It Cost to Build a Website?

Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted