Browse DevX
Sign up for e-mail newsletters from DevX


Better, Faster XML Processing with VTD-XML : Page 2

VTD-XML is a new open source XML processing API that provides a great alternative to SAX and DOM that doesn't force you to trade processing performance for usability. Find out why this Java-based, non-validating parser is faster than DOM and better than SAX.

Understanding VTD-XML
You can use two criteria to determine whether VTD-XML can help with your next XML project. First, the current version of VTD-XML doesn't support entity declarations in DTDs. It only recognizes five built-in entities (&s, &apos, <, >, and " ). So if you're dealing with SOAP, RDF, FIXML, or RSS, VTD-XML should handle the job very well. Second, VTD-XML's internal parsed representation of XML is slightly larger than the XML itself, so you have to make sure that you have sufficient RAM. Keep in mind that to provide true, random access to the entire document, keeping that document entirely in memory is pretty much unavoidable. If both criteria are met, you'll find VTD-XML to be the most efficient XML processing API.

At the top level, the Java API of VTD-XML consists of three essential components:

  • VTDGen (VTD generator) encapsulates the parsing routine that produces the internal parsed representation of XML.
  • VTDNav (VTD navigator) is a cursor-based API that allows for DOM-like random access to the hierarchical structure of XML.
  • Autopilot is the class that allows for document-order element traversal similar to Xerces' NodeIterator.
To use VTD-XML to process an XML document, whether from disk, or via HTTP, the first step is to find out its length, allocate a chunk of memory big enough to hold the document, and then read the entire document into memory. Next, you create an instance of VTDGen and assign the byte array to it using setDoc(). Finally, you call parse(boolean ns) to generate the parsed XML representation. When ns is set to true, subsequent document navigation is namespace aware. If parsing succeeds, you can retrieve an instance of VTDNav by calling getNav().

Navigating the Document Hierarchy
At the onset of navigation, the VTDNav instance's cursor points at the root element (equivalent to the VTD record of the starting tag) of the XML document. To move the cursor manually to different positions in the hierarchy, you use one of the overloaded versions of toElement(). The simplest form is toElement(int direction), which takes an integer as the input to indicate the direction in which the cursor moves. Defined as class variables of VTDNav, the six possible values of this integer are: ROOT, PARENT, FIRST_CHILD, LAST_CHILD, NEXT_SIBLING, and PREV_SIBLING. Each has its respective acronym: R, P, FC, LC, NS, and PS. The method toElement() returns a Boolean value indicating the status of the operation—true when the cursor moved successfully. If you try to move the cursor to a non-existent location (e.g. the first child of a childless element), the cursor does not move and toElement() returns false.

The method getAttrVal(String attrName) retrieves the attribute value of the element at the cursor position. Likewise, getText() retrieves the text content of the cursor element. If the namespace is turned on during parsing, you can also use toElementNS() and getAttrValNS() to navigate the document hierarchy in a namespace-aware fashion.

Autopilot is the other mode of navigation is by using. An Autopilot instance acts like a magic hand that automatically moves the cursor through the node hierarchy in document order. To use it, you first call the constructor, which accepts an instance variable of VTDNav as the input. Next, you call the selectElement() or selectElementNS() to specify the descendent elements to be sifted out. Afterwards, each call to iterate() moves the cursor to the next matching element.

Thanks for your registration, follow us on our social networks to keep up-to-date