Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Manipulate XML Content the Ximple Way

For many common use cases, you can improve your XML-processing performance by taking advantage of XML-VTD's document-centric processing model.


advertisement
he latest Java version of the Virtual Token Descriptor for XML (VTD-XML) can function as a slicer, an editor, and an incremental modifier to intelligently manipulate XML document content. This article will show you how to use it, introduce you to the concept of "document-centric" XML processing and discuss its implications for service-oriented architecture (SOA) and the future of enterprise IT.

Previous articles on DevX (see Related Resources) presented VTD-XML as a general-purpose, ultra high-performance XML parser well-suited for processing large XML documents using XPath. In parsing mode, VTD-XML derives its memory efficiency and high performance from non-extractive parsing. Internally, VTD-XML retains the XML document intact in memory and un-decoded, using offsets and lengths to describe tokens in the XML document. By resorting entirely to primitive data types (such as 64-bit integers), VTD-XML achieves unrivaled performance and memory efficiency by eliminating unnecessary object creation and garbage collection costs (which are largely responsible for the poor performance of DOM and SAX parsing).

Nevertheless, memory usage and CPU efficiency may be only a small part of the inherent benefits that non-extractive parsing offers. An arguably more significant implication—one that sets it apart from other XML parsing techniques—lies in its unique ability to manipulate XML document content at the byte level. Below are three distinct, yet related, sets of capabilities available in version 2.2 of VTD-XML.

  • XML slicer—You can use a pair of integers (offset and length) to address a segment of XML content so your application can slice the segment from the original document and move it to another location in the same or a different document. The VTDNav class exposes two methods that allow you to address an element fragment: getElementFragment(), which returns a 64-bit integer representing the offset and length value of the current element, and getElementFragmentNs() (in the latest version), which returns an ElementFragmentNs object representing a "namespace-compensated" element fragment (more detail on this later).
  • Incremental XML modifier—You can modify an XML document incrementally through the XMLModifier, which defines three types of "modify" operations: inserting new content into any location (at any offset) in the document, deleting content (by specifying the offset and length), and replacing old content with new content—which effectively is a deletion and insertion at the same location. To compose a new document containing all the changes, you need to call the XMLModifier's output(...) method.
  • XML editor— You can directly edit the in-memory copy of the XML text using VTDNav's overWrite(...) method, provided that the original tokens you're overwriting are wide enough to hold the new byte content.
Editor VS Incremental Modifier
While non-extractive parsing enables both the editing mode and the incremental modifier mode of VTD-XML, there are subtle differences between the two. Using VTD-XML as an incremental modifier (by calling various XMLModifier methods) doesn't modify the in-memory copy of the XML document; instead, you compose a new document based on the original document and the operations you specify. To generate the new document, you must call the XMLModifier's output(...) method.

In contrast, when using VTD-XML as an editor, you directly modify the in-memory XML text. In other words, if the modification is successful, your application logic can immediately access the new data—there's no need to reparse.

Consider the following XML document named test.xml:

<root attr="old value 123"/>

To change the attribute value of "attr" to "new value," you can use the following Java code:

import com.ximpleware.*; public class changeAttrVal{ public static void main(String args[]) throws Exception{ VTDGen vg = new VTDGen(); XMLModifier xm = new XMLModifier(); if (vg.parseFile("test.xml",false)){ VTDNav vn = vg.getNav(); xm.bind(vn); int i = vn.getAttrVal("attr"); if (i!=-1) xm.updateToken(i,"new value"); xm.output("new_test.xml"); } } }

The last line of the preceding code outputs the modified XML document with the changed attribute value to the file new_text.xml, as shown below:



<root attr="new value"/>

You could achieve the same result using the VTD-XML's editing mode using this Java code:

import com.ximpleware.*; import java.io.*; public class changeAttrVal2{ public static void main(String args[]) throws Exception{ VTDGen vg = new VTDGen(); if (vg.parseFile("test.xml",false)){ VTDNav vn = vg.getNav(); int i = vn.getAttrVal("attr"); if (i != -1){ vn.overWrite(i, "new value".getBytes()); //print the new string here System.out.println( "print the new attr value ===> " + vn.toString(i)); } FileOutputStream fos = new FileOutputStream("new_test2.xml"); fos.write(vn.getXML().getBytes()); fos.close(); } } }

In contrast to the output from XMLModifier, this version retains a few extra white spaces as a part of the attribute value. This is because VTDNav's overWrite() method first fills the "window" (the space occupied by the content) of the attribute value with the new byte content, then fills the remaining part of the window with white spaces, guaranteeing that the new token has the same length as the old token in the new XML file. However, note that the example can immediately print out the new attribute value after calling overWrite(), without generating a new copy of the document:

<root attr="new value "/>



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap