Browse DevX
Sign up for e-mail newsletters from DevX


Manipulate XML Content the Ximple Way  : Page 2

For many common use cases, you can improve your XML-processing performance by taking advantage of XML-VTD's document-centric processing model.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Using Namespace-Compensated Element Fragments
For documents that don't use namespaces, you call VTDNav's getElementFragment() to retrieve the offset and length of the element fragment, which in itself is a valid XML document. But for XML files that include namespaces, describing an element fragment by only offset and length is usually insufficient, because it may miss namespace declarations in ancestor nodes. VTD-XML 2.2 allows you to obtain an ElementFragmentNs instance by calling VTDNav's new getElementFragmentNs() method You can think of an ElementFragmentNs object as a namespace-aware, well-formed element fragment, consisting of the fragment itself plus any namespace declarations in its ancestor nodes. Consider the following XML document:

<env:Envelope xmlns:env= "http://www.w3.org/2003/05/soap-envelope"> <env:Header> <m:reservation xmlns:m= "http://travelcompany.example.org/reservation" env:role= "http://www.w3.org/2003/05/soap-envelope/role/next" env:mustUnderstand="true"> <m:reference>uuid: 093a2da1-q345-739r-ba5d-pqff98fe8j7d </m:reference> <m:dateAndTime> 2001-11-29T13:20:00.000-05:00 </m:dateAndTime> </m:reservation> </env:Header> </env:Envelope>

Using only the offset and length, you get a "naked" fragment for the m:reservation element as shown below. Notice that it is not well-formed namespace-wise:

<m:reservation xmlns:m= "http://travelcompany.example.org/reservation" env:role= "http://www.w3.org/2003/05/soap-envelope/role/next" env:mustUnderstand="true"> <m:reference>uuid: 093a2da1-q345-739r-ba5d-pqff98fe8j7d </m:reference> <m:dateAndTime> 2001-11-29T13:20:00.000-05:00 </m:dateAndTime> </m:reservation>

In comparison, a namespace-compensated fragment for the m:reservation element contains an additional namespace declaration (as defined in the root element):

<m:reservation xmlns:env= "http://www.w3.org/2003/05/soap-envelope" xmlns:m= "http://travelcompany.example.org/reservation" env:role= "http://www.w3.org/2003/05/soap-envelope/role/next" env:mustUnderstand="true"> <m:reference>uuid: 093a2da1-q345-739r-ba5d-pqff98fe8j7d </m:reference> <m:dateAndTime> 2001-11-29T13:20:00.000-05:00 </m:dateAndTime> </m:reservation>

An interesting property is that the ElementFragmentNs instance of a root element is precisely the document itself. Version 2.2 added a couple of overloaded methods to the XMLModifer class that let you insert an ElementFragmentNs object into the document. These two methods are insertAfterElement(ElementFragmentNs efn) and insertBeforeElement(ElementFragmentNs efs).

"Document-Centric" XML Processing
Figure 1: XML Processing: Object-oriented processing forces object creation, while document-centric XML processing does not.
Traditional XML processing models (such as DOM, SAX and JAXB) were designed around the notion of objects. The XML text, as a mere form of object serialization, was relegated to the status of a second-class citizen. You base your applications on DOM nodes, strings, and various business objects, but rarely on the physical documents. However, it's become obvious that this object-oriented approach of XML processing makes little sense as it causes performance hits from virtually all directions. Not only are object creation and garbage collection inherently memory and CPU intensive, but applications incur the cost of re-serialization with even the smallest changes to the original text (see Figure 1).

In contrast, VTD-XML's non-extractive parsing starts from the XML itself—the persistent data format. Whether you're parsing, performing XPath queries, modifying content, or slicing element fragments, you no longer work directly with objects by default. Instead, you need to create and work with objects only when it makes sense to do so. More often than not, you can treat documents purely as syntax, and think in bytes, byte arrays, integers, offsets, lengths, fragments, and namespace-compensated fragments. The first-class citizen in this paradigm is the XML text itself; object-centric notions of XML processing, such as serialization and de-serialization (or marshalling and unmarshalling) are often displaced, if not replaced, by more document-centric notions of parsing and composition (see Figure 1.). When you approach XML programming in this manner, you'll find that your XML programming experience gets simpler. And not surprisingly, the simpler, more intuitive way to think about XML processing is also the most efficient and powerful.

Thanks for your registration, follow us on our social networks to keep up-to-date