Using XML in Java Gets Easier with DOM4J

f you have worked with XML in Java applications during the past few years, you know the pain of parsing and extracting XML data inside the application. The process required writing lots of cumbersome code to retrieve each element from JAXB objects. More importantly, how the application parsed incoming XML was entirely a mystery; many times, my application simply crashed while parsing large XML documents.

I always hoped for an application to make parsing and retrieving XML data simpler and easier, and with the 2004 releases of DOM4J and JDOM, my day finally had come. While both solutions were developed for the same purpose, DOM4J provides more features, such as a provision for parsing large XML documents, memory-efficient parsing, and a variety of utility classes for XML-based enterprise application development.

DOM4J is the product I’d been waiting for. All enterprise Java developers are sure to love it. DOM4J is built on the latest trend of universal tool platforms: an open, extensible tool built for anything and everything.

The DOM4J XML framework’s main features include:

  • A plug-in for any parser (SAX or DOM)
  • Navigation of XML documents with the Java2 collections framework
  • Parsing large XML documents with little memory overhead

Besides these, it provides support for the following:

  • XPATH integration
  • XSLT integration
  • Pretty printing XML
  • Functionality for comparing nodes

Since DOM4J supports the Java2 collections framework, it provides developers the flexibility to use a variety of utility classes to cater to the performance requirements of their applications. For instance, some may use a LinkedList rather than an ArrayList because its usage characteristics perform better in their scenarios. Others may use a Vector because it is synchronized.

This article demonstrates flexible, high-performance, and memory-efficient implementations of DOM4J for XML parsing and data navigation. It also provides detailed examples of important DOM4J features that address the main hardships of XML-based enterprise application development.

First Things First: Load and Parse a XML Document

By default, DOM4J comes configured with its own SAX parser, but you can reconfigure it to use your own SAX parser. For most of the applications, the SAXParser that DOM4J provides should be enough. (Click here to download all the files for the examples.)

After loading a XML document, you can retrieve all the element tag names and values with just three lines of code. The file LoadAndParse.java contains the code to load the document:

Element root = document.getRootElement();for ( Iterator i = root.elementIterator(); i.hasNext(); ){	Element element = (Element) i.next();	System.out.println("Element Name:"+element.getQualifiedName() );	System.out.println("Element Value:"+element.getText());}

JAXB generates classes for each complex type, with getter and setter accessories for each element tag. In the test.xml example document, you have to call the methods getBirthdayMonth(), getBirthdayDay(), and getBirthdayYear() to retrieve the element values. Now, you can parse the incoming XML with JAXB classes for XSD validation, and you can build your DOM4J document. When using JAXB objects, if your XML has 100 element tags, your code will have to invoke all 100 methods. This XML data-iteration feature alone is reason enough to use DOM4J in your next application.

Plug in Any Parser

DOM4J’s plug-and-play feature allows you to use any parser you like. The LoadWithDOM.java example file loads the same test.xml document from the previous section using javax DOMParser. After parsing the XML, you can convert the org.w3c.dom.Document tree into DOM4J’s org.dom4j.Document tree using the DOM4J DOMReader class:

DOMReader reader = new DOMReader();org.dom4j.Document document = reader.read(org.w3c.dom.Document doc);

That’s it! Now you can use DOM4J’s powerful navigation features to navigate the DOM tree.

Parse Extremely Large XML Documents

The quality of any XML parser is measured by its capability to parse large XML documents using minimal system resources. DOM4J is designed to achieve this ideal, and it provides a way to programmatically parse large XML documents.

DOM4J’s event-based model allows developers to prune the XML tree when parts of the document have been successfully processed, which eliminates the need to keep the entire document in memory.

DOM4J provides features for registering an event handler for one or more path expressions. DOM4J calls these handlers at the start and end of each path registered against a particular handler. When it finds the start tag of a path, it calls the onStart() method of the handler registered to the path. When DOM4J finds the end tag of a path, it calls the onEnd() method of the handler registered to that path. The DOM4J Element class provides the detach() method, which detaches the current element?thus pruning it from memory.

The file ParseLargeXML.java provides an example for processing a 14MB XML file. I was amazed to find that my application needed only 9MB of memory to parse the entire file. And it was extremely fast all along.

The DOM4J util Package

DOM4J’s util package provides many utility classes for comparing nodes, reporting parsing errors, creating a Singleton Document object, etc. This section highlights some of the most useful.

Comparing multiple XML documents is a very common feature in any service-oriented architecture (SOA) application. To do this, you need to write a lot of if statements to compare each element data. DOM4J addresses this need with the NodeComparator utility class, which compares two nodes (attributes, elements, documents, etc.) for equality.

Compare2Docs.java loads the test.xml and test1.xml documents, which contain the same XML documents with the same data, parses and loads them into DOM4J, and compares them using NodeComparator. In this case, since both XML documents are same, it prints the equality message:

NodeComparator comparator = new NodeComparator();if ( comparator.compare( d1, d2 ) == 0 ) {       System.out.println("Both documents are same.");}else{	System.out.println("Both documents are different.");} 

Should you modify the data in one of the XML documents, you will see the inequality message. Had DOM4J come a bit earlier, I would have saved myself many late nights spent writing the cumbersome code to compare two XML documents.

The DOM4J XMLErrorHandler utility class provides an XML representation of the errors that can occur during XML parsing. This is a very elegant way of reporting invalid XML documents. In order to retrieve the SAXParsing errors, you need to set the error handler to the SAXReader:

reader.setErrorHandler(errorHandler);

When an exception occurs, the errors can be retrieved using the follow command:

Element root = ((XMLErrorHandler)reader.getErrorHandler()).getErrors();

ErrorDemo.java contains code to demonstrate this DOM4J feature.

DOM4J’s SimpleSingleton utility class provides common factory access for the same object instance. This implementation creates a new instance from the class specified (Document) and does not create a new one unless it is reset. This is a very useful feature for building a single Document object across different application modules.

DOM4J to the Rescue

XML technology is perfect for developing integrated applications. However, parsing and retrieving XML data in your Java application requires thousands of lines of simple but cumbersome code. Enter DOM4J?and not a moment too soon. You can look forward for numerous lightweight, high-performance enterprise Java applications when you use DOM4J.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: