First Things First: Load and Parse a XML Document
By default, DOM4J comes configured with its own SAX parser, but you can reconfigure it to use your own SAX parser. For most of the applications, the SAXParser that DOM4J provides should be enough. (Click here
to download all the files for the examples.)
After loading a XML document, you can retrieve all the element tag names and values with just three lines of code. The file LoadAndParse.java contains the code to load the document:
Element root = document.getRootElement();
for ( Iterator i = root.elementIterator(); i.hasNext(); )
Element element = (Element) i.next();
System.out.println("Element Name:"+element.getQualifiedName() );
JAXB generates classes for each complex type, with getter and setter accessories for each element tag. In the test.xml example document, you have to call the methods
getBirthdayYear() to retrieve the element values. Now, you can parse the incoming XML with JAXB classes for XSD validation, and you can build your DOM4J document. When using JAXB objects, if your XML has 100 element tags, your code will have to invoke all 100 methods. This XML data-iteration feature alone is reason enough to use DOM4J in your next application.
Plug in Any Parser
DOM4J's plug-and-play feature allows you to use any parser you like. The LoadWithDOM.java example file loads the same test.xml document from the previous section using javax DOMParser. After parsing the XML, you can convert the org.w3c.dom.Document tree into DOM4J's org.dom4j.Document tree using the DOM4J DOMReader class:
DOMReader reader = new DOMReader();
org.dom4j.Document document = reader.read(org.w3c.dom.Document doc);
That's it! Now you can use DOM4J's powerful navigation features to navigate the DOM tree.
Parse Extremely Large XML Documents
The quality of any XML parser is measured by its capability to parse large XML documents using minimal system resources. DOM4J is designed to achieve this ideal, and it provides a way to programmatically parse large XML documents.
DOM4J's event-based model allows developers to prune the XML tree when parts of the document have been successfully processed, which eliminates the need to keep the entire document in memory.
DOM4J provides features for registering an event handler for one or more path expressions. DOM4J calls these handlers at the start and end of each path registered against a particular handler. When it finds the start tag of a path, it calls the
onStart() method of the handler registered to the path. When DOM4J finds the end tag of a path, it calls the
onEnd() method of the handler registered to that path. The DOM4J Element class provides the
detach() method, which detaches the current elementthus pruning it from memory.
The file ParseLargeXML.java provides an example for processing a 14MB XML file. I was amazed to find that my application needed only 9MB of memory to parse the entire file. And it was extremely fast all along.