RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


XML Parsers: DOM and SAX Put to the Test  : Page 2

Before making the important decision to purchase an XML parser, look at the results of Steve Franklin's test of a selection of both DOM- and SAX-based parsers.


At present, two major API specifications define how XML parsers work: SAX and DOM. The DOM specification defines a tree-based approach to navigating an XML document. In other words, a DOM parser processes XML data and creates an object-oriented hierarchical representation of the document that you can navigate at run-time.

The SAX specification defines an event-based approach whereby parsers scan through XML data, calling handler functions whenever certain parts of the document (e.g., text nodes or processing instructions) are found.

How do the tree-based and event-based APIs differ? The tree-based W3C DOM parser creates an internal tree based on the hierarchical structure of the XML data. You can navigate and manipulate this tree from your software, and it stays in memory until you release it. DOM uses functions that return parent and child nodes, giving you full access to the XML data and providing the ability to interrogate and manipulate these nodes. DOM manipulation is straightforward and the API does not take long to understand, particularly if you have some JavaScript DOM experience.

In SAX's event-based system, the parser doesn't create any internal representation of the document. Instead, the parser calls handler functions when certain events (defined by the SAX specification) take place. These events include the start and end of the document, finding a text node, finding child elements, and hitting a malformed element.

SAX development is more challenging, because the API requires development of callback functions that handle the events. The design itself also can sometimes be less intuitive and modular. Using a SAX parser may require you to store information in your own internal document representation if you need to rescan or analyze the information—SAX provides no container for the document like the DOM tree structure.

Is having two completely different ways to parse XML data a problem? Not really, both parsers have very different approaches for processing the information. The W3C DOM specification provides a very rich and intuitive structure for housing the XML data, but can be quite resource-intensive given that the entire XML document is typically stored in memory. You can manipulate the DOM at run-time and stream the updated data as XML, or transform it to your own format if you require.

The strength of the SAX specification is that it can scan and parse gigabytes worth of XML documents without hitting resource limits, because it does not try to create the DOM representation in memory. Instead, it raises events that you can handle as you see fit. Because of this design, the SAX implementation is generally faster and requires fewer resources. On the other hand, SAX code is frequently complex, and the lack of a document representation leaves you with the challenge of manipulating, serializing, and traversing the XML document.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date