SAX vs. DOM
At present, two major API specifications define how XML parsers work: SAX and DOM. The DOM specification defines a tree-based approach to navigating an XML document. In other words, a DOM parser processes XML data and creates an object-oriented hierarchical representation of the document that you can navigate at run-time.
The SAX specification defines an event-based approach whereby parsers scan through XML data, calling handler functions whenever certain parts of the document (e.g., text nodes or processing instructions) are found.
In SAX's event-based system, the parser doesn't create any internal representation of the document. Instead, the parser calls handler functions when certain events (defined by the SAX specification) take place. These events include the start and end of the document, finding a text node, finding child elements, and hitting a malformed element.
SAX development is more challenging, because the API requires development of callback functions that handle the events. The design itself also can sometimes be less intuitive and modular. Using a SAX parser may require you to store information in your own internal document representation if you need to rescan or analyze the informationSAX provides no container for the document like the DOM tree structure.
Is having two completely different ways to parse XML data a problem? Not really, both parsers have very different approaches for processing the information. The W3C DOM specification provides a very rich and intuitive structure for housing the XML data, but can be quite resource-intensive given that the entire XML document is typically stored in memory. You can manipulate the DOM at run-time and stream the updated data as XML, or transform it to your own format if you require.
The strength of the SAX specification is that it can scan and parse gigabytes worth of XML documents without hitting resource limits, because it does not try to create the DOM representation in memory. Instead, it raises events that you can handle as you see fit. Because of this design, the SAX implementation is generally faster and requires fewer resources. On the other hand, SAX code is frequently complex, and the lack of a document representation leaves you with the challenge of manipulating, serializing, and traversing the XML document.