Browse DevX
Sign up for e-mail newsletters from DevX


XML Parsers: DOM and SAX Put to the Test  : Page 4

Before making the important decision to purchase an XML parser, look at the results of Steve Franklin's test of a selection of both DOM- and SAX-based parsers.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

What Have We Learned?
From these results, one could draw some initial conclusions. First, the results clearly vary quite a bit for very similar code across all parsers. Only minimal changes were made to comply with the specific interface of each parser. XP obviously accomplishes one of its goals: high performance. However, this may be explained through some missing features such as lack of DTD validation, which creates overhead for the other parsers.

SAX clearly beats DOM for run-time parsing, although its lack of an internal DOM representation will cause some difficulties for developers under certain situations. These differences are most apparent when the document gets very large. Although these tests do not show it, SAX parsers typically are faster for very large documents where the DOM model hits virtual memory or consumes all available memory.

These tests also seemed to indicate that Sun was much more efficient during construction than the read-and-parse state. Although Sun excelled in Tests #4 and #5, it came in last place for Tests #1 and #2.

I have no doubt that some tweaking of each parser's default behavior could improve its results. That would be the next phase in your evaluation. Test the parsers with project-specific requirements that enter into the equation, such as XSL transformations and document sizes. Even the attribute types and complexity/nesting of elements can affect the parsers differently. Some parsers are more efficient with heavy white spacing, while attribute-rich elements bog down others.

Know Your Needs
If you need to parse and process huge XML documents, SAX implementations obviously offer some benefits over DOM-based ones. Also ask yourself if an improved design would remove the need for such large XML documents, perhaps pre-filtering in a database that can stream XML would suit your needs. By going with SAX, you may restrict your options for document manipulation and XSLT and require your team to write code to internally manage, store, and rewrite the document. SAX is best suited to sequential-scan applications, for which you want to quickly go through the XML document start-to-finish. However, sometimes you won't need the overhead of a full-blown DOM, and a SAX parser will be sufficient for creating a lightweight and compact internal data structure.

At the same time, DOM has great advantages, including its simplicity, powerful access to the document, popularity, and well-defined specification. It also pairs nicely with XSLT and other document-transformation solutions you may require. DOM implementations are currently biased towards in-memory storage of the document, but this may change as PDOM implementations become more popular. Programming DOM code becomes even easier with a JDOM wrapper for Java, which encapsulates SAX/DOM manipulation behind a much simpler interface.

A large number of parser options are available. Picking the right one can be tricky, but a few tests will help to point you in the right direction. The JAXP plug-in XML parser framework could make it much easier for you to swap and evaluate XML parsers without significantly breaking your code. Also, using news groups to gauge other developers' feedback can save you some time. I can't recommend a specific parser as the right tool because I don't know your situation. The one for you depends on the needs of your application.

Steve Franklin handles the architecture and project engineering responsibilities at a major software firm dealing with J2EE, client/server, command and control, and other distributed architectures. Steve Franklin's primary "off-hours" hobby can be found at Lookoff.com, a repository for Internet and research resources.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date