Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX

By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.


XML Parsers: DOM and SAX Put to the Test  : Page 3

Before making the important decision to purchase an XML parser, look at the results of Steve Franklin's test of a selection of both DOM- and SAX-based parsers.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Putting the Parser to the Test
To determine the right parser for you, prioritize the importance of functionality, speed, memory requirements, and class footprint size. A few types of tests can help you evaluate them, although the performance of some depends on the specific nature and design of your software. These tests include parsing large and small XML documents, traversing and navigating the processed DOM, constructing a DOM from scratch, and evaluating the resource requirements of the parser.

You can tell quite a bit about a parser by using one or two simple XML documents. If your software will have to deal with many small files, see if the parser has some initialization overhead that slows down repeated parsing. For very large files, confirm that the parser can interpret the file in sufficient time with reasonable resource requirements. For the latter case, very large XML documents may require using a SAX parser that does not store the document in memory. You might also consider reading in parts of the document (using an appropriate DTD that allows for a partial document) and manipulating the document fragments in memory, one at a time.

In addition, new DOM parsing solutions may handle massive XML documents more effectively. Remember that the DOM API specifies only how to interact with the document, not how it must be stored. Persistent DOM (PDOM) implementations with index-based searches and retrieval are in the works, but I have not yet tested any of these.

You should also evaluate how well the parser traverses an in-memory DOM after XML data has been parsed. If you require the ability to search or scan through a post-parsed DOM using the API, you can rule out SAX—unless you are willing to create your own document model from your callback functions. For W3C DOM-compliant parsers, test the speed of scanning through the constructed DOM to see how expensive traversal of the tree can be.

Some XML parsers come with a serialization feature and are able to convert a document tree to XML data. This capability is not in all parsers, but the performance of parsers that support this ability is often proportional to the time required to navigate a given document tree using the API. Again, because SAX does not support an internal representation of the document, you would have to provide your own document and serialization functionality.

Parsing Benchmarks
The available XML parsers vary in performance. Performance is not a definitive benchmark, and it barely scratches the surface of all parser capabilities. I used the XmlTest application to test a selection of Java-based XML parsers:

  • Sun's Project X parser, included with the JAXP release
  • Oracle's v2 XML parser
  • the Xerces-J parser, shared by both IBM and Apache
  • XP
All of the parsers have both SAX and DOM support except for the XP parser, which is SAX-based.

Test Framework Design
Figure 1: Test Framework Design

Figure 1 shows the architecture for my test framework. The XmlTest application took an argument that specified which parser to instantiate and test. This insured that each parser started with a clean Java run-time (JRT). The following tests were performed:
  1. Read and parse a small DTD-enforced XML file (approximately 25 elements)
  2. Read and parse a large DTD-enforced XML file (approximately 50,000 elements)
  3. Navigate the DOM created by Test #2 from root to all children
  4. Build a large DOM (approximately 60,000 elements) from scratch
  5. Build an infinitely large DOM from scratch using createElement(...) and similar function calls. Continue until a java.OutOfMemoryError is raised. Measure the time it takes to hit the "memory wall" within a default (32MB heap) JRE and count how many elements are created before the unrecoverable error is raised.
Table 1 shows the results of the tests. Again, these are not meant to be definitive. The tests are grouped by DOM-based and SAX-based parsers. Some tests were not performed on SAX parsers (indicated by the "-" designation). All tests except for Test #5 were run as follows: one dry run to remove any caching effects and then five repetitions of the test. The results are averaged to produce the test scores. Test #5 was averaged by running the same test framework five times to confirm the results. It was impossible to repeat the test within one JRE session because a java.OutOfMemoryError is not recoverable, leaving the final {...} clause to report the test results and exit.

Test #
Small Read
Large Read
Large Nav
Build Large
Build Huge
Max Size
SunDOM 0.022 3.732 0.21 0.496 12.33 440,358
OracleDOM 0.014 2.976 0.06 0.926 8.23 281,308
XercesDOM 0.042 2.482 0.078 0.81 10.11 389,044
SunSAX 0.018 0.7
OracleSAX 0.01 0.546
XercesSAX 0.036 1.3
XPSAX 0.016 0.458

Table 1: Test Results

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date