Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX

By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.


StAX: DOM Ease with SAX Efficiency

StAX (the Streaming API for XML) is a memory-efficient, simple, and convenient way to process XML while retaining control over the parsing and writing process.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

ith so many XML technologies, deciding what to use and when to use it can sometimes be bewildering. Many chose to build on top of existing DOM or SAX implementations rather than StAX (the Streaming API for XML). However, with StAX JSR-173 in the pipeline, this may change. StAX is a parser-independent, streaming pull-based Java API for reading and writing XML data. It is a memory-efficient, simple, and convenient way to process XML while retaining control over the parsing and writing process.

Most parsers fall into two broad categories: tree based (e.g., DOM) or event based (e.g., SAX). Although StAX is more closely aligned with the latter, it bridges the gap between the two. In SAX, data is pushed via events to application code handlers. In StAX, the application "pulls" the data from the XML data stream at its convenience. Application code can filter, skip tags, or stop parsing at any time. The application--not the parser--is in control, which enables a more intuitive way to process data.

This article gives you a look under the hood of this useful Java API and then demonstrates how to read and write XML documents efficiently using StAX.

A Brief Recap on XML Parsing

In tree-based or DOM parsers, the entire XML content is read and assembled into an in-memory, hierarchical object graph. Graphs are convenient when applications need to traverse the document multiple times or manipulate the DOM tree. The downside is that they can be inefficient. The object model can take up more memory than the raw XML itself. This precludes loading large documents into memory. SAX, on the other hand, is memory efficient. It reads the XML and pushes pieces of the document to application handlers using events. The parser takes control of the process, which makes it fast but also a bit awkward to use and debug.

SAX Push vs. StAX Pull

The following code examples demonstrate the respective push and pull approaches of SAX and StAX.

Application code registers a callback, which the SAX parser invokes as it reads the XML:

FileInputStream fis = new FileInputStream(file); XMLReader saxXmlReader = XMLReaderFactory.createXMLReader(); // Create callback handler DefaultHandler handler = new DefaultHandler() { public void startElement(String uri, String localName, String qName, Attributes attributes) { // do something with element } }; // register hander saxXmlReader.setContentHandler(handler); saxXmlReader.setErrorHandler(handler); // control passed to parser... saxXmlReader.parse(new InputSource(fis));

Application code controls parsing directly by iterating over the document using the StAX stream reader:

FileInputStream fis = new FileInputStream(file); XMLInputFactory factory = (XMLInputFactory)XMLInputFactory.newInstance(); XMLStreamReader staxXmlReader = (XMLStreamReader) factory.createXMLStreamReader(fis); for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) { if (event == XMLStreamConstants.START_ELEMENT) { String element = staxXmlReader.getLocalName(); // do something with element } }

Like SAX, StAX employs a streaming approach. It holds only a small part of the document in memory at any one time. Consequently, it is extremely efficient and a good choice for dealing with large documents.

StAX in Detail

The StAX XMLStreamReader is the main class for interacting with StAX. It presents an Iterator- (or Cursor-) style interface. (Other event-based Iterator APIs are available if you require them.) With the XMLStreamReader, an application iterates over the document by invoking next() until it has read all the data. Each call to next() advances the StAX reader to the next item in the XML stream, whether it be an element, namespace, DTD, or start or end document. The next() return code indicates which type of event has been read. The possible event types are defined as constants on the XMLStreamConstants interface.

A common Application StAX idiom is to read events in a loop using the XMLStreamReader and delegate control to other components based on the event type, using a switch or if statement:

for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) { switch (event) { case XMLStreamConstants.START_DOCUMENT: System.out.println("Start document " + staxXmlReader.getLocalName()); break; case XMLStreamConstants.START_ELEMENT: System.out.println("Start element " + staxXmlReader.getLocalName()); System.out.println("Element text " + staxXmlReader.getElementText()); break; case XMLStreamConstants.END_ELEMENT: System.out.println("End element " + staxXmlReader.getLocalName()); break; default: break; } }

On each call, the application code can either chose to process the event or continue. In this fashion, the application can easily skip unwanted elements. However, some reader methods can be used only when the reader is positioned on certain tags. For example, calls to get attribute details such as XMLStreamReader::getAttributeValue() work only when the reader is currently positioned on a start element tag, not on end document tag or end element tag.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date