advertisement
Premier Club Log In/Registration
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   SKILLBUILDING  |   TIP BANK  |   SOURCEBANK  |   FORUMS  |   NEWSLETTERS
Browse DevX
StAX Source Code
Partners & Affiliates
advertisement
advertisement
Average Rating: 3/5 | Rate this item | 1 user has rated this item.
 Print Print
 
StAX: DOM Ease with SAX Efficiency
StAX (the Streaming API for XML) is a memory-efficient, simple, and convenient way to process XML while retaining control over the parsing and writing process. 

advertisement
ith so many XML technologies, deciding what to use and when to use it can sometimes be bewildering. Many chose to build on top of existing DOM or SAX implementations rather than StAX (the Streaming API for XML). However, with StAX JSR-173 in the pipeline, this may change. StAX is a parser-independent, streaming pull-based Java API for reading and writing XML data. It is a memory-efficient, simple, and convenient way to process XML while retaining control over the parsing and writing process.


Most parsers fall into two broad categories: tree based (e.g., DOM) or event based (e.g., SAX). Although StAX is more closely aligned with the latter, it bridges the gap between the two. In SAX, data is pushed via events to application code handlers. In StAX, the application "pulls" the data from the XML data stream at its convenience. Application code can filter, skip tags, or stop parsing at any time. The application--not the parser--is in control, which enables a more intuitive way to process data.

This article gives you a look under the hood of this useful Java API and then demonstrates how to read and write XML documents efficiently using StAX.

A Brief Recap on XML Parsing

In tree-based or DOM parsers, the entire XML content is read and assembled into an in-memory, hierarchical object graph. Graphs are convenient when applications need to traverse the document multiple times or manipulate the DOM tree. The downside is that they can be inefficient. The object model can take up more memory than the raw XML itself. This precludes loading large documents into memory. SAX, on the other hand, is memory efficient. It reads the XML and pushes pieces of the document to application handlers using events. The parser takes control of the process, which makes it fast but also a bit awkward to use and debug.

SAX Push vs. StAX Pull

The following code examples demonstrate the respective push and pull approaches of SAX and StAX.

SAX
Application code registers a callback, which the SAX parser invokes as it reads the XML:

FileInputStream fis = new FileInputStream(file);

XMLReader saxXmlReader = XMLReaderFactory.createXMLReader();

// Create callback handler
DefaultHandler handler = new DefaultHandler() {
public void startElement(String uri, String localName, String qName, Attributes attributes) {
        // do something with element
      }
};


// register hander
saxXmlReader.setContentHandler(handler);
saxXmlReader.setErrorHandler(handler);

// control passed to parser...
saxXmlReader.parse(new InputSource(fis));

StAX
Application code controls parsing directly by iterating over the document using the StAX stream reader:

FileInputStream fis = new FileInputStream(file);

XMLInputFactory factory = (XMLInputFactory)XMLInputFactory.newInstance();
XMLStreamReader staxXmlReader = (XMLStreamReader) factory.createXMLStreamReader(fis);

for (int event = staxXmlReader.next(); event !=   XMLStreamConstants.END_DOCUMENT;
 event = staxXmlReader.next()) {

  if (event == XMLStreamConstants.START_ELEMENT) {
    String element = staxXmlReader.getLocalName();
    // do something with element
  }

}

Like SAX, StAX employs a streaming approach. It holds only a small part of the document in memory at any one time. Consequently, it is extremely efficient and a good choice for dealing with large documents.

StAX in Detail

The StAX XMLStreamReader is the main class for interacting with StAX. It presents an Iterator- (or Cursor-) style interface. (Other event-based Iterator APIs are available if you require them.) With the XMLStreamReader, an application iterates over the document by invoking next() until it has read all the data. Each call to next() advances the StAX reader to the next item in the XML stream, whether it be an element, namespace, DTD, or start or end document. The next() return code indicates which type of event has been read. The possible event types are defined as constants on the XMLStreamConstants interface.

A common Application StAX idiom is to read events in a loop using the XMLStreamReader and delegate control to other components based on the event type, using a switch or if statement:


for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) {
switch (event) {
  case XMLStreamConstants.START_DOCUMENT:
    System.out.println("Start document " + staxXmlReader.getLocalName());
    break;
  case XMLStreamConstants.START_ELEMENT:
    System.out.println("Start element " + staxXmlReader.getLocalName());
 	System.out.println("Element text " + staxXmlReader.getElementText());
    break;
  case XMLStreamConstants.END_ELEMENT:
    System.out.println("End element " + staxXmlReader.getLocalName());
    break;
  default:
    break;
  }
}

On each call, the application code can either chose to process the event or continue. In this fashion, the application can easily skip unwanted elements. However, some reader methods can be used only when the reader is positioned on certain tags. For example, calls to get attribute details such as XMLStreamReader::getAttributeValue() work only when the reader is currently positioned on a start element tag, not on end document tag or end element tag.

  Next Page: Patterns for Using StAX


Page 1: IntroductionPage 3: StAX Output
Page 2: Patterns for Using StAX 
advertisement
Advertising Info  |   Member Services  |   Permissions  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About


JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
IBM eBook: Planning a Service Oriented Architecture
IBM eBook: Choosing the Right Architecture--What It Means for You and Your Business
Microsoft Article: Will Hyper-V Make VMware This Decade's Netscape?
Avaya Article: Using Intelligent Presence to Create Smarter Business Applications
Intel Go Parallel Article: Getting Started with TBB on Windows
Microsoft Article: 7.0, Microsoft's Lucky Version?
Avaya Article: How to Feed Data into the Avaya Event Processor
IBM Article: Developing a Software Policy for Your Organization
Microsoft Article: Managing Virtual Machines with Microsoft System Center
Intel Go Parallel Article: Intel Threading Tools and OpenMP
HP eBook: Storage Networking , Part 1
Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
HP Video: StorageWorks EVA4400 and Oracle
HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Red Gate Download: SQL Toolbelt and free High-Performance SQL Code eBook
Iron Speed Designer Application Generator
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
Silverlight 2 App and Walkthrough: Leverage Silverlight 2 with SQL Server and XML
IBM Article: Enterprise Search--Do You Know What's Out There?
HP Demo: StorageWorks EVA4400
Microsoft Article: The Progress and Promise of Deep Zoom
Microsoft How-to Article: Get Going with Silverlight and Windows Live
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES