StAX: DOM Ease with SAX Efficiency

StAX: DOM Ease with SAX Efficiency

ith so many XML technologies, deciding what to use and when to use it can sometimes be bewildering. Many chose to build on top of existing DOM or SAX implementations rather than StAX (the Streaming API for XML). However, with StAX JSR-173 in the pipeline, this may change. StAX is a parser-independent, streaming pull-based Java API for reading and writing XML data. It is a memory-efficient, simple, and convenient way to process XML while retaining control over the parsing and writing process.

Most parsers fall into two broad categories: tree based (e.g., DOM) or event based (e.g., SAX). Although StAX is more closely aligned with the latter, it bridges the gap between the two. In SAX, data is pushed via events to application code handlers. In StAX, the application “pulls” the data from the XML data stream at its convenience. Application code can filter, skip tags, or stop parsing at any time. The application–not the parser–is in control, which enables a more intuitive way to process data.

This article gives you a look under the hood of this useful Java API and then demonstrates how to read and write XML documents efficiently using StAX.

A Brief Recap on XML Parsing

In tree-based or DOM parsers, the entire XML content is read and assembled into an in-memory, hierarchical object graph. Graphs are convenient when applications need to traverse the document multiple times or manipulate the DOM tree. The downside is that they can be inefficient. The object model can take up more memory than the raw XML itself. This precludes loading large documents into memory. SAX, on the other hand, is memory efficient. It reads the XML and pushes pieces of the document to application handlers using events. The parser takes control of the process, which makes it fast but also a bit awkward to use and debug.

SAX Push vs. StAX Pull

The following code examples demonstrate the respective push and pull approaches of SAX and StAX.

SAX

Application code registers a callback, which the SAX parser invokes as it reads the XML:

FileInputStream fis = new FileInputStream(file);XMLReader saxXmlReader = XMLReaderFactory.createXMLReader();// Create callback handlerDefaultHandler handler = new DefaultHandler() {public void startElement(String uri, String localName, String qName, Attributes attributes) {        // do something with element      }};// register handersaxXmlReader.setContentHandler(handler);saxXmlReader.setErrorHandler(handler);// control passed to parser...saxXmlReader.parse(new InputSource(fis));

StAX

Application code controls parsing directly by iterating over the document using the StAX stream reader:

FileInputStream fis = new FileInputStream(file);XMLInputFactory factory = (XMLInputFactory)XMLInputFactory.newInstance();XMLStreamReader staxXmlReader = (XMLStreamReader) factory.createXMLStreamReader(fis);for (int event = staxXmlReader.next(); event !=   XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) {  if (event == XMLStreamConstants.START_ELEMENT) {    String element = staxXmlReader.getLocalName();    // do something with element  }}

Like SAX, StAX employs a streaming approach. It holds only a small part of the document in memory at any one time. Consequently, it is extremely efficient and a good choice for dealing with large documents.

StAX in Detail

The StAX XMLStreamReader is the main class for interacting with StAX. It presents an Iterator- (or Cursor-) style interface. (Other event-based Iterator APIs are available if you require them.) With the XMLStreamReader, an application iterates over the document by invoking next() until it has read all the data. Each call to next() advances the StAX reader to the next item in the XML stream, whether it be an element, namespace, DTD, or start or end document. The next() return code indicates which type of event has been read. The possible event types are defined as constants on the XMLStreamConstants interface.

A common Application StAX idiom is to read events in a loop using the XMLStreamReader and delegate control to other components based on the event type, using a switch or if statement:

for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) {switch (event) {  case XMLStreamConstants.START_DOCUMENT:    System.out.println("Start document " + staxXmlReader.getLocalName());    break;  case XMLStreamConstants.START_ELEMENT:    System.out.println("Start element " + staxXmlReader.getLocalName()); 	System.out.println("Element text " + staxXmlReader.getElementText());    break;  case XMLStreamConstants.END_ELEMENT:    System.out.println("End element " + staxXmlReader.getLocalName());    break;  default:    break;  }}

On each call, the application code can either chose to process the event or continue. In this fashion, the application can easily skip unwanted elements. However, some reader methods can be used only when the reader is positioned on certain tags. For example, calls to get attribute details such as XMLStreamReader::getAttributeValue() work only when the reader is currently positioned on a start element tag, not on end document tag or end element tag.

Patterns for Using StAX

If your XML is anything more than trivial, you’ll find that putting all that parsing logic inside one large event loop can quickly become unmanageable and hard to maintain. A better way to do this is to group logically related units of parsing work into discrete components that can be called from within the main event loop.

Take the following ATOM XML feed file as an example:

?xml version="1.0" encoding="utf-8"?>"http://www.w3.org/2005/Atom"> </span>Simple Atom Feed File<span style='color:navy'> Using StAX to read feed files "http://example.org/"/> 2006-01-01T18:30:02Z     Feed Author   [email protected]     </span>StAX parsing is simple<span style='color:navy'>   "http://www.devx.com"/>   2006-01-01T18:30:02Z   Lean how to use StAX 

To make life easy, create a small piece of infrastructure. Start by defining a ComponentParser interface that defines the contract between the main StAX event loop and parsing components:

public interface ComponentParser {  public void parseElement(XMLStreamReader staxXmlReader) throws XMLStreamException;}

This allows parsing components to be dealt with in a common way through the interface.

Define two concrete parsers: one to parse ATOM author elements and one to parse ATOM entry elements. Ensure that they implement the ComponentParser interface.

The following is the AuthorParser class:

public class AuthorParser implements ComponentParser{    public void parse(XMLStreamReader staxXmlReader) throws XMLStreamException{          // read name      StaxUtil.moveReaderToElement("name",staxXmlReader);      String name = staxXmlReader.getElementText();            // read email      StaxUtil.moveReaderToElement("email",staxXmlReader);      String email = staxXmlReader.getElementText();            // Do something with author data...  }}

The following is the EntryParser class:

public class EntryParser implements ComponentParser {  public void parse(XMLStreamReader staxXmlReader) throws XMLStreamException{          // read title      StaxUtil.moveReaderToElement("title",staxXmlReader);      String title = staxXmlReader.getElementText();            // read link attributes      StaxUtil.moveReaderToElement("link",staxXmlReader);      // read href attribute      String linkHref = staxXmlReader.getAttributeValue(0);            // read updated      StaxUtil.moveReaderToElement("updated",staxXmlReader);      String updated = staxXmlReader.getElementText();            // read title      StaxUtil.moveReaderToElement("summary",staxXmlReader);      String summary = staxXmlReader.getElementText();            // Do something with the data read from StAX..  }}

The StaxUtil class is just a helper class for reading from the StAX reader until the code finds the target element. Note that you should take care to (1) read elements in the correct order, (2) not read past the end of the stream, and (3) not read data that belongs to other ComponentParsers.

In the main event loop, modify the code to farm out parsing work to ComponentParsers based on the XML element name. ComponentParsers can be pre-registered with the main class prior to parsing. The advantage of this pattern is that it keeps the main event loop code simple and devoid of any understanding of the ATOM XML format. ComponentParsers still pull data from StAX, but they are neatly separated and can be reused (e.g., in recurring elements in the XML hierarchy). You can now apply the loop to parse any XML file, provided you registered the appropriate ComponentParsers. The following is the main event loop using a component parser registry:

public class StaxParser implements ComponentParser {    private Map delegates;    …    public void parse(XMLStreamReader staxXmlReader) throws XMLStreamException{      for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) {        if (event == XMLStreamConstants.START_ELEMENT) {          String element = staxXmlReader.getLocalName();          // If a Component Parser is registered that can handle          // this element delegate…          if (delegates.containsKey(element)) {            ComponentParser parser = (ComponentParser) delegates.get(element);            parser.parse(staxXmlReader);          }         }      } //rof    }}

Here’s how you would put it all together in a test case:

InputStream in = this.getClass().getResourceAsStream("atom.xml");     XMLInputFactory factory = (XMLInputFactory) XMLInputFactory.newInstance(); XMLStreamReader staxXmlReader = (XMLStreamReader) factory.createXMLStreamReader(in);     StaxParser parser = new StaxParser(); parser.registerParser("author",new AuthorParser()); parser.registerParser("entry",new EntryParser());     parser.parse(staxXmlReader);

StAX Output

No discussion of StAX is complete without mentioning StAX output. StAX is bi-directional in that it supports both read and write. The StAX XMLStreamWriter class provides a simple, low-level API to output XML data.

The following is an example of using StAX to generate an XML ATOM feed document:

File file = new File("atomoutput.xml");FileOutputStream out = new FileOutputStream(file);String now = new SimpleDateFormat().format(new Date(System.currentTimeMillis()));XMLOutputFactory factory = XMLOutputFactory.newInstance();XMLStreamWriter staxWriter = factory.createXMLStreamWriter(out);staxWriter.writeStartDocument("UTF-8", "1.0");// feedstaxWriter.writeStartElement("feed");staxWriter.writeNamespace("", "http://www.w3.org/2005/Atom");// titleStaxUtil.writeElement(staxWriter,"title","Simple Atom Feed File");// subtitleStaxUtil.writeElement(staxWriter,"subtitle","Using StAX to read feed files");// linkstaxWriter.writeStartElement("link");staxWriter.writeAttribute("href","http://example.org/");staxWriter.writeEndElement();// updatedStaxUtil.writeElement(staxWriter,"updated",now);// author...// entry.. staxWriter.writeEndElement(); // end feedstaxWriter.writeEndDocument();staxWriter.flush();staxWriter.close();

The resultant XML file is identical to the ATOM feed file previously shown in the “Patterns for Using StAX” section.

StAX Parsers

Several JSR-173-compliant parsers are available, including the following:

  1. Woodstox
  2. StAX Reference Implementation
  3. Oracle StAX Pull Parser
  4. BEA

You’ll find the entire StAX JSR-173 on the JCP Web site.

Just How Fast Is StAX?

In addition to being easy to use, StAX is also very fast. Sun has released a whitepaper (PDF) that compares its performance with several other parsers.

The Future of StAX

StAX is ideally suited for no-nonsense, efficient XML input and output. The pull paradigm promotes a more intuitive parsing approach whereby application components can aggregate logically related parsing operations and pull what they want from the stream one element after another. Developers must still maintain appropriate state throughout the parsing process, but they retain overall control. StAX, like SAX, works well for large documents and when parts of the document can be dealt with in small chunks independently of other chunks. And best of all, it’s fast.

devx-admin

devx-admin

Share the Post:
iPhone 15 Unveiling

The iPhone 15’s Secrets and Surprises

As we dive into the most frequently asked questions and intriguing features, let us reiterate that the iPhone 15 brings substantial advancements in technology and

Performance Camera

iPhone 15: Performance, Camera, Battery

Apple’s highly anticipated iPhone 15 has finally hit the market, sending ripples of excitement across the tech industry. For those considering upgrading to this new

Battery Breakthrough

Electric Vehicle Battery Breakthrough

The prices of lithium-ion batteries have seen a considerable reduction, with the cost per kilowatt-hour dipping under $100 for the first occasion in two years,

Economy Act Soars

Virginia’s Clean Economy Act Soars Ahead

Virginia has made significant strides towards achieving its short-term carbon-free objectives as outlined in the Clean Economy Act of 2020. Currently, about 44,000 megawatts (MW)

Renewable Storage Innovation

Innovative Energy Storage Solutions

The Department of Energy recently revealed a significant investment of $325 million in advanced battery technologies to store excess renewable energy produced by solar and

iPhone 15 Unveiling

The iPhone 15’s Secrets and Surprises

As we dive into the most frequently asked questions and intriguing features, let us reiterate that the iPhone 15 brings substantial advancements in technology and design compared to its predecessors.

Chip Overcoming

iPhone 15 Pro Max: Overcoming Chip Setbacks

Apple recently faced a significant challenge in the development of a key component for its latest iPhone series, the iPhone 15 Pro Max, which was unveiled just a week ago.

Performance Camera

iPhone 15: Performance, Camera, Battery

Apple’s highly anticipated iPhone 15 has finally hit the market, sending ripples of excitement across the tech industry. For those considering upgrading to this new model, three essential features come

Battery Breakthrough

Electric Vehicle Battery Breakthrough

The prices of lithium-ion batteries have seen a considerable reduction, with the cost per kilowatt-hour dipping under $100 for the first occasion in two years, as reported by energy analytics

Economy Act Soars

Virginia’s Clean Economy Act Soars Ahead

Virginia has made significant strides towards achieving its short-term carbon-free objectives as outlined in the Clean Economy Act of 2020. Currently, about 44,000 megawatts (MW) of wind, solar, and energy

Renewable Storage Innovation

Innovative Energy Storage Solutions

The Department of Energy recently revealed a significant investment of $325 million in advanced battery technologies to store excess renewable energy produced by solar and wind sources. This funding will

Renesas Tech Revolution

Revolutionizing India’s Tech Sector with Renesas

Tushar Sharma, a semiconductor engineer at Renesas Electronics, met with Indian Prime Minister Narendra Modi to discuss the company’s support for India’s “Make in India” initiative. This initiative focuses on

Development Project

Thrilling East Windsor Mixed-Use Development

Real estate developer James Cormier, in collaboration with a partnership, has purchased 137 acres of land in Connecticut for $1.15 million with the intention of constructing residential and commercial buildings.

USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies in the USA. Through a

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in and explore the leaders in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India, and kickstart your journey to

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner for your online project. Your

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the state. A Senate committee meeting

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor supply chain and enhance its

Revolutionized Low-Code

You Should Use Low-Code Platforms for Apps

As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with minimal coding. These platforms not

Cybersecurity Strategy

Five Powerful Strategies to Bolster Your Cybersecurity

In today’s increasingly digital landscape, businesses of all sizes must prioritize cyber security measures to defend against potential dangers. Cyber security professionals suggest five simple technological strategies to help companies

Global Layoffs

Tech Layoffs Are Getting Worse Globally

Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

Huawei Electric Dazzle

Huawei Dazzles with Electric Vehicles and Wireless Earbuds

During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

Cybersecurity Banking Revolution

Digital Banking Needs Cybersecurity

The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

FinTech Leadership

Terry Clune’s Fintech Empire

Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

Generative AI Revolution

Is Generative AI the Next Internet?

The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

Microsoft Laptop

The New Surface Laptop Studio 2 Is Nuts

The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will