ith so many XML technologies, deciding what to use and when to use it can sometimes be bewildering. Many chose to build on top of existing DOM or SAX implementations rather than StAX (the Streaming API for XML). However, with StAX JSR-173 in the pipeline, this may change. StAX is a parser-independent, streaming pull-based Java API for reading and writing XML data. It is a memory-efficient, simple, and convenient way to process XML while retaining control over the parsing and writing process.
Most parsers fall into two broad categories: tree based (e.g., DOM) or event based (e.g., SAX). Although StAX is more closely aligned with the latter, it bridges the gap between the two. In SAX, data is pushed via events to application code handlers. In StAX, the application "pulls" the data from the XML data stream at its convenience. Application code can filter, skip tags, or stop parsing at any time. The application--not the parser--is in control, which enables a more intuitive way to process data.
This article gives you a look under the hood of this useful Java API and then demonstrates how to read and write XML documents efficiently using StAX.
FileInputStream fis = new FileInputStream(file);
XMLReader saxXmlReader = XMLReaderFactory.createXMLReader();
// Create callback handler
DefaultHandler handler = new DefaultHandler() {
public void startElement(String uri, String localName, String qName, Attributes attributes) {
// do something with element
}
};
// register hander
saxXmlReader.setContentHandler(handler);
saxXmlReader.setErrorHandler(handler);
// control passed to parser...
saxXmlReader.parse(new InputSource(fis));
FileInputStream fis = new FileInputStream(file);
XMLInputFactory factory = (XMLInputFactory)XMLInputFactory.newInstance();
XMLStreamReader staxXmlReader = (XMLStreamReader) factory.createXMLStreamReader(fis);
for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT;
event = staxXmlReader.next()) {
if (event == XMLStreamConstants.START_ELEMENT) {
String element = staxXmlReader.getLocalName();
// do something with element
}
}
Like SAX, StAX employs a streaming approach. It holds only a small part of the document in memory at any one time. Consequently, it is extremely efficient and a good choice for dealing with large documents.
A common Application StAX idiom is to read events in a loop using the XMLStreamReader and delegate control to other components based on the event type, using a switch or if statement:
for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) {
switch (event) {
case XMLStreamConstants.START_DOCUMENT:
System.out.println("Start document " + staxXmlReader.getLocalName());
break;
case XMLStreamConstants.START_ELEMENT:
System.out.println("Start element " + staxXmlReader.getLocalName());
System.out.println("Element text " + staxXmlReader.getElementText());
break;
case XMLStreamConstants.END_ELEMENT:
System.out.println("End element " + staxXmlReader.getLocalName());
break;
default:
break;
}
}
On each call, the application code can either chose to process the event or continue. In this fashion, the application can easily skip unwanted elements. However, some reader methods can be used only when the reader is positioned on certain tags. For example, calls to get attribute details such as XMLStreamReader::getAttributeValue() work only when the reader is currently positioned on a start element tag, not on end document tag or end element tag.
Take the following ATOM XML feed file as an example:
?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Simple Atom Feed File</title>
<subtitle>Using StAX to read feed files</subtitle>
<link href="http://example.org/"/>
<updated>2006-01-01T18:30:02Z</updated>
<author>
<name>Feed Author</name>
<email>doofus@feed.com</email>
</author>
<entry>
<title>StAX parsing is simple</title>
<link href="http://www.devx.com"/>
<updated>2006-01-01T18:30:02Z</updated>
<summary>Lean how to use StAX</summary>
</entry>
</feed>
To make life easy, create a small piece of infrastructure. Start by defining a ComponentParser interface that defines the contract between the main StAX event loop and parsing components:
public interface ComponentParser {
public void parseElement(XMLStreamReader staxXmlReader) throws XMLStreamException;
}
This allows parsing components to be dealt with in a common way through the interface.
Define two concrete parsers: one to parse ATOM author elements and one to parse ATOM entry elements. Ensure that they implement the ComponentParser interface.
The following is the AuthorParser class:
public class AuthorParser implements ComponentParser{
public void parse(XMLStreamReader staxXmlReader) throws XMLStreamException{
// read name
StaxUtil.moveReaderToElement("name",staxXmlReader);
String name = staxXmlReader.getElementText();
// read email
StaxUtil.moveReaderToElement("email",staxXmlReader);
String email = staxXmlReader.getElementText();
// Do something with author data...
}
}
The following is the EntryParser class:
public class EntryParser implements ComponentParser {
public void parse(XMLStreamReader staxXmlReader) throws XMLStreamException{
// read title
StaxUtil.moveReaderToElement("title",staxXmlReader);
String title = staxXmlReader.getElementText();
// read link attributes
StaxUtil.moveReaderToElement("link",staxXmlReader);
// read href attribute
String linkHref = staxXmlReader.getAttributeValue(0);
// read updated
StaxUtil.moveReaderToElement("updated",staxXmlReader);
String updated = staxXmlReader.getElementText();
// read title
StaxUtil.moveReaderToElement("summary",staxXmlReader);
String summary = staxXmlReader.getElementText();
// Do something with the data read from StAX..
}
}
The StaxUtil class is just a helper class for reading from the StAX reader until the code finds the target element. Note that you should take care to (1) read elements in the correct order, (2) not read past the end of the stream, and (3) not read data that belongs to other ComponentParsers.
In the main event loop, modify the code to farm out parsing work to ComponentParsers based on the XML element name. ComponentParsers can be pre-registered with the main class prior to parsing. The advantage of this pattern is that it keeps the main event loop code simple and devoid of any understanding of the ATOM XML format. ComponentParsers still pull data from StAX, but they are neatly separated and can be reused (e.g., in recurring elements in the XML hierarchy). You can now apply the loop to parse any XML file, provided you registered the appropriate ComponentParsers. The following is the main event loop using a component parser registry:
public class StaxParser implements ComponentParser {
private Map delegates;
…
public void parse(XMLStreamReader staxXmlReader) throws XMLStreamException{
for (int event = staxXmlReader.next(); event != XMLStreamConstants.END_DOCUMENT; event = staxXmlReader.next()) {
if (event == XMLStreamConstants.START_ELEMENT) {
String element = staxXmlReader.getLocalName();
// If a Component Parser is registered that can handle
// this element delegate…
if (delegates.containsKey(element)) {
ComponentParser parser = (ComponentParser) delegates.get(element);
parser.parse(staxXmlReader);
}
}
} //rof
}
}
Here's how you would put it all together in a test case:
InputStream in = this.getClass().getResourceAsStream("atom.xml");
XMLInputFactory factory = (XMLInputFactory) XMLInputFactory.newInstance();
XMLStreamReader staxXmlReader = (XMLStreamReader) factory.createXMLStreamReader(in);
StaxParser parser = new StaxParser();
parser.registerParser("author",new AuthorParser());
parser.registerParser("entry",new EntryParser());
parser.parse(staxXmlReader);
The following is an example of using StAX to generate an XML ATOM feed document:
File file = new File("atomoutput.xml");
FileOutputStream out = new FileOutputStream(file);
String now = new SimpleDateFormat().format(new Date(System.currentTimeMillis()));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter staxWriter = factory.createXMLStreamWriter(out);
staxWriter.writeStartDocument("UTF-8", "1.0");
// feed
staxWriter.writeStartElement("feed");
staxWriter.writeNamespace("", "http://www.w3.org/2005/Atom");
// title
StaxUtil.writeElement(staxWriter,"title","Simple Atom Feed File");
// subtitle
StaxUtil.writeElement(staxWriter,"subtitle","Using StAX to read feed files");
// link
staxWriter.writeStartElement("link");
staxWriter.writeAttribute("href","http://example.org/");
staxWriter.writeEndElement();
// updated
StaxUtil.writeElement(staxWriter,"updated",now);
// author
...
// entry
..
staxWriter.writeEndElement(); // end feed
staxWriter.writeEndDocument();
staxWriter.flush();
staxWriter.close();
The resultant XML file is identical to the ATOM feed file previously shown in the "Patterns for Using StAX" section.
You'll find the entire StAX JSR-173 on the JCP Web site.
| DevX is a division of Jupitermedia Corporation © Copyright 2007 Jupitermedia Corporation. All Rights Reserved. Legal Notices |