Stage Two of the Solution: Chunks to Beans
The implementation for the first stage of the process is in place; the XML is chunked into a size that is easily handled with XML object binding. In this case, the XML chunks contain one element each, but you can create larger chunks for batch processing. For example, 10 Employee
elements could be read by simply adding a counter and delaying the resetting of the XML memory buffer until the counter reaches 10. In most applications, this would probably be some kind of property that could be configured at runtime to assist in tuning the performance of the system.
Now it is time for the second stage of the process: reading the chunks into XMLBeans. As you read earlier, XML object binding works by generating classes that can serialize and deserialize the XML. XMLBeans provides two different tools for generating the classes required:
- The scomp command line tool takes the XML schema file as input and produces a Java archive (JAR) file that contains the compiled classes.
- The provided Apache Ant task xmlbean easily integrates the class generation into your build process.
With either tool, the output is a JAR file and optionally the precompiled source files (which may be useful when working in an IDE). After running the XMLBeans tools on the schema presented earlier for this project, you will see interfaces in the org.mpilone.companySchema package such as CompanyDocument, CompanyType, EmployeeType, and ListOfEmployees. You can see how these interfaces correspond to the complex types in the XML schema, with CompanyDocument referring to the top-level Company element. The implementation for these interfaces is found in the impl subpackage, however you should never use the implementation files directly.
XMLBeans-generated classes assume that the XML they read and write will contain only the children of the type. For example, the ListOfEmployees type will generate XML that contains Employee tags, but not a ListOfEmployee tag (the main reason for this is the ListOfEmployees is a type, not an instance of a type, so it does not have an element instance in the XML when it is out of context). Knowing this, you can use the ListOfEmployees to read the chunks that are stored in the XML buffer. To do this, flush the XMLEventWriter to ensure that all the events have been written to the buffer and then extract the XML in the buffer using the Reset() method. Use the nested Factory in the ListOfEmployees to parse the XML into objects as shown in Listing 5. If your chunk contains more than one Employee instance, you may have to wrap the chunk in <xml-fragment> tags. This is required by the internal XMLBeans parser.
You can now use the ListOfEmployees object to gain access to all the employees in the original XML chunk. XMLBeans provides schema-level validation that you can perform on the object to obtain a detailed list of errors in the XML. The complex types in the XML are also parsed and exposed as real Java objects, such as with XML Schema dateTime types becoming java.util.Calendar types. These type mappings greatly simplify working with XML data.
A New Day for XML Chunking and Object Binding
This article gave a simplified view of XML chunking and object binding, but you can extend this basic framework in many exciting ways. StAX provides a mechanism to implement filters that would allow you to manipulate the XML as it is read, prior to the application parsing it into objects. Using the XML event API, an application can measure the elements processed or implement simple transformations on an XML stream in real-time. Looking at the other end of the process, you could use object/relation-binding technologies like Apache's iBATIS to easily map the XMLBean objects into a relational database.
While XML Schema and XML parsing has been around for a number of years, new APIs such as StAX and XMLBeans give new life and flexibility to how applications deal with data. Before StAX, XML chunking could be tricky, involving technologies like XSLT, raw string matching, or verbose SAX handlers. In just a couple of hundred lines of code, you can write a robust and efficient XML parser that can handle huge input files while exposing the data as easy-to-use Java objects.