Streaming Object Marshalling APIs
Leveraging the well-known marshalling APIs defined by frameworks such as Spring-WS, I started with the marshaller and unmarshaller interface shown in Listing 1
. These interfaces provide a common method for transforming an object into and out of XML while hiding the underlying implementation and providing consistent exception handling. The next step is to extend these interfaces to support objects of any size.
One option is to pass lists to the marshal method, but that doesn't meet the requirement of scalability if all of the objects must be placed in the list first. This can also cause problems because the number of lists required could vary based on the object being serialized, and therefore doesn't easily support a generic API. A callback interface can be used to request the objects to be written as they are needed. This creates a 'stream' of objects that are pulled directly from the source to the marshaller, or from the unmarshaller directly to the sink. Listing 2 shows the simple marshaller source and unmarshaller sink APIs.
To let the marshaller know when to start pulling objects from the source, or which source/sink to use if multiple object streams are used, the streamed object must be described to the marshaller. This is done using a stream definition as shown in Listing 3. The definition simply provides a lookup based on the current element being written, and can return an object stream source. The objects returned from the source compose the children of the original element. The same design works in reverse for unmarshalling as the stream definition is interrogated for object stream sinks.
The final step is to modify the marshal and unmarshal methods to take the stream definition as a parameter (see Listing 4). The new API, using the stream definition to locate the object streams, can be used as a generic OXM API to support objects of any size very efficiently. With the interfaces in place, a working implementation is the next objective.
Implementation with JAXB on StAX
Using the interfaces described in the previous section, I could implement the solution using existing XML marshalling tools. I decided to use JAXB on top of the StAX library because of the stability and flexibility that these two tools provide. At the same time, my general marshaller was already implemented with JAXB, so the new streaming marshaller could leverage some of that implementation. Also, to reduce the number of implementation classes, the JAXB implementation recognizes both the StreamMarshaller and StreamUnmarshaller interfaces. See the full implementation in Listing 5.
The key to the implementation is detecting when an element being written requires children from a stream source. To accomplish this, the implementation creates a dynamic proxy to the StAX XMLEventWriter instance, which watches for add method calls that are adding a start element event. In generic terms, the stream marshaller is looking for the specific XML open element tag that will contain streamed objects. For each start element, the stream definition is consulted to see if there is a stream source available.
If the stream definition returns a stream source, the current marshalling activity is suspended, and a new marshalling loop begins in which the objects from the source are marshaled to the event writer until all are exhausted. Once exhausted, the original marshalling activity resumes. This streaming technique allows the stream source to load the objects to be marshalled on demand and therefore limits memory usage. Due to a reentrancy limitation in the JAXB marshaller, a second marshaller is used in the inner loop, but other implementations may support the use of a single marshaller.
The unmarshalling process works in essentially the same way. If a start element is found, the stream definition is consulted. If a stream sink is returned from the definition, the current unmarshalling activity is suspended, and a new unmarshalling loop begins in which the child objects are unmarshalled and given to the sink until an end element is found. Again, the large list of objects in the XML document are read and processed individually to prevent them all from being loaded into memory.