Putting it All Together
To put it all together, let's look at a simple example using an audit trail logging system. The schema, and consequently the domain model, for the audit trail is shown in Figure 1
. Depending on the user activity, the audit trail can contain thousands of log entries, so it isn't advisable to try to load them all into memory before serialization. Listing 6
presents the implementation of a stream definition for the audit trail in which the ListOfEvents element will trigger the use of a stream source/sink. (The implementation of the data access objects is outside the scope of this article. However, it is safe to assume that they simply load or save objects to a database.)
|Figure 1. The Schema for the Audit Trail: The XML schema model for the audit trail example domain objects.|
The application can now simply use the streaming marshaller with the stream definition to marshal an entire audit trail to XML without worrying about memory exhaustion (see Listing 7). The only audit-trail-specific code is the simple stream definition and the anonymous source/sink objects. The JAXB marshaller implementation remains generic and can be reused in a threaded environment for any supported domain object.
While this streaming marshaller pattern is sufficient for the majority of cases, you can create a few extensions to make it even more useful. The implementation presented so far supports only a single nested object stream. By splitting the child marshalling loop out into a separate operation that internally creates the child JAXB marshaller, any number of nested object streams could be supported.
Writing a no-operation source or sink would allow the application to ignore sections of a document that were not relevant. For example, an application could process a payroll XML document and skip the expense reports section by using a no-op stream sink. Then it would process only the billable time section. The no-op source or sink implementations can be generic and reusable by any stream definition.
Based on the implementation presented, if the stream definition returns null for a given start element, the marshalling process will continue as normal. By leveraging this functionality, an application can decide to allow the marshaller to read or write objects normallyeven if they usually are streamed. For example, based on the previous audit trail example, if the application knows that very few events are in the current system, it could simply load the events into the audit trail and return null in the stream definition. You may notice that by returning null from the stream definition the streaming marshaller behaves exactly the same as the non-streaming marshaller implementation.
Limitations and Benefits
Of course any pattern has tradeoffs and this one is no exception. The callback model used by the stream source and sink objects can sometimes be more difficult to implement in an application because the marshaller controls the reading or writing process. This callback model is reminiscent of the SAX (Simple API for XML) model, which is somewhat deprecated.
During unmarshalling, the stream sink will receive callbacks with the child objects before the parent object has been fully unmarshalled due to the hierarchical nature of XML. This could be problematic in situations where information from the parent object is required before the children can be processed. For example, what do you associate audit events to in the database when the unmarshaller returns them before the actual audit trail parent object? One workaround is to use a no-op sink as described earlier to read the XML document once, extracting only the parent object, and then read the XML document again using the proper sinks. This will not deliver the best performance but it does give access to the parent object before the children are returned.
But even given these limitations, the benefits of the generic, reusable implementation, automatic XML type handling, and limited memory usage still make the approach a winner. So the next time you have to serialize large objects to XML, consider this pattern before dropping to the low-level APIs. You'll save yourself a lot of headaches.