uring the past 20 years I have had the privilege of working with well over 1,000 different software developers. Some of the best developers in the industry mentored me when I began my career. They taught me software development from the ground up, beginning with how to run and enhance automated unit tests for operating systems. (Yes, people knew how to create and execute automated unit tests well before Kent Beck developed the elegant xUnit framework
I gradually started leading teams of junior software developers and tried to teach them the lessons I had learned. I since have mentored many development teams from the United States, India, and China, and I've come to a sobering realization: the biggest lesson of all is often forgotten in today's fast-paced and often outsourced software industry. Getting a software program working is just Step 1 in a series of steps to produce good software.
To keep this vital lesson fresh in your mind and further your software engineering prowess, this article walks you through Step 1 of a challenging and all-too-common programming problem: transforming XML documents between disparate systems.
The Problem: Efficiently Processing Huge XML Transformations
I often face the need to transform XML documents. Sometimes these XML documents get very large as one system performs a data extract of perhaps hundreds of thousands of transactions and then sends them on to a second system for further processing. Rarely does the sending system view data in the same way as the receiving system, therefore an XML transformation almost always is required to integrate the two systems.
Let's apply this problem to a programming exercise. There are many ways to transform XML, and more than a few software companies make nice livings out of providing such services. But as good software developers, we should always try to use industry standards where possible. XSLT is the industry standard for this problem, and Java has a standard way to invoke XSLT, via JAXP.
The Input.xml document (see Listing 1) in this exercise contains a simplified partial extract of an XML file from the sending system. Notice two important points about how this file is structured. First of all, each transaction relayed between the two systems is delimited by a Record element under the root element named InsurancePolicyData. That way the sending system can place any number of transactions in the same XML export file. Second, the sending system has a pretty obscure way of writing out its data: in label/value pairs rather than in more traditional XML elements and text within an element.
The receiving system wants the XML to be structured as shown in the Output.xml document (see Listing 2) so it can process it more efficiently. XML transformations always require time, processing resources, and some sort of mapping and/or coding process to complete. So always check if system 1 or system 2 can agree upon a standardized XML schema to avoid the transformation process. But in many cases these days the two systems are provided by third-party vendors you cannot control. And more often than not the systems are written by two different vendors with two very different sets of goals. So begins your quest to write an efficient standards-based XML transformation program.