Iteration 1: A Standard "Big Bang" JAXP XSLT Transformation
If you are accustomed to developing object-oriented programs in Java, you'll need some time to get used to XSLT. But you can accomplish sophisticated XML transformations with a pretty small amount of XSLT code, and you can also modify the transformation very easily as the sending and receiving systems enhance their XML schemas over time. A Java code transformation takes a lot longer to initially create and modify.
The WholeFileTransformation.xslt file (see Listing 3) contains a simplified partial XSLT to transform the Input.xml document into the Output.xml document. The only trick in this particular XSTL file is creating an XML element from the ColumnName attribute contained within the input XML file. The XSLT file is itself an XML file and must conform to all XML specifications. In order to dynamically create a named XML element, you must escape the greater-than and less-than symbols surrounding the name of the element. The following line will send a less-than (<) symbol to the output document:
So you combine this with the name of the element, end it with the escaped greater-than symbol, and you have a dynamically created XML element start tag.
Next, you create a Java code program to run a JAXP standard XSLT transformation. The SingleThreadXSLT.java file (see Listing 4) is a very simple Java class that is called from the command line. It contains three input parameters: the names of the input XML, XSLT, and output XML files. It will then run the XSLT transformation on the input XML file to produce the output XML file.
This transformation program runs very well on a small input XML file and takes less than a second once the Java virtual machine is loaded. It also runs in very little memory compared with what the base Java virtual machine consumes, around 12 MB in total.
Unfortunately, as you increase the size of the input XML file, you greatly increase the size of the Java virtual machine required to transform the file. A 275 MB input file nearly runs out of memory in a virtual machine sized to 1 GB. A standard JAXP transformation appears to load the entire input document into a DOM structure in memory before even starting the transformation process. So this approach certainly won't work to process multiple gigabyte-sized input files. It also utilizes only one CPU thread per program run, causing the 275 MB input file to take a little over 200 seconds to execute.