Iteration 3: Multithreaded JAXP XSLT Transformation
Starting with JDK 5 (1.5) and above, Java introduced a really useful utility library for concurrent programming
. With a few lines of code you can create a thread pool and give that thread pool discrete tasks to execute in the background. Thread pools are very important for bulk background processing, as the overhead to create and destroy threads is so large that doing so on a routine basis can eliminate any positive effect of multi-threading your program. You want to minimize thread creation wherever possible and the concurrency library does this for you automatically. The simplest way to do this is to create a fixed-size thread pool as follows:
ExecutorService executor =
This type of thread pool creates the number of threads given as input once, up front, and all those threads just wait around until given a task to run. After a task is finished executing, the thread does not terminate, it just sleeps until it is assigned another task to run. The executor object handles queuing the tasks for you. So all you need to do is create a task, which is simply a Java class extending either the Runnable or Callable interface, and then tell the executor to execute that task when a thread in the pool is available:
transformer, outputWriter, newDocument));
Unfortunately, you still need to put a lot of thought into how you can restructure your program to take advantage of that thread pool. Good candidates for thread pool tasks are sequences of processing that repeat many times during the execution of one particular program feature. The best candidates are processing actions that can be executed in any order. Remember that tasks run on a thread pool are always in the background and as such you cannot guarantee the order in which they execute.
In this programming problem, you have three actions performed time and time again:
- Parsing the input file for each record, copying each record into an XML document for individual XSLT transformation
- Running the XSLT transformation
- Appending the output of the XSLT transformation to the output file
I don't think the dom4j library would support multithreaded parsing. The library parses the input file as a sequential file via a SAX parser. So cross #1 off the above list; it will be performed as a single thread in the main program. That leaves #2 and #3, both of which can be done in parallel. But when you do that, do you need to maintain an order in the output file that is equivalent to the order defined in the input file? Not for this exercise since the output file can be reordered, as long as all input records are properly transformed. If you need to maintain order, you still can but it requires a more complex program.
You will create a thread pool task to do a combination of #2 and #3. Hopefully the single-threaded #1 action is much faster than the combination of #2 and #3, or you won't get much of a benefit from multi-threading the program. Before going through all the work of restructuring the program, however, verify that assumption! I did.
The MultiThreadSplitXSLT.java file (see Listing 8) shows the updated main program class, which takes an additional input parameter: the number of threads to allocate in the fixed-size thread pool. It creates the thread pool, starts an incremental parse and transformation of the large input XML file, waits for it to complete, and ends the output file and program.
The MultiThreadSplitFileElementHandler.java file (see Listing 9) shows an updated dom4j element handler that creates a thread pool task for as much work as possible. The element handler creates the new partial XML document to transform and passes it to a thread pool task to transform it and write it to the output file. The only tricky part of this file is making sure you create a new transformer for each thread pool task:
Transformer transformer = cachedXSLT.newTransformer();
Transformers cannot transform multiple documents at the same time. You must create a new transformer for each task you create. Any time you pass an object into a task executed in parallel, you must take great care to synchronize the object with any other tasks executed at the same time. If you create a new transformer object for each task, then you do not need to worry about synchronization. Two tasks will never reference the same object, so you do not need to synchronize access to the object.
The XSLTRunnable.java file (see Listing 10) is a new class used to execute an XSLT in the thread pool. It extends the Runnable interface so when a thread from the thread pool picks up the task, it executes the run method. This method performs a JAXP XSLT transformation on the input document, but then it synchronizes the outputWriter object before it appends the results of the transformation.
Remember the rule above: any object passed into a task executed in parallel must be safe to be accessed in parallel. If you create a new object each time you send it to a task, that object generally should be safe to use without synchronization (to know for certain, you must make sure anything that object accesses in turn is either distinct among any other threads executing or is synchronized properly).
When creating a thread pool task, you pass it three parameters:
Writer outputWriter, Document document)
You created a new transformer object for each task so you do not need to synchronize access to that object as discussed above. The document is also derived from a full copy of each individual Record element from the input file. So that object is distinct for each task, and you have no problem with synchronization there either.
Unfortunately the outputWriter is an output stream object to the one and only output file. You cannot create copies of this object, nor can you simply ignore synchronization issues. If you did, the output file would be scrambled with partial records interleaved with other partial records. You must write out a completely transformed record in its entirety before allowing any other thread to write out its completely transformed record.
Executing this version of the program on a single CPU machine reduced the runtime processing of a 275 MB file to 160 seconds, which initially surprised me. That is a better runtime than the Iteration 1 transformation, even though this program is doing quite a bit more work to accomplish the same result. On a single CPU machine the parallel processing is allowing the program to do in-memory transformations at the same time as I/O operations. In the Iteration 1 approach, the program cannot fully utilize its only CPU while it reads input and writes output files.