very new enterprise system built today has to integrate with and support existing systems. Since the earliest days of programming, passing data between systems via flat files has been a common operation?and developers get to write the code to transfer the data from one format to another. As unexciting as the topic may be, flat file processing is a required feature of any new applications developed for the U.S. government.
As XML becomes ubiquitous in modern systems, it’s increasingly common for developers to take the flat files exported by older applications, and use them to construct single or multiple XML documents. The basic task of parsing an incoming record and creating XML nodes from that is simple; the real pain starts at the maintenance level. Any change to the flat file format involves tedious code changes to alter the construction of the XML file(s).
|
But changes to flat file formats are part and parcel of flat file processing. Ideally you would construct a flat-file-to-XML application that?if designed carefully?is immune to most flat file format changes, and can, by simply altering a few rules stored externally to the application itself, adapt the XML output to match the changes.
Here’s an example. Suppose you need to capture patient appointment data for each doctor in a hospital software system. The software system exports patient data in a fixed flat-file format. What you need to do is update patient visits for existing patients and create a new entry for new patients. But the existing flat-file system doesn’t differentiate between new and existing patients. Here’s an example patient record:
// Sample flat file Patient appointment data: // Note: The following code would be one // line in the incoming flat file. P JAVA DEVELOPER73774777719740310 20874Programming stress disorders
Table 1 shows the field type and size information needed to parse the file and separate the fields.
Table 1: The description and length of each field in incoming patient flat-file records.
Field |
Length (Number of chars) |
Record Type |
1 |
Patient First Name |
10 |
Patient Last Name |
10 |
Patient SSN |
9 |
Patient DOB |
8 (YYYYMMDD) |
Doctor ID Number |
10 |
Reason for visit |
50 |
The first automation step is to parse the incoming flat file and construct XML. Because you want to accomplish this generically, making the application adapt to changing flat file formats, you need to create parsing rules stored externally to your program. You can store the rules however you like, but it’s convenient to store them in XML files or in a database table.
Automating XML Construction
The sample code for this article stores the parsing rules in an XML file. The application loads the XML file and then extracts the parsing rules from the XML using JAXB. Storing the rules in a database table can be better if you feel using XML and JAXB for getting parsing rules is overkill for your application.
The parsing rules simply inform the parsing application of the name and length of each field in the incoming fixed-width text file.
RecordType 0 1 FirstName 1 11 LastName 11 21 SSN 21 30 ... ... ...
Using the parsing rules from the preceding XML file, the application parses the input text file, and constructs a flat XML file. A flat XML file is the XML-formatted equivalent of the text file; in other words, it has no hierarchical structure beyond that enforced by XML itself?a root node containing the record structure inherent in the fields of the text file. For example, here’s the same incoming patient record:
// Sample flat file Patient appointment data: // Note: The following code would be one // line in the incoming flat file. P JAVA DEVELOPER73774777719740310 20874Programming stress disorders
And here’s the flat XML equivalent:
P JAVA DEVELOPER 737747777 19740310 20874 Programming stress disorders
Constructing the flat XML file is always the first step in this generic flat-file parsing application, because the result is well-formed XML that you can then transform into more complex and useful structures using XSLT.
Applying an XSL Transformation
Often, the flat XML file isn’t precisely mated to your application’s needs. For example, there’s little point in searching through the flat XML file to see if a patient exists when you could speed up the operation enormously by extracting only the information required to identify a patient. To create such files, you use XSLT to transform the flat XML file into more appropriate forms. The XSL transformation process consists of three steps.
- Load the XSL document
- Load the source XML document (in this example it’s the flat XML file).
- Use an XSL processor to transform the document
There are many different implementations of XML and XSL parsers available for Java; I used the one implemented by Oracle, but you should be able to use any implementation.
Loading an XSL Document
The following Java code parses the XSL document and creates an instance of an XSLStylesheet class.
DOMParser parser = new DOMParser(); // you can also use a standard HTTP URL instead of // the file protocol shown below URL xslURL = new URL("file://" + fileName); parser.parse(xslURL); XMLDocument xsldoc = parser.getDocument(); // instantiate a stylesheet XSLStylesheet xsl = new XSLStylesheet(xsldoc, xslURL);
Loading an XML Document
The following code parses the XML string and constructs an XMLDocument object.
ByteArrayInputStream theStream = new ByteArrayInputStream( XMLStr.getBytes() ); parser.parse(theStream); XMLDocument xml = parser.getDocument();
Transforming an XSL Document
After loading a stylesheet and the XML document, you apply the XSL transformation using an XSLProcessor object.
XSLProcessor processor = new XSLProcessor(); DocumentFragment result = processor.processXSL(xsl, xml); // create an output document to hold the result XMLDocument out = new XMLDocument(); // create a dummy document element for the // output document out.appendChild(result); ByteArrayOutputStream outStream = new ByteArrayOutputStream( ); out.print(outStream); String transformedXML = outStream.toString();
Using this generic code, you can construct any number of XML documents from a single source flat XML file just by changing the XSL files. Changing the parsing rules or the XSL file requires no changes to this Java code.
Constructing Useful XML Documents with XSL
The examples in the three steps below show how you can use different XSL files to map data from a single flat source XML document into different output documents that serve different needs and accommodate changes.
Step 1: This example constructs a PatientKey.xml file used as a fast method for performing look-ups to see if a specified patient exists.
Flat XML File
P JAVA DEVELOPER 737747777 19740310 20874 Programming stress disorders
A PatientKey.xsl stylesheet
OUTPUT: PatientKey.xml
DEVELOPER 19740310
Step 2: Construct an XML document containing patient appointments from the same flat XML file, but using a different XSL template.
A PatientAppointment.xsl stylesheet
OUTPUT: PatientAppointment.xml
737747777 20874 Programming stress disorders
In each step shown above, applying a different stylesheet results in a different output XML document. You could use the PatientKey.xml file to perform fast lookups and the PatientAppointment.xml file to find patients with similar reasons for visits, to find which doctors treated which patients, etc.
Here’s the really important point. All this XML and XSLT processing is overkill as long as the format of incoming text file remains the same and the meaning and combination of the patient record fields remains static. You might as well have baked the flat file parse operation into code, populated object arrays with the information, and avoided using XML altogether. The true advantages of all this XML/XSLT processing work becomes apparent when the incoming data or the business needs change.
Accommodating Change
Business needs and information often change suddenly and sometimes radically. Suppose the record format of the incoming flat file changes? If the parse operation were hard-coded, you’d have to change the application to accommodate the changed record structure. But using this technique, you need only adjust the parsing rules XML file. That’s a comparatively simple operation.
|
A more complex type of change occurs when business needs alter the way you use the information. For example, suppose the original company merges with another company, and as a result, the PatientKey class must be altered to use the patient’s Social Security Number (SSN) rather than the patient’s last name and date of birth. Using the techniques you’ve seen, making that change is relatively simple; you just change the way you construct the PatientKey.xml file by replacing the XSL file with a new one.
// New PatientKey.xsl file // Different PatientKey.xml file result 737747777
Using this design can make you a hero on the development team because the resulting applications can handle changes that would otherwise require major code changes.