Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Change-proof Your Flat-file Processing with XML : Page 2

If you build applications that use text files as input, try this technique for processing them into XML—and simultaneously insulate your applications from changes to the content and structure of the incoming files.




Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js

Automating XML Construction
The sample code for this article stores the parsing rules in an XML file. The application loads the XML file and then extracts the parsing rules from the XML using JAXB. Storing the rules in a database table can be better if you feel using XML and JAXB for getting parsing rules is overkill for your application.

The parsing rules simply inform the parsing application of the name and length of each field in the incoming fixed-width text file.

<?xml version="1.0" encoding="UTF-8"?> <TagName>RecordType</TagName> <startIndex>0</startIndex> <endIndex>1</endIndex> <TagName>FirstName</TagName> <startIndex>1</startIndex> <endIndex>11</endIndex> <TagName>LastName</TagName> <startIndex>11</startIndex> <endIndex>21</endIndex> <TagName>SSN</TagName> <startIndex>21</startIndex> <endIndex>30</endIndex> ... ... ... </ParsingRules>

Using the parsing rules from the preceding XML file, the application parses the input text file, and constructs a flat XML file. A flat XML file is the XML-formatted equivalent of the text file; in other words, it has no hierarchical structure beyond that enforced by XML itself—a root node containing the record structure inherent in the fields of the text file. For example, here's the same incoming patient record:

// Sample flat file Patient appointment data: // Note: The following code would be one // line in the incoming flat file. P JAVA DEVELOPER73774777719740310 20874Programming stress disorders

And here's the flat XML equivalent:

<PatientRecord> <RecordType>P</RecordType> <FirstName>JAVA</FirstName> <LastName>DEVELOPER</LastName> <SSN>737747777</SSN> <DOB>19740310</DOB> <DoctorID>20874</DoctorID> <VisitReason> Programming stress disorders </VisitReason> </PatientRecord>

Constructing the flat XML file is always the first step in this generic flat-file parsing application, because the result is well-formed XML that you can then transform into more complex and useful structures using XSLT.

Applying an XSL Transformation
Often, the flat XML file isn't precisely mated to your application's needs. For example, there's little point in searching through the flat XML file to see if a patient exists when you could speed up the operation enormously by extracting only the information required to identify a patient. To create such files, you use XSLT to transform the flat XML file into more appropriate forms. The XSL transformation process consists of three steps.

  1. Load the XSL document
  2. Load the source XML document (in this example it's the flat XML file).
  3. Use an XSL processor to transform the document
There are many different implementations of XML and XSL parsers available for Java; I used the one implemented by Oracle, but you should be able to use any implementation.

Loading an XSL Document
The following Java code parses the XSL document and creates an instance of an XSLStylesheet class.

DOMParser parser = new DOMParser(); // you can also use a standard HTTP URL instead of // the file protocol shown below URL xslURL = new URL("file://" + fileName); parser.parse(xslURL); XMLDocument xsldoc = parser.getDocument(); // instantiate a stylesheet XSLStylesheet xsl = new XSLStylesheet(xsldoc, xslURL);

Loading an XML Document
The following code parses the XML string and constructs an XMLDocument object.

ByteArrayInputStream theStream = new ByteArrayInputStream( XMLStr.getBytes() ); parser.parse(theStream); XMLDocument xml = parser.getDocument();

Transforming an XSL Document
After loading a stylesheet and the XML document, you apply the XSL transformation using an XSLProcessor object.

XSLProcessor processor = new XSLProcessor(); DocumentFragment result = processor.processXSL(xsl, xml); // create an output document to hold the result XMLDocument out = new XMLDocument(); // create a dummy document element for the // output document out.appendChild(result); ByteArrayOutputStream outStream = new ByteArrayOutputStream( ); out.print(outStream); String transformedXML = outStream.toString();

Using this generic code, you can construct any number of XML documents from a single source flat XML file just by changing the XSL files. Changing the parsing rules or the XSL file requires no changes to this Java code.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date