Converting Fixed-Width Text Records to XML



We’re not so far removed from the databases of old. Most databases still have a specified length applied to field entries, although the exact relationship between that length and the way the database stores that information is considerably more complex than it was when databases were essentially single long strings of fixed length. Moreover, the interfaces for accessing this information have changed as well, so you’re probably only peripherally aware of the length relationship. Still, with legacy databases you may run into situations where you’re provided with data as a file in which the data consists of fixed-width records. Each record contains multiple defined fields, and field is a smaller string of known size. Usually, a carriage return separates each “record” from the next. One of the benefits of XML is its ability to richly format data through XSLT; but you have to get the information into XML format in the first place, otherwise, XSLT doesn’t do you a lot of good.



Fortunately, converting fixed-field length text files into XML is not a terribly difficult undertaking, though you need to be careful about a few “gotchas”. After some simple preliminary processing to wrap the data in markup and save it as a well-formed XML document, you can use XSLT to handle most of the real work.

Convert Text File to XML
The one aspect of conversion between text files and XML that you need to watch most carefully, especially when using DOM processing, is that the number of records involved could get large fast. If the files are comparatively small (up to about 5000 records), then you can use recursion techniques to parse lines; the problems appear when you have a large number of records, because most recursive routines will likely end up “blowing the stack”, exceeding the maximum depth that the processor can handle. For that reason, it’s preferable (and in many respects both easier and faster) to preprocess the source files so that each line becomes an element. After that, you can use standard node-set iterations to walk through each line in the XSLT and generate the individual fields.

For example, a set of fixed length records might originally be contained in a text file as shown below. Each item consists of a fixed-length substring always is found at the same position in the lines (unlike a comma or tab delimited file where the fields may be of variable length). Note that in order to make this work properly, there should be no carriage return after the last line. Each field in the source file is of the same length.

Fixed Field Length Text

31A201Kurt        Cagle       3242.27  Basic      31A202Aleria      Delamare    6250.54  Advanced   31A203Gina        Delgadio    317.12   Advanced   31A204Sera        Anadropolis 4392.15  Basic      31A205Gregor      Hauptmann   1224.88  Special    31A206Alexis      Porter      92.15    Basic      31A207James       Cabal       2215.25  Basic      31A208Micheal     Denning     925.66   Advanced   31A209Amaya       Kiasabe     866.54   Special    31A210Nathan      Lane        936.12   Advanced   ... Additional Values ...

To perform the initial processing, I wrote a simple ASP JavaScript program (see Listing 1) that loads the source text document and creates a second document (with an XML extension but treated as text). Although the sample code is in JavaScript, you could easily port it to Java or another language. The program iterates through each line of the first document, wraps a set of tags around each line, writes the wrapped line to the target text file, and then moves onto the next line. I chose to do this rather than just build the expression as a string in memory because files place no limits on the size of the text file you’re reading…always an important issue to consider:

At the end of the processing, the text file has been converted to an XML document in this form:

        31A201  Kurt      Cagle        3242.27  Basic            31A202  Aleria    Delamare     6250.54  Advanced            31A203  Gina      Delgadio     317.12   Advanced   



Transform Field Values with XSLT
Now, it’s possible to take advantage of the strengths of XSLT to quickly process the fields. For fixed length data, you need to know three distinct pieces of meta-information:

  • the name of the field
  • the field’s associated data type
  • the number of characters allocated to displaying the field data

It is generally preferable to create a separate XML document that contains the relevant field information rather than attempting to store meta-information with the data. The fields.xml file contains meta-content about the text database.

                                 

The type information comes from the XML Schema namespace. Note that while type information is not –strictly speaking–necessary (especially if a schema already exists for the records to be created) it can be useful for processing field information down the road.

With this information, it’s possible to parse each line in the initial records and correctly extract the relevant field data. Iterating through each record in the recordset is simple, and occurs often enough that it’s worth building a general named template to perform the task:

                                                                                 

The named template getRecordsFromNodeSet takes a recordset as an argument and calls the getFields template over each record.

The getFields named template is a little more complicated, primarily because it involves a certain amount of recursion (see Listing 2). Essentially, the named template works by retaining an index for each field. The template loads the XML file containing the field meta-information from the URL in the parameter $fieldSource using the document() function. The individual field entries are passed into a node-set. This node-set in turn can act like an array to retrieve individual field elements from a given index:

The recursive calls increment the field pointer (re-initializing with each record). In turn, this can be compared to the number of fields in the meta-information file. Because the number of recursion calls here is very shallow?from a handful to a few dozen at maximum–it is in fact preferable (and probably faster) to use XSLT recursion than to process the fields via the DOM.

Display the Resulting XML
The ASP file ProcessTextFile.asp generates the file XML record set, saves it, then passes the XML file to another transformation (ProcessRecordset.xsl):

         31A201         Kurt         Cagle         3242.27         Basic            31A202         Aleria         Delamare         6250.54         Advanced            31A203         Gina         Delgadio         317.12         Advanced               

The processTextFile.asp page accepts a showType parameter. –It passes the value to the FixedLengthRoutines.xsl to determine whether to include the XML Schema data type information in the final output. You control the setting by using an optional query string parameter. For example, using the URL processTextFile.asp?showType=yes tells the application to include the type definitions, while setting showType to “no” in the URL removes them.

After you have the information in XML form, you can essentially do anything that you can do with any other XML file. In the sample code, the processTextFile.asp passes the XML to another stylesheet, processRecordset.xsl (see Listing 3), which formats the output in a table and provides a rudimentary way of filtering the output list.

The processRecordset.xsl file displays the records as a table. You can use the $matchTerm parameter as a rudimentary filter on the list. You can also subclass the templates to provide additional functionality, such as currency formatting or color-coding for different types of accounts. To avoid potential namespace conflicts, I defined all these templates as being of “format” mode. To subclass your own templates, you need to use the same mode indicator in your code. You can view the output of the ASP page in Figure 1.

The successful use of XSLT sometimes comes down to knowing when to avoid using XSLT. While it is possible to accomplish the same, transformation within XSLT, (especially using extensions) there is still a basic need for DOM to handle limitations in the XSLT model. However, regardless of the processing mechanism, by converting fixed records into even simple XML you can bootstrap your development dramatically and gain the robust processing capabilities that XSLT brings.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: