Browse DevX
Sign up for e-mail newsletters from DevX


Converting Fixed-Width Text Records to XML

Fortunately, converting fixed-field length text files into XML is not a terribly difficult undertaking, though you need to be careful about a few "gotchas".




Building the Right Environment to Support AI, Machine Learning and Deep Learning

We're not so far removed from the databases of old. Most databases still have a specified length applied to field entries, although the exact relationship between that length and the way the database stores that information is considerably more complex than it was when databases were essentially single long strings of fixed length. Moreover, the interfaces for accessing this information have changed as well, so you're probably only peripherally aware of the length relationship. Still, with legacy databases you may run into situations where you're provided with data as a file in which the data consists of fixed-width records. Each record contains multiple defined fields, and field is a smaller string of known size. Usually, a carriage return separates each "record" from the next. One of the benefits of XML is its ability to richly format data through XSLT; but you have to get the information into XML format in the first place, otherwise, XSLT doesn't do you a lot of good.

Fortunately, converting fixed-field length text files into XML is not a terribly difficult undertaking, though you need to be careful about a few "gotchas". After some simple preliminary processing to wrap the data in markup and save it as a well-formed XML document, you can use XSLT to handle most of the real work.

Convert Text File to XML
The one aspect of conversion between text files and XML that you need to watch most carefully, especially when using DOM processing, is that the number of records involved could get large fast. If the files are comparatively small (up to about 5000 records), then you can use recursion techniques to parse lines; the problems appear when you have a large number of records, because most recursive routines will likely end up "blowing the stack", exceeding the maximum depth that the processor can handle. For that reason, it's preferable (and in many respects both easier and faster) to preprocess the source files so that each line becomes an element. After that, you can use standard node-set iterations to walk through each line in the XSLT and generate the individual fields.

For example, a set of fixed length records might originally be contained in a text file as shown below. Each item consists of a fixed-length substring always is found at the same position in the lines (unlike a comma or tab delimited file where the fields may be of variable length). Note that in order to make this work properly, there should be no carriage return after the last line. Each field in the source file is of the same length.

Fixed Field Length Text

31A201Kurt Cagle 3242.27 Basic 31A202Aleria Delamare 6250.54 Advanced 31A203Gina Delgadio 317.12 Advanced 31A204Sera Anadropolis 4392.15 Basic 31A205Gregor Hauptmann 1224.88 Special 31A206Alexis Porter 92.15 Basic 31A207James Cabal 2215.25 Basic 31A208Micheal Denning 925.66 Advanced 31A209Amaya Kiasabe 866.54 Special 31A210Nathan Lane 936.12 Advanced ... Additional Values ...

To perform the initial processing, I wrote a simple ASP JavaScript program (see Listing 1) that loads the source text document and creates a second document (with an XML extension but treated as text). Although the sample code is in JavaScript, you could easily port it to Java or another language. The program iterates through each line of the first document, wraps a set of tags around each line, writes the wrapped line to the target text file, and then moves onto the next line. I chose to do this rather than just build the expression as a string in memory because files place no limits on the size of the text file you're reading...always an important issue to consider:

At the end of the processing, the text file has been converted to an XML document in this form:

<record> 31A201 Kurt Cagle 3242.27 Basic </record> <record> 31A202 Aleria Delamare 6250.54 Advanced </record> <record> 31A203 Gina Delgadio 317.12 Advanced </record>

Thanks for your registration, follow us on our social networks to keep up-to-date