Browse DevX
Sign up for e-mail newsletters from DevX


Converting Fixed-Width Text Records to XML-2 : Page 2




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Transform Field Values with XSLT
Now, it's possible to take advantage of the strengths of XSLT to quickly process the fields. For fixed length data, you need to know three distinct pieces of meta-information:

  • the name of the field
  • the field's associated data type
  • the number of characters allocated to displaying the field data
It is generally preferable to create a separate XML document that contains the relevant field information rather than attempting to store meta-information with the data. The fields.xml file contains meta-content about the text database.

<fields> <field id="id" type="xsd:id" length="6" /> <field id="firstname" type="xsd:string" length="18" /> <field id="lastname" type="xsd:string" length="24" /> <field id="amount" type="xsd:currency" length="9" /> <field id="type" type="xsd:string" length="8" /> </fields>

The type information comes from the XML Schema namespace. Note that while type information is not --strictly speaking--necessary (especially if a schema already exists for the records to be created) it can be useful for processing field information down the road.

With this information, it's possible to parse each line in the initial records and correctly extract the relevant field data. Iterating through each record in the recordset is simple, and occurs often enough that it's worth building a general named template to perform the task:

<xsl:template name="getRecordsFromNodeSet"> <xsl:param name="records" /> <records> <xsl:for-each select="$records"> <record> <xsl:call-template name="getFields"> <xsl:with-param name="sourceLine" select="string(.)" /> </xsl:call-template> </record> </xsl:for-each> </records> </xsl:template>

The named template getRecordsFromNodeSet takes a recordset as an argument and calls the getFields template over each record.

The getFields named template is a little more complicated, primarily because it involves a certain amount of recursion (see Listing 2). Essentially, the named template works by retaining an index for each field. The template loads the XML file containing the field meta-information from the URL in the parameter $fieldSource using the document() function. The individual field entries are passed into a node-set. This node-set in turn can act like an array to retrieve individual field elements from a given index:

The recursive calls increment the field pointer (re-initializing with each record). In turn, this can be compared to the number of fields in the meta-information file. Because the number of recursion calls here is very shallow—from a handful to a few dozen at maximum--it is in fact preferable (and probably faster) to use XSLT recursion than to process the fields via the DOM.

Thanks for your registration, follow us on our social networks to keep up-to-date