Convert a Text File to XML

Convert a Text File to XML

ere’s an interesting conundrum. You have an application that doesn’t have an object model (or has one that’s so hideously complex that you’d just as soon not pay a programmer three months’ wages just to build a matching automation interface). The details of the program are unimportant, except that it can output a comma or tab-separated text file; it can produce a text file with fields of fixed length; or it can work with an INI file or some similar minimally structured text file.

These sort of text files are the poor cousins of the database world. They often contain sufficiently small numbers of records, which makes them ideal to work with as XML files. With the data in an XML file, you can perform queries, sort and filter the data, and even handle multi-layer formats?all with a common set of tools. This data file could be sent to a different machine without worrying about such coding conventions as whether you’re using a comma and I’m using a tab to separate data. Once in an XML file, you can easily transform the data using XSL (Extensible Stylesheet Language) into tables, combo boxes, or far more sophisticated controls or behaviors. In other words, the record becomes vastly easier to manipulate anywhere.

If this is the case, then it should be easy to create an XSL filter or perhaps a schema to load this data into an XML file. Unfortunately, as useful as this would be, the fundamental problem that any text to XML conversion faces is that without tags, an XSL filter or schema sees such files as undifferentiated blobs. However, here’s one case where you can harness the XML Document Object Model itself to create a few fairly generic functions for dealing with straight text.

Delimited Text to XML
Consider a simple text database that contains information about employees in your company:

name,dateOfBirth,dept,jobTitleJohn,1962-11-24,accounting,senior accountantTina,1962-09-26,administration,managerKaren,1972-01-10, marketing,graphic designerMichael,1978-02-11,research,programmerSandra,1976-10-26,marketing,account manager

This simple text database called employees.txt shows a prime example of a comma delimited file. A routine that converts text to XML should make use of as much information as it can from the records. In general, it’s easier to filter out specific pieces of information using XSL query on the created XML than it is to put extensive logic in the code that creates the XML in the first place. The XML file that’s generated should use the header information provided at the top of the text file.

The File System Object (FSO) has turned out to be one of the more recent useful additions to Visual Basic and scripting. It encapsulates a number of standard text I/O functions into a single object (with ProgID “Scripting.FileSystemObject”) replacing the bewildering collection of legacy Visual Basic commands (such as the highly irregular syntax of the VB Open command). If you are working with text files, I would highly recommend that you take advantage of the features provided with the FSO.

In this particular case, the primary purpose of the FSO is to create a text stream object. A stream is a generic interface for handling data input/output without needing to worry about its source or destination. To read the sample file line by line with the FSO, you would create a text stream object as a read source, iterate through the file line by line until you run out of lines, and then close the text stream:

Dim Fs as FileSystemObject Dim TS as TextStreamSet Fs = CreateObject("Scripting.FileSystemObject")Set Ts = fs.OpenTextFile(textFilePath, ForReading, False,TristateUseDefault)While not Ts.AtEndOfStream	Buffer=Ts.ReadLine	' Process the bufferWendTs.CloseFs.Close

Next, you need to find a way to parse the fields. A general-purpose routine should let you parse on any character, although you most likely will end up using either comma-separated value files (CSV), or tab-separated files (TSV or TAB). In any of these cases, you can take advantage of two useful VB6 (or VBS5) string functions?split() and join()?to parse the strings. Split takes a string and a delimiter and converts that into a variant array, while Join takes a variant array and a delimiter and concatenates each array item as a string using the delimiter as a separator.

	Dim myArray as Variant	' Split a comma separated string, setting the comma as the ' delimiter:	MyArray=split("a,b,c,d,e",",")' Print the fourth (zero-based) array element ("d")	Debug.print MyArray(3) 	' Print the index of the lower and upper bounds of the array	Debug.print lbound(MyArray),uBound(MyArray)	' Join the array elements together with a pipe ("|")	Debug.print join(MyArray,"|")	' This prints a|b|c|d|e to the debug window.

The Text2XML() function takes advantage of both of these functions, along with the XML DOM, to do the mapping. Because an XML structure requires a single root node, and each object should be encapsulated, the function also takes an “objectName” parameter (for example, “user”) and a “collectionName” parameter (for example, “users”). For the sample table, the Text2XML would produce this XML structure:

			John		1962-11-24		accounting		senior accountant				Tina		1962-09-26		administration		manager		

Note the introduction of both an object label (“employee”) and a collection label (“employees”).

The function itself works by creating an XML object template using the fields defined in the first line of the text field (see Listing 1). This template is then repeatedly copied (or cloned), populated with a new line worth of data, and added to the collection node. You could also create the element manually by buffering the XML structure as a string, but by taking this route you lose the advantage of possible error checking. (Note: I haven’t implemented an error handling mechanism here, but if it were critical, you would use the XML DOM parseError object to output to a log or perform other error handling). Thus, to generate the XML file for the sample user list, you’d make the function call:

Dim xmlDocSet xmlDoc=Text2XML("employees.txt","employee","employees",",")

Working With Fixed Length Fields
Older databases, EDI messages, and similar applications often either maintain or generate data in a fixed data text format. In this format, the fields themselves are of specific lengths, and to be able to read the record, you need to parse each record at well-defined points in the string. Here is a sample of such a field set, based upon the record file previously defined:

123456789012345678901234567890123456789012345678901234567890John      1962-11-24accounting      senior accountantTina      1962-09-26administration  managerKaren     1972-01-10marketing       graphic designerMichael   1978-02-11research        programmerSandra    1976-10-26marketing       account manager

This example of a fixed length data field is contained in the file “employees_fl.txt”, with fields starting at position 1, 11, 21, and 37. (The numeric line isn’t included and is shown for reference only.)

The function FixedText2XML looks complicated, but most of the routines primarily handle the assignment of the field names and the specific parsing of the fields themselves (see Listing 2). In either case, the XML portion of this function is nearly identical?create a collection node and a template XML tree, pull the data from each line of text and assign it to a clone of the template, then export the XML file that is created. The one other difference between the Text2XML and FixedText2XML function stems from the need to trim white space in the latter (you can set this option with the TrimWhiteSpace Boolean value, normally defaulted to true).

You can then call FixedText2XML() by passing fieldnames and lengths, along with the object and collections names. The output is the same as before: an XML document with each node corresponding to either containers or field header names:

		John		1962-11-24		accounting		senior accountant	dim xmlDocset xmlDoc=FixedText2XML("employees_fl.txt",_          "employee",_          "employees",_          Array("name,dateOfBirth,dept,jobTitle"),_          Array(1,11,21,37), _          True)

Going the Other Way
It’s also worth examining the reverse process, or how to create a standard text file from an XML file. In many respects, this process is much simpler because the XML DOM contains a mechanism to convert any subset of an XML structure into a linear list?the selectNodes property.

In the XML2Text function, you pass the XML document as a DOM, the XML query to generate the list, and the name of the output file as a string to the function (see Listing 3). SelectNodes walks through the XML tree and copies pointers to respective nodes into a list. The nodes themselves stay in their original positions, so you don’t need to worry about causing your XML structure to self-destruct when using SelectNodes?the list is just a linked list that can be iterated like any other collection.

Again, a significant portion of this code is housekeeping?keeping the function reasonably generic. It uses two query strings: a record query for retrieving the nodes corresponding to the data at the query level (relative to the initial node, not the overall document) and a property query for determining which query to include. The default action is to include all children of the starting node as record nodes and all children of these nodes as property nodes, although in general you really should specify these explicitly.

If you wanted to specify that the function should retrieve a list only of those employers working in accounting, and then only retrieve the name and date of birth to be stored in employeesSub.txt as tab delimited text, then you’d call the function as follows (assuming EmployeeXML contained the XML structure displayed earlier):

XML2Text  EmployeeXML,_        "//employee[dept='Accounting']",_        "name|dateOfBirth",_        employeesSub.txt",_        vbTab

I leave it up to you to convert an XML structure to a fixed length text, with the hint that the code is nearly identical to the previous code, save for passing field length parameters.

So Why Do This?
Being able to convert XML into text and vice versa seems like an interesting exercise in working with the DOM, but what are the advantages for moving the text into and out of an XML format? To understand this, you have to realize that the comma- (or tab-) delimited file format suffers from a number of shortcomings. First, you must manually parse the file and build the storage mechanisms. Also, you must write any search routines on the text itself, making all but the simplest searches an expensive proposition. With a comma-delimited file, you can’t represent hierarchical data in a list (easily). In addition, text files generally don’t support Unicode. Finally, to display a text format in something like HTML, you have to write customer “viewer” programs.

On the other hand, with XML, parsing is performed automatically, as is error handling if the parsing fails. XML supports a rich set of searching capabilities in the form of XPath (or XSL Query, depending upon which marketing department you’re listening to). XML easily represents multiple levels of hierarchy without complex indexing schemes. In addition, XML intrinsically supports Unicode and the full ISO character set standards.You can use either the Document Object Model or XSL to transform the XML into a different format, even non-XML formats such as Comma Separated Values (but that’s for a different article).

Many programs?spreadsheets, databases, tables in word processing documents, accounting packages, and so forth?give you the option of saving out to a delimited text file. Using the routines in this article gives you the advantage of both this generic form of output coupled with the robustness and sophistication of XML.

Other applications could also benefit from a similar approach. For example, initialization files (INI files) are text files that give a first-level pass to producing hierarchical data. Converting keys and sub-keys into XML is a straightforward process. Similarly, many other application standards (such as a POP3 e-mail format) have similar name/value structures, which you can readily convert into and out of an XML. Think of the possibility of an application that could intercept incoming e-mails, convert them into XML before passing the originals on, and then perform basic search and sorting on the entries.

These sorts of applications will be important in the future as document management moves beyond simply providing directories and pointer structures and into building cohesive information structures based upon meaning. This is one of the most significant reasons for looking at XML, and the methods mentioned here help open the door.

devx-admin

devx-admin

Share the Post:
2D Semiconductor Revolution

Disrupting Electronics with 2D Semiconductors

The rapid development in electronic devices has created an increasing demand for advanced semiconductors. While silicon has traditionally been the go-to material for such applications,

Cisco Growth

Cisco Cuts Jobs To Optimize Growth

Tech giant Cisco Systems Inc. recently unveiled plans to reduce its workforce in two Californian cities, with the goal of optimizing the company’s cost structure.

FAA Authorization

FAA Approves Drone Deliveries

In a significant development for the US drone industry, drone delivery company Zipline has gained Federal Aviation Administration (FAA) authorization, permitting them to operate drones

Mortgage Rate Challenges

Prop-Tech Firms Face Mortgage Rate Challenges

The surge in mortgage rates and a subsequent decrease in home buying have presented challenges for prop-tech firms like Divvy Homes, a rent-to-own start-up company.

Lighthouse Updates

Microsoft 365 Lighthouse: Powerful Updates

Microsoft has introduced a new update to Microsoft 365 Lighthouse, which includes support for alerts and notifications. This update is designed to give Managed Service

Website Lock

Mysterious Website Blockage Sparks Concern

Recently, visitors of a well-known resource website encountered a message blocking their access, resulting in disappointment and frustration among its users. While the reason for

2D Semiconductor Revolution

Disrupting Electronics with 2D Semiconductors

The rapid development in electronic devices has created an increasing demand for advanced semiconductors. While silicon has traditionally been the go-to material for such applications, it suffers from certain limitations.

Cisco Growth

Cisco Cuts Jobs To Optimize Growth

Tech giant Cisco Systems Inc. recently unveiled plans to reduce its workforce in two Californian cities, with the goal of optimizing the company’s cost structure. The company has decided to

FAA Authorization

FAA Approves Drone Deliveries

In a significant development for the US drone industry, drone delivery company Zipline has gained Federal Aviation Administration (FAA) authorization, permitting them to operate drones beyond the visual line of

Mortgage Rate Challenges

Prop-Tech Firms Face Mortgage Rate Challenges

The surge in mortgage rates and a subsequent decrease in home buying have presented challenges for prop-tech firms like Divvy Homes, a rent-to-own start-up company. With a previous valuation of

Lighthouse Updates

Microsoft 365 Lighthouse: Powerful Updates

Microsoft has introduced a new update to Microsoft 365 Lighthouse, which includes support for alerts and notifications. This update is designed to give Managed Service Providers (MSPs) increased control and

Website Lock

Mysterious Website Blockage Sparks Concern

Recently, visitors of a well-known resource website encountered a message blocking their access, resulting in disappointment and frustration among its users. While the reason for this limitation remains uncertain, specialists

AI Tool

Unleashing AI Power with Microsoft 365 Copilot

Microsoft has recently unveiled the initial list of Australian clients who will benefit from Microsoft 365 (M365) Copilot through the exclusive invitation-only global Early Access Program. Prominent organizations participating in

Microsoft Egnyte Collaboration

Microsoft and Egnyte Collaboration

Microsoft has revealed a collaboration with Egnyte, a prominent platform for content cooperation and governance, with the goal of improving real-time collaboration features within Microsoft 365 and Microsoft Teams. This

Best Laptops

Top Programming Laptops of 2023

In 2023, many developers prioritize finding the best laptop for programming, whether at home, in the workplace, or on the go. A high-performing, portable, and user-friendly laptop could significantly influence

Renaissance Gaming Magic

AI Unleashes A Gaming Renaissance

In recent times, artificial intelligence has achieved remarkable progress, with resources like ChatGPT becoming more sophisticated and readily available. Pietro Schirano, the design lead at Brex, has explored the capabilities

New Apple Watch

The New Apple Watch Ultra 2 is Awesome

Apple is making waves in the smartwatch market with the introduction of the highly anticipated Apple Watch Ultra 2. This revolutionary device promises exceptional performance, robust design, and a myriad

Truth Unveiling

Unveiling Truths in Bowen’s SMR Controversy

Tony Wood from the Grattan Institute has voiced his concerns over Climate and Energy Minister Chris Bowen’s critique of the Coalition’s support for small modular nuclear reactors (SMRs). Wood points

Avoiding Crisis

Racing to Defy Looming Financial Crisis

Chinese property developer Country Garden is facing a liquidity challenge as it approaches a deadline to pay $15 million in interest associated with an offshore bond. With a 30-day grace

Open-Source Development

Open-Source Software Development is King

The increasingly digital world has led to the emergence of open-source software as a critical factor in modern software development, with more than 70% of the infrastructure, products, and services

Home Savings

Sensational Savings on Smart Home Security

For a limited time only, Amazon is offering massive discounts on a variety of intelligent home devices, including products from its Ring security range. Running until October 2 or while

Apple Unleashed

A Deep Dive into the iPhone 15 Pro Max

Apple recently unveiled its groundbreaking iPhone 15 Pro and iPhone 15 Pro Max models, featuring a revolutionary design, extraordinary display technology, and unrivaled performance. These new models are the first

Renewable Crypto Miners

Crypto Miners Embrace Renewable Energy?

As the cryptocurrency sector deals with the fallout from the FTX and Celsius exchange collapses, Bitcoin miners are increasingly exploring alternative energy sources to reduce expenses and maintain profitability. Specialists

Laptop Savings

The HP Omen 16 is a Gamer’s Dream

Best Buy is currently offering an unbeatable deal on the HP Omen 16 gaming laptop, giving potential buyers the chance to save a significant $720 on their purchase. Originally priced

How to Check for Vulnerabilities in Exchange Server

It is imperative to keep your systems and infrastructure up-to-date to mitigate security issues and loopholes, and to protect them against any known vulnerabilities and security risks. There are many

Data Center Creation

Transforming Corporate Campuses into Thriving Data Centers

Dallas-based developer Compass Datacenters has purchased a 197-acre ex-corporate campus in Hoffman Estates, Illinois for an estimated $194 million. This acquisition occurs as more companies are downsizing and consolidating their

Nano Unbeatable Value

Lenovo ThinkPad X1 Nano: Unbeatable Value

The Lenovo ThinkPad X1 Nano, a first-generation model beloved for its ergonomic keyboards and stylish appearance, is now available at an unbeatable price of $600 on eBay. Though this 13-inch

Assistant Revolution

Home Assistant Green Makes Your Home Futuristic

In recent times, the growing interest in smart home technology has led to the emergence of various devices and platforms that enable homeowners to control and manage their homes in

Ultimate Smart Home

Unlock Your Ultimate Smart Home Adventure

Ever thought about having a home where everything is tailored to your needs and runs smoothly? Home automation may significantly impact your daily life. Transform your space into an intelligent

Gaming Laptops

Lenovo Legion Laptops Revolutionize Gaming

The gaming laptop market has recently been graced with two potent contenders, the Lenovo Legion 9i and the Legion Pro 7i. Both laptops boast powerful 13th-gen Intel Core i9 processors

Fintech Revolution Leap

Figure’s Bold Leap Beyond Banking

Figure, a fast-growing fintech startup, recently retracted its bid to become a bank, sparking speculation that the process was too challenging. However, the true reason behind Figure’s decision was its