few years ago Kurt Cagle wrote an article discussing how to convert fixed-width text records to XML, a more malleable format, making it possible to use XSL and XSLT to further transform those records into almost any format needed. Kurt’s article was educational, an excellent solution, and a fun project to play with, but in the long run, writing tedious and repetitive data transformations manually is a poor business solution. This is where GoXML Transform Designer steps in.
GoXML is a relatively new product on the market that simplifies working with flat files to help step up your data integration projects. The application requires a recent version of the JRE, but the install package checks your system and gives you the option to choose a local JRE if you already have one installed, or install one if you don’t.
|Figure 1. CSV Subscriber List: This sample CSV file is a straightforward list of subscriber records with column names in the first line.
In this article, I’ll show you how to use GoXML to reduce the process of transforming flat-files into a required XML format to a few mouse clicks and keyboard strokes. All the files used and created in this sample are included in the downloadable sample code.
A Transformation Scenario
This example scenario uses a flat-file list of Web site subscribers. Each line includes a subscriber ID, name, e-mail address and a “yes” or “no” field that indicates whether the subscriber opted-in to receive e-mail notifications. The information was collected at a conference, so the data is in Comma Separated Value (CSV) format, and includes column headers in the first line (see Figure 1).
|Figure 2. Template XML Format: After transforming the CSV file shown in Figure 1 into XML, you should have a list of subscribers in the format shown here.
In this scenario, the company’s subscriber import tool requires the data to be in XML format, so you need to create a mapping to transform this CSV data into a specific XML format. The source data file is named subscribers2.csv and the target data, or template, is mailingList.xml (see Figure 2). The target data format includes a root node of
|Author’s note: The names for the sample data in this scenario and sample project were created with a name generator, so any name, similar or actual to anyone real, is purely coincidental.
Creating a Transformation with GoXML
Start by opening the GoXML Transform Designer. The first time you open the application, you may only see the Source pane and the Target pane. If this is the case, open the Configuration pane by selecting View from the menu bar and checking the Configuration Pane option. Your screen should now look similar to Figure 3.
|Figure 3. GoXML Transformation Designer: After opting to view the Configuration pane from the View menu, your screen should show three panes: Configuration, Source, and Target.
In these examples, you won’t need to make any custom configuration changes, but if you prefer, you can update the Preference tab in the Configuration pane.
In the Source pane, select the Configuration tab, click the Data Format dropdown field, and select CSV as the source data type. The delimiter field should default to a comma. In the structure field, you need to enter the path to your CSV file or click on the ellipses to browse to the file (see Figure 4). When you’ve done this, click the Load button. GoXML will read in the file.
|Figure 4. Selecting a Source: When the source file is in CSV format, select CSV from the Data Format dropdown, make sure the Delimiter field is set to a comma, and provide the path to the source file in the Structure field.
Next, select the Mapping tab. Here you will see the headers for your CSV file. If you click on the Structure tab, it will show you the actual data, or “structure” of your document. You’ll use the headers displayed in the Mapping tab to create our relationships with the target data.
In the Target pane, click on the Configuration tab, open the Data Format list box and select XML. You could click on the “New” button in the XML Formatter frame to create a new XML document, but because you already have a template, just enter the path to the target file, or click on the ellipses to select the mailingList.xml file. Now you can check the Load Structure textbox and select the Mapping tab.
Here you will see a tree-view representation of your XML nodes. Now we can create the relationships between our source data and target data.
The first challenge is that the CSV file contains the user’s full name in one field while the target XML file has separate nodes for first and last names. Obviously, you’re going to have to split up the single CSV field with code. To handle situations like this, GoXML lets you map target nodes to a function rather than directly to a source field.
|Figure 5. Completed Mapping: After mapping all the sample CSV fields to their corresponding elements in the XML target format, your mapping surface should look like this.
To do that, right-click on the firstName element and select “Map to Function,” which opens the function editor. In the function pane, expand the STRING_CHARACTER option and then scroll down to the STRTOKEN function. The strtoken function takes the source data, and splits it based on a specified delimiter, creating a list of separated “tokens,” returning the token with the “instance” (index) you declare in the second argument. Drag-and-drop the STRTOKEN function to the Function Instance pane. This displays the parameters required by the function. The first parameter is the source data, so drag the name field from the Source pane to the first parameter of the function instance. Next, double-click the second parameter and enter 1, meaning that the function should return the first token?the first name. Finally, double-click the third parameter and enter a space for the delimiter, declaring that the tokens are space delimited. Click OK, and you have your first name mapping.
At any time, you can hover over the elements and see the info about that element such as an XPath query for the selected element, a description of the mapping, such as a function name in this case, and the node type.
You follow the same process to map the lastName, except that you change the instance from 1 to 2, causing the function to return the second token in the returned string. Here’s a convenient feature that can save you some time when defining repetitive mappings. Right-clicking on the firstName element and selecting Copy Mapping, then right-clicking the lastName element and selecting Paste Mapping copies the function defined for firstName to the lastName element. All you have to do to modify the instance from 1 to 2 is right-click on the lastName element, select “Map to Function,” and modify the copied function.
|Figure 6. Validating Results: After creating a successful transform, you can validate the results by expanding nodes in the Result pane.
To finish the mapping, drag the email element form the source pane to the email element in the target pane, and do the same for the optIn element. At this point, your mapping should look like Figure 5.
To validate the data you have just mapped, click on the Structure tab in the source pane and the Result tab in the target pane. This causes GoXML to transform the data based on your mappings and functions. If everything works properly, you’ll get a “Transform successful” message. At this point, you can expand the nodes in the Result pane (see Figure 6) to validate the data and mappings against the data in the source structure.
When you’re happy with the results, click the Save button on the main toolbar to save the mappings and configuration for future transformations and then click the Save button in the target pane to save the target results as mailingListFinal.xml. At this point in the scenario you have both an XML document of subscribers suitable for the subscriber import tool, and a mapping that will work with future incoming CSV subscriber files.
After saving a mapping, for every future similar CSV subscriber file you need only open the existing project, select the Configuration tab in the source pane and change the source to the new CSV file. GoXML will apply the mappings to the new source data and you can save the results as another version of mailingListFinal.xml suitable for import. In other words, after creating the transformation project, you no longer need developer time to handle the task. For simple tasks, GoXML even makes it possible to let non-developer staff create transformations.
In real-world scenarios, you’re likely to have many of these transformations that have to run on a weekly, daily, or hourly basis. It’s obviously unrealistic to expect your staff to keep up with these transactions manually; instead, you’d want to automate the transformations. For that, Xenos, GoXML’s parent company, offers an enterprise product, called terminalONE Transform Server, which has an API and a runtime engine for automating all transformations. This enterprise version consists of three components: terminalONE transform, terminalONE transport and terminalONE traffic. By creating a few business rules for particular source data formats, terminalONE transform can perform real-time transforms of almost any data format into any other data format you need.
Then, terminalONE transport acts as a gateway enabling secure point-to-point data exchange between disparate systems, whether in your network or across networks in a B2B environment making terminalONE a true enterprise solution.
With the final addition of terminalONE traffic, this enterprise application routes your data transformations to the appropriate backoffice system based on the data type. If the received data is not in format defined in your business rules, terminalONE traffic requests a transformation from terminalONE transform, making it an intelligent application. Although a full examination of GoXML’s enterprise-level product is beyond the scope of this review, if you need this level of automation, see the terminalONE components Web site.
Despite the simplicity of the mapping process that GoXML provides, you can’t eradicate errors, especially in obsolete or non-standard data structures. To test GoXML’s error-handling features, I ran the sample transform with a CSV file that was missing the first column header. When the missing header missing was not part of the transform, the Mapping tab display simply didn’t display that name in the designer. But if the missing column header existed in the target as a node defined in a mapping, GoXML returns a Designer Error, notifying you that there was an error while mapping the source to the target, and to see the log (see Figure 7).
|Figure 7. Sample Error Message: The figure shows a sample GoXML error message generated from a mismatch between fields in the source data and the defined target mapping.
Although the error message is sparse, the log entries tell you what mappings GoXML could not resolve. Each log entry includes the XPath-style route to the node that caused the problem, making it easy to hone in on the specific node. You can view the log at any time at the bottom of GoXML along with the Project and Help tabs, which you can see at the bottom of Figure 3. In contrast to missing mapping columns, GoXML handles missing values in the source data by simply continuing the transform, leaving those node values blank in the target file.
When you know that empty values may occur, you can use the Trim option to tell GoXML to handle them using one of three options, called None, Child and All Nodes. With the None option the transform retains all empty tags. The Child option removes tags from the output if the node’s children also have empty values, and the All Nodes option removes all nodes with empty output.
To further stress the sample transform, I removed the last name of one of the records in my source data, which you may remember was mapped to a function to define the target data. To my gratification, GoXML didn’t throw an error, and the transform ended successfully. But when validating the source structure to the target results, instead of the last name element being empty, both the firstName and LastName elements contained the first name.
Though this didn’t give me the opportunity to look into the missing data, GoXML handled the missing data cleanly and prevented any roadblocks in my transactions.
GoXML can be a very powerful tool in your IT shed when needed. With ambiguous or undefined flat text files, you have the ability to create a flat text schema document that defines your source data. With the Importer Wizard, you can define the actual fields in your document, split the records if they contain unrelated data combined in a single record, similar to normalizing a database, and even define new records. The quick guide included with your install walks you through such transformations step-by-step. Just as in this example, after defining a transform, running it only requires loading the project and specifying the source and output documents.
Not Just for Flat Files and XML
GoXML handles far more formats than CSV and XML, and transformations can work in either direction. Supported data types include flat text, CSV, XML and EDI which GoXML can transform to the likes of PDF, HL7, SQL and COBOL. And that’s only half the data types available. It even handles the complexities of having multiple source files that you need to transform and merge into a single target document.
To transform to PDF, you will need Adobe Acrobat so you can create a target PDF, or a PDF template containing defined fields similar to the XML target that I used in the example. After defining the target PDF fields, GoXML compiles a new PDF file based on the name you supply using the mappings you created.
GoXML definitely fits into the family of what are called “visual” tools, changing data transformation from a code-centric programming task to a much simpler and less error-prone drag-and-drop process.
Still Room for Improvement
A quick final note: the help documentation, though relatively thorough, is a little cumbersome. When using search, clicking on a link in your results displays the requested page then seems to “step back” to your previous page. To work around the problem, clicking on the result link once more finally delivers the page requested.
The GUI can be a little slow if you aren’t used to using Java UI’s in a Windows environment. Where Windows usually provides an immediate response to mouse events, with GoXML you may find a slight delay, more so when accessing the local file system. But overall, the product is worth its weight in gold, especially when you include their online help and product training services. Granted, you lose the fun of coding, but that’s more than offset by the significant reduction in your workload and production times. What’s more, many transformations are easy enough that with a little training, the IT team can pass such tasks to other groups, spreading the work back across your enterprise.