Browse DevX
Sign up for e-mail newsletters from DevX


Processing EDI Documents into XML with Python : Page 4

You don't have to rely on expensive and proprietary EDI conversion software to parse, validate, and translate EDI X12 data to and from XML; you can build your own translator with any modern programming language, such as Python.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Building an EDI Handler
EDI handlers in this framework are similar to SAX content handlers. The four essential EDI handler methods are start_interchange(), end_interchange(), segment(), and error(). The parser calls these methods when it encounters the appropriate events.

You'll find the EDI handler code in a file named edi_handler.py. It contains two relevant classes: Translator and NonEDIHandler (there's also a GenericEDIHandler class that simply prints each segment to standard output).

The Translator EDI handler's primary functions are to:

  • Validate that segments occur in the proper order based on the order specified in a description file.
  • Maintain state for the current position in the looping structure of the EDI file.
  • Provide a do_<X12 segment tag>() method for each X12 tag to facilitate the creation of the output XML.
  • Build a DOM output object and populate it with data encountered in the input file
  • Maintain a dictionary of nodes added to the XML output DOM object

Translator contains the four essential methods mentioned above as well as a number of helper methods.

The start_interchange() method creates a DOM object that will contain the output XML. The end_interchange() method displays the final result of the XML. Translator calls the error() method if a segment appears out of order. It stops all further processing of a file after calling error().

The segment() method splits the segment into elements, calls the validate_seg() helper method to determine if the segment is allowed to appear where it does, and finally calls the appropriate do_<segment_name>() method.

The validate_seg() method is responsible for determining if the X12 segment just encountered appears in its proper sequence. It does so by "remembering" the last X12 segment encountered, being "informed of" the latest X12 segment encountered, and traversing up, down, and/or laterally in a DOM description of the EDI file (which was created from the X12 schema file x12_schema.xml) to discover if the latest segment can occur now.

Author's Note: I have not implemented a validator to determine whether the maximum occurrences of a particular segment have been exceeded. However, that should be a minor modification.

The X12 schema XML file mentioned above details the proper looping structure of the X12 document it describes, whether a segment is required or optional, and whether it can occur multiple times. It's worth noting that I haven't described the elements for each segment. Creating the element descriptions and the code to validate them should be another fairly minor modification to the XML file and the Python code.

XML Output
As mentioned in the Generic EDI Parsing Layer section, you can call the gen_parser.py script from the command line. It accepts the name of an EDI input file. For example, to call gen_parser.py on the sample input file, you use a command like this:

   jjones@bean:~/svn/home/source/edi$ python 
      gen_parser.py example_edi_stream.txt xml_output

As the code reads the X12 input file, it creates an output DOM object and builds upon that as it encounters data. It converts that DOM object to an XML output file. The resultant XML file looks like Listing 3:

Compare the XML in Listing 3 to the documents shown on the first page of this article, and you'll instantly see why translating EDI to XML makes sense. Traditional EDI data formats such as X12 are simply text files that can be processed with any modern programming language. This Python-based example of an EDI translator, while not exhaustive, is a good first step in the development of more extensive tools for managing EDI.

Jeremy Jones is a Quality Assurance Engineer at The Weather Channel. He's responsible for writing, maintaining, and extending an automated test framework for their TV broadcast software. He has six years of experience in the EDI industry working for Harbinger/Peregrine/Inovis. He considers his daughter Zane, his son Justus, and that his wife, Debra, still stays with him after eight years of marriage his three greatest accomplishments. He's also an open source software author (see http://sourceforge.net/projects/munkware). All views and opinions expressed in this article are those of Jeremy Jones and not of The Weather Channel.
Thanks for your registration, follow us on our social networks to keep up-to-date