Browse DevX
Sign up for e-mail newsletters from DevX


Processing EDI Documents into XML with Python : Page 3

You don't have to rely on expensive and proprietary EDI conversion software to parse, validate, and translate EDI X12 data to and from XML; you can build your own translator with any modern programming language, such as Python.


Extending the Parser Class
You can extend this class to accommodate other EDI parsers, such as Tradacom, EDIFACT, etc., by taking the following steps:

  • Write a parser class for the desired type of EDI following the example of the X12 parsing class (described later in this article).
  • Import the module containing the new parser class in the gen_parser module.
  • Create a reference to a parser object for the desired type of EDI in the run() method of the gen_parser class similar to the following example for EDIFACT:
   self.edifact = 
  • Call the add_transitions() method on the (Edifact) parsing instance created above.
  • Add a check for the new EDI's header segment in the searching_header() method. Following the already established EDIFACT example, you would end up with something like this:
   elif poten_tag == "UNA":
       return (self.edifact.header_seg, (infile, poten_tag))
   elif poten_tag == "UNB":
       return (self.edifact.header_seg, (infile, poten_tag))

While gen_parser.py is a module containing a class definition, it also contains a main section that you can call from the command line. It accepts two parameters: the name of the EDI input file and an output XML file prefix. The parser reads in the specified input file and the EDI translation handler writes a translated XML file for each X12 interchange encountered in the input file using the naming convention <XML file prefix>_.xml.

EDI X12 Parser Plugin
You'll find the X12 parser plugin implementation in the file x12_parser.py. The parser consists of a single class named x12_parser. This class primarily does two things:

  • recognizes entire valid X12 interchanges
  • tokenizes segments as it is recognizing each interchange

The x12_parser class contains five methods: __init__(), add_transitions(), header_seg(), body_seg(), and end_seg(). The latter three methods pass each tokenized EDI segment to the segment() method of the EDI handler object (discussed below).

The header_seg() method determines whether the potential ISA segment is valid. An ISA segment is a fixed 106 characters long with each element of the ISA segment having a fixed length. Since we have already read in 3 characters ("ISA"), we need to read in another 103 characters and make sure everything is where it should be. If it is, we extract information about the document thus far, such as the characters used for the delimiters (element separator, segment terminator, and sub-element separator) and some interchange identifiers (sender and receiver IDs and qualifiers and interchange date, time, and control number).

The body_seg() method reads in a chunk of characters at a time (100 by default) and looks for a segment terminator. If it hits EOF before it can find a segment terminator, it breaks out. If it finds a segment terminator, it passes that segment to the segment() method on the EDI handler and starts looking for the next element separator. The characters between the segment terminator and element separator are the next segment tag. If the next segment tag is not an "IEA", it will keep looping through the document. If it is an "IEA," the code jumps to the end_seg() method.

The end_seg() method simply reads in the rest of the IEA segment and makes sure everything is where it should be. The IEA is not a fixed-width segment as the ISA is, so you have to do a little more validation work.

Thanks for your registration, follow us on our social networks to keep up-to-date