Browse DevX
Sign up for e-mail newsletters from DevX


Processing EDI Documents into XML with Python

You don't have to rely on expensive and proprietary EDI conversion software to parse, validate, and translate EDI X12 data to and from XML; you can build your own translator with any modern programming language, such as Python.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

any companies devote a sizeable portion of their IT infrastructure to converting traditional EDI data to and from the data formats their back office systems use. Typically, they handle this conversion using software packages purchased from EDI software vendors. But as more and more back-end systems become capable of consuming XML, it's becoming increasingly attractive to avoid all the proprietary formats and simply translate EDI X12 data to and from XML. This article shows you how to create your own tools for parsing, validating, and then translating EDI X12 data to XML. All the code examples for this article are in Python, but you could just as easily use any other programming language.

EDI (Electronic Data Interchange) is a generic term used to describe the exchange of electronic business documents between business partners. Specific incarnations of EDI such as ANSI X12, EDIFACT, Tradacom, and TDCC are character-delimited text files that follow a specific format.

Python is an object-oriented, byte-compiled language with a clean syntax, clear and consistent philosophy, and a strong user community. These attributes (both of the language and the community) make it possible to quickly write working, maintainable code, which in turn makes Python an excellent choice for nearly any programming task. Processing any "flavor" of EDI is no exception.

EDI Translation
Traditional EDI data such as X12 is rarely integrated directly into back office systems. While some ERP systems (and certainly some other types of applications) provide direct support for importing EDI data, it's far more common for developers to convert the EDI data to a format more usable by the back office systems, such as flat file (either fixed length record-based files or some delimited format) or XML. EDI software vendors offer what are basically EDI development environments in which you can create custom data transformation descriptions and push EDI data through the translation tool to complete the conversion.

The approach taken in this article illustrates how to build your own simple EDI-to-XML transformation framework. The advantages of such an approach are:

  • No costs for EDI transformation software, which can be quite expensive.
  • Absolute flexibility in the conversion of data (because you have all the power of a programming language available).
  • Potentially higher staffing productivity with custom rather than vendor development environments.

This approach is not without potential problems which warrant the following warnings:

  • While you can find excellent help from the Python community, no specific vendor will be available to help you solve problems.
  • This is an approach for more technically advanced individuals, which you should take into account when considering appropriate staffing.
  • This particular framework is not mature enough for production use and other similar freely available frameworks may not be overly mature, either.

Sample X12 Input
You can find all the files and code described in this article in the downloadable sample code. For example, the sample input file used for this article is an X12 Purchase Order (also referred to as a PO or an 850 transaction set). Here's the sample document:

ISA* * * * *ZZ*SENDER *ZZ*RECEIVER *041201*1200*U*00305*000000101*1*P*^!GS*PO* SENDER*RECEIVER*041201*1200*101*X*003050!ST *850*000000101!BEG*22*NE*101**041201*123456 !FOB*DF*ZZ*JMJ!DTM*037*041205!DTM*038*04121 5!DTM*002*041218!TD1*CNT90*1!TD5****JJ*X!TD 3*40!N1*OB**92*7759!N3*111 Buyer St!N4*Conyers*GA*30094*US!N1*SE*Foo Bar Sellers!N4****US!REF*DP*101!PO1*100*1*EA*** ZZ*BL47*HD*100!PID*F****Widget!PO4**1*EA!N1 *CT**38*CN!N4****CN!CTT*1*100!SE*22*0000001 01!GE*1*101!IEA*1*000000101!

As you can see, it's difficult to discern the document's structure simply by looking at it. Here's the same document rendered in a "prettified" view (unwrapped at the segment level and indented to show the looping structure):

ISA* * * * *ZZ*SENDER *ZZ*RECEIVER *041201*1200*U*00305*000000101*1*P*^! GS*PO*SENDER*RECEIVER*041201*1200*101*X*003 050! ST*850*000000101! BEG*22*NE*101**041201*123456! FOB*DF*ZZ*JMJ! DTM*037*041205! DTM*038*041215! DTM*002*041218! TD1*CNT90*1! TD5****JJ*X! TD3*40! N1*OB**92*7759! N3*111 Buyer St! N4*Conyers*GA*30094*US! N1*SE*Foo Bar Sellers! N4****US! REF*DP*101! PO1*100*1*EA***ZZ*BL47*HD*100! PID*F****Widget! PO4**1*BC! N1*ST**9! N4****US! CTT*1*100! SE*22*000000101! GE*1*101! IEA*1*000000101!

Unwrapping and formatting the input helps to show the structure, but the content of the document is still not likely to be very clear to most people.

Thanks for your registration, follow us on our social networks to keep up-to-date