|
||||||||||
|
A Layered Translation Framework
The framework described in this article has a layered design. The layers are as follows:
General Purpose State Machine The topmost layer of the framework is a state machine borrowed from chapter four of David Mertz's excellent book, "Text Processing in Python."
I decided to use Mertz's state machine because it is an excellent piece of concise, understandable, and usable code. It was written with the obvious intention of handling a file in a stateful manner, which is an approach well suited for EDI. While writing this EDI parser, I decomposed the parsing into four states:
Generic EDI Parsing Layer The next layer of the translation framework is a generic pluggable parser. You can find the code in the file gen_parser.py (see Listing 2). The main purpose for this file is to contain references to parsers and handler (both EDI and Non-EDI) plugins. The generic parser searches through the input file until it hits something that looks like an EDI document, and then passes it off to the proper parser. Listing 2 contains a single class named gen_parser. The two most interesting methods in the gen_parser class are run() and searching_header(). The run() method explicitly adds the generic EDI transitional states to the state machine by calling the add_state() method. It also adds all X12-specific transitional states to the state machine by calling the add_transitions() method of the X12 parsing class. The searching_header() method searches for what may be a header segment by iteratively reading three characters, seeing if the three characters are "ISA" (which are the first three characters of an X12 interchange), backing up two characters if not and repeating. When the code finds an "ISA" sequence, it calls a method in the X12 parsing code to determine whether subsequent characters contain a valid ISA segment. This is an inefficient way of searching for a potential header segment and you may be better off with an algorithm more like:
However, if there is only a small amount of garbage text before valid X12 interchanges, this inefficiency should have minimal impact on performance.
|
||||||||||
|