RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Drill-down on Three Major New Modules in Python 2.5 Standard Library : Page 5

The freshly minted 2.5 version of Python has lots of goodies, but the three in this article are the cream of the crop. Find out how ctypes, pysqlite, and ElementTree can save you time and aggravation in this extensive article with a ton of great sample code.

Parsing XML
ElementTree is not just an XML builder. It is a parser too. It can take an XML file or string and create an ElementTree out of it. This area of ElementTree is probably the most common one and yet the interface is pretty clunky IMHO. To parse files you call the parse() function with a filename; to parse an XML string you can use one of two identical functions: XML() or fromstring(). Nothing is consistent about this choice of functions. It feels wrong to have more than one way to parse XML strings.

The forest contains only a single crazy ogre. It's not much of a challenge. Let's add a bunch of other enemies. How about Godzilla, Micro Godzilla, King Kong, Prince Kong, a Fearsome Dragon, a Hot Dragoness, a Drunk Dragon, a Killer Bunny, and a Wolf Pack for good measure. Simply passing the XML string that describes each enemy to the XML() function is enough to create an Element that contains sub-elements for each enemy. The wolf pack is a nested enemy that contains individual wolves.

# initializing from a string
enemies_xml = \
    <enemy name='Godzilla' life='8500' strength='200' special='stomping' />
    <enemy name='King Kong' life='2500' strength='120' special='stomping' />
    <enemy name='Micro Godzilla' life='0.5' strength='0.3' special='stomping' />
    <enemy name='Prince Kong' life='300' strength='50' special='stomping' />
    <enemy name='Fearsome Dragon' life='400' strength='80' special='Fire Breath' />
    <enemy name='Hot Dragoness' life='400' strength='80' special='Fire Breath' />
    <enemy name='Drunk Dragon' life='0.5' strength='0.3' special='Alcohol Breath' />
    <enemy name='Killer Bunny' life='25' strength='15' />
    <enemy name='Wolf pack'>
        <enemy name="Wolf 1" life="10" strength='5' />
        <enemy name="Wolf 2" life="10" strength='5' />
        <enemy name="Wolf 3" life="10" strength='5' />

# Creating an Element from a string
enemies = XML(enemies_xml)
The output looks just like the input in this case, which is a good validation of the pretty_dump function.

Finding Your Way in the Forest
So, you have a nice populated forest with lots of enemies and the hero is ready to bravely enter it. The hero is powerful, courageous, and can dance like a ballerina. Unfortunately he is also a stompophob. Stompophobs as you very well know are afraid to death to be stomped. This is a very rational aptitude where the likes of Godzilla and King Kong walk the earth.

The hero naturally has access to our forest XML file, and he wishes to know about all the stompers in the area. ElementTree sports several flavors of finding stompers such as find(), findall(), and findtext(). All these functions accept a parameter that can be either a tag name or a limited XPath expression. ElementTree supports a very basic subset of XPath. You can search for a specific tag in your direct children or on an entire tree or you can start from a specific branch. For example, to find all the compound enemies in the forest the following expression will do:

compound = enemies.findall('./enemy/enemy')
for e in compound:
    print e.get('name')

Wolf 1
Wolf 2
Wolf 3
The XPath expression finds all the wolves in the wolf pack. Note that there is no way to get the parent of a node, so if you want to find the compound itself, you are out of luck. The XPath support doesn't include attributes. This means that there is no way to perform searches based on attribute names or values (or the text content of nodes). This poses a serious problem to the hero since the 'stomping' special attack is stored in attributes of the forest creatures. A lesser man or woman would probably buckle up and go around the forest. The hero, however, is both heroic and well versed in the art of XML and ElementTree. He decides to transform the forest so that attributes become elements.

Here is his plan: Scan recursively the element tree. For each element with attributes create an 'attributes' tag, insert into it a sub-element for each attribute (tag is the attribute name, text is the value), and set the original attributes to None. Note that I created the 'attributes' element using Element and not SubElement. This creates a standalone element that later I insert() as a sub-element explicitly. The reason I didn't use SubElement is two-fold: I wanted to show you another way to add sub-elements and also I wanted to make sure the 'attributes' sub-element will be the first sub-element. The SubElement() function always appends the new sub-element.

def attributes2elements(e):
    for child in e.getchildren():

    if e.items:
        # make sure that the attributes element is the first one
        attributes = Element('attributes')
        e.insert(0, attributes)
        for (name, value) in e.items():
            a = SubElement(attributes, name)
            a.text = value
        e.attrib = {}
In Listing 3, the hero wastes no time and invokes attributes2elements on the original forest XML string. The verbosity tripled instantly, but at least the information is preserved and the XML contains no attributes.

Detecting Stompers
At this point I can invoke one of the find() functions to locate stompers. However, it is not very simple. Here's why.

stompers = [e for e in enemies.findall('.//special') if e.text == 'stomping']

for s in stompers:
    print pretty_dump(s)

This code indeed locates all the stompers, but only their special element. There is no way to climb back up and find the 'enemy' element. (Take note of the XPath expression to locate all the sub-elements under the current node in any level.) In order to find the stomping enemy elements some ingenuity is required. Listing 4 checks every enemy (working at the enemy element level) to see if it has a special attribute with a value of stomping.

Elementary ElmentTree
ElementTree is a fine piece of software that proves that a friendly API can also be performant. ElementTree offers much more than that, including decent namespace support, fine-grained XML tree building, reading and writing to files, etc. For performance buffs the cElementTree is a real boon. The official documentation is here: http://docs.python.org/dev/lib/module-xml.etree.ElementTree.html, but it is very weak. I recommend going to the source: http://effbot.org/zone/element-index.htm. And be sure to keep your eye out for many fine tutorials and articles by third-party developers.

Gigi Sayfan specializes in cross-platform object-oriented programming in C/C++/C#/Python/Java with an emphasis on large-scale distributed systems. He is currently trying to build brain-inspired intelligent machines at Numenta.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date