Module No. 3: xml.etree.ElementTree
This module contains pythonic XML processing tools for parsing and constructing XML documents. Python boasts several standard XML modules that support the DOM and SAX APIs. However, the DOM API (xml.dom.minidom) is modeled after the W3C DOM API and is quite cumbersome. ElementTree is the brainchild of Fredrick Lunde ( http://www.effbot.org
). It is a highly pythonic and high-performance XML package. Lunde also contributed the cElementTree, which is a C extension that exposes the same API as the Python package. The performance of cElementTree is amazing (speed and memory foot print).
Many pythonistas reject XML as a data exchange format altogether and prefer to simply use direct Python data structures for data exchange. This can be done either as plain text (to be evaluated on the other side using the eval() function) or pickled. However, no one can escape XML these days. It is especially dominant in the important web services domain. To discuss ElementTree, I will continue with the role-playing game example.
ElementTree is based on the Element data type. An element has a tag and may also have children (sub-elements), attributes (key-value pairs), content (text string), and a tail (text string that follows the element until the next sibling element). ElementTree is optimized for non-mixed data models (where text never contains elements), which will be the focus of this article.
The Forbidden Forest
Our well-equipped hero from the sqlite3 section is about to enter an ominous forest. Game areas are created and exchanged using XML in the game because XML is better suited for dealing with hierarchical data structures. The forest contains enemies, treasures and other items. ElementTree lets you express it very concisely:
from xml.etree.ElementTree import (ElementTree,
root = Element('forest')
print 'The root has', len(root), 'sub-elements'
for e in root:
The root has 3 sub-elements
<forest><treasures /><enemies /><items /></forest>
I created a root Element giving it a tag. You can also provide attributes as a dictionary and even more attributes via named parameters. After creating the root element I created a few sub-elements of the root (treasures, enemies, items). You can iterate over sub-elements using a simple for loop over an element. The len()
of an element returns the number of its sub-elements. This is the essence of the "pythonicity" of ElementTree. It uses Python idioms to expose its data model.
The dump() function takes an Element and dumps its contents recursively to the screen. It is very handy for interactive development. You create some elements hook them up together and dumps the root to the screen to make sure you got it right.
It's time to inhabit the forest with some fearsome creatures. The creatures in this game have a name, numeric life and strength attributes, and an optional special attack or power. When life reaches 0 the creature (or the hero) is dead. The strength determines the damage the creature dishes in each attack and the special attack is, well, special. It can affect many aspects of a battle. Let's start with your garden variety crazy ogre. In order to add a crazy ogre to the enemies in the forest I use the find() of the root element to find the enemies sub-element and then append() an 'enemy' element with the various attributes. Note that I passed some of the attributes as a dictionary, but the 'special' attribute as a named parameter. This is fineall the attributes are equivalent.
enemies = root.find('enemies')
Let's dump the forest and see what it looks like:
<forest><treasures /><enemies><enemy life="85" name="Crazy Ogre" special="bone crusher" strength="18" /></enemies><items /></forest>
That code is not very readable and I've only dumped a single enemy. What would it look like with a few more enemies?
The problem with the dump() function is that when you build your element tree using ElementTree's Element and SubElement classes no indentation or new lines are added to the XML. You end up with a single long line of verbose XML.
I created a little recursive function called pretty_dump() that takes an Element and returns an XML string with a nice layout of its content including all sub-elements. Nested elements are indented. Elements with no children appear on the same line. Every element starts in a new line. The code processes recursively every element while increasing the indentation level by two spaces and building the XML strings incrementally. The function doesn't actually print anything to the screen and just returns the final XML string. I used the os.linesep, which is the line terminator character[s]. Python defines it for every platform to make sure it works nicely everywhere.
def pretty_dump(e, ind=''):
# start with indentation
s = ind
# put tag (don't close it just yet)
s += '<' + e.tag
# add all attributes
for (name, value) in e.items():
s += ' ' + name + '=' + "'%s'" % value
# if there is text close start tag, add the text and add an end tag
if e.text and e.text.strip():
s += '>' + e.text + '</' + e.tag + '>'
# if there are children...
if len(e) > 0:
# close start tag
s += '>'
# add every child in its own line indented
for child in e:
s += os.linesep + pretty_dump(child, ind + ' ')
# add closing tag in a new line
s += os.linesep + ind + '</' + e.tag + '>'
# no text and no children, just close the starting tag
s += ' />'
Here is pretty_dump()
<enemy life='85' strength='18' name='Crazy Ogre' special='bone crusher' />