n September 19th, 2006, Python enthusiasts were rewarded with a new version. Python 2.5 continues the honorable tradition of improving and enhancing the language carefully while extending its impressive standard library.
This article is the first part in a three-part series about Python 2.5. I’ll begin with a broad-stroke summary of the changes in Python 2.5 and set the stage for a detailed exploration. The rest of this article discusses the major language enhancements in Python 2.5. Part 2 of this series will present the major new and improved modules in Python 2.5, and Part 3 will discuss a whole bag of smaller improvements and changes that are relevant to specific subsets of the Python community.
Python 2.5 at a Glance
Python 2.5 introduced some significant language changes. The venerable try-except and try-finally blocks have been unified. The functional programming style got a boost via partial functions and other additions to the functools module. The new ‘with’ statement enables safer resource management and brings along the contextlib module. Generators are more interactive and it is possible to insert values into existing generators. Conditional expressions might seem a little weird at first, but there is a good reason for their syntax. I’ll cover these features in the remainder of this article.
In the next article I’ll cover the ctypes, sqlite3, and xml.etree.ElementTree packages that are both super-cool and super-important for hassle-free deployments of Python systems without distributing a slew of external modules and libraries. ctypes allows calling C code in dynamic libraries directly from Python code. sqlite3 is a Python front-end to the excellent sqlite embedded database, and ElementTree is a highly pythonic and efficient XML parser.
Major Language Changes
Python 2.5 brings quite a few language enhancements while staying backward compatible with Python 2.4.x. These include:
- Exception handling syntax
- Functional programming improvements
- New resource management facilities
- Generator enhancements
- Conditional expressions
Python 2.5 streamlines the syntax for exception handling. Prior to 2.5 you couldn’t have both an except clause and a finally clause for the same try block. If you wanted to catch exceptions and also to perform some cleanup code afterwards you had to create a nested try-except block inside a try-finally block. Now, you can have a combined try-except-finally block. I created a little program to demonstrate the differences. Here is the code followed by explanation:
import platformclass Whoops(Exception): def __init__(self, message): Exception.__init__(self, message)def_go_24 = """def go_24(): # Throw exception here try: print 'Running Python 2.4 code' print '-'*24 try: raise Whoops('Alright, you got me!!!') except Whoops, e: print e x = Whoops ancestors =  while x.__bases__: ancestors += [str(b) for b in x.__bases__] x = x.__bases__ print 'Whoops ancestors:', ','.join(ancestors) else: print 'I''m never going to get called' finally: print 'Finally!!!' print # Don't throw exception here try: try: print 'Everything is cool...' except Whoops, e: print e else: print 'else clause is here' finally: print 'Finally!!!'"""def_go_25 = """def go_25(): # Throw exception here try: print 'Running Python 2.5 code' print'-'*24 raise Whoops('Alright, you got me!!!') except Whoops, e: print e x = Whoops ancestors =  while x.__bases__: ancestors += [str(b) for b in x.__bases__] x = x.__bases__ print 'Whoops ancestors:', ','.join(ancestors) else: print 'I''m never going to get called' finally: print 'Finally!!!' print # Don't throw exception here try: print 'Everything is cool...' except Whoops, e: print e else: print 'else clause is here' finally: print 'Finally!!!'"""if __name__=='__main__': v = platform.python_version() if v.startswith('2.4'): exec(def_go_24) go_24() elif v.startswith('2.5'): exec(def_go_25) go_25()
The program is designed to run under both Python 2.4.x and Python 2.5. It checks, using the platform.python_version() function, what version of Python it is running under and executes a similar exception handling code tailored to this version. Note that I had to create the exception handling code dynamically by defining the functions go_24 and go_25 in strings and use the exec() function to execute and actually define the functions. Then I call the right function. This is a roundabout method but it fulfills my desire to keep the code for both versions in a single file, but prevents Python 2.4 from throwing a syntax error when it encounters the try-except-finally block as an actual function definition in the file.
The Whoops class is a simple exception class that subclasses exceptions. Exception is shared by the Python 2.4 and 2.5 code although under the covers it is quite different in each version, as you will soon see.
The def_go_24 string contains a Python 2.4 function definition that has two test cases. Both test cases consist of nested try-finally + try-except block. In the first test case the Whoops exception is thrown and in the second no exception is thrown. In the first case the exception is caught and the code explores the hierarchy of Whoops’ base classes. In Python 2.4 exceptions.Exception is a plain, old-style class. The else clause is not executed in this case. The finally clause of the external try-finally block is executed of course. Here is the output of the first test case:
Running Python 2.4 code------------------------Alright, you got me!!!Whoops ancestors: exceptions.ExceptionFinally!!!
In the second test case no exception is thrown so the else clause is executed and the external finally clause is executed too:
Everything is cool...else clause is hereFinally!!!
The def_go_25 string contains Python 2.5 code, which is almost identical to the 2.4 code. The difference, of course, is in the exception handling code structure. Here there is only one combined try-except-finally block and there is no nesting. Check out the output:
Running Python 2.5 code------------------------Alright, you got me!!!Whoops ancestors:
, , Finally!!!Everything is cool...else clause is hereFinally!!!
This is almost the same except for the ancestor list of Whoops. It looks different and it contains three classes as opposed to the single class of Python 2.4. Python 2.5 reorganized the exception classes hierarchy. exceptions.Exception is not the root exception anymore. Here is the new exception hierarchy:
object |--BaseException |-- KeyboardInterrupt |-- SystemExit |-- Exception |-- [all other built-in exceptions]
BaseException is the root of the built-in exception hierarchy and it is a new-style class (subclasses object). The KeyboardInterrupt and SystemExit are now siblings, just as the Exception and not sub-classes were in Python 2.4. The reason for this change is that in your code usually you don’t want to catch these exceptions, but let them propagate all the way to the top and end the program. This led to cumbersome error handling code where people caught all exceptions but raised KeyboardInterrupt and SystemExit again to make sure they are not ignored silently. You can still raise classic classes as exceptions if you want. This enhancement is completely backward compatible.
Python always supported the functional programming style with the lambda, map, filter, and reduce constructs. Python 2.5 adds partial function application and the built-in functions any() and all(). Guido Van Rossum (inventor of Python) actually thinks that the original lambda, map, filter, and reduce functions should be dropped in favor of nested functions (instead of lambda) and list comprehensions (instead of map and filter). He is not too happy about reduce either, which is why he pushed for the inclusion of any() and all(). Check out this blog for his reasoning: http://www.artima.com/weblogs/viewpost.jsp?thread=98196.
The following sample program demonstrates all(), any(), and partial() at work. The all() function takes an iterable and returns True if all the elements are True, otherwise False. The any() function takes an iterable too and returns True if any of the elements are True, otherwise False. It doesn’t get much simpler, but it’s very useful. You could write them yourself of course, but these function are written in C for maximal performance.
In the code I defined two functions?all_divided_by() and any_divided_by()?that use all() and any() respectively. all_divided_by() takes a sequence of numbers and a divider x and invokes all(0 on the generator expression (e % x == 0 for e in seq), which will be True for any number that can be divided by x with no remainder. I’ll leave it to you to figure out what some_divided_by is doing.
def all_divided_by(seq, x): return all(e % x == 0 for e in seq)def some_divided_by(seq, x): return any(e % x == 0 for e in seq)if __name__=='__main__': s = [2, 4, 6] x = 2 print 'All elements of', s, 'can be divided by', x, ':', all_divided_by(s, x) x = 3 print 'All elements of', s, 'can be divided by', x, ':', all_divided_by(s, x) print 'Some elements of', s, 'can be divided by', x, ':', some_divided_by(s, x) x = 7 print 'Some elements of', s, 'can be divided by', x, ':', some_divided_by(s, x)
I tested these functions with the sequence 2, 4, 6 that all its elements are divisable by 2 but only some of them (6) are divisible by 3. I tried both functions with the dividers 2, 3, and 7. Here is the result:
All elements of [2, 4, 6] can be divided by 2 : TrueAll elements of [2, 4, 6] can be divided by 3 : FalseSome elements of [2, 4, 6] can be divided by 3 : TrueSome elements of [2, 4, 6] can be divided by 7 : False
Partial is a higher-level concept. It allows partial application of functions. This is similar to partial template specialization in C++ except that it’s done at runtime and not at compile time. The idea is that if you have a function that accepts some arguments you can feed it only some of these arguments and what you will get is a new function that accepts only the unsupplied arguments. When you invoke it, it uses the original arguments and calls the original function with the union of all the arguments. It sounds more complicated than it should be, so let’s see some code. I used partial to specialize the all_divided_by(). I passed x=2 and I got a new partial function that I called evens(). This function accepts just a sequence (where the original all_divided_by() accepted a sequence and a divider) and returns True if all the numbers are even. It does it of course by invoking all_divided_by using the supplied x=2 argument.
import functoolsall_evens = functools.partial(all_divided_by, x=2)print 'All elements of', s, 'are even numbers:', all_evens(s)s.append(7)print 'All elements of', s, 'are even numbers:', all_evens(s)
The output is very conclusive. all_evens() behaves like a function that detects even numbers.
All elements of [2, 4, 6] are even numbers: TrueAll elements of [2, 4, 6, 7] are even numbers: False
all_evens() looks like a function, but it is not a full-fledged function. It doesn’t have the __name__ and doc__ properties. Here is some code that demonstrates that all_divided_by has a __name__, but all_evens doesn’t:
print '%s: %s' % ('all_divided_by.__name__', all_divided_by.__name__) try: print '%s: %s' % ('all_evens.__name__', all_evens.__name__) except AttributeError: print 'all_events have no __name__ attribute'
Trying to access all_evens.__name__ raises an AttributeError that I catch and print. Here is the output.
all_divided_by.__name__: all_divided_byall_events have no __name__ attribute
Partial can be used in diverse scenarios such as callback functions, hidden cookies, and API adaption. Partial function application is a generalization of currying, which is the hallmark of languages such as Haskell.
Resource Management With ‘with’
People always ask me: ‘Why doesn’t C++ have a finally clause to clean up resources in the face of exceptions? What happens if an exception is thrown before some resource was cleaned up properly?’ Well, the truth is nobody ever asked me that. It’s an utter lie. Still, lots of people wonder about this issue and newsgroups are rife with suggestions to add a finally keyword to C++. It won’t happen. The reason is that C++ has a different way to guarantee resource cleanup. It’s called a destructor.
Every C++ object has a destructor that is called automatically when an object that’s allocated on the stack exits its enclosing scope even when an exception is thrown. C++ preaches that you should wrap every resource in an object that will take care of the cleanup in its destructor. This idiom is called “Resource Acquisition Is Initialization” (RAII). Is RAII better than an explicit try-finally block? Usually, it’s much better. It allows simpler and more readable code. Rather than write the cleanup code in every place you use the resource, you write it just once?in the resource object destructor. You don’t have to introduce a try-finally block with the mandatory nesting and name scoping. It’s impossible to forget to call cleanup code.
Well, guess what? RAII is exactly the model for the new ‘with’ statement in Python. It allows you to use a resource (that supports it) in a special with-block and not worry about cleanup. I’ll develop a mini-example soon, but first I’ll whet your appetite:
lock = threading.Lock()with lock: # Critical section of code ...
The lock will be released automatically at the end of the critical section. A shorter version is:
with threading.Lock() as lock: # Critical section of code ...
The next example is more general. In it I have an ImportantResource class and a ResourceCoordinator class. The ResourceCoordinator is responsible for coordinating the use of a single resource between multiple users. Users are supposed to acquire the resource from the ResourceCoordinator, use it, and relinquish it.
When looking at the code notice that ‘with’ is not yet a full-fledged Python statement. In order to use it you must import it using __future__. This is a standard Python mechanism to introduce new keywords in case you have classes or variables that are named ‘with’. In Python 2.6 ‘with’ will gain first-class status.
from __future__ import with_statementimport randomclass ImportantResource(object): def __init__(self, rc): self._cookie = None self._resourceCoordinator = rc def use(self): print 'ImportantResource - good job!' def __enter__(self): pass def __exit__(self, type, value, tb): self._resourceCoordinator.relinquish(self._cookie)class ResourceCoordinator(object): def __init__(self): self._resource = ImportantResource(self) self._cookie = None def acquire(self): if self._cookie == None: self._cookie = random.random() print 'ResourceCoordinator - resource acquired' self._resource._cookie = self._cookie return (self._cookie, self._resource) else: return (None, None) def relinquish(self, cookie): if cookie == self._cookie: self._cookie = None print 'ResourceCoordinator - resource released'
If the resource is not being held by someone the ResourceCoordinator hands out the resource and a random cookie when acquire() is called. If the cookie is something other than None it means the resource is unavailable (someone already holds it) and a pair of Nones is returned. When the holder calls relinquish() the ResourceCoordinator sets its cookie to None, so the next acquirer will be able to get hold of the resource. The cookie is also used to verify (weakly) that the relinquisher is indeed the current holder. This is a very poor and unsafe implementation of a resource coordination framework meant only to demonstrate the ‘with’ statement.
The ImportantResource instance is initialized by the ResourceCoordinator and gets the current cookie every time it is acquired. The __enter__ and __exit__ methods are what makes ImportantResource a context manager qualified to participate in the ‘with’ game. The __enter__ method is called upon entry to the ‘with’ block and the result is assigned to the ‘as’ variable if one is available (see the lock variable in the threading example). The __exit__ method is called upon exit from the ‘with’ block. If an exception was raised it receives the type, value, and traceback, otherwise all the parameters are None.
It’s time to experiment a little with the ResourceCoordinator. I created three experiments. In each experiment the resource is acquired from the ResourceCoordinator, used, and relinquished. In experiment_1 the user must remember to relinquish the resource. If an exception is raised the resource will never be relinquished.
def experiment_1(rc): """Not so good. If exception is thrown resource is not relinquished""" print print '-'*5, 'Experiment 1' cookie, resource = rc.acquire() resource.use() rc.relinquish(cookie)
In experiment_2 the code is placed inside a try-finally block. The user must still remember to relinquish the resource properly in the finally clause using the returned cookie, but at least it is guaranteed to happen even in the face of exceptions.
def experiment_2(rc): """Better. finally ensures that the resource is relinquished""" print print '-'*5, 'Experiment 2' try: cookie, resource = rc.acquire() resource.use() finally: rc.relinquish(cookie)
In experiment_3 I used a with-block and it is much more concise. No need to manage a cookie or explicitly call relinquish.
def experiment_3(rc): """Much Better. Using the 'with' statement effectively""" print print '-'*5, 'Experiment 3' resource = rc.acquire() # no need to store the cookie now with resource: resource.use()
I was kind enough not to raise any exceptions in all experiments so the results are identical. Here is the main script that invokes all experiments and the output:
if __name__=='__main__': rc = ResourceCoordinator() experiment_1(rc) experiment_2(rc) experiment_3(rc)
----- Experiment 1ResourceCoordinator - resource acquiredImportantResource - good job!ResourceCoordinator - resource released----- Experiment 2ResourceCoordinator - resource acquiredImportantResource - good job!ResourceCoordinator - resource released----- Experiment 3ResourceCoordinator - resource acquiredImportantResource - good job!ResourceCoordinator - resource released
The ‘with’ statement supports another idiom, which is similar to the IDisposable interface in C#. If an object has a method named close() (yes, that would definitely be file objects) then they can be used as context managers too, without implementing __enter__ and __exit__. Here is how it is done:
from contextlib import closingwith closing(open('test.txt', 'w')) as f: f.write('Yeah, it works!!!')print open('test.txt').read()
Note that ‘closing’ must be explicitly imported from contextlib.
Generators (introduced in Python 2.2) are a great language feature. In Python 2.3 and 2.4 generators could just generate values. Starting with Python 2.5 generators can interact with your program for their entire lifetime.
First let’s see what generators are all about and then I’ll talk about the Python 2.5 enhancements. Put simply a generator is an iterator that invents the sequence it iterates as it goes. Generators use a special keyword ‘yield’ to return values to the outside world whenever their next() method is called (typically in a for loop). Consider a tokenizer that needs to tokenize a huge string (maybe coming from a file or a url). In this case Python’s split is inappropriate because you don’t want to hog all the memory and waste a lot of time tokenizing a huge text if only the first couple of tokens are needed:
def tokenizer (text, sep): try: while True: token = '' while text == sep: text = text[1:] index = 0 while text[index] != sep: token += text[index] index += 1 yield token text = text[index:] #print text, index except IndexError: if token != '': yield token
The tokenizer looks like a regular function. The only difference is that it uses the yield statement in a couple of places. That’s enough to make it a generator that maintains its state and can be resumed multiple times. The tokenizing algorithm is pretty simple:
- skip all the initial separators
- accumulate all non-separator characters in token
- yield the token when you encounter another separator
- truncate the text and go back to 1.
If the code reaches the end of the text and tries to access out of bounds characters an IndexError will be raised and the tokenizer will return the last token if it’s not empty. This is a common Python idiom to let exceptions serve as sentinels. Here is the main script that utilizes the tokenizer:
if __name__=='__main__': text1 = '123 456 789' g = tokenizer(text1, ' ') print g.next() print g.next() print g.next() print text2 = ' ABC DEF GHI ' for t in tokenizer(text2, ' '): print t
To iterate over the sequence of tokens generated by the tokenizer you can either call the next() method explicitly or use a for loop that does it implicitly until the sequence is exhausted. You can also use generators anywhere you would use any iterable.
The tokenizer is an amazing piece of programming lore. It can handle properly leading separators, terminating separators and sequences of separators. However, suppose while tokenizing you suddenly realize that instead of a space you actually want to the tokens separated by commas. Well, prior to Python 2.5 you would have been in deep trouble; you would have had to start the whole process again with comma as a separator. Generators were a launch-and-forget affair. Not anymore. Starting with Python 2.5 you can communicate with your generator while it’s working.
You communicate by calling the send() method of the generator. You can pass anything in the call to send. The generator will receive it as result of the yield call. It sounds a little confusing, but it’s just some unorthodox syntax. Here is a second version of the tokenizer that can accept a different separator while tokenizing:
def tokenizer2 (text, sep): try: while True: token = '' while text == sep: text = text[1:] index = 0 while text[index] != sep: token += text[index] index += 1 new_sep = (yield token) if new_sep != None: sep = new_sep text = text[index:] except IndexError: if token != '': yield token
The expression new_sep = (yield token) is where all the action is. If send() wasn’t called new_sep will be None and ignored in this case. If it wasn’t None then it really becomes the new separator that will be used to tokenize the rest of the text. Note that (yield token) is enclosed in parentheses. It is not always required, but the rules are pretty arcane. I recommend you stay on the safe side and always use parentheses.
The following code starts tokenizing with the space separator, but if it encounters the token ‘comma’ it sends ‘,’ to tokenizer2. I ran the same text with original tokenizer and tokenizer2. Here is the code and the results:
if __name__=='__main__': print '--- Plain tokenizer ---' text = 'Blah Blah comma ,Yeah,it,works!!!' for t in tokenizer(text, ' '): print t print print '--- Interactive tokenizer ---' g = tokenizer2(text, ' ') for t in g: print t if t == 'comma': g.send(',')
--- Plain tokenizer ---BlahBlahcomma,Yeah,it,works!!!--- Interactive tokenizer ---BlahBlahcommaYeahitworks!!!
Generators are very versatile they can (and are) used for lazy evaluation, efficient XML parsing, working with infinite sequences and more. The new interactive send() empowers generators to implement co-routines. Co-routines are resumable functions and can be used to implement cooperative multi-tasking. This is very useful in many domains; simulations, asynchronous network programming and some algorithms are expressed better as co-routines.
Conditional expressions brings the equivalent of C/C++’s ternary operator to Python. In C/C++ you can write:
float ratio = x > y ? x/y : y/x;
and crash if one of them is positive and the other is zero :-).
In Python you had to write:
if x > y: ratio = x/yelse: ratio = y/x
and get an unhandled ZeroDivisionError if one of them is positive and the other is zero :-)You could also use the infamous and/or idiom which took advantage of boolean short circuiting. It looks like this:
result = condition and true_value or false value
In this case:
ratio = x > y and x/y or y/x
This idiom works sometimes. It fails when the middle expression evaluates to boolean False (which might happen in this case if x is 0 and y is negative). I never took to this idiom because it didn’t feel right. The C/C++ ternary operator is both concise and symmetric. The and/or idiom just looks funny and it’s brittle.Python 2.5 introduces conditional expressions. The above example looks like:
ratio = x/y if x > y else y/x
What? Yes, the condition is in the middle:
result = true_value if condition else false_value
The benefit of this syntax is that you can read it just like English, which is a quality I like in a language. The official excuse/explanation is that in the standard library most conditional expressions evaluate 90 percent of the time to True so the true_value should dominate. I (and many other people) find it odd. It may grow on me. I like it better than the dreaded and/or idiom. At least I can read it and understand what’s going on. It’s yet another nice arrow in Python’s quiver.