Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

A Parade of New Features Debuts in Python 2.5 : Page 5

Python 2.5 still has the smell of fresh paint but it's the perfect time to drill down on the most important new features in this comprehensive release. Read on for detailed explanations and examples of exception handling, resource management, conditional expressions, and more.


advertisement
Generator Enhancements
Generators (introduced in Python 2.2) are a great language feature. In Python 2.3 and 2.4 generators could just generate values. Starting with Python 2.5 generators can interact with your program for their entire lifetime.

First let's see what generators are all about and then I'll talk about the Python 2.5 enhancements. Put simply a generator is an iterator that invents the sequence it iterates as it goes. Generators use a special keyword 'yield' to return values to the outside world whenever their next() method is called (typically in a for loop). Consider a tokenizer that needs to tokenize a huge string (maybe coming from a file or a url). In this case Python's split is inappropriate because you don't want to hog all the memory and waste a lot of time tokenizing a huge text if only the first couple of tokens are needed:

def tokenizer (text, sep): try: while True: token = '' while text[0] == sep: text = text[1:] index = 0 while text[index] != sep: token += text[index] index += 1 yield token text = text[index:] #print text, index except IndexError: if token != '': yield token

The tokenizer looks like a regular function. The only difference is that it uses the yield statement in a couple of places. That's enough to make it a generator that maintains its state and can be resumed multiple times. The tokenizing algorithm is pretty simple:
  1. skip all the initial separators
  2. accumulate all non-separator characters in token
  3. yield the token when you encounter another separator
  4. truncate the text and go back to 1.
If the code reaches the end of the text and tries to access out of bounds characters an IndexError will be raised and the tokenizer will return the last token if it's not empty. This is a common Python idiom to let exceptions serve as sentinels. Here is the main script that utilizes the tokenizer:


if __name__=='__main__': text1 = '123 456 789' g = tokenizer(text1, ' ') print g.next() print g.next() print g.next() print text2 = ' ABC DEF GHI ' for t in tokenizer(text2, ' '): print t

Output:

123 456 789 ABC DEF GHI

To iterate over the sequence of tokens generated by the tokenizer you can either call the next() method explicitly or use a for loop that does it implicitly until the sequence is exhausted. You can also use generators anywhere you would use any iterable.

The tokenizer is an amazing piece of programming lore. It can handle properly leading separators, terminating separators and sequences of separators. However, suppose while tokenizing you suddenly realize that instead of a space you actually want to the tokens separated by commas. Well, prior to Python 2.5 you would have been in deep trouble; you would have had to start the whole process again with comma as a separator. Generators were a launch-and-forget affair. Not anymore. Starting with Python 2.5 you can communicate with your generator while it's working.

You communicate by calling the send() method of the generator. You can pass anything in the call to send. The generator will receive it as result of the yield call. It sounds a little confusing, but it's just some unorthodox syntax. Here is a second version of the tokenizer that can accept a different separator while tokenizing:

def tokenizer2 (text, sep): try: while True: token = '' while text[0] == sep: text = text[1:] index = 0 while text[index] != sep: token += text[index] index += 1 new_sep = (yield token) if new_sep != None: sep = new_sep text = text[index:] except IndexError: if token != '': yield token

The expression new_sep = (yield token) is where all the action is. If send() wasn't called new_sep will be None and ignored in this case. If it wasn't None then it really becomes the new separator that will be used to tokenize the rest of the text. Note that (yield token) is enclosed in parentheses. It is not always required, but the rules are pretty arcane. I recommend you stay on the safe side and always use parentheses.

The following code starts tokenizing with the space separator, but if it encounters the token 'comma' it sends ',' to tokenizer2. I ran the same text with original tokenizer and tokenizer2. Here is the code and the results:

if __name__=='__main__': print '--- Plain tokenizer ---' text = 'Blah Blah comma ,Yeah,it,works!!!' for t in tokenizer(text, ' '): print t print print '--- Interactive tokenizer ---' g = tokenizer2(text, ' ') for t in g: print t if t == 'comma': g.send(',')

Output:

--- Plain tokenizer --- Blah Blah comma ,Yeah,it,works!!! --- Interactive tokenizer --- Blah Blah comma Yeah it works!!!

Generators are very versatile they can (and are) used for lazy evaluation, efficient XML parsing, working with infinite sequences and more. The new interactive send() empowers generators to implement co-routines. Co-routines are resumable functions and can be used to implement cooperative multi-tasking. This is very useful in many domains; simulations, asynchronous network programming and some algorithms are expressed better as co-routines.

Conditional Expressions
Conditional expressions brings the equivalent of C/C++'s ternary operator to Python. In C/C++ you can write:

float ratio = x > y ? x/y : y/x;

and crash if one of them is positive and the other is zero :-).

In Python you had to write:

if x > y: ratio = x/y else: ratio = y/x

and get an unhandled ZeroDivisionError if one of them is positive and the other is zero :-) You could also use the infamous and/or idiom which took advantage of boolean short circuiting. It looks like this:

result = condition and true_value or false value

In this case:

ratio = x > y and x/y or y/x

This idiom works sometimes. It fails when the middle expression evaluates to boolean False (which might happen in this case if x is 0 and y is negative). I never took to this idiom because it didn't feel right. The C/C++ ternary operator is both concise and symmetric. The and/or idiom just looks funny and it's brittle. Python 2.5 introduces conditional expressions. The above example looks like:

ratio = x/y if x > y else y/x

What? Yes, the condition is in the middle:

result = true_value if condition else false_value

The benefit of this syntax is that you can read it just like English, which is a quality I like in a language. The official excuse/explanation is that in the standard library most conditional expressions evaluate 90 percent of the time to True so the true_value should dominate. I (and many other people) find it odd. It may grow on me. I like it better than the dreaded and/or idiom. At least I can read it and understand what's going on. It's yet another nice arrow in Python's quiver.



Gigi Sayfan specializes in cross-platform object-oriented programming in C/C++/C#/Python/Java with an emphasis on large-scale distributed systems. He is currently trying to build brain-inspired intelligent machines at Numenta.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date