Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

A Developer's Guide to Python 3.0: Package Installation and More Language Enhancements : Page 5

Explore Python 3.0's new support for per-user installations, an official with statement, property decorators, keyword-only arguments, dictionary changes, and C API changes.


advertisement

PEP-3106: Revamping dict.keys(), .values() and .items()

The dictionary (dict) is arguably the most useful and optimized data structure in Python. In Python 2.5 the .keys(), .values() and .items() methods return lists that contain the keys, values and pairs of key-values respectively. Usually, developers calling these methods intend to iterate over the result: they don't really need a list. The list return value is expensive because the keys/values/items must be read from the dictionary and copied into the list. The Python 2.5 dict also supports.iterkeys(), .itervalues() and .iteritems() methods that just return an iterator:

>>> d.iterkeys() <dictionary-keyiterator object at 0x74300> >>> d.itervalues() <dictionary-valueiterator object at 0x74380> >>> d.iteritems() <dictionary-itemiterator object at 0x743a0>



However, many developers end up using the more natural and shorter names exposed by the heavyweight methods (keys(), values() and items()).

Python 3.0 changes keys(), values() and items() so they return a lightweight set-like object, effectively making them behave like the old iter* methods, and removed the iter* methods themselves. In Python 3.0:

>>> d = {} >>> d.keys() <dict_keys object at 0xe9c50> >>> d.values() <dict_values object at 0xe9d10> >>> d.items() <dict_items object at 0xe9c50>

It is important to realize that all these iterators are completely unordered (not even necessarily in insertion order), but the keys, values and items are synchronized:

>>> d = dict(a=1, b=2, c=3) >>> for k in d: ... print(k) ... a c b >>> for v in d.values(): ... print(v) ... 1 2 3

If you do need a list you can simply wrap a list around the result of keys(), values() and items():

>>> d = dict(a=1, b=2, c=3) >>> list(d.keys()) ['a', 'c', 'b'] >>> list(d.values()) [1, 3, 2] >>> list(d.items()) [('a', 1), ('c', 3), ('b', 2)]

Here's a little experiment that explores the difference between keys() and iterkeys(). I populated a dictionary with five million entries, and then wrote a little function that—based on an argument (use_iter)—iterates over all the keys in the dictionary using either keys() or iterkeys(), and measures the elapsed time. I then ran the function five times each for both iterkeys() and keys(), and took the average. Here's the code:

# Iterate over all the keys using either keys() or iterkeys() def f(use_iter): func = d.iterkeys if use_iter else d.keys s = time.time() for k in func(): pass return time.time() - s # Run the test function 5 times with iterkeys() total = 0 for i in range(5): t = f(use_iter=True) print t total += t print 'Average time for iterkeys():', total / 5 # Run the test function 5 times with keys() total = 0 for i in range(5): t = f(use_iter=False) print t total += t print 'Average time for keys():', total / 5

The results showed that iterkeys() takes about 60% of the time it takes keys():

0.291584968567 0.298330068588 0.310563087463 0.319568157196 0.361966133118 Average time for iterkeys(): 0.316402482986 0.537712812424 0.525843858719 0.535580158234 0.569056034088 0.535542964935 Average time for keys(): 0.54074716568

That may sound like a big difference; however, think about it from a different perspective. If you need to iterate over five million dict entries, all you gain by using iterkeys() rather than keys() is 0.2 seconds. Given that any real-world code would do something with these five million items that probably requires some time, a two-tenths of a second gain doesn't seem all that impressive. To illustrate, I modified the function f so it simply sums up all the keys, which is a very cheap operation:

def f(use_iter): func = d.iterkeys if use_iter else d.keys s = time.time() sum = 0 for k in func(): sum += k return time.time() - s

This time, the average for iterkeys() was: 2.0865404129 and for keys(): 2.11347484589. So, the iterkeys() iteration now takes 98.7% of the duration of the keys() iteration. If you do some real work inside the loop, such as string manipulation, serious computation, or some IO, there is virtually no difference between the two.

I ran the same tests on Python 3.0 for the keys() method and it performed pretty much like the Python 2.5 iterkeys() method (maybe even a little faster if you throw away the extreme 0.54 result):

0.308724880219 0.272349119186 0.258269071579 0.259425878525 0.540471076965 Average time for keys(): 0.327848005295

Finally, I ran some more tests just on lookup. The idea is that if you want to determine whether a dict contains a certain value, you don't need to create a list of all the values and search it, you can just use the in set operation, which is very efficient. If you compare value lookups on Python 2.5 to value lookups on Python 3.0 you get amazing performance differences due to the required copy of all the values for each lookup. Here's a test function that performs 100 lookups on the now-familiar five million item dict:

def g(): s = time.time() for i in range(100): y = i in d.values() return time.time() - s total = 0 for i in range(5): t = g() print(t) total += t print('Average time for 100 lookups:', total / 5)

Python 2.5 averaged about 20 seconds, while Python 3.0 averaged less than a millisecond. This sounds fantastic until you try the lookup in Python 2.5 using itervalues()—and you get the same numbers as Python 3.0.

The conclusion is that this is a nice syntactic cleanup, but don't expect your code to become blazingly fast just because the common dict methods are now set-like iterators. That said, if you have code that performs many lookups on the keys or values of large dictionaries and you're using keys() and values(), you would be wise to rewrite the lookups using iterkeys() or itervalues().

Platform-Specific Changes (Windows, Mac OSX)

There were several interesting changes for Windows. Python 3.0 requires at minimum Windows 2000 Service Pack 4. Windows 95, 98, ME and NT 4 are no longer supported. In addition, the default compiler is now Visual Studio 2008 (Microsoft provides a free Express version). This is important if you build extension modules on Windows, because extension modules must be built using the same compiler that built the Python interpreter itself. Other interesting additions for writing cross-platform system administration are the functions os.path.expandvars() and os.path.expanduser(). These functions can use the tilda (~) as shorthand for the user's home directory and can access environment variables in the Windows %VAR% format:

>>> os.path.expanduser('~') 'c:\\Documents and Settings\\Gigi' >>> os.path.exapnsvars('%USERNAME%') Gigi

The Mac OS X port mostly eliminated old modules.

The next and final article in this series covers Python 2.6 and porting code from Python 2.x to 3.0.



Gigi Sayfan specializes in cross-platform object-oriented programming in C/C++/C#/Python/Java with an emphasis on large-scale distributed systems. He is currently trying to build brain-inspired intelligent machines at Numenta.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap