RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


A Brief Rundown of Changes and Additions in Python 3.1 : Page 3

Changes to the core language, the standard library, and some welcome performance improvements make Python 3.1 a balanced and worthwhile release.


Standard Library Changes

Python 3.1 also includes several changes to the standard library, described below.

PEP-372: Ordered Dictionaries

The major new addition is an ordered dictionary class, which got its own PEP. When you iterate over an ordered dict, you get a list of keys and values in the same order in which they were inserted, which is often desirable. As an illustration, here's some code that shows the difference between an ordered dict and a regular dict:

>>> items = [('a', 1), ('b', 2), ('c', 3)]
>>> d = dict(items)
>>> d
{'a': 1, 'c': 3, 'b': 2}
>>> from collections import OrderedDict
>>> od = OrderedDict(items)
>>> od
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> list(d.keys())
['a', 'c', 'b']
>>> list(od.keys())
['a', 'b', 'c']

As, you can see the ordered dict preserves the initial item order, while the standard dict doesn't. However, I was a little surprised to find out that if you populate the dictionary with named arguments rather than key/value pairs, it does not maintain the order. I would even consider that behavior a bug, because using named arguments is a perfectly valid way to initialize a dictionary, and the items have a clear order (left to right) just like the first example with the items list:

>>> d = dict(a=1, b=2, c=3)
>>> d
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict(a=1, b=2, c=3)
>>> od
OrderedDict([('a', 1), ('c', 3), ('b', 2)])

The Counter Class

The new Counter class in the collections module is a dictionary that keeps track of how many times an object occurs in a collection.

>>> import collections
>>> x = [1, 1, 2, 3, 4, 5, 4, 4, 6, 4]
>>> c = collections.Counter(x)
>>> c = collections.Counter(x)
>>> c
Counter({4: 4, 1: 2, 2: 1, 3: 1, 5: 1, 6: 1})

The class supports the typical set of dict methods: keys(), values() and items() for accessing its contents; however, the update() method differs from a regular dict update(). It accepts either a sequence or a mapping whose values are integers. If you use a sequence, it counts the elements and adds their count to the existing counted items. For a mapping it adds the count of each object in the mapping to the existing count. The following code updates the Counter class initialized in the preceding example:

>>> c.update([3, 3, 4])
>>> c
Counter({4: 5, 3: 3, 1: 2, 2: 1, 5: 1, 6: 1})
>>> c.update({2:5})
>>> c
Counter({2: 6, 4: 5, 3: 3, 1: 2, 5: 1, 6: 1})
>>> c.update({2:5})
>>> c
Counter({2: 11, 4: 5, 3: 3, 1: 2, 5: 1, 6: 1})

The Counter class also has a couple of special methods. The elements() method returns all the elements in the original collection grouped together and sorted by value (not frequency):

>>> list(c.elements())
[1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
 3, 3, 3, 4, 4, 4, 4, 4, 5, 6]

The most_common() method returns object:frequency pairs, sorted by the most common object.

>>> c.most_common()
[(2, 11), (4, 5), (3, 3), (1, 2), (5, 1), (6, 1)]

If you pass an integer N to most_common, it returns only the N most common elements. For example, given the Counter object from the preceding examples, the number 2 appears most often:

>>> c.most_common(1)
[(2, 11)]

Improvements to the itertools Module

The itertools module lets you work with infinite sequences and draws inspiration from Haskell, SML and APL. But it's also useful for working with finite sequences. In Python 3.1 it received two new functions: combinations_with_replacement() and compress().

The combinations() function returns sub-sequences of the input sequence in lexicographic order without repetitions (based on position in the input sequence, not value). The new combinations_with_replacement() function allows repetition of the same element, as the following code sample demonstrates:

from itertools import *
print(list(combinations([1, 2, 3, 4], 3)))
print('-' * 10)
print(list(combinations_with_replacement(['H', 'T'], 5)))
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]
[('H', 'H', 'H', 'H', 'H'), ('H', 'H', 'H', 'H', 'T'), 
 ('H', 'H', 'H', 'T', 'T'), ('H', 'H', 'T', 'T', 'T'), 
 ('H', 'T', 'T', 'T', 'T'), ('T', 'T', 'T', 'T', 'T')]

Note that in both functions each sub-sequence is always ordered.

The compress() function allows you to apply a mask to a sequence to select specific elements from the sequence. The function returns when either the sequence or the selectors mask is exhausted. Here's an interesting example that uses both compress() and count() to generate an infinite stream of integers, map() to apply a lambda function (+1) to the elements of count(), and chain(), which chains two iterables together. The result stream is very similar to the non-negative integers, except that 1 appears twice. I'll let you guess what the compress() function selects out of this input stream:

from itertools import *
selectors = [1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 
   0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]
sequence = chain(iter([0, 1]), map(lambda x: x+1, count()))
print(list(compress(sequence, selectors)))
[0, 1, 1, 2, 3, 5, 8, 13, 21]


Python has a powerful and flexible industrial-strength logging module that supports logging messages at different levels to arbitrary target locations that include memory, files, network, and console. Using it requires a certain amount of configuration. Libraries that want to provide logging can either configure themselves by default or require users to configure them. If, as library developer, you require users to configure logging, you're likely to annoy users who don't care about logging. However, if your library configures itself what should the configuration settings be?

There are two common options: log to a file or log to the console. Both options cause clutter. Until Python 3.1 best practice required the library developer to include a small do-nothing handler and configure its logger to use this handler. Python 3.1 provides such a NullHandler as part of the logging module itself.

Here's a logging scenario: Suppose you have the following library code in a module called lib.py. It has an init() function that accepts a logging handler, but defaults to the new NullHandler. It then sets the logger object to use the provided logger (or the default one). A logging handler is an object that determines where the output of the logger should go. The example function a_function_that_uses_logging() calls the global logger object and logs some funny messages:

import logging
logger = None
def init(handler=logging.NullHandler()):
  global logger
  logger = logging.getLogger('Super-logger')
def a_function_that_uses_logging():
  logger.info('The capital of France is Paris')
  logger.debug('Don\'t forget to fix a few bugs before the release')
  logger.error('Oh, oh! something unpalatable occurred')
  logger.warning('Mind the gap')
  logger.critical('Your code is a mess. You really need to step up.')

The next bit of application code configures a rotating file handler. This is a sophisticated handler for long-running systems that generate large numbers of logged messages. The handler limits the amount of logging info in each file, and also saves a pre-set number of backup files. These restrictions ensure that the log files never exceed a given size, and that the latest logging info (up to the limit) is always preserved.

For example purposes, the code configures the handler to store only 250 bytes in each log file and maintain up to 5 backup files. It then invoke the venerable a_function_that_uses_logging().

import logging
import logging.handlers
from lib import a_function_that_uses_logging
log_file = 'log.txt'
handler = logging.handlers.RotatingFileHandler(
   log_file, maxBytes=250, backupCount=4)
for i in range(4):

Here's what I found in my current directory after running this code. The handler created a rotating log file (log.txt), with four backups because the example allowed only 250 bytes in each file.

~/Documents/Articles/Python 3.1/ > ls
article.py  log.txt  log.txt.1  log.txt.2  log.txt.3  log.txt.4

To view the contents of those files I simply concatenated them:

~/Documents/docs/Publications/DevX/Python 3.0/Article_6 > cat log.*
Mind the gap
Your code is a mess. You really need to step up.
Your code is a mess. You really need to step up.
The capital of France is Paris
Oh, oh! something unpalatable occurred
Mind the gap
Your code is a mess. You really need to step up.
The capital of France is Paris
Oh, oh! something unpalatable occurred
The capital of France is Paris
Oh, oh! something unpalatable occurred
Mind the gap
Your code is a mess. You really need to step up.
The capital of France is Paris
Oh, oh! something unpalatable occurred
Mind the gap

This works well, but sometimes users don't care about the logged messages—they just want to invoke the function without having to configure the logger, and they need it to work in a way that will not cause the disk to run out of space or the screen to be filled with messages. That's where the NullHandler class comes in. The next bit of code does the same thing as the preceding example, but doesn't configure a logging handler and gets no logging artifacts. Note how much ceremony went away; there are no imports for logging and logging.handlers, and no hard decisions about which handler to use or how to configure it.

for i in range(3):

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date