A Brief Rundown of Changes and Additions in Python 3.1

A Brief Rundown of Changes and Additions in Python 3.1

he previous articles in this series (see the Related Resources section in the left column) covered the releases of Python 3.0 and Python 2.6. Despite the relative youth of these versions, the Python core developers have already created Python 3.1, which was released on June 27, 2009—less than seven months after the release of Python 3.0. While the 3.1 release has much smaller scope than Python 3.0, it still brings several interesting features, additions—and everybody’s favorite—performance improvements!

Core Language Changes

I’ll cover changes to the core language first, and then move on to changes in the standard library and performance improvements.

String Formatting

One welcome feature is the ability to auto-number format fields. Formatting strings is a very common operation in many programs. Python 2.x has the [s]printf-like percent operation:

>>> '%s,  %s!' % ('Hello', 'World')'Hello,  World!'

Python 3.0 added advanced string formatting capabilities (PEP-3101) modeled after C#’s format syntax:

>>> '{0},  {1}!'.format('Hello', 'World')'Hello,  World!'

This is better for many reasons (see the Advanced String Formatting topic in this earlier article), but Python 3.1 improves it further. In Python 3.0 you had to specify the index of each positional argument whenever you referred to them in the format string. In Python 3.1 you can simply drop the index and Python will populate the arguments in sequence:

>>> '{},  {}!'.format('Hello', 'World')'Hello,  World!'

PEP-378: Format Specifier for Thousands Separator

In financial applications, a thousands separator is the norm. Bankers and accountants don’t write “You owe me $12345678,” but rather “You owe me $12,345,678,” with commas (or another character) as separators. Here’s how you achieve that in Python:

>>> format(12345678, ',')'12,345,678'

You can combine it with other specifiers. The width specifier (8 in the example below) includes the commas and decimal point:

>>> format(12345.678, '8,.1f')'12,345.7'

A comma is the default separator character; if you want to use a different separator you’ll need to substitute the character you prefer using replace:

>>> format(1234, ',').replace(',', '_')'1_234'

Of course, you can also use the format function as a string method:

>>> '{0:8,.1f}'.format(123.456)'   123.5'

This seems like a minor addition to me; basically it simply adds one more display specifier to the format function that still doesn’t handle the more difficult cases of formatting content for different locales; that remains your responsibility. Still, the addition got its own PEP that encouraged a lively discussion, with at least two proposals.

The maketrans Function

Together, the maketrans() and translate() functions let you replace a set of characters with a different set. Although I have never used maketrans()/translate() in a real application, I assume that they’re highly efficient. Using the functionality is a little cumbersome, because it requires that you build a translation table using maketrans() that maps input characters to output characters. You then pass the resulting translation table to the translate() function. The string module still has its own maketrans() function, but that has been deprecated in Python 3.1 in favor of separate maketrans() functions that operate on bytes, bytearrays, and str.

Here’s an example that demonstrates how to use maketrans() and translate() with a bytes object. Note that the translation table for bytes has 256 entries (one for each possible byte), and this example maps most bytes to themselves—the exceptions are 1, 2, and 3, which the table maps to 4, 5 and 6 respectively:

>>> tt = bytes.maketrans(b'123', b'456')>>> len(tt)256>>> ttb'x00x01x02x03x04x05x06x07x08	
x0ex0fx10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f !"#$%&'()*+,-./0456456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7fx80x81x82x83x84x85x86x87x88x89x8ax8bx8cx8dx8ex8fx90x91x92x93x94x95x96x97x98x99x9ax9bx9cx9dx9ex9fxa0xa1xa2xa3xa4xa5xa6xa7xa8xa9xaaxabxacxadxaexafxb0xb1xb2xb3xb4xb5xb6xb7xb8xb9xbaxbbxbcxbdxbexbfxc0xc1xc2xc3xc4xc5xc6xc7xc8xc9xcaxcbxccxcdxcexcfxd0xd1xd2xd3xd4xd5xd6xd7xd8xd9xdaxdbxdcxddxdexdfxe0xe1xe2xe3xe4xe5xe6xe7xe8xe9xeaxebxecxedxeexefxf0xf1xf2xf3xf4xf5xf6xf7xf8xf9xfaxfbxfcxfdxfexff'

After you have the translation table you just pass it to the translate() function:

>>> b'123456'.translate(tt)b'456456'

You may also pass an additional argument that simply deletes characters:

>>> b'123456'.translate(tt, b'5')b'45646'

It’s interesting to see that the original 5 from 123456 was deleted, but the translated 5 (remember, the table translates 2s to 5s) wasn’t. That implies that translate first deletes the characters from the original string and then applies the translation.

Translating strings is a little different. Rather than a fixed translation table of all possible characters (remember, strings are Unicode now) the string version of maketrans returns a dictionary.

>>> tt = str.maketrans('123', '456'){49: 52, 50: 53, 51: 54}>>> '123456'.translate(tt)'456456'

Math-Related Changes

The 3.1 release also includes some math-related changes.

Int Gets a bit_length Method

The venerable int gained a bit_length method that returns the number of bits required to represent the int in binary form. For example the number 19 is 10011 in binary form, which requires 5 bits:

>>> int.bit_length(19)5>>> bin(19)'0b10011'

I’m not sure what it’s useful for, but maybe you can figure something out.

Rounding Floats

In Python 3.0 and earlier the round() function was a little inconsistent. If you provided no precision digits it always returned an int, but if you provided precision digits, it returned the type passed in:

>>> round(1000)1000>>> round(1000.0)1000>>> round(1000, 2)1000>>> round(1000.0, 2)1000.0

In Python 3.1 round() always returns an int if the input number is an integer—even if that integer is represented as a float (e.g. 1000.0):

>>> round(1000)1000>>> round(1000.0)1000>>> round(1000, 2)1000>>> round(1000.0, 2)1000

Floating Point Number Representation

Real numbers are represented in most of today’s hardware and operating systems in either 32 bits (single precision) or 64 bits (double precision) according to IEEE-754. However, that means some real numbers can’t be represented precisely. Due to the binary nature of computer storage, the best representation for some numbers with a concise decimal representation is not so concise in the floating point scheme (see this section of the Wikipedia Floating Point entry). For example, in 32-bits (single precision), the number 0.6 is represented as 0.59999999999999998:

>>> 0.60.59999999999999998

This is as accurate as possible, given the representation scheme, but isn’t user-friendly. Python 3.1 employs a new algorithm that looks for the most concise representation that keeps the original value intact. So in Python 3.1 the same input results in:

>>> 0.60.6

That’s fine until you hit another “gotcha” of floating number representation—arithmetic operations. For example, what is the value of the expression 0.7 + 0.1 in 32-bit floating point representation? If you thought it was 0.79999999999999993 you were spot on. Now, what is the value of the number 0.8? That’s right, 0.80000000000000004. But those results imply that 0.7 + 0.1 is not equal to 0.8, which can lead to some pretty nasty bugs. As an example, this innocent looking while loop will never stop:

>>> x = 0.0>>> while x != 1.0:...   print(repr(x))...   x += 0.1Output:00.100000000000000010.200000000000000010.300000000000000040.400000000000000020.50.599999999999999980.699999999999999960.799999999999999930.899999999999999910.999999999999999891.09999999999999991.21.31.40000000000000011.50000000000000021.6000000000000003...

In Python 3.0 the repr() function returns the actual representation. In Python 3.1 it returns the concise representation. In both Python 3.0 and Python 3.1 the print() function prints the concise representation:

>>> print(0.1)0.1>>> print(0.10000000000000001)0.1
Author’s Note: For cross-platform compatibility, the text pickle protocol still uses the actual representation.

Python also has a module called decimal for precise real number representation. It’s slower then floating point numbers and uses a different representation scheme, but it can represent real numbers with as many digits as available memory allows—and it doesn’t suffer from rounding errors when doing arithmetic. In Python 3.0, the Decimal type gained a new method that initialized the value from a string; Python 3.1 adds another new method, from_float(), that accepts a float. Note, that even when using from_float(), the decimal module uses higher precision than 32-bits.

>>> from decimal import Decimal>>> Decimal.from_float(0.1)Decimal('0.1000000000000000055511151231257827021181583404541015625')

Improved with Statement

The with statement, which helps guarantee timely release of resources, was introduced in Python 2.5 as a __future__ feature, and officially brought into the language in Python 3.0. Python 3.1 extends its reach to support multiple resources in the same statement. The most common case is probably opening input and output files and closing both when the processing completes. In Python 3.0 you either had to use nested with statements or explicitly close at least one of the files. Here’s a Python 3.0 example that opens an input file, reads its contents as a string, title-cases the contents (using the string’s title() method), and writes the result to an output file.

The example contains two nested with statements. Note the last line of the nested with block. When the code is trying to read form out.txt the result is empty, because the file is buffered and nothing has been written yet. When the with block completes, Python closes the files, so the last line (after the nested with block) asserts that the contents of out.txt is indeed the capitalized text

open('in.txt', 'w').write('abc def')with open('in.txt') as in_file:  with open('out.txt', 'w') as out_file:    text =    assert text == 'abc def'    text = text.title()    assert text == 'Abc Def'    out_file.write(text)    assert open('out.txt').read() == ''assert open('out.txt').read() == 'Abc Def'

While not bad, the nested with statements are a little annoying. The intention here is to open two files and close them when the processing is done. (If you needed to open three files (e.g. for a three-way merge) you would need three nested with statements.) Python 3.1 lets you open both files using a single with statement:

with open('in.txt') as in_file, open('out.txt', 'w') as out_file:  text =  assert text == 'abc def'  text = text.title()  assert text == 'Abc Def'  out_file.write(text)  assert open('out.txt').read() == ''assert open('out.txt').read() == 'Abc Def'

Another Python 3.1 improvement is that the gzip.GzipFile and bz2.BZ2File now support the context manager protocol, and can be used in a with statement. These are compressed file formats. Here’s a code sample that stores 5000 bytes in both a gzip file and a bz2 file and prints the sizes. It takes advantage of a few additional Python 3 features, such as the nice stat result with named attributes (unlike the raw tuple in Python 2.x) and advanced string formatting.

from bz2 import BZ2Filefrom gzip import GzipFileimport oswith GzipFile('1.gz', 'wb') as g, BZ2File('1.bz2', 'wb') as b:  g.write(b'X' * 5000)  b.write(b'X' * 5000)for ext in ('.gz', '.bz2'):  filename = '1' + ext  print ('The size of the {0} file is {1.st_size} bytes'.format(ext, os.stat(filename)))Output:  The size of the .gz file is 43 bytesThe size of the .bz2 file is 45 bytes

Standard Library Changes

Python 3.1 also includes several changes to the standard library, described below.

PEP-372: Ordered Dictionaries

The major new addition is an ordered dictionary class, which got its own PEP. When you iterate over an ordered dict, you get a list of keys and values in the same order in which they were inserted, which is often desirable. As an illustration, here’s some code that shows the difference between an ordered dict and a regular dict:

>>> items = [('a', 1), ('b', 2), ('c', 3)]>>> d = dict(items)>>> d{'a': 1, 'c': 3, 'b': 2}>>> from collections import OrderedDict>>> od = OrderedDict(items)>>> odOrderedDict([('a', 1), ('b', 2), ('c', 3)])>>> list(d.keys())['a', 'c', 'b']>>> list(od.keys())['a', 'b', 'c']

As, you can see the ordered dict preserves the initial item order, while the standard dict doesn’t. However, I was a little surprised to find out that if you populate the dictionary with named arguments rather than key/value pairs, it does not maintain the order. I would even consider that behavior a bug, because using named arguments is a perfectly valid way to initialize a dictionary, and the items have a clear order (left to right) just like the first example with the items list:

>>> d = dict(a=1, b=2, c=3)>>> d{'a': 1, 'c': 3, 'b': 2}>>> od = OrderedDict(a=1, b=2, c=3)>>> odOrderedDict([('a', 1), ('c', 3), ('b', 2)])

The Counter Class

The new Counter class in the collections module is a dictionary that keeps track of how many times an object occurs in a collection.

>>> import collections>>> x = [1, 1, 2, 3, 4, 5, 4, 4, 6, 4]>>> c = collections.Counter(x)>>> c = collections.Counter(x)>>> cCounter({4: 4, 1: 2, 2: 1, 3: 1, 5: 1, 6: 1})

The class supports the typical set of dict methods: keys(), values() and items() for accessing its contents; however, the update() method differs from a regular dict update(). It accepts either a sequence or a mapping whose values are integers. If you use a sequence, it counts the elements and adds their count to the existing counted items. For a mapping it adds the count of each object in the mapping to the existing count. The following code updates the Counter class initialized in the preceding example:

>>> c.update([3, 3, 4])>>> cCounter({4: 5, 3: 3, 1: 2, 2: 1, 5: 1, 6: 1})>>> c.update({2:5})>>> cCounter({2: 6, 4: 5, 3: 3, 1: 2, 5: 1, 6: 1})>>> c.update({2:5})>>> cCounter({2: 11, 4: 5, 3: 3, 1: 2, 5: 1, 6: 1})

The Counter class also has a couple of special methods. The elements() method returns all the elements in the original collection grouped together and sorted by value (not frequency):

>>> list(c.elements())[1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,  3, 3, 3, 4, 4, 4, 4, 4, 5, 6]

The most_common() method returns object:frequency pairs, sorted by the most common object.

>>> c.most_common()[(2, 11), (4, 5), (3, 3), (1, 2), (5, 1), (6, 1)]

If you pass an integer N to most_common, it returns only the N most common elements. For example, given the Counter object from the preceding examples, the number 2 appears most often:

>>> c.most_common(1)[(2, 11)]

Improvements to the itertools Module

The itertools module lets you work with infinite sequences and draws inspiration from Haskell, SML and APL. But it’s also useful for working with finite sequences. In Python 3.1 it received two new functions: combinations_with_replacement() and compress().

The combinations() function returns sub-sequences of the input sequence in lexicographic order without repetitions (based on position in the input sequence, not value). The new combinations_with_replacement() function allows repetition of the same element, as the following code sample demonstrates:

from itertools import *print(list(combinations([1, 2, 3, 4], 3)))print('-' * 10)print(list(combinations_with_replacement(['H', 'T'], 5)))Output:[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]----------[('H', 'H', 'H', 'H', 'H'), ('H', 'H', 'H', 'H', 'T'),  ('H', 'H', 'H', 'T', 'T'), ('H', 'H', 'T', 'T', 'T'),  ('H', 'T', 'T', 'T', 'T'), ('T', 'T', 'T', 'T', 'T')]

Note that in both functions each sub-sequence is always ordered.

The compress() function allows you to apply a mask to a sequence to select specific elements from the sequence. The function returns when either the sequence or the selectors mask is exhausted. Here’s an interesting example that uses both compress() and count() to generate an infinite stream of integers, map() to apply a lambda function (+1) to the elements of count(), and chain(), which chains two iterables together. The result stream is very similar to the non-negative integers, except that 1 appears twice. I’ll let you guess what the compress() function selects out of this input stream:

from itertools import *selectors = [1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0,    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]sequence = chain(iter([0, 1]), map(lambda x: x+1, count()))print(list(compress(sequence, selectors)))Output:[0, 1, 1, 2, 3, 5, 8, 13, 21]


Python has a powerful and flexible industrial-strength logging module that supports logging messages at different levels to arbitrary target locations that include memory, files, network, and console. Using it requires a certain amount of configuration. Libraries that want to provide logging can either configure themselves by default or require users to configure them. If, as library developer, you require users to configure logging, you’re likely to annoy users who don’t care about logging. However, if your library configures itself what should the configuration settings be?

There are two common options: log to a file or log to the console. Both options cause clutter. Until Python 3.1 best practice required the library developer to include a small do-nothing handler and configure its logger to use this handler. Python 3.1 provides such a NullHandler as part of the logging module itself.

Here’s a logging scenario: Suppose you have the following library code in a module called It has an init() function that accepts a logging handler, but defaults to the new NullHandler. It then sets the logger object to use the provided logger (or the default one). A logging handler is an object that determines where the output of the logger should go. The example function a_function_that_uses_logging() calls the global logger object and logs some funny messages:

import logginglogger = Nonedef init(handler=logging.NullHandler()):  global logger  logger = logging.getLogger('Super-logger')  logger.setLevel(logging.INFO)  logger.addHandler(handler)def a_function_that_uses_logging():'The capital of France is Paris')  logger.debug('Don't forget to fix a few bugs before the release')  logger.error('Oh, oh! something unpalatable occurred')  logger.warning('Mind the gap')  logger.critical('Your code is a mess. You really need to step up.')

The next bit of application code configures a rotating file handler. This is a sophisticated handler for long-running systems that generate large numbers of logged messages. The handler limits the amount of logging info in each file, and also saves a pre-set number of backup files. These restrictions ensure that the log files never exceed a given size, and that the latest logging info (up to the limit) is always preserved.

For example purposes, the code configures the handler to store only 250 bytes in each log file and maintain up to 5 backup files. It then invoke the venerable a_function_that_uses_logging().

import loggingimport logging.handlersfrom lib import a_function_that_uses_logginglog_file = 'log.txt'handler = logging.handlers.RotatingFileHandler(   log_file, maxBytes=250, backupCount=4)init(handler)for i in range(4):  a_function_that_uses_logging()

Here’s what I found in my current directory after running this code. The handler created a rotating log file (log.txt), with four backups because the example allowed only 250 bytes in each file.

~/Documents/Articles/Python 3.1/ >  log.txt  log.txt.1  log.txt.2  log.txt.3  log.txt.4

To view the contents of those files I simply concatenated them:

~/Documents/docs/Publications/DevX/Python 3.0/Article_6 > cat log.*Mind the gapYour code is a mess. You really need to step up.Your code is a mess. You really need to step up.The capital of France is ParisOh, oh! something unpalatable occurredMind the gapYour code is a mess. You really need to step up.The capital of France is ParisOh, oh! something unpalatable occurredThe capital of France is ParisOh, oh! something unpalatable occurredMind the gapYour code is a mess. You really need to step up.The capital of France is ParisOh, oh! something unpalatable occurredMind the gap

This works well, but sometimes users don’t care about the logged messages—they just want to invoke the function without having to configure the logger, and they need it to work in a way that will not cause the disk to run out of space or the screen to be filled with messages. That’s where the NullHandler class comes in. The next bit of code does the same thing as the preceding example, but doesn’t configure a logging handler and gets no logging artifacts. Note how much ceremony went away; there are no imports for logging and logging.handlers, and no hard decisions about which handler to use or how to configure it.

init()for i in range(3):  a_function_that_uses_logging()

Version Info

Python 3.1 uses the named tuple construct introduced in Python 3.0 to make the version info more readable. In Python 2.5:

>>> import sys>>> sys.version_info(2, 5, 4, 'final', 0)

In Python 3.0:

>>> import sys>>> sys.version_info(3, 0, 1, 'final', 0)

In Python 3.1:

>>> import sys>>> sys.version_infosys.version_info(major=3, minor=1, micro=0,    releaselevel='final', serial=0)

Pickling of Partial Functions

Partial functions are one of my favorite functional features. They allow you take a function that accepts X arguments, make some of the arguments fixed (static), and get a new function that accepts only the arguments you didn’t specify. A trivial example is an add() function that accepts two arguments, adds them, and returns the result. Now, if you fix one of the arguments to be 5, you end up with a new function that accepts only one argument, adds 5 to it and returns the result:

from functools import partialdef add(a, b):  return a + badd5 = partial(add, 5)assert add5(8) == 13

Partial functions are very useful when working with APIs that require arguments that are always the same in your use case. Consider a web API that requires a username and a password in each method signature. If you create partial functions that fix the username and password your life will be much easier, because you don’t have to pass the arguments around. Arguably, your code will be safer too, because the user name and password will not show up in every call site.

However, partial functions had an annoying limitation up to Python 3.1. They could not be pickled. Python 3.1 addressed this issue. Here’s an example:

import picklefrom functools import partialdef add(a, b):  return a + bs = pickle.dumps(partial(add, 10))add10 = pickle.loads(s)assert add10(8) == 18

This code snippet passes under Python 3.1, but fails under Python 3.0 and earlier with the following error:

Traceback (most recent call last):  File "", line 12, in     s = pickle.dumps(partial(add, 10))  File "/Library/Frameworks/Python.framework/Versions/     2.5/lib/python2.5/", line 1366, in dumps     Pickler(file, protocol).dump(obj)  File "/Library/Frameworks/Python.framework/Versions/     2.5/lib/python2.5/", line 224, in dump  File "/Library/Frameworks/Python.framework/Versions/     2.5/lib/python2.5/", line 306, in save     rv = reduce(self.proto)  File "/Library/Frameworks/Python.framework/Versions/     2.5/lib/python2.5/", line 69, in _reduce_ex     raise TypeError, "can't pickle %s objects" % base.__name__TypeError: can't pickle partial objects

Pickled functions (and partial functions) are all the rage if you use the processing module for parallel programming. This module, which has been part of the standard library since Python 2.6, is the best Python solution for taking advantage of the power of today’s multi-core machines. Under the covers, the processing module pickles everything it sends between processes, so picklable partial functions increase the expressive power and the tools available to you.

Unit Test Improvements

Python has a standard unittest module that you use to write xUnit-style tests. You can reuse setup/teardown code, organize your tests in suites (collections of tests), and even run your tests. Here’s a unit test for the add5() partial function. The TestAdd5 class is derived from unittest.TestCase and defines a setUp() method called before executing each test method. It ensures that some consistent state is available to every test method. The test methods call unittest’s assertEqual() and assert_() methods. If any call fails, the hosting test method considers that a failure, and moves on to the next test.

import unittestfrom functools import partialdef add(a, b):  return a + badd5 = partial(add, 5)class TestAdd5(unittest.TestCase):  def setUp(self):    self.values = range(1, 10)  def test_positive(self):    for v in self.values:      self.assertEquals(add5(v), v + 5)  def test_negative(self):    for v in self.values:      self.assertEquals(add5(-v), 5 - v)  def test_zero(self):    self.assert_(add5(0) == 5)if __name__ == '__main__':    unittest.main()

In this case, unittest.main() runs when the module is run, locates all the test classes (just one in this case), runs their test methods and reports the results:

...-------------------------------------------------Ran 3 tests in 0.000sOK
Author’s Note: You can see that each test method is really a separate test case that can pass or fail independently, so it is somewhat of a misnomer to call the test class the unittest.TestCase.

Python 3.1 added the capability to skip tests and to mark tests as “expected to fail.” Skipping tests is useful in many scenarios. For example, you may want to save some time and run only the tests you are actively working on at the moment, or you may want to skip some platform specific tests if you run on a different platform (yes, you can conditionally skip tests). Expected failures are useful when you’re working in test-first mode, but need to check-in your code before the new functionality works. In this case, the test for the new functionality is expected to fail until you fix the code or implement the new feature. Here’s a smarter version of add5 that lets it to operate on strings that contain numbers such as 3. The new version contains an added test_string method, which takes each value from the sequence, turns it into a string, and then feeds it to add5:

def test_string(self):   for v in self.values:      self.assertEquals(add5(str(v)), v + 5)

Running the test now results in an error as expected:

> python3.1 test_string (__main__.TestAdd5)---------------------------------------------Traceback (most recent call last):  File "", line 26, in test_string    self.assertEquals(add5(str(v)), v + 5)  File "", line 4, in add    return a + bTypeError: unsupported operand type(s) for +: 'int' and 'str'----------------------------------------------------------------Ran 4 tests in 0.001s

Now, let’s skip the zero test and allow the test_string method to fail:

  @unittest.skip("skipping this guy")  def test_zero(self):    self.assert_(add5(0) == 5)  @unittest.expectedFailure()  def test_string(self):    for v in self.values:      self.assertEquals(add5(str(v)), v + 5)

Now, the tests run successfully, reporting the skipped and expected failures:

> python3.1 4 tests in 0.001sOK (skipped=1, expected failures=1)

Another new unittest feature is that you can use assertRaises as a context manager:

with self.assertRaises(ImportError):  import no_such_module

In Python 3.0 and earlier you had to wrap the code in yet another function and pass it to assertRaises:

def import_no_such_module():  import no_such_module  self.assertRaises(ImportError, import_no_such_module) 

In addition many new assertXXX() functions have been added. Check the documentation for all the details.

Performance Improvements

Python 3.0 was all about getting all the PEPs right. A big part of the development effort of Python 3.1 was focused on performance and it shows.

I/O library Implemented in C

The Python 3.0 boasted a new I/O library that was implemented in Python. Its performance was…pretty bad, as expected. In Python 3.1 the library was re-implemented in C, and is supposed to be much faster (2 to 20 times). I wrote a little program to write 5,000,000 bytes to a file 10 times, then calculate the average time after throwing away the slowest and fastest run. I ran it under Python 2.5, 2.6, 3.0 and 3.1. (This also functions as an example of how to write code that works on all versions using a version check.) Note the hack with the exec() function so the code can use a bytes literal, which—if written directly in code—will fail in the Python 2.x interpreter even if it’s not executed:

from __future__ import with_statementimport sysimport timeif sys.version_info[0] == 3:  exec("c = b'X'")else:  c = 'X'def test_write_speed():  start = time.time()  with open('1.txt', 'wb') as f:    for i in range(5000000):      f.write(c)  end = time.time() - start  print (end)  return endtimes = [test_write_speed() for i in range(10)]times.remove(max(times))times.remove(min(times))print('Average:', sum(times) / len(times))

Here are the average times (in seconds):

  • Python 2.5 – 3.0146874487400055
  • Python 2.6 – 4.4676837027072906
  • Python 3.0 – 33.0755852461
  • Python 3.1 – 5.7733258903

The results are both interesting and somewhat disconcerting. For this basic I/O task of writing bytes one by one to a file, there are clear differences between various Python versions. Python 3.0 is understandably much slower, but Python 2.6 was 50% slower than Python 2.5, while Python 3.1 required nearly twice as much time as Python 2.5 to complete the same task.

I then tried the same test, but opened the file as a text file (‘w’ instead of ‘wb’) and wrote the string ‘1’ for Python 3.0/3.1 rather than writing bytes:

...  with open('1.txt', 'w') as f:    for i in range(5000000):      f.write('X')...

Here are the average times (in seconds):

  • Python 2.5 – 3.1337025165557861
  • Python 2.6 – 2.9250392615795135
  • Python 3.0 – 68.4243619442
  • Python 3.1 – 3.43869066238

What can you learn from that? First of all, Python 3.0 performance on this task is abysmal, and takes twice as long to write characters rather than bytes. Overall it’s about twenty times slower than Python 3.1. Python 2.5, 2.6, and 3.1 all perform roughly the same.

Character Decoding

Unicode processing definitely improved a lot between Python 2.x and Python 3.0. The following program encodes and decodes a buffer that contains 1,000,000 instances of the Hebrew word “shalom” (meaning “peace”) to and from UTF-8 and UTF-16. The total size of the buffer is five million bytes.

from __future__ import with_statementimport sysimport timedef test_encode_decode():  shalom = ' u05ddu05d5u05dcu05e9'  text = shalom * 1000000  start = time.time()  text_utf8 = text.encode('utf-8')  text_utf16 = text.encode('utf-16')  assert text_utf8.decode() == text  assert text_utf16.decode('utf-16') == text  end = time.time() - start  print (shalom, end)  return endtest = test_encode_decodeif __name__=='__main__':  times = [test() for i in range(10)]  times.remove(max(times))  times.remove(min(times))  print('Average:', sum(times) / len(times))

I ran this program as usual under Python 2.5, 2.6, 3.0 and 3.1, with these results:

  • Python 2.5 – 1.6552573442459106
  • Python 2.6 – 1.6100345551967621
  • Python 3.0 – 0.280230671167
  • Python 3.1 – 0.205590486526

Python 2.5 and 2.6 both run this code at about the same speed; however, Python 3.0 is significantly faster (5-6 times faster), while Python 3.1 is about eight times faster than Python 2.X and about 40% faster than Python 3.0.

JSON Improvements

The json module acquired a C extension in Python 3.1, which increased its performance dramatically. The following program creates a nested data structure consisting of a list of dictionaries that contain lists of other dictionaries that hold some basic values. The program serializes the entire list to JSON and back. Listing 1 shows the basic data structure (repeated 100 times):

Here’s the program that acts on the data in Listing 1:

from __future__ import with_statementimport sysimport timeimport jsondef test_json():  x = dict(a=1, b='BBBB', c=4.56)  x6 = 6 * [x]  y = dict(z=x6, zz=2 * x6, zzz=3 * x6)  print (y)  sys.exit()  o = 100 *[y]  start = time.time()  j = json.dumps(o)  assert json.loads(j) == o  end = time.time() - start  return endtest = test_jsonif __name__=='__main__':  times = [test() for i in range(10)]  times.remove(max(times))  times.remove(min(times))

Python 2.5 doesn’t have a standard json module, so here are the results for Python 2.6, Python 3.0, and Python 3.1:

  • Python 2.6: 0.58422702550888062
  • Python 3.0: 0.580562502146
  • Python 3.1: 0.0455559492111

These results show that there is virtually no difference between Python 2.6 and Python 3.0 (they use the same module and the language changes don’t seem to have any impact). Python 3.1 is more than an order of magnitude faster. This is significant, because JSON is the lingua franca of web services, and if your web service happens to receive or return large amounts of JSON data the encoding/decoding can take a significant portion of the time required to process each request.

Yet another change is that the JSON module works only with str (the Python 3 Unicode string); it no longer works with bytes.

Pickle attribute interning

The pickle module now interns attribute names of pickled objects. That means that if you pickle many identical objects they all have the same attribute names. Therefore, instead of storing the same strings (attribute names) multiple times for each object you can just keep a table containing the all attribute names and store an index for each attribute (or you may store only dynamic attributes that were added or removed from the standard set of attribute names per object). The supposed benefit is smaller pickles, which means faster loading (unpickling).

The test program in Listing 2 defines a class A with three very long attribute names and then creates a list that contains 100000 dictionaries. Each dictionary has a long key and an A object as its value. It then pickles the entire list to a file and then unpickles it, keeping track of the time required:

The pickle size was 200359 bytes for both Python 3.0 and 3.1. The times were:

  • Python 3.0 – 1.29865017533
  • Python 3.1 – 0.112466335297

Again, this is an order of magnitude improvement. I tried the same program with short attribute names (just a, b and c) and a short dictionary key (just x)—and I got the same execution times, so I’m not sure how the interning helps.

Miscellaneous Changes

I’ll mention several other performance improvements because it’s difficult to measure their impact.

  • Tuples and dicts containing only untrackable objects are no longer tracked by the garbage collector.
  • A new configuration option --with-computed-gotos. This causes the bytecode evaluation loop to use a new dispatch mechanism that may speed it up by 20% (not available on all compilers).
  • Integers occupied 15 bits in previous versions, but now they can be either 15 bits or 30 bits. The 30-bit representation is much faster on 64-bit systems, but on 32-bit systems the results are unclear. So, the default is 30 bits on 64-bit systems and 15 bits on 32-bit systems. You can use another new configure option for Unix, called --enable-big-digits, to override this default.

Python 3 Library Migration: State of the Union

As you may have heard, Python 3 was a controversial release due to its lack of backward compatibility with Python 2.x. The Python development team did a great job of making it easy to migrate from Python 2.x to Python 3.x, but they couldn’t port all the third-party libraries out there. Unfortunately, that’s a major issue for many projects. So, there is a chicken and egg problem here. Library developers won’t be motivated to port their libraries to Python 3 until their users demand it. But the users must wait for all the libraries they depend on to be ported to Python 3 before they can port their projects.

The Python package index contains about 5000 packages in general, and about 50 packages specifically for Python 3. That’s just one percent, but you could argue that many of the 5000 are dead packages that nobody uses or maintains. There are also some “hub” packages used by many projects, so those are the key packages to port. I think it’s safe to say that Python 3 development has not taken the world by storm just yet. Many of the key “hub” packages such as numpy, PIL and twisted haven’t yet been ported to Python 3 at the time of this writing. That’s not surprising, because important packages are usually big and complex, so porting them requires a serious effort.

Overall, A Production-Worthy Release

As you’ve seen, there are a number of important, convenient, and performance-oriented changes in Python 3.1. This release demonstrates again how solid the Python language is and how dependable its developers and community are. It is a very balanced release, with improvements to both the core language and the standard library. The performance improvements (especially those for IO, json, and pickle) make it a serious production-worthy consideration if the third-party packages you need have already been ported.


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist