Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

A Brief Rundown of Changes and Additions in Python 3.1 : Page 5

Changes to the core language, the standard library, and some welcome performance improvements make Python 3.1 a balanced and worthwhile release.


advertisement

Performance Improvements

Python 3.0 was all about getting all the PEPs right. A big part of the development effort of Python 3.1 was focused on performance and it shows.

I/O library Implemented in C

The Python 3.0 boasted a new I/O library that was implemented in Python. Its performance was…pretty bad, as expected. In Python 3.1 the library was re-implemented in C, and is supposed to be much faster (2 to 20 times). I wrote a little program to write 5,000,000 bytes to a file 10 times, then calculate the average time after throwing away the slowest and fastest run. I ran it under Python 2.5, 2.6, 3.0 and 3.1. (This also functions as an example of how to write code that works on all versions using a version check.) Note the hack with the exec() function so the code can use a bytes literal, which—if written directly in code—will fail in the Python 2.x interpreter even if it's not executed:



from __future__ import with_statement import sys import time if sys.version_info[0] == 3: exec("c = b'X'") else: c = 'X' def test_write_speed(): start = time.time() with open('1.txt', 'wb') as f: for i in range(5000000): f.write(c) end = time.time() - start print (end) return end times = [test_write_speed() for i in range(10)] times.remove(max(times)) times.remove(min(times)) print('Average:', sum(times) / len(times))

Here are the average times (in seconds):

  • Python 2.5 - 3.0146874487400055
  • Python 2.6 - 4.4676837027072906
  • Python 3.0 - 33.0755852461
  • Python 3.1 - 5.7733258903

The results are both interesting and somewhat disconcerting. For this basic I/O task of writing bytes one by one to a file, there are clear differences between various Python versions. Python 3.0 is understandably much slower, but Python 2.6 was 50% slower than Python 2.5, while Python 3.1 required nearly twice as much time as Python 2.5 to complete the same task.

I then tried the same test, but opened the file as a text file ('w' instead of 'wb') and wrote the string '1' for Python 3.0/3.1 rather than writing bytes:

... with open('1.txt', 'w') as f: for i in range(5000000): f.write('X') ...

Here are the average times (in seconds):

  • Python 2.5 - 3.1337025165557861
  • Python 2.6 - 2.9250392615795135
  • Python 3.0 - 68.4243619442
  • Python 3.1 - 3.43869066238

What can you learn from that? First of all, Python 3.0 performance on this task is abysmal, and takes twice as long to write characters rather than bytes. Overall it's about twenty times slower than Python 3.1. Python 2.5, 2.6, and 3.1 all perform roughly the same.

Character Decoding

Unicode processing definitely improved a lot between Python 2.x and Python 3.0. The following program encodes and decodes a buffer that contains 1,000,000 instances of the Hebrew word "shalom" (meaning "peace") to and from UTF-8 and UTF-16. The total size of the buffer is five million bytes.

from __future__ import with_statement import sys import time def test_encode_decode(): shalom = ' \u05dd\u05d5\u05dc\u05e9' text = shalom * 1000000 start = time.time() text_utf8 = text.encode('utf-8') text_utf16 = text.encode('utf-16') assert text_utf8.decode() == text assert text_utf16.decode('utf-16') == text end = time.time() - start print (shalom, end) return end test = test_encode_decode if __name__=='__main__': times = [test() for i in range(10)] times.remove(max(times)) times.remove(min(times)) print('Average:', sum(times) / len(times))

I ran this program as usual under Python 2.5, 2.6, 3.0 and 3.1, with these results:

  • Python 2.5 - 1.6552573442459106
  • Python 2.6 - 1.6100345551967621
  • Python 3.0 - 0.280230671167
  • Python 3.1 - 0.205590486526

Python 2.5 and 2.6 both run this code at about the same speed; however, Python 3.0 is significantly faster (5-6 times faster), while Python 3.1 is about eight times faster than Python 2.X and about 40% faster than Python 3.0.

JSON Improvements

The json module acquired a C extension in Python 3.1, which increased its performance dramatically. The following program creates a nested data structure consisting of a list of dictionaries that contain lists of other dictionaries that hold some basic values. The program serializes the entire list to JSON and back. Listing 1 shows the basic data structure (repeated 100 times):

Here's the program that acts on the data in Listing 1:

from __future__ import with_statement import sys import time import json def test_json(): x = dict(a=1, b='BBBB', c=4.56) x6 = 6 * [x] y = dict(z=x6, zz=2 * x6, zzz=3 * x6) print (y) sys.exit() o = 100 *[y] start = time.time() j = json.dumps(o) assert json.loads(j) == o end = time.time() - start return end test = test_json if __name__=='__main__': times = [test() for i in range(10)] times.remove(max(times)) times.remove(min(times))

Python 2.5 doesn't have a standard json module, so here are the results for Python 2.6, Python 3.0, and Python 3.1:

  • Python 2.6: 0.58422702550888062
  • Python 3.0: 0.580562502146
  • Python 3.1: 0.0455559492111

These results show that there is virtually no difference between Python 2.6 and Python 3.0 (they use the same module and the language changes don't seem to have any impact). Python 3.1 is more than an order of magnitude faster. This is significant, because JSON is the lingua franca of web services, and if your web service happens to receive or return large amounts of JSON data the encoding/decoding can take a significant portion of the time required to process each request.

Yet another change is that the JSON module works only with str (the Python 3 Unicode string); it no longer works with bytes.

Pickle attribute interning

The pickle module now interns attribute names of pickled objects. That means that if you pickle many identical objects they all have the same attribute names. Therefore, instead of storing the same strings (attribute names) multiple times for each object you can just keep a table containing the all attribute names and store an index for each attribute (or you may store only dynamic attributes that were added or removed from the standard set of attribute names per object). The supposed benefit is smaller pickles, which means faster loading (unpickling).

The test program in Listing 2 defines a class A with three very long attribute names and then creates a list that contains 100000 dictionaries. Each dictionary has a long key and an A object as its value. It then pickles the entire list to a file and then unpickles it, keeping track of the time required:

The pickle size was 200359 bytes for both Python 3.0 and 3.1. The times were:

  • Python 3.0 - 1.29865017533
  • Python 3.1 - 0.112466335297

Again, this is an order of magnitude improvement. I tried the same program with short attribute names (just a, b and c) and a short dictionary key (just x)—and I got the same execution times, so I'm not sure how the interning helps.

Miscellaneous Changes

I'll mention several other performance improvements because it's difficult to measure their impact.

  • Tuples and dicts containing only untrackable objects are no longer tracked by the garbage collector.
  • A new configuration option --with-computed-gotos. This causes the bytecode evaluation loop to use a new dispatch mechanism that may speed it up by 20% (not available on all compilers).
  • Integers occupied 15 bits in previous versions, but now they can be either 15 bits or 30 bits. The 30-bit representation is much faster on 64-bit systems, but on 32-bit systems the results are unclear. So, the default is 30 bits on 64-bit systems and 15 bits on 32-bit systems. You can use another new configure option for Unix, called --enable-big-digits, to override this default.

Python 3 Library Migration: State of the Union

As you may have heard, Python 3 was a controversial release due to its lack of backward compatibility with Python 2.x. The Python development team did a great job of making it easy to migrate from Python 2.x to Python 3.x, but they couldn't port all the third-party libraries out there. Unfortunately, that's a major issue for many projects. So, there is a chicken and egg problem here. Library developers won't be motivated to port their libraries to Python 3 until their users demand it. But the users must wait for all the libraries they depend on to be ported to Python 3 before they can port their projects.

The Python package index contains about 5000 packages in general, and about 50 packages specifically for Python 3. That's just one percent, but you could argue that many of the 5000 are dead packages that nobody uses or maintains. There are also some "hub" packages used by many projects, so those are the key packages to port. I think it's safe to say that Python 3 development has not taken the world by storm just yet. Many of the key "hub" packages such as numpy, PIL and twisted haven't yet been ported to Python 3 at the time of this writing. That's not surprising, because important packages are usually big and complex, so porting them requires a serious effort.

Overall, A Production-Worthy Release

As you've seen, there are a number of important, convenient, and performance-oriented changes in Python 3.1. This release demonstrates again how solid the Python language is and how dependable its developers and community are. It is a very balanced release, with improvements to both the core language and the standard library. The performance improvements (especially those for IO, json, and pickle) make it a serious production-worthy consideration if the third-party packages you need have already been ported.



Gigi Sayfan specializes in cross-platform object-oriented programming in C/C++/C#/Python/Java with an emphasis on large-scale distributed systems. He is currently trying to build brain-inspired intelligent machines at Numenta.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date