his article is the second part in a three-part series about Python 2.5. The first part discussed the major changes and enhancements to the Python language itself. This part introduces the main modules that were added to the Python standard library. The third part will discuss a whole bag of smaller improvements and changes that are relevant to specific subsets of the Python community.
Python has a vibrant community that produces lots of useful packages and modules. The best onesthe ones that have proven themselves in the fieldsometimes get included in the standard Python library. This is important for several reasons:
- High availabilityPeople who deploy large Python-based systems that rely on standard modules only have it easy when it comes to installation, deployment, and upgrades.
- High visibilityBeing included in the standard library means that the module will be documented in the official Python documentation as well as in Python books. Example programs and articles are more likely to use standard modules because they don't require special installation (see point 1).
- Blessed statusIf there are multiple modules that provide some functionality then the module picked for inclusion in the standard library obviously has been deemed better.
There are three modules recently included in the standard library that I'll discuss in this article: ctypes, pysqlite, and ElementTree.
- ctypes allows calling C functions in dynamic/shared libraries without writing extensions.
- Pysqlite is a great embedded database package.
- ElementTree is a pythonic and efficient set of XML processing tools.
Arguably, these modules are the most important for the majority of Python users. I'll discuss the hashlib and wsgiref modules, which are also important, in a third article (coming soon).
Module No. 1: ctypes
Python is slow. Most of the time that doesn't matter. You might use it simply to write small scripts that finish before you even blink, or you might use it to glue together some tools. You can even write decent games in Python that perform well using a library like PyGame. However, if you develop core parts of a large-scale system in Python you might find out that Python is too slow. In this case you can always write the critical parts in C or C++ and wrap them with an extension module. But of course, this process is not slick and streamlined like pure Python development.
There are many ways to automate it and make it less painful (e.g. SWIG and Boost::Python). ctypes offers a simpler approach. It allows you to call C functions in dynamic libraries directly.
Dynamic libraries use platform-specific mechanisms. ctypes tries very hard to operate at a higher abstraction level , but in some cases it is just impossible. Some libraries may be available only on a certain platform and the library itself may have a different name. In this article, I will use libc for all the examples because it is so ubiquitous. I use Mac OS X, but the examples should work on every Linux/Unix OS. I will also refer to Windows from time to time because there are important capabilities that are available on Windows only.
Finding and Loading Libraries
Before you can start calling those great C functions, you need to locate and load the dynamic library that contains them. There are two ways to locate a library:
- You can call the ctypes.util.find_library() function
- You can just know where it is
In both cases you end up with a path to the dynamic library.
Here is how to use find_library to find the path to the libc library:
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library
In this code, find_library()
is doing its best to shield you from OS-specific details. Note that I didn't have to specify the extension (.dylib on Mac) or the 'lib' prefix.
Once you locate the dynamic library you can load it. There are different ways to do it, but for the most part they all depend on the dynamic library type, the calling convention, the platform, and interaction with the Python C API. On Linux/Mac OSX you should use the CDLL class to load a dynamic library with the C calling convention. On Windows you should use the WINDLL for dynamic libraries that use the standard (Pascal) calling convention and OLEDLL for COM objects.
Here is how to load libc on Linux/Mac OSX:
>>> from ctypes import CDLL
>>> from ctypes.util import find_library
>>> libc = CDLL(find_library('c'))
<CDLL '/usr/lib/libc.dylib', handle 100470 at 796f0>
Python has support for random number generation (well, pseudo random numbers). The random module provides a bunch of functions to generate anything you want. The problem is that there is no simple way to generate a random integer between 0 and X, which is almost always what I want. You can use random.randint(min, max)
but you will have to provide two numbers for min and max and then you will need to know that the random number you will get is in the range [min, max], which means min <= x <= max. This not intuitive to me because in computers (and often in math, too) half open ranges are the norm [min, max), which means min <= mix < max. Even Python's own range function returns the half open range.
Here is what I have to do to get a random number in the range [0,4) :
>>> import random
>>> random.randint(0, 3)
So, I don't like random.randint()
. Luckily, ctypes comes to the rescue with its rand()
takes no arguments and always returns a random integer between 0 and max_int. Converting it to the range [0, 4) is as simple as this:
>>> libc.rand() % 4
Ok, let's try some math. What is the square root of 1?
Cool, that works. Let's try some more:
Oops. That's not good. As I recall the square root of 4 should be 2. What happened? So, CDLL objects assume that all functions return an int unless you tell them otherwise. That means that the return value of sqrt
that happens to be a double precision floating point number will be coerced automatically to a Python int type (types.IntType). For some reason everything I tried to feed to sqrt returns 1 (or overflow error).
The way to fix it is to tell the sqrt function that it should return a double and not an int. ctypes provides a bunch of type factories designed to make it easy to map native C types to Python types. The full list can be found here: http://docs.python.org/dev/lib/node452.html.
Here is how to tell sqrt to return double:
from ctypes import c_double
>>> libc.sqrt.restype = c_double
also expects a double parameter. You probably think that you can pass in a Python double just like you passed an int but you would be wrong. Only the following types are converted automatically to C types:
- None becomes a NULL pointer
- int and long become the default C int type (exact type depends on platform)
- strings and unicode strings become char * or wchar_t * respectively.
In order to call a C function that accepts a double you must use one of ctypes type constructors. It's as simple as calling a function and passing a Python value:
The small error is an artifact of the way floating point numbers are represented in modern computers and is not a bug. Don't be alarmed.
So, let's get the sqrt() function going already:
Yay, it works. What happens if you try to pass a raw Python double? Nothing good, that's for sure:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ctypes.ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1
ctypes.ArgumentError is the exception ctypes raises if it can't convert the object you passed in.