RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


A Developer's Guide to Python 3.0: Package Installation and More Language Enhancements : Page 4

Explore Python 3.0's new support for per-user installations, an official with statement, property decorators, keyword-only arguments, dictionary changes, and C API changes.


Python C API Changes

Welcome to the land of the Python C API. Standard Python (aka CPython) is implemented in C (albeit very object-oriented C). The implementation language is relevant to a serious Python programmer because Python can be extended with extension modules written in C. There are two main reasons to use an extension module:

  1. You have a C library that already does exactly what you need, and you don't want to rewrite it in Python.
  2. Speed.

Python itself (the CPython implementation) is very slow. This is partly due to its extreme flexibility, partly due to some design decisions, and partly because Python developers have always been able to speed up critical parts of their programs by replacing them with lightning-fast C extension modules.

Python 3.0 made several changes to the C API, which means that Python C extensions compiled against the Python 2.x API don't work in Python 3.0.

Author's Note: I will discuss this topic in depth in the next article in the series. I will not cover all the changes, due to time and space constraints, but will focus on two major changes that received their own PEPs, and explain them at a relatively high-level.

PEP-3121: Extension Module Initialization and Finalization

C extension modules are dynamically loaded libraries. In Python 2.x the life cycle of extension modules is a little problematic:

  1. They are generally initialized once and never get unloaded.
  2. The entry point that the Python interpreter calls to initialize an extension module is called init{module name}. This generic name must be exported, and can conflict with other global symbols.
  3. When Py_Finalize() is called the init function is called a second time. This is surprising and required careful management on the part of the extension developer to make sure all resources were cleaned up the second time.
  4. The init entry point has no return value. That deviates from common practice and doesn't allow developers to check whether the initialization failed.
  5. If you run multiple interpreters that import the same extension module, it will be shared by all interpreters. This means that one interpreter can corrupt the state of another interpreter.

PEP-3121 solves all these issues. The entry point signature of the init function is now:

PyObject * PyInit_{module name}()

That solves the name-conflict issue and provides a return value. If initialization succeeds, the function returns a pointer to the module object, otherwise it returns NULL.

This function should return a new module object every time it's called and each Python interpreter should call it once. The module object will be passed to each module function, allowing a separate state for each interpreter.

The module is defined in the following C struct:

struct PyModuleDef{
  PyModuleDef_Base m_base;  /* To be filled out by the interpreter */
  Py_ssize_t m_size; /* Size of per-module data */
  PyMethodDef *m_methods;
  inquiry m_reload;
  traverseproc m_traverse;
  inquiry m_clear;
  freefunc m_free;

The m_clear function gets called every time the GC clears the module's memory (or NULL if not keeping state). The m_free function gets called when the module is deallocated (or NULL if not needed).

Unfortunately, Python 3.0 didn't fix the reload use case. The Python reload() function from Python 2.x was even removed (it never worked properly). I'm not sure what the reason is. It seems that with per-module state and the free function it should be possible to implement reload() properly. The problem probably stems from the issue of modules being shared between interpreters. If you reload a module with modified code in one interpreter, what would happen to the module in another interpreter? The correct solutions are either not to share modules between interpreters to begin with (which could become some kind of an import option) or, whenever a module gets reloaded, load a new copy to the reloading interpreter and let other interpreters continue to run with the old module.

PEP-3118: Revising the Buffer Protocol

Many Python objects and modules share memory under the covers at the C level. Python 2.x provided a buffer protocol to allow shared access via pointer to a contiguous memory buffer or a sequence of segments. C API objects could export their data in raw format or consume other objects' data. The Python 2.x buffer protocol had some deficiencies that this PEP addresses. The main issues are:

  • Consumers can't tell exporter objects when they are done with the shared memory, so the exporters can release the memory safely.
  • There is no good way to work with discontiguous memory.
  • There is no way to describe the internal structure of the memory with the 2.x buffer protocol. The consumer must "know" or the exporter and consumer must share some scheme to put such metadata in the buffer.

PEP-3118 is motivated by NumPy and PIL. These libraries are not part of the Python standard libraries, but they are definitely standards in the Python scientific community (NumPy) and image processing community (PIL). These libraries use discontiguous memory extensively and their needs informed the PEP. NumPy deals primarily with N-dimensional arrays, and uses a strided memory model to enable efficient slicing. PIL deals primarily with images that are often stored in an array of pointers to contiguous memory buffers.

The new buffer protocol addresses all these issues:

  • It gets rid of some little-used features, such as char-buffer and multiple segment sequences.
  • It adds a notification function for consumers to call when they are done with the memory.
  • It adds a variable that describes the structure of the memory (a-la struct).
  • It also adds shape and stride information (for NumPy).
  • There is a mechanism for sharing arrays that must be accessed using pointer indirection (PIL).
  • It provides functions for copying contiguous data in and out of object supporting the buffer interface.

The bottom line result of the changes is that low-level libraries that need to manipulate raw memory can do it in a very efficient manner and interoperate with other libraries without unnecessary conversions or putting private metadata info inside the buffers. High-level applications that use these libraries should benefit from improved performance with no code changes.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date