Improve Python Performance with Cython

Improve Python Performance with Cython

Python is my favorite language. It is simple, readable, flexible, supports many programming paradigms, has an awesome standard library, tons of third-party packages and a great community. But, it is also slow and a memory hog. I use Python for many tasks: writing little scripts to make me more productive and automate repetitive tasks, one-off scripts to migrate data or convert files, and even to write full-fledged massive scale distributed systems. Guess what? Performance matters when you write large scale distributed systems.

Normally, this is not a problem because the performance of critical parts can be executed by native libraries or extensions, but sometimes your Python code is the bottleneck. When that is the case Python has a great interoperability story as far as C/C++ goes. This article will look into the various options.

Python Implementations

There are several Python implementations. I focus here on the primary CPython?implementation. Other implementations like PyPy, IRonPython and Jython have their own ways to handle performance issues, although some of them are compatible to various degrees with CPython C extensions. Those implementations are out of scope. For the rest of this article when I write “Python” I mean “CPython”.

Python and C

Python provides a C-based API for extending and embedding. Extending means you can call C/C++ code from Python code. Embedding means you call Python code from a C/C++ program (embed a Python interpreter in it). This API existed since the dawn of time and it gives you complete control, but it is pretty hairy and you need to have a pretty good knowledge of CPython internals to use it properly and more importantly to debug issues.


Cython?is an optimizing static compiler that allows you to write Python C extensions and call C and C++ code in a syntax that is a superset of Python. It is now the recommended way (by Python core developers) to write Python extensions. Cython will take your Python-like code, possibly annotated with type information and produce a high-performance extension you can use from your Python code. Let’s look at an example: Computing the n-th Fibonacci number. Here is the Python code in

def fib_py(n):    """Return the n-th Fibonacci number."""    a, b = 0, 1    for i in range(n):        a, b = b, a + b    return a

The Cython code is identical except that it is in a file called fib_cython.pyx and I renamed the function to fib_cython, so I can import both without name conflicts

def fib_cython(n):    """Return the n-th Fibonacci number."""    a, b = 0, 1    for i in range(n):        a, b = b, a + b    return a

Finally, another Cython variation with type information in a file called fib_cython_with_types.pyx. Here the a,b and i variables are annotated with Cython’s ‘cdef int’ to let Cython know those variables should be treated as C integers

def fib_cython_with_types(n):    """Return the n-th Fibonacci number."""    cdef int a = 0    cdef int b = 1    cdef int i    for i in range(n):        a, b = b, a + b    return a

Before using Cython code you either need to create a and use some special build instructions or take the highroad and use pyximport as follows:

import pyximportpyximport.install()

Now, you can import .pyx Cython modules just like normal Python modules and they will be compiled on the fly when you import them. I then proceeded to import all 3 functions from the Python module and two Cython modules:

from fib_py import fib_pyfrom fib_cython import fib_cythonfrom fib_cython_with_types import fib_cython_woth_types

Here is the output when import fib_cython:

/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1639:28:warning: unused function '__Pyx_PyObject_AsString' [-Wunused-function]static CYTHON_INLINE char* __Pyx_PyObject_AsString(PyObject* o) {                           ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1636:32:warning: unused function '__Pyx_PyUnicode_FromString'[-Wunused-function]static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(const char* c_str) {                               ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:325:29:warning: unused function '__Pyx_Py_UNICODE_strlen' [-Wunused-function]def fib_cython(n):static CYTHON_INLINE size_t __Pyx_Py_UNICODE_strlen(const Py_UNICODE *u)                            ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1701:26:warning: unused function '__Pyx_PyObject_IsTrue' [-Wunused-function]static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) {                         ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1751:33:warning: unused function '__Pyx_PyIndex_AsSsize_t' [-Wunused-function]static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) {                                ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1813:33:warning: unused function '__Pyx_PyInt_FromSize_t' [-Wunused-function]def fib_cython_with_types(n):static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) {                                ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1172:32:warning: unused function '__Pyx_PyInt_From_long' [-Wunused-function]static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value) {                               ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1223:27:warning: function '__Pyx_PyInt_As_long' is not needed and will not be      emitted [-Wunneeded-internal-declaration]static CYTHON_INLINE long __Pyx_PyInt_As_long(PyObject *x) {                          ^/Users/gigi/.pyxbld/temp.macosx-10.9-x86_64-2.7/pyrex/fib_cython.c:1407:26:warning: function '__Pyx_PyInt_As_int' is not needed and will not be      emitted [-Wunneeded-internal-declaration]static CYTHON_INLINE int __Pyx_PyInt_As_int(PyObject *x) {                         ^9 warnings generated.

The warnings are harmless and you can see that Cython has gone ahead and built a Python C extension. To test the performance of the three versions I used IPython’s %timeit magic function to compute the 100th Fibonacci number using each function and here are the results:

In [18]: %timeit fib_cython_with_types(100)10000000 loops, best of 3: 140 ns per loopIn [19]: %timeit fib_cython(100)100000 loops, best of 3: 2.75 ?s per loopIn [20]: %timeit fib_py(100)100000 loops, best of 3: 7.85 ?s per loop

The pure Python version took 7.85 Microseconds. The Cython version with no type information to 2.75 Microseconds (about 3X faster). However, the Cython version with type information took only 140 Nanoseconds, which is about 56X faster than the pure Python version.

This is, of course, just an example, but it hints at the capabilities of Cython to improve the performance of Python programs with very little effort.

Python and C++

C++ was very difficult traditionally to integrate with Python. Solutions like SWIG?or Boost.Python?were not the most user-friendly. Now there are some easier solutions that take advantage of modern C++ such as pybind11.


There are many ways to optimize Python and improve its performance. Cython is very easy to use and recommended to get the low-hanging fruits of performance. If you want to interface with C++ there are several options as well, and in case you need total control you can always use the Python C API directly. But, don’t forget the premature optimization is the root of all evil in computer science. Before using any of these techniques, first verify that Python is indeed the bottleneck.


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist