devxlogo

Dig Deep into Python Internals

Dig Deep into Python Internals

his article is the first in a two-part series that will dig deep to explore the fascinating new-style Python object model, which was introduced in Python 2.2 and improved in 2.3 and 2.4. The object model and type system are very dynamic and allow quite a few interesting tricks. In this article I will describe the object, model, and type system; explore various entities; explain the life cycle of an object; and introduce some of the countless ways to modify and customize almost everything you thought immutable at runtime.

The Python Object Model
Python’s objects are basically a bunch of attributes. These attributes include the type of the object, fields, methods, and base classes. Attributes are also objects, accessible through their containing objects.

The built-in dir() function is your best friend when it comes to exploring python objects. It is designed for interactive use and, thereby, returns a list of attributes that the implementers of the dir function thought would be relevant for interactive exploration. This output, however, is just a subset of all the attributes of the object. The code sample below shows the dir function in action. It turns out that the integer 5 has many attributes that seem like mathematical operations on integers.

dir(5)['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__','__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__','__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__','__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__str__', '__sub__', '__truediv__', '__xor__']

The function foo has many attributes too. The most important one is __call__ which means it is a callable type. You do want to call your functions, don’t you?

def foo()      pass...dir(foo)['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__get__', '__getattribute__','__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__','__repr__', '__setattr__', '__str__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']

Next I’ll define a class called ‘A’ with two methods, __init__ and dump, and an instance field ‘x’ and also an instance ‘a’ of this class. The dir function shows that the class’s attributes include the methods and the instance has all the class attributes as well as the instance field.

>>> class A(object):...     def __init__(self):...             self.x = 3...     def dump(self):...             print self.x...>>> dir(A)['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', '__weakref__', 'dump']>>> a = A()>>> dir(a)['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash__', '__init__','__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__','__weakref__', 'dump', 'x']

The Python Type System
Python has many types. Much more than you find in most languages (at least explicitly). This means that the interpreter has a lot of information at runtime and the programmer can take advantage of it by manipulating types at runtime. Most types are defined in the types module, which is shown in the code immediately below. Types come in various flavors: There are built-in types, new-style classes (derived from object), and old-style classes (pre Python 2.2). I will not discuss old-style classes since they are frowned upon by everybody and exist only for backward compatibility.

>>> import types>>> dir(types)['BooleanType', 'BufferType', 'BuiltinFunctionType', 'BuiltinMethodType', 'ClassType', 'CodeType','ComplexType', 'DictProxyType', 'DictType', 'DictionaryType', 'EllipsisType', 'FileType', 'FloatType', 'FrameType', 'FunctionType', 'GeneratorType', 'InstanceType', 'IntType', 'LambdaType','ListType', 'LongType', 'MethodType', 'ModuleType', 'NoneType', 'NotImplementedType', 'ObjectType','SliceType', 'StringType', 'StringTypes', 'TracebackType', 'TupleType', 'TypeType', 'UnboundMethodType', 'UnicodeType', 'XRangeType', '__builtins__', '__doc__', '__file__', '__name__']

Python’s type system is object-oriented. Every type (including built-in types) is derived (directly or indirectly) from object. Another interesting fact is that types, classes and functions are all first-class citizens and have a type themselves. Before I delve down into some juicy demonstrations let me introduce the built-in function ‘type‘. This function returns the type of any object (and also serves as a type factory). Most of these types are listed in the types module, and some of them have a short name. Below I’ve unleashed the ‘type‘ function on several objects: None, integer, list, the object type, type itself, and even the ‘types’ module. As you can see the type of all types (list type, object, and type itself) is ‘type’ or in its full name types.TypeType (no kidding, that’s the name of the type).

>>> type(None)>>> type(5)>>> x = [1,2,3]>>> type(x)>>> type(list) >>> type(type)>>> type(object)>>>>>> import types>>> type(types)>>> type==types.TypeTypeTrue

What is the type of classes and instances? Well, classes are types of course, so their type is always ‘type‘ (regardless of inheritance). The type of class instances is their class.

>>> class A(object):...     pass>>> a = A()>>> type(A)>>> type(a)>>> a.__class__

It’s time for the scary part?a vicious cycle: ‘type‘ is the type of object, but object is the base class of type. Come again? ‘type‘ is the type of object, but object is the base class of type. That’s right?circular dependency. ‘object’ is a ‘type’ and ‘type’ is an ‘object’.

>>> type(object)>>> type.__bases__(,)>>> object.__bases__()

How can it be? Well, since the core entities in Python are not implemented themselves in Python (there is PyPy but that’s another story) this is not really an issue. The ‘object’ and ‘type’ are not really implemented in terms of each other.

The one important thing to take home from this is that types are objects and are therefore subject to all the ramifications thereof. I’ll discuss those ramifications very shortly.

Instances, Classes, Class Factories, and Metaclasses
When I talk about instances I mean object instances of a class derived from object (or the object class itself). A class is a type, but as you recall it is also an object (of type ‘type’). This allows classes to be created and manipulated at runtime. This code demonstrates how to create a class at runtime and instantiate it.

def init_method(self, x, y):      self.x = x      self.y = ydef dumpSum_method(self):      print self.x + self.yD = type('DynamicClass',                   (object,),                   {'__init__':init_method, 'dumpSum':dumpSum_method})d = D(3, 4)d.dumpSum()

As you can see I created two functions (init_method and dumpSum_method) and then invoked the ubiquitous ‘type’ function as a class factory to create a class called ‘DynamicClass,’ which is derived from ‘object’ and has two methods (one is the __init__ constructor).

It is pretty simple to create the functions themselves on the fly too. Note that the methods I attached to the class are regular functions that can be called directly (provided their self-argument has x and y members, similar to C++ template arguments).

Functions, Methods and other Callables
Python enjoys a plethora of callable objects. Callable objects are function-like objects that can be invoked by calling their () operator. Callable objects include plain functions (module-level), methods (bound, unbound, static, and class methods) and any other object that has a __call__ function attribute (either in its own dictionary, via one of its ancestors, or through a descriptor).

It’s truly complicated so the bottom line is to remember that all these flavors of callables eventually boil down to a plain function. For example, in the code below the class A defines a method named ‘foo’ that can be accessed through:

  1. an instance so it is a bound method (bound implicitly to its instance)
  2. through the class A itself and then it is an unbound method (the instance must be supplied explicitly)
  3. directly from A’s dictionary, in which case it is a plain function (but you must still call it with an instance of A).

So, all methods are actually functions but the runtime assigns different types depending on how you access it.

class A(object):    def foo(self):        print 'I am foo'>>> a = A()>>> a.foo>>>> A.foo>>> A.__dict__['foo']>>> a.foo>>> a.foo()I am foo>>> A.foo(a)I am foo>>> A.__dict__['foo'](a)I am foo

Let’s talk about static methods and class methods. Static methods are very simple. They are similar to static methods in Java/C++/C#. They are scoped by their class but they don’t have a special first argument like instance methods or class methods do; they act just like a regular function (you must provide all the arguments since they can’t access any instance fields). Static methods are not so useful in Python because regular module-level functions are already scoped by their module and they are the natural mapping to static methods in Java/C++/C#.

Class methods are an exotic animal. Their first argument is the class itself (traditionally named cls) and they are used primarily in esoteric scenarios. Static and class methods actually return a wrapper around the original function object. In the code that follows, note that the static method may be accessed either through an instance or through a class. The class method accepts a cls instance as its first argument but cls is invoked through a class directly (no explicit class argument). This is different from an unbound method where you have to provide an instance explicitly as first argument.

class A(object):    def foo():        print 'I am foo'    def foo2(cls):        print 'I am foo2', cls     def foo3(self):        print 'I am foo3', self           foo=staticmethod(foo)    foo2=classmethod(foo2)        >>> a = A()>>> a.foo()I am foo>>> A.foo()I am foo>>> A.foo2()I am foo2 >>> a.foo3()I am foo3 <__main__.A object at 0x00A1AA10>

Note that classes are callable objects by themselves and operate as instance factories. When you “call” a class you get an instance of that class as a result.

A different kind of callable object is an object that has a __call__ method. If you want to pass around a function-like object with its context intact, __call__ can be a good thing. Listing 1 features a simple ‘add’ function that can be replaced with a caching adder class that stores results of previous calculations. First, notice that the test function expects a function-like object called ‘add’ and it just invokes it as a function. The ‘test’ function is called twice?once with a simple function and a second time with the caching adder instance. Continuations in Python can also be implemented using __call__ but that’s another article.

Metaclasses
Metaclasse is a concept that doesn’t exist in today’s mainstream programming languages. A metaclass is a class whose instances are classes. You already encountered a meta-class in this article called ‘type’. When you invoke “type” with a class name, a base-classes tuple, and an attribute dictionary, the method creates a new user-defined class of the specified type. So the __class__ attribute of every class always contains its meta-class (normally ‘type’).

That’s nice, but what can you do with a metaclass? It turns out, you can do plenty. Metaclasses allow you to control everything about the class that will be created: name, base classes, methods, and fields. How is it different from simply defining any class you want or even creating a class dynamically on the fly? Well, it allows you to intercept the creation of classes that are predefined as in aspect-oriented programming. This is a killer feature that I’ll be discussing in a follow-up to this article.

After a class is defined, the interpreter looks for a meta-class. If it finds one it invokes its __init__ method with the class instance and the meta-class gets a stab at modifying it (or returning a completely different class). The interpreter will use the class object returned from the meta-class to create instances of this class.

So, how do you stick a custom metaclass on a class (new-style classes only)? Either you declare a __metaclass__ field or one of your ancestors has a __metaclass__ field. The inheritance method is intriguing because Python allows multiple inheritance. If you inherit from two classes that have custom metaclasses you are in for a treat?one of the metaclasses must derive from another. The actual metaclass of your class will be the most derived metaclass:

class M1(type): passclass M2(M1):   passclass C2(object): __metaclass__=M2    class C1(object): __metaclass__=M1class C3(C1, C2): passclasses = [C1, C2, C3]for c in classes:    print c, c.__class__    print '------------'                 Output:     ------------ ------------ 

Day In The Life of a Python Object
To get a feel for all the dynamics involved in using Python objects let’s track a plain object (no tricks) starting from its class definition, through its class instantiation, access its attributes, and see it to its demise. Later on I’ll introduce the hooks that allow you to control and modify this workflow.

The best way to go about it is with a monstrous simulation. Listing 2 contains a simulation of a bunch of monsters chasing and eating some poor person. There are three classes involved: a base Monster class, a MurderousHorror class that inherits from the Monster base class, and a Person class that gets to be the victim. I will concentrate on the MurderousHorror class and its instances.

Class Definition
MurderousHorror inherits the ‘frighten‘ and ‘eat‘ methods from Monster and adds a ‘chase‘ method and a ‘speed’ field. The ‘hungry_monsters’ class field stores a list of all the hungry monsters and is always available through the class, base class, or instance (Monster.hungry_monsters, MurderousHorror.hungry_monsters, or m1.hungry_monsters). In the code below you can see (via the handy ‘dir‘ function) the MurderousHorror class and its m1 instance. Note that methods such as ‘eat,’ ‘frighten,’ and ‘chase‘ appear in both, but instance fields such as ‘hungry’ and ‘speed’ appear only in m1. The reason is that instance methods can be accessed through the class as unbound methods, but instance fields can be accessed only through an instance.

class NoInit(object):    def foo(self):        self.x = 5            def bar(self):        print self.x                            if __name__ == '__main__':                ni = NoInit()    assert(not ni.__dict__.has_key('x'))    try:        ni.bar()    except AttributeError, e:        print e    ni.foo()    assert(ni.__dict__.has_key('x'))    ni.bar()    Output:'NoInit' object has no attribute 'x'5

Object Instantiation and Initialization
Instantiation in Python is a two-phase process. First, __new__ is called with the class as a first argument, and later as the rest of the arguments, and should return an uninitialized instance of the class. Afterward, __init__ is called with the instance as first argument. (You can read more about __new__ in the Python reference manual.)

When a MurderousHorror is instantiated __init__ is the first method called. __init__ is similar to a constructor in C++/Java/C#. The instance calls the Monster base class’s __init__ and initializes its speed field. The difference between Python and C++/Java/C# is that in Python there is no notion of a parameter-less default constructor, which, in other languages, is automatically generated for every class that doesn’t have one. Also, there is no automatic call to the base class’ default __init__ if the derived class doesn’t call it explicitly. This is quite understandable since no default __init__ is generated.

In C++/Java/C# you declare instance variables in the class body. In Python you define them inside a method by explicitly specifying ‘self.SomeAttribute’. So, if there is no __init__ method to a class it means its instances have no instance fields initially. That’s right. It doesn’t HAVE any instance fields. Not even uninitialized instance fields.

The previous code sample (above) is a perfect example of this phenomenon. The NoInit class has no __init__ method. The x field is created (put into its __dict__) only when foo() is called. When the program calls ni.bar() immediately after instantiation the ‘x’ attribute is not there yet, so I get an ‘AttributeError’ exception. Because my code is robust, fault tolerant, and self healing (in carefully staged toy programs), it bravely recovers and continues to the horizon by calling foo(), thus creating the ‘x’ attribute, and ni.bar() can print 5 successfully.

Note that in Python __init__ is not much more then a regular method. It is called indeed on instantiation, but you are free to call it again after initialization and you may call other __init__ methods on the same object from the original __init__. This last capability is also available in C#, where it is called constructor chaining. It is useful when you have multiple constructors that share common initialization, which is also one of the constructors/initializers. In this case you don’t need to define another special method that contains the common code and call it from all the constructors/initializers; you can just call the shared constructor/initializer directly from all of them.

Attribute Access
An attribute is an object that can be accessed from its host using the dot notation. There is no difference at the attribute access level between methods and fields. Methods are first-class citizens in Python. When you invoke a method of an object, the method object is looked up first using the same mechanism as a non-callable field. Then the () operator is applied to the returned object. This example demonstrates this two-step process:

class A(object):    def foo(self):        print 3                            if __name__ == '__main__':                a = A()    f = a.foo    print f    print f.im_self    a.foo()    f()        Output:><__main__.A object at 0x00A03EB0>33

The code retrieves the a.foo bound method object and assigns it to a local variable ‘f’. ‘f’ is a bound method object, which means its im_self attribute points to the instance to which it is bound. Finally, a.foo is invoked through the instance (a.foo()) and by calling f directly with identical results. Assigning bound methods to local variables is a well known optimization technique due to the high cost of attribute lookup. If you have a piece of Python code that seems to perform under the weather there is a good chance you can find a tight loop that does a lot of redundant lookups. I will talk later about all the ways you can customize the attribute access process and why it is so costly.

Destruction
The __del__ method is called when an instance is about to be destroyed (its reference count reaches 0). It is not guaranteed that the method will ever be called in situations such as circular references between objects or references to the object in an exception. Also the implementation of __del__ may create a new reference to its instance so it will not be destroyed after all. Even when everything is simple and __del__ is called, there is no telling when it will actually be called due to the nature of the garbage collector. The bottom line is if you need to free some scarce resource attached to an object do it explicitly when you are done using it and don’t wait for __del__.

A try-finally block is a popular choice for garbage collection since it guarantees the resource will be released even in the face of exceptions. The last reason not to use is __del__ is that its interaction with the ‘del’ built-in function may confuse programmers. ‘del’ simply decrements the reference count by 1 and doesn’t call ‘__del__‘ or cause the object to be magically destroyed. In the next code sample I use the sys.getrefcount() function to determine the reference count to an object before and after calling ‘del’. Note that I subtract 1 from sys.getrefcount() result because it also counts the temporary reference to its own argument.

import sysclass A(object):    def __del__(self):        print "That's it for me"                            if __name__ == '__main__':                a = A()    b = a    print sys.getrefcount(a)-1    del b    print sys.getrefcount(a)-1Output:21That's it for me

Hacking Python
Let the games begin. In this section I will explore different ways to customize attribute access. The topics include the __getattribute__ hook, descriptors, and properties.

- __getattr__, __setattr__ and __getattribute__

These special methods control attribute access to class instances. The standard algorithm for attribute lookup returns an attribute from the instance dictionary or one of its base class’s dictionaries (descriptors will be described in the next section). They are supposed to return an attribute object or raise AttributeError exception. If you define some of these methods in your class they will be called upon during attribute access under some conditions.

  • __getattr__ and __setattr__ work with old-style and new-style classes.
  • __getattr__ is called to get attributes that cannot be found using the standard attribute lookup algorithm.
  • __setattr__ is called for setting the value of any attribute. This asymmetry is necessary to allow adding new attributes to instances.
  • __getattribute__ works for new-style classes only. It is called to get any attribute (existing or non-existing). __getattribute__ has precedence over __getattr__, so if you define both, __getattr__ will not be called (unless __getattribute__ raises AttributeError exception).

Listing 3 is an interactive example. It is designed to allow you to play around with it and comment out various functions to see the effect. It introduces the class A with a single ‘x’ attribute. It has __getattr__, __setattr__, and __getattribute__ methods. __getattribute__ and __setattr__ simply forward any attribute access to the default (lookup or set value in dictionary). __getattr__ always returns 7. The main program starts by assigning 6 to the non-existing attribute ‘y’ (happens via __setattr__) and then prints the preexisting ‘x’, the newly created ‘y’, and the still non-existent ‘z’. ‘x’ and ‘y’ exist now, so they are accessible via __getattribute__. ‘z’ doesn’t exist so __getattribute__ fails and __getattr__ gets called and returns 7. (Author’s Note: This is contrary to the documentation. The documentation claims if __getattribute__ is defined, __getattr__ will never be called, but this is not the actual behavior.)

Descriptors
A descriptor is an object that implements three methods __get__, __set__, and __delete__. If you put such a descriptor in the __dict__ of some object then whenever the attribute with the name of the descriptor is accessed one of the special methods is executed according to the access type (__get__ for read, __set__ for write, and __delete__ for delete).This simple enough indirection scheme allows total control on attribute access.

The following code sample shows a silly write-only descriptor used to store passwords. Its value may not be read nor deleted (it throws AttributeError exception). Of course the descriptor object itself and the password can be accessed directly through A.__dict__[‘password’].

class WriteOnlyDescriptor(object):    def __init__(self):        self.store = {}    def __get__(self, obj, objtype=None):        raise AttributeError     def __set__(self, obj, val):        self.store[obj] = val        def __delete__(self, obj)         raise AttributeErrorclass A(object):    password = WriteOnlyDescriptor()      if __name__ == '__main__':     a = A()    try:        print a.password    except AttributeError, e:        print e.__doc__    a.password = 'secret'    print A.__dict__['password'].store[a]

Descriptors with both __get__ and __set__ methods are called data descriptors. In general, data descriptors take lookup precedence over instance dictionaries, which take precedence over non-data descriptors. If you try to assign a value to a non-data descriptor attribute the new value will simply replace the descriptor. However, if you try to assign a value to a data descriptor the __set__ method of the descriptor will be called.

Properties
Properties are managed attributes. When you define a property you can provide get, set, and del functions as well as a doc string. When the attribute is accessed the corresponding functions are called. This sounds a lot like descriptors and indeed it is mostly a syntactic sugar for a common case.

This final code sample is another version of the silly password store using properties. The __password field is “private.” Class A has a ‘password’ property that, when accessed as in ‘a.password,’ invokes the getPassword or setPassword methods. Because the getPassword method raises the AttributeError exception, the only way to get to the actual value of the __password attribute is by circumventing the Python fake privacy mechanism. This is done by prefixing the attribute name with an underscore and the class name a._A__password. How is it different from descriptors? It is less powerful and flexible but more pleasing to the eye. You must define an external descriptor class with descriptors. This means you can use the same descriptor for different classes and also that you can replace regular attributes with descriptors at runtime.

class A(object):    def __init__(self):        self.__password = None    def getPassword(self):        raise AttributeError    def setPassword(self, password):                self.__password = password    password = property(getPassword, setPassword)          if __name__ == '__main__':    a = A()    try:        print a.password    except AttributeError, e:        print e.__doc__    a.password = 'secret'    print a._A__password    Output:    Attribute not found.secret 

Properties are more cohesive. The get, set functions are usually methods of the same class that contain the property definition. For programmers coming from languages such as C# or Delphi, Properties will make them feel right at home (too bad Java is still sticking to its verbose java beans).

Python’s Richness a Mixed Blessing
There are many mechanisms to control attribute access at runtime starting with just dynamic replacement of attribute in the __dict__ at runtime. Other methods include the __getattr__/__setattr, descriptors, and finally properties. This richness is a mixed blessing. It gives you a lot of choice, which is good because you can choose whatever is appropriate to your case. But, it is also bad because you HAVE to choose even if you just choose to ignore it. The assumption, for better or worse, is that people who work at this level should be able to handle the mental load.

In my next article, I will pick up where I’ve left off. I’ll begin by contrasting metaclasses with decorators, then explore the Python execution model, and explain how to examine stack frames at runtime. Finally, I’ll demonstrate how to augment the Python language itself using these techniques. I’ll introduce a private access checking feature that can be enforced at runtime.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist