Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Dig Deep into Python Internals, Part 2 : Page 2

Advanced techniques such as metaclasses, code injection, and call-stack walking harden Python for the enterprise. One novel use of Python's dynamic nature allows you to add private code access checking. Follow along to learn how.


advertisement
Hardening Python
Python, being the free spirit that it is, has no real access-checking mechanism (private, public, protected, package, etc). Any variable and any function can be accessed from any piece of code, so long as you qualify it properly. Python does provide a sort of name-hiding feature for class attributes. Attributes that start with two underscores (e.g. __blabla) and end with at most one underscore (e.g. __blabla_ is ok, __blabla__ is not ok) are implicitly prefixed by an underscore and the class name. So, __blabla becomes _classname__blabla (assuming classname is really the name of the class). Code inside the class can access the attribute using the short name (__blabla), but external code will have to use the full name (_classname__blabla).

The Puritan class in the example code below declares two "private" variables and one "non-private". Note that the dump() method can access all variables with their regular name, while the external code must qualify the attribute name with '_Puritan'.

class Puritan(object): __classPrivate = 3 __notPrivate__=5 def __init__(self): self.__instancePrivate = 4 def dump(self): print Puritan.__classPrivate print self.__instancePrivate if __name__=='__main__': p = Puritan() try: print Puritan.__classPrivate except AttributeError, e: print e try: print p.__instancePrivate except AttributeError, e: print e p.dump() print Puritan.__notPrivate__ print Puritan._Puritan__classPrivate print p._Puritan__instancePrivate Output: type object 'Puritan' has no attribute '__classPrivate' 'Puritan' object has no attribute '__instancePrivate' 3 4 5 3 4

Another way to get to "private" attributes is through the __dict__. This name mangling technique hasn't been popular in the Python community. The most common practice is to prefix private attributes with a single leading underscore as in _private. A single leading underscore actually means that 'import * from m' will not import all names (classes, functions, variables, etc) that have a single leading underscore. Anyway, all these semi-formal schemes don't really enforce code access verification and they are easy to circumvent. The question is how important is real access verification? The answer is it's getting more and more important for large systems.



Python, as opposed to most other dynamic languages, is being used to develop enterprise-grade systems. Bugs in enterprise-grade systems are notoriously expensive (especially if they are discovered late in the development cycle). Everything that can help reduce the number of bugs is welcome. In a large team of developers there will inevitably be someone who likes shortcuts, and will therefore call this private method temporarily, potentially ruining the integrity of the system. Another scenario where access verification may be important is when your system exposes a Python API and loads plugins written by some third party. In this case, you are potentially exposed to both clumsy and malevolent individuals. This trend of scaling up Python to ever larger systems is evident also in the quest for optional static typing for Python by Guido Van Rossum, Python's creator, and others.

Let's assume I convinced you Python needs code access verification. What can you do about? It turns out there is plenty you can do. You can decide to focus on renaming all the private attributes in your code and the libraries you use to the double underscore style. Then, you can review your code and make sure nobody is accessing something private. When you get tired, you can decide to write a little program that will do it for you. Finally, you can run this program periodically to scan your code for violations. This approach may turn out to be too tedious and error-prone. Also, you can't really say much about the code using static analysis if your code contains eval(), exec(), and friends or uses various dynamic code modification tricks.

The solution I'll present is based on access verification checks at runtime (of class attributes). Whenever a private attribute is accessed, by some mysterious magic the caller will be checked, and if it doesn't belong to the same class an exception will be raised.

Functions, Code Objects and Frames
Before you can submerge yourself into peeking and poking the call stack let's clear the dust out of some basic concepts. When you write a function 'foo' you type the arguments and the code that operates on these arguments (and possibly on the environment) and you decide whether or not foo() returns a result. When the module that contains your 'foo' is loaded (or compiled to .pyc) Python takes your function, compiles it to a code object that contains a bunch of metadata as well as a bytecode that can be executed by the Python virtual machine. In addition, Python creates a function object that contains a bunch of different metadata and also a reference to the code object and finally puts it in the global dictionary of the module. At runtime when function 'foo' executes, a frame object is created and put at the top of the call stack. This frame object has yet another set of metadata and a reference to the same code object referenced by 'foo'.

It turns out that Python can provide a lot of information about the entire call stack and particularly the direct caller.
Listing 1 is a miniature tour de force of this confusing compile-time/run-time code management. The 'dumpObject' function is a helper function that accepts an object and a regular expression filter. It traverses the object's attributes and prints the name and value (by eval()uating it) of each attribute that matches the filter. This is convenient for exploring the relevant attributes of function, code and frame objects since their attributes have a distinctive prefix (func_, co_, and f_). The 'a' function gets the current frame object using sys._getframe() and then it calls dumpObject three times—for the frame object, the code object, and itself (the 'a' function object)—with the corresponding regular expression filter. The output is somewhat censored. I removed the frame's builtins attribute because it was too big and the bytecode of the code object's co_code attribute since it was unprintable binary goo. As you can see, the frame object and the function object share the same code object (at 009AF620).

Working with Stack Frames
Finding the caller is definitely on the agenda if you want to verify it is allowed to call the current method. It turns out that Python can provide a lot of information about the entire call stack and particularly the direct caller. sys._getframe() is your beachhead to the call stack. When called without parameters it returns the current frame object of the call stack. When called with a depth argument it returns the frame object in this depth in the callstack. So, the caller frame object can be obtained using sys._getframe(1). Frame objects have several useful attributes such as the context of the current frame (f_builtins, f_locals, f_globals), the caller frame (f_back), and more. I will concentrate on f_code, which is the code object associated with a frame.

Listing 2 is focused on retrieving some information about the caller using a helper function: 'getCallerInfo'. Class A defines a method bar(), which is called from the main() function (a.bar()) and also from the a.foo() method. The a.bar() method calls getCallerInfo() to get the information and then displays it. Note that when getCallerInfo() is called from a.bar() the actual caller is already at depth 2 in the callstack, so getCallerInfo() uses sys._getFrame(2) to get its stack frame. It retrieves the caller's module and function name from the code object; then comes the interesting part. There is no "Pythonic" way to get the argument values from the frame or code object. Luckily the 'inspect' module provides a function called getargvalues() that returns a tuple whose first member is a list of the argument names and whose third member is the locals dictionary of the input frame. getCallerInfo() assumes that if the first argument exists it is the 'self' reference and its type is therefore the class of the caller. It is possible to retrieve the first line of the caller function and parse it to determine if it's a regular function or a method and scroll up and find the actual class. I prefer not to engage in such elaborate acrobatics at the moment (I did something similar for C++ in my article "Method Call Interception" in the April 2005 issue of C/C++ Users Journal).



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap