RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Developer's Guide to Python 3.0: Python 2.6 and Migrating From 2 to 3 : Page 4

Python 3.0 has been released. Are you ready to migrate your code? Find out what you need to know to make the switch.


More about 2to3

The 2to3 tool is an excellent example of a modern Python application. It's well worth your time to look at Guido Van Rossum's code. The architecture is interesting too. The main component is a general-purpose Python refactoring engine. This engine is coupled with a simple plug-in infrastructure, where each plug-in can perform a particular refactoring. It is designed to be extensible, and you may find it useful to extend the application for your own purposes. The only drawback is that 2to3 is largely undocumented.

But here are the basic workings. You'll find the code in the lib2to3 package under the lib subdirectory of the location where you installed Python 2.6 (on a Mac by default it's in /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib2to3). The package contains the following files:


You'll also find three sub-directories: pgen2, fixes, and tests, each of which is described below.


The pgen2 subdirectory contains conv.py, grammar.py, parse.py, driver.py and some other files that are responsible for the parsing portion of the application, based on grammar files.


This directory contains the concrete "fixers" that implement the actual 2to3 functionality. A fixer is a plug-in that applies a specific refactoring technique. You'll get a closer look at one of the fixers later on.


This subdirectory contains a comprehensive suite of tests that test the framework itself as well as every fixer.

The other files—in the root directory—are:


The __init__.py file is empty.


The main.py file has a main() function that gets invoked by the 2to3 script itself. This is the real entry point. It defines a class called StdoutRefactoringTool that subclasses the generic refactor.RefactoringTool and prints its output to standard output. The main() function parses the command line options using the optparse.OptionParser class, initializes the refactoring tool, filters the fixers list (from the fixers subdirectory), launches the refactoring tool, and reports the results. This is a fine example of a main program for an entire class of command line programs. It doesn't do too much, and it delegates the heavy lifting to the refactoring tool.


This file contains the code for a nice general-purpose refactoring engine. It manages the fixer plug-ins (which are just Python modules located in a package—a sub-directory—with a naming convention). The main class is RefactoringTool, which exposes several refactoring methods such as refactor_file, refactor_dir, refactor_string, and so forth. The RefactoringTool class traverses an AST tree (both pre- and post-order) and applies its fixers to appropriate nodes via the match() and transform() methods.

The actual parsing and conversion of Python source code to an AST tree is the most complicated part of the project. I will not go into too much detail because it is pretty hairy code.


This module contains the pattern compiler. 2to3 looks for patters to transform your code from Python 2 to Python 3. The grammar is fairly simple, and it's defined in PatternGrammer.txt:

# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# A grammar to describe tree matching patterns.
# Not shown here:
# - 'TOKEN' stands for any token (leaf node)
# - 'any' stands for any node (leaf or interior)
# With 'any' we can still specify the sub-structure.
# The start symbol is 'Matcher'.
Matcher: Alternatives ENDMARKER
Alternatives: Alternative ('|' Alternative)*
Alternative: (Unit | NegatedUnit)+
Unit: [NAME '='] ( STRING [Repeater]
                 | NAME [Details] [Repeater]
                 | '(' Alternatives ')' [Repeater]
                 | '[' Alternatives ']'
NegatedUnit: 'not' (STRING | NAME [Details] | '(' Alternatives ')')
Repeater: '*' | '+' | '{' NUMBER [',' NUMBER] '}'
Details: '<' Alternatives '>'


This module contains the implementation of the parse tree. It has Node and Leaf classes as well as various pattern classes (LeafPattern, NodePattern, WildcardPattern and NegatedPattern).


This module just defines a Symbol class and loads the Python grammar from a file.


Many utility functions for creating nodes, testing nodes and a bunch that defy classification. Here is an example for a function that creates a function call node:

def Call(func_name, args=None, prefix=None):
    """A function call"""
    node = Node(syms.power, [func_name, ArgList(args)])
    if prefix is not None:
    return node


This module contains the BaseFix and ConditionalFix classes. I'll concentrate on BaseFix here. This class can be used as a subclass to a fixer; studying it can help you understand what fixers do, and how they do it. Here's a list of the BaseFix methods with implementation and comments elided:

class BaseFix(object):
    def __init__(self, options, log):
    def compile_pattern(self):
    def set_filename(self, filename):
    def match(self, node):
    def transform(self, node, results):
    def new_name(self, template="xxx_todo_changeme"):
    def log_message(self, message):
    def cannot_convert(self, node, reason=None):
    def warning(self, node, reason):
    def start_tree(self, tree, filename):
    def finish_tree(self, tree, filename):

Each fix generally has a pattern used to match specific nodes. When a match is found the fix calls the transform() method, which transforms the tree and returns the results. The set_filename() and log_message() methods are used for reporting. The cannot_convert() and warning() methods detect potential problems. The start_tree() and finish_tree() methods get called at the beginning and end of an entire tree fix for fixers that need full tree context.

Here's a closer look at one of the simpler fixers: fix_getcwdu. The getcwdu() function in Python 2.x returns the current working directory as a Unicode string (as opposed to the getcwd() function that returns the current working directory as an ASCII string). In Python 3 all strings are Unicode, so getcwdu() can simply become getcwd(). The fixer is called fix_getcwdu. It contains a class called FixGetcwdu that subclasses BaseFix:

Fixer that changes os.getcwdu() to os.getcwd().
# Author: Victor Stinner
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixGetcwdu(fixer_base.BaseFix):
    PATTERN = """
              power< 'os' trailer< dot='.' name='getcwdu' > any* >
    def transform(self, node, results):
        name = results["name"]
        name.replace(Name("getcwd", prefix=name.get_prefix()))

The implementation is concise because it takes advantage of much of the 2to3 infrastructure. It defines only a PATTERN, and overrides the transform() method of the BaseFix. The PATTERN is simple, but still I'm not clear about the syntax all the parts. It seems to represent os.getcwdu (where the os. part is probably optional).

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date