RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


A Developer's Guide to Python 3.0: Standard Library : Page 3

The changes to the standard library in Python 3.0 truly "clean house." The results are both more usable and less cluttered.


The io Module

The io module formalizes the concept of streams in Python. Python always supported the idiom of file-like objects that have a read() and write() method, but it was pretty fuzzy, and left many important features and subtleties to the implementer. The new io module is specified in PEP-3116. It presents three layers of I/O: raw, buffered, and text. Raw I/O and buffered I/O operate on bytes, while text I/O operates on characters.

Each layer consists of one or more abstract base classes along with concrete classes that implement them. In addition, the module implements the familiar built-in open()function.

The base class for all IO streams is IOBase. It is officially an abstract base class because it subclasses abc.ABCMeta, but it has no abstract methods or properties (methods and properties that subclasses must define). It defines the following methods and properties: close(), closed, flush(), readable(), readline(), readlines(), seek(), seekable(), tell(), truncate(), writable(), and writelines(). It does not define read() and write() methods, because their signature is different for different streams—but these are also considered part of the interface. In addition, IOBase supports iteration (it implements __iter__ and __next__) and the with keyword (it implements __enter__ and __exit__). It provides dummy implementations for all these methods that represent a file that you can't read from, write to, or seek. Subclasses are expected to override or add key methods.

Three classes directly sub-class IOBase: RawIOBase, BufferedIOBase and TextIOBase. RawIOBase adds the methods: read(), readall(), readinto(), and write(). BufferedIOBase adds the methods: read(), readinto(), and write(). TextIOBase adds/implements the methods: read(), readline(), truncate(), and write(). Each of these classes provides its own flavor. For example, the BufferedIOBase class's read() method raises a BlockingError exception if there is no data available in its buffer, while the RawIOBase class does not.

The RawIOBase is the abstract base class for all raw IO classes. For example, the SocketIO class from the socket module subclasses io.RawIOBase to implement the raw IO interface over sockets. The BufferedIOBase is the abstract base class for all buffered IO classes, such as BufferedRWPair and BytesIO, and TextIOBase is the abstract base class for all the text-aware IO classes, such as StringIO and TextIOWrapper.

My take on the entire IO hierarchy is that it is mostly useful for standard library developers. Even if you wanted to develop your own io-compatible stream class the library wouldn't help you too much. Here's an example custom stream class that spits out Fibonacci numbers. It is a read-only stream that sub-classes io.IOBase and implements the read() and readable() methods. I couldn't subclass RawIOBase, BufferedIOBase or TextIOBase because they don't return integers:

   class FibonacciStream(io.RawIOBase):
     def __init__(self, limit : int):
       self.a = 0
       self.b = 1
       self.limit = limit
     def read(self) -> int:
       if self.a >= self.limit:
         return None
       self.b += self.a
       self.a = self.b - self.a
       return self.b - self.a
     def readable(self):
       return True
Here is FibonacciStream in action:

   fs = FibonacciStream(50)
   assert fs.readable()
   assert not fs.writable()
   x = fs.read()
   while x is not None:
     print(x, end=' ')
     x = fs.read()
   0 1 1 2 3 5 8 13 21 34
When implementing this class I discovered a serious design flaw—there's no standard way to signal end of stream (EOF). The IOBase class is silent on the issue. The other classes signal it using special return values from the read() method, or by throwing an exception. This is not a trivial problem, especially when you consider buffering, where even the stream itself might not know if more data is coming. The bottom line is that code that has to work with streams must work at a higher-level than IOBase. This precludes generic use cases such as counting how many items are in a stream. For FibonacciStream, I chose to return None to signal EOF.

One nice (but not earth-shattering) benefit I got from IOBase is that it automatically raises UnsupportedException for all the methods I didn't implement.

The ast Module

This new module is a high-level interface for abstract syntax trees that sits on top of the low-level _ast module introduced in Python 2.5. An abstract syntax tree (AST) is the parsed representation of Python source code that the interpreter uses internally. You can get an AST from a piece of source code using the built-in compile() function using a special flag or from the ast module's parse() function.

You might want to work at the AST level if you needed to analyze some Python code—it might be easier than trying to parse the source itself. The ast module provides you with an object model that you can navigate through and drill down into, which is supposedly easier than dealing with a chunk of text. Moreover, you can also modify an AST and generate Python code object for execution.

As an example, the following code parses a short Python code snippet string using the ast module, and dumps the AST:

   import ast
   s = """
   def f(start : int, end : int):
     """Print the numbers from start to end"""
     for i in range(start, end):
   a = ast.parse(f)
   "Module(body=[FunctionDef(name='f', args=arguments(
   args=[arg(arg='start', annotation=Name(
   id='int', ctx=Load())), arg(arg='end', 
   annotation=Name(id='int', ctx=Load()))], vararg=None, 
   varargannotation=None, kwonlyargs=[], kwarg=None, 
   kwargannotation=None, defaults=[], 
   kw_defaults=[]), body=[Expr(
   value=Str(s='Print the numbers from start to end')), 
   For(target=Name(id='i', ctx=Store()), 
   iter=Call(func=Name(id='range', ctx=Load()), 
   args=[Name(id='start', ctx=Load()), 
   Name(id='end', ctx=Load())], keywords=[], starargs=None, 
   kwargs=None), body=[Expr(value=Call(
   func=Name(id='print', ctx=Load()), 
   args=[Name(id='i', ctx=Load())], keywords=[], 
   starargs=None, kwargs=None))], orelse=[])], 
   decorator_list=[], returns=None)])"
The output is a little dense because the dump() function is intended primarily for debugging and not for pretty printing; however it reveals all the information used by the interpreter. The root object is Module, which has a body that contains a single FunctionDef object. If there were more import statements, functions, classes and global variables then you would see more top-level objects. The FunctionDef object contains everything you need to know about a function and its arguments. The function body contains the comment string, the For object and Call objects for the range() and print() functions. It's pretty clear that you don't want to try to parse this out yourself; instead, fight code with code, and let Python process the ASTs for you.

Here's another example. Suppose you have a program that receives snippets of Python source code, each containing a process() function with a certain signature that your program executes. To prevent name clashes, you want to make sure that you rename the process() function to a unique name. You also need to ensure that each snippet defines nothing else except import statements.

The ast module accomodates these requirements very well. Here's an example of a code snippet that doesn't fulfill the requirements, because it defines a global variable called name in addition to the required process() function and the import statements:

   plugin_source = """
   name = 'some_plugin'
   import os, sys
   from pprint import pprint as pp
   def process(start, end, verbose=True):
     if verbose:
       print('start:', start, 'end:', end)
     for i in range(start, end):
AST-aware code can detect the unwanted name definition, and issue a warning. The load_snippet() function checks to make sure that a snipped uses only valid statements. It verifies that the snipped contains exactly one function called process, which it renames to process_<long random suffix> to prevent name clashes. It compiles and executes compliant snippets, and returns the function object:

Author's Note: The renaming part is a little silly, because the process function definition gets executed inside the load_snippet() namespace, so it can't pollute the global namespace—but it serves for demonstration purposes.

   def load_snippet(s):
     t = ast.parse(s)
     function_defs = []
     # check that all statements are either imports or function definitions
     for i, x in enumerate(t.body):
       if type(x) not in [ast.Import, ast.ImportFrom, ast.FunctionDef]:
         raise RuntimeError('Detected invalid statement of type:', type(x))
       if isinstance(x, ast.FunctionDef):
         function_defs.append((i, x))
     # check that there is exactly one function
     if len(function_defs) != 1:
       raise RuntimeError('Found {0} functions'.format(len(function_defs)))
     # Check that the function is called 'process'
     f = function_defs[0][1]
     if f.name != 'process':
       raise RuntimeError('Misnamed function: {0.name} (should be "process")'.format(f))
     # Rename the process function to a unique name
     index = function_defs[0][0]
     unique_name = 'process_{0}'.format(random.getrandbits(100))
     t.body[index].name = unique_name
     # Compile and execute the plugin code as a module
     code = compile(t, '<string>', 'exec')
     func = locals()[unique_name]
     return func
When you run the load_snippet() method on the plugin_source snippet you get an exception, because it detects the invalid assignment to the name variable.

   >>> load_snippet(plugin_source)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "article_3.py", line 366, in load_snippet
       raise RuntimeError('Detected invalid statement of type:', type(x))
   RuntimeError: ('Detected invalid statement of type:', <class '_ast.Assign'>)
If you comment out the name assignment statement, everything works, and you can invoke the returned function object.

   >>> process_func = load_snippet(plugin_source)
   >>> process_func
   <function process_650405713977730110111762012007 at 0xed858>
   >>> process_func(3,6)
   start: 3 end: 6
To summarize, the ast module gives you instant access to the Python interpreter's internal representation, which can be a very powerful tool if you need to manipulate Python source code. The examples here only scratch the surface. If you plan to work with this module, you should look into the NodeVisitor and NodeTransformer classes. The AST even contains line number and column number information!

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date