devxlogo

A Developer’s Guide to Python 3.0: Standard Library

A Developer’s Guide to Python 3.0: Standard Library

he previous articles in the series covered the most significant changes to the core language and type system and the basic data types in Python 3.0. This article covers the changes to the standard library.

PEP 3108 – Standard Library Reorganization

The Python standard library is one of the strongest libraries around and adequately supports Python’s motto of “batteries included.” But, over the years some cruft accumulated in the standard library. Python 3.0 takes full advantage of the fact that it’s not backwardly compatible, using that lack of compatibility as a way to “clean house.” PEP 3108 describes in detail the changes to the standard modules.

Removed modules

For Python 3.0, many older modules were simply removed. These were either already deprecated, or dropped because they weren’t used often, or because better alternatives were already available in the standard library or in third-party packages. Many platform-specific modules were removed too, including:

  • The dl module, which has been superseded by ctypes
  • The dircache module, which was rarely used and easily implemented
  • The ihooks module, because it was undocumented and used only by rexec (turned off in Python 2.3 due to security issues), which has now been removed as well.
  • popen2 is gone (use subprocess) as well as sets (use the builtin set and frozenset types)
  • md5 and sha are gone (use hashlib)

Renamed Modules

Other modules were renamed to comply with PEP-8 style (short, all lowercase, and underscores may be used). I’m not particularly fond of this naming convention, which is sometimes referred to as “allwordssmashedtogether” for obvious reasons. For example, SocketServer has been renamed to socketserver. I would prefer a mandatory underscore between words (e.g. socket_server), because I feel that improves readability in a significant way.

Every successful naming scheme I know of has a way to emphasize word boundaries in multi-word names: Lisp uses hyphens (multi-word-name), other common methods are camel-casing (MultiWordName or multiWordName) and underscores, (multi_word_name). The Ada programming language went a little overboard and promotes both capitalization and underscores (Multi_Word_Name). Anyway, whether you like it or not, Python 3.0 adheres to this new official naming convention consistently across the standard library.

Some modules are not intended for public consumption; they’re used only by other modules from the standard library. These have a single underscore prefix in the name to make their use clearer. For example, markupbase has been renamed to _markupbase.

There were some larger changes where multiple modules have been renamed and moved into a package. For example, Python 3.0’s html, http and tkinter packages contain a number of formerly top-level modules. The http.server module now contains the former BaseHTTPServer, CGIHTTPServer, and SimpleHTTPServer.

The new urllib package contains functionality from the erstwhile urllib, urllib2, urlparse, and robotparser organized into slightly different modules: urllib.request, urllib.error, urllib.parse, and urllib.robotparser.

New Modules

OK, that’s enough boring stuff. Python 3.0 introduces some exciting new modules too (these are all distributed with Python 2.6 as well).

The processing Module

Python’s threading model (the CPython implementation) can’t utilize multiple cores. This is a serious drawback for a general-purpose programming language in today’s world, where multi-code processing is becoming mandatory. Due to Python’s design—the infamous Global Interpreter Lock (GIL)—it is extremely difficult to fix. A lot of digital ink has been spilled about it. The official position is that there are better ways to do parallel programming than multithreading. I happen to agree with the management this time around. Enter the processing module. This module presents an interface almost identical to the threading module, which is the standard Python multithreading interface. However instead of running jobs in different threads in the same process it launches multiple processes. The processing module is not a panacea though; it has some stringent requirements. Often, you can’t simply replace Thread with Process in multithreaded code.

The processing module supports many programming models and interprocess communication models. The main class you work with is of course the Process, which is the equivalent of the Thread class in the threading module. When you create a Process you pass it a function and arguments. After you call start(), it creates a new process and executes the function with the provided arguments in the new process. After the function completes, the processing module terminates the process.

The parent process can exchange information with the child processes in multiple ways. Here’s an example that uses a Manager object to pass a shared list to multiple child processes that simply write their process ID (obtained via os.getpid()). The Manager runs in a server process and communicates via Python objects such as lists and dictionaries:

   import os   import multiprocessing   from multiprocessing import Process, Manager      def process(id, results):     results[id] = os.getpid()      cpu_count = multiprocessing.cpu_count()   manager = Manager()   results = manager.list([0] * cpu_count)   processes = []   for i in range(cpu_count):     p = Process(target=process, args=[i, results])     p.start()     processes.append(p)      for p in processes:     p.join()        print(results)

Each child process runs the process() function. The important utility function cpu_count() tells you how many cores are available. It is often useful to create as many processes as the available cores: If you create fewer, you underutilize your hardware, and if you create more, the processes must share the available cores. The code creates a process for each available core and passes it both the results list obtained from the manager and the ID of each process. This allows each child process to know its place and write to a separate entry in the list. Each process is started, and then the parent process waits for the processes to finish, using the join() method that blocks until the target process is done.

That’s nice code, but it contains a lot of ceremony. The processing module also has a Pool class that works at a little higher level of abstraction. It represents a pool of worker processes that allows you to run parallel tasks and collect the results easily via a map() function that operates just like the standard map() except that it runs on multiple processes. You’ll find this functionality particularly suitable for embarrassingly parallel scenarios where you can divide your data into independent chunks and feed each chunk to a worker process, which processes it and returns the results.

For example, suppose you have a big database of textual works, and you want to find all the palindromes in it (a palindrome being a word (or phrase, but this deals only with single words) that reads the same whether read forward or backward, such as “dad” or “noon”).

First, you need to make some words. I tried to use generators, but I discovered that you can’t pass a generator as input to a process—that’s a limitation of the processing module. Generally, when you pass an object to a child process the object is pickled and the process received the serialized object, which it unpickles. Unpicklable objects can’t be used.

The following code shows an example. It takes a generator (the generate_words() function) and converts it to a function that returns a plain list using a partial application of the listify decorator. The example demonstrates the type of preprocessing you will have to do to prepare your data to be passed to worker processes:

   def generate_words(word_count, word_length):     a = ord('a')     for i in range(word_count):       word = ''       for j in range(word_length):         word += chr(a + random.randint(0, 25))       yield word      def listify(f, *args, **kwdargs):     return list(f(*args, **kwdargs))      make_words = partial(listify, generate_words)

The goal here is to find all the palindromes in a list of 8,000 4-letter words. The work gets split between two processes, so the 8,000 words gets split into two 4,000 word lists. This is clearly an embarrassingly parallel problem:

   input_list = [make_words(4000, 4) for x in range(2)]

The function that finds palindromes in a word list is simple: It basically checks if the first half of each word is equivalent to its reversed second half:

   def find_palindromes(words):     results = []     for w in words:       half = len(w) // 2 # Note the explicit floor division       if w[:half] == w[-half:][::-1]:         results += [w]        return results

Running two such palindrome finders in parallel is as easy as instantiating the Pool and calling map() on the input list. There are two items in the input list (the two 4000 word lists) so two processes will be created and run under the covers:

   pool = Pool()   results = pool.map(find_palindromes, input_list)      print(results)

Passing large amounts of data to/from your processes (as in this case) erodes the performance boost you get from running in parallel. You will have to experiment and measure to verify that your code doesn’t actually run more slowly due to I/O requirements.

In many real-world problems you need to interact with the worker processes and send/receive information during the computation. A perfect example is finding prime numbers. One of the earliest and most famous algorithms is the Sieve of Eratosthenes, a very elegant algorithm that uses only addition to filter out non-primes. Here’s a little Python implementation that includes a few optimizations, such as starting only with the odd numbers (the prime 2 gets added at the end) and skipping multiples of the current prime.

   def prime_sieve(n):     # start with the odd numbers >= 3 (we'll add 2 in the end)     primes =  list(range(3, n+1, 2))        # It's enough to go until the square root of n.     limit = int(n ** 0.5)     if limit % 2 == 0:       limit -= 1     for i in range(limit // 2):       cur_prime = primes[i]       if cur_prime == 0:         continue       # index i is our next prime, skip it and first at the next multiply.       for j in range(cur_prime, len(primes) - i, cur_prime):         # Zero out multiples of the current prime         primes[j + i] = 0        # That's it. All the non-zeroes are primes.     return [2] + [p for p in primes if p > 0]      primes = prime_sieve(80)   print(primes)

Now, the question is how to make this algorithm parallel so it can execute using multiple simultaneous processes. It’s pretty clear that you will need to divide the integer space between all the processes, letting each process work on a separate range. For example, if you want to find all the primes up to 80, and you have two processors, you can decide that one will be in charge of all the primes up to 40, and the second will take charge of all the primes from 41 to 80. Unfortunately, the sieve algorithm works only when you start from the beginning, because it has to filter out all the primes found so far. The process responsible for 41-80 needs to know about the primes below 40 found by the first process. This means that a proper parallel sieve algorithm needs to pass information between processes.

In addition, you need to collect the results somehow. The algorithm below uses a main process to divide the ranges and instantiate child processes. It also establishes pipes to communicate with the child processes. The general flow is that whenever a process finishes processing its range, it sends all the primes it found to the main process. The main process sends all these primes to downstream processes. As each process completes, it sends its primes back to the main process, and so forth.

First, here’s the code for the sieve() function that each child process runs:

   def sieve(first, last, conn):     start = time.time()     numbers = list(range(first, last))        while (True):       number, skip = conn.recv()       if number == -1:         # start working on its own primes         for i, n in enumerate(numbers):           if n == 0 or n * n >  last:             continue           for j in range(i + n, len(numbers), n):             numbers[j] = 0            conn.send([n for n in numbers if n > 0])         return       else:         index = number % first         i = None         for i in range(index, len(numbers), skip):           numbers[i] = 0

Note the connection object used to receive and send information bidirectionally to the main process. When the process receives -1, that means no more primes are coming from upstream processes and it can send its own primes to the main process and complete. When it receives an initial number and skip value (a prime number), it zeroes all the intervals of the skip number, starting with the initial number.

The logic for the main process is in the collect_primes() function. It takes a list of ranges, creates a child process and a pipe for each range and launches each process with the proper range and connection. Then it ignites the process by sending the known primes 2, 3, 5, and 7 to all the processes and starts the main loop, which tells the current process that no more primes are coming, and then waits to get its primes, which in turn are sent to subsequent processes. This procedure repeats until all the child processes have returned their primes, collected in the all_primes list. The main process waits for the child processes to finish using the join() method:

   def collect_primes(ranges):     """Start multiple processes and calculate all the      primes in the input ranges        ranges - a list of pairs that represent a contiguous range of integers     """     # Init (prepare pipes, processes, ranges and conections)     pipes = []     parent_conn, child_conn = Pipe()     pipes.append(parent_conn)        processes = []     conns = []     for i, r in enumerate(ranges):       parent_conn, child_conn = Pipe()       conns.append(parent_conn)       p = Process(target=sieve, args=[r[0], r[1], child_conn])       processes += [p]       p.start()        # A list of the first number each process is reponsible for     first_numbers = [r[0] for r in ranges]     last_numbers = [r[1] for r in ranges]        # Initial known primes     all_primes = [2, 3, 5, 7]        # The primes to send to the current process     primes = all_primes     for i in range(len(conns)):       # Send primes to all downstream processes       send_primes(primes, conns[i:], zip(first_numbers[i:],           last_numbers[i:]))          # Tell the current process no more primes are coming       conns[i].send((-1, None))          # Receive the primes of the current process       primes = conns[i].recv()       all_primes += primes        # All processes should be done by now     for p in processes:       p.join()        return all_primes

The main process uses a utility function called send_primes(), shown below for completeness.

   def send_primes(primes, conns, limits):     for prime in primes:       for conn, limit in zip(conns, limits):         n = limit[0]         last = limit[1]         if n % prime == 0:           number = n         else:           number = n + (prime - n % prime)            if number 

Here's collect_primes() in action, computing all primes below 50 using two sub-processes:

   >>> collect_primes(ranges=((10, 30), (30, 50)))   [2, 3, 5, 7, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 35, 37, 39, 41, 43, 45, 47, 49]

The processing module is very flexible and powerful, but it requires a solid understanding of parallel programming concepts, and also has some special requirements compared to the threading module. For example, as mentioned earlier, objects that you want to share between processes via Pipe or Queue must be picklable. This turns out to be a very harsh requirement in many real-world situations.

Author's Note: In a recent project I needed to run some computations in parallel. The Python objects that performed the computation used C++ extensions, which are not picklable. I was in quite a pickle to say the least. I had to implement an additional layer on top of the C++ extensions that extracted the essential information from each unpicklable object into a serialized form, and then implement a factory to instantiate new objects based on that serialized form. This is not only considerable extra work, but it has performance implications, too. In my case, the required objects were very expensive to construct.

The limitations and restrictions are detailed in the programming guidelines section of the documentation. Make sure you read it if you plan to use this module.

The io Module

The io module formalizes the concept of streams in Python. Python always supported the idiom of file-like objects that have a read() and write() method, but it was pretty fuzzy, and left many important features and subtleties to the implementer. The new io module is specified in PEP-3116. It presents three layers of I/O: raw, buffered, and text. Raw I/O and buffered I/O operate on bytes, while text I/O operates on characters.

Each layer consists of one or more abstract base classes along with concrete classes that implement them. In addition, the module implements the familiar built-in open()function.

The base class for all IO streams is IOBase. It is officially an abstract base class because it subclasses abc.ABCMeta, but it has no abstract methods or properties (methods and properties that subclasses must define). It defines the following methods and properties: close(), closed, flush(), readable(), readline(), readlines(), seek(), seekable(), tell(), truncate(), writable(), and writelines(). It does not define read() and write() methods, because their signature is different for different streams—but these are also considered part of the interface. In addition, IOBase supports iteration (it implements __iter__ and __next__) and the with keyword (it implements __enter__ and __exit__). It provides dummy implementations for all these methods that represent a file that you can't read from, write to, or seek. Subclasses are expected to override or add key methods.

Three classes directly sub-class IOBase: RawIOBase, BufferedIOBase and TextIOBase. RawIOBase adds the methods: read(), readall(), readinto(), and write(). BufferedIOBase adds the methods: read(), readinto(), and write(). TextIOBase adds/implements the methods: read(), readline(), truncate(), and write(). Each of these classes provides its own flavor. For example, the BufferedIOBase class's read() method raises a BlockingError exception if there is no data available in its buffer, while the RawIOBase class does not.

The RawIOBase is the abstract base class for all raw IO classes. For example, the SocketIO class from the socket module subclasses io.RawIOBase to implement the raw IO interface over sockets. The BufferedIOBase is the abstract base class for all buffered IO classes, such as BufferedRWPair and BytesIO, and TextIOBase is the abstract base class for all the text-aware IO classes, such as StringIO and TextIOWrapper.

My take on the entire IO hierarchy is that it is mostly useful for standard library developers. Even if you wanted to develop your own io-compatible stream class the library wouldn't help you too much. Here's an example custom stream class that spits out Fibonacci numbers. It is a read-only stream that sub-classes io.IOBase and implements the read() and readable() methods. I couldn't subclass RawIOBase, BufferedIOBase or TextIOBase because they don't return integers:

   class FibonacciStream(io.RawIOBase):     def __init__(self, limit : int):       self.a = 0       self.b = 1       self.limit = limit        def read(self) -> int:       if self.a >= self.limit:         return None       self.b += self.a       self.a = self.b - self.a       return self.b - self.a        def readable(self):       return True

Here is FibonacciStream in action:

   fs = FibonacciStream(50)      assert fs.readable()   assert not fs.writable()      x = fs.read()      while x is not None:     print(x, end=' ')     x = fs.read()   print()         Output:      0 1 1 2 3 5 8 13 21 34   

When implementing this class I discovered a serious design flaw—there's no standard way to signal end of stream (EOF). The IOBase class is silent on the issue. The other classes signal it using special return values from the read() method, or by throwing an exception. This is not a trivial problem, especially when you consider buffering, where even the stream itself might not know if more data is coming. The bottom line is that code that has to work with streams must work at a higher-level than IOBase. This precludes generic use cases such as counting how many items are in a stream. For FibonacciStream, I chose to return None to signal EOF.

One nice (but not earth-shattering) benefit I got from IOBase is that it automatically raises UnsupportedException for all the methods I didn't implement.

The ast Module

This new module is a high-level interface for abstract syntax trees that sits on top of the low-level _ast module introduced in Python 2.5. An abstract syntax tree (AST) is the parsed representation of Python source code that the interpreter uses internally. You can get an AST from a piece of source code using the built-in compile() function using a special flag or from the ast module's parse() function.

You might want to work at the AST level if you needed to analyze some Python code—it might be easier than trying to parse the source itself. The ast module provides you with an object model that you can navigate through and drill down into, which is supposedly easier than dealing with a chunk of text. Moreover, you can also modify an AST and generate Python code object for execution.

As an example, the following code parses a short Python code snippet string using the ast module, and dumps the AST:

   import ast      s = """   def f(start : int, end : int):     """Print the numbers from start to end"""     for i in range(start, end):       print(i)   """      a = ast.parse(f)   ast.dump(a)      Output:      "Module(body=[FunctionDef(name='f', args=arguments(   args=[arg(arg='start', annotation=Name(   id='int', ctx=Load())), arg(arg='end',    annotation=Name(id='int', ctx=Load()))], vararg=None,    varargannotation=None, kwonlyargs=[], kwarg=None,    kwargannotation=None, defaults=[],    kw_defaults=[]), body=[Expr(   value=Str(s='Print the numbers from start to end')),    For(target=Name(id='i', ctx=Store()),    iter=Call(func=Name(id='range', ctx=Load()),    args=[Name(id='start', ctx=Load()),    Name(id='end', ctx=Load())], keywords=[], starargs=None,    kwargs=None), body=[Expr(value=Call(   func=Name(id='print', ctx=Load()),    args=[Name(id='i', ctx=Load())], keywords=[],    starargs=None, kwargs=None))], orelse=[])],    decorator_list=[], returns=None)])"

The output is a little dense because the dump() function is intended primarily for debugging and not for pretty printing; however it reveals all the information used by the interpreter. The root object is Module, which has a body that contains a single FunctionDef object. If there were more import statements, functions, classes and global variables then you would see more top-level objects. The FunctionDef object contains everything you need to know about a function and its arguments. The function body contains the comment string, the For object and Call objects for the range() and print() functions. It's pretty clear that you don't want to try to parse this out yourself; instead, fight code with code, and let Python process the ASTs for you.

Here's another example. Suppose you have a program that receives snippets of Python source code, each containing a process() function with a certain signature that your program executes. To prevent name clashes, you want to make sure that you rename the process() function to a unique name. You also need to ensure that each snippet defines nothing else except import statements.

The ast module accomodates these requirements very well. Here's an example of a code snippet that doesn't fulfill the requirements, because it defines a global variable called name in addition to the required process() function and the import statements:

   plugin_source = """      name = 'some_plugin'      import os, sys   from pprint import pprint as pp      def process(start, end, verbose=True):     if verbose:       print('start:', start, 'end:', end)     for i in range(start, end):       pp(i)   """

AST-aware code can detect the unwanted name definition, and issue a warning. The load_snippet() function checks to make sure that a snipped uses only valid statements. It verifies that the snipped contains exactly one function called process, which it renames to process_long random suffix> to prevent name clashes. It compiles and executes compliant snippets, and returns the function object:

Author's Note: The renaming part is a little silly, because the process function definition gets executed inside the load_snippet() namespace, so it can't pollute the global namespace?but it serves for demonstration purposes.

   def load_snippet(s):     t = ast.parse(s)     function_defs = []     # check that all statements are either imports or function definitions     for i, x in enumerate(t.body):       if type(x) not in [ast.Import, ast.ImportFrom, ast.FunctionDef]:         raise RuntimeError('Detected invalid statement of type:', type(x))       if isinstance(x, ast.FunctionDef):         function_defs.append((i, x))        # check that there is exactly one function     if len(function_defs) != 1:       raise RuntimeError('Found {0} functions'.format(len(function_defs)))        # Check that the function is called 'process'     f = function_defs[0][1]     if f.name != 'process':       raise RuntimeError('Misnamed function: {0.name} (should be "process")'.format(f))        # Rename the process function to a unique name     index = function_defs[0][0]     unique_name = 'process_{0}'.format(random.getrandbits(100))     t.body[index].name = unique_name        # Compile and execute the plugin code as a module     code = compile(t, '', 'exec')     exec(code)     func = locals()[unique_name]        return func

When you run the load_snippet() method on the plugin_source snippet you get an exception, because it detects the invalid assignment to the name variable.

   >>> load_snippet(plugin_source)   Traceback (most recent call last):     File "", line 1, in      File "article_3.py", line 366, in load_snippet       raise RuntimeError('Detected invalid statement of type:', type(x))   RuntimeError: ('Detected invalid statement of type:', )

If you comment out the name assignment statement, everything works, and you can invoke the returned function object.

   >>> process_func = load_snippet(plugin_source)   >>> process_func      >>> process_func(3,6)   start: 3 end: 6   3   4   5

To summarize, the ast module gives you instant access to the Python interpreter's internal representation, which can be a very powerful tool if you need to manipulate Python source code. The examples here only scratch the surface. If you plan to work with this module, you should look into the NodeVisitor and NodeTransformer classes. The AST even contains line number and column number information!

The json Module

JSON stands for "JavaScript Object Notation," which is a popular data exchange format, especially for AJAX applications. JavaScript can evaluate JSON and have a ready-made data structure to work on. JSON can represent infinitely nested lists (Arrays in JavaScript) and dictionaries (Object in JavaScript) of numbers, strings and Booleans. You can find the JSON specification in RFC 4627.

Python 3.0's json module provides a pickle-like interface that uses loads and dumps calls. Here's a simple example:

   >>> d = {'a': [(1, 4), (2, 5), (3, 6)], 'b': [1, 2, 3, 1, 2, 3]}   >>>json_text = json.dumps(d)   >>>print(json_text)   {"a": [[1, 4], [2, 5], [3, 6]], "b": [1, 2, 3, 1, 2, 3]}      >>> json.loads(json_text)   {'a': [[1, 4], [2, 5], [3, 6]], 'b': [1, 2, 3, 1, 2, 3]}

As you can see, JSON is pretty similar to Python lists and dictionaries. Tuples are converted to arrays, and single quotes are converted to double quotes, but overall, it should look pretty familiar.

As a test, I tried calling the Google AJAX search API, which returns data in JSON format. I used the urllib module to get the results for the query "python rocks," which returned some JSON. I then used the json module to decode the results into accessible Python data structures:

   from urllib.request import urlopen   from urllib.parse import urlencode   import json      query = urlencode(dict(q='python rocks'))   #url_mask = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&{0}&start={1}&rsz=large'   #url = url_mask.format(query, 0)   url_mask = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&{0}'   url = url_mask.format(query)      # The [2:-1] slicing is to get rid of the "b'" prefix and the "'" suffix   text = str(urlopen(url).read())[2:-1]   response = json.loads(text)   results = response['responseData']['results']   for r in results:     print(r['url'])        Output:      http://pythonrocks.com/   http://mail.python.org/pipermail/python-list/2000-September/051415.html   http://personalpages.tds.net/~kent37/stories/00020.html   http://personalpages.tds.net/~kent37/blog/

The response data structure contains a lot of information. I drilled down directly to the results (response['responseData']['results']). Each result is a dictionary that uses the following keys: ['GsearchResultClass', 'visibleUrl', 'titleNoFormatting', 'title', 'url', 'cacheUrl', 'unescapedUrl', 'content'].

By default, you get only four results. I added a couple of query parameters (see the commented lines) to get more results and json broke down. It turns out that the json module can't handle some of the unicode properly (even though it's valid JSON) that the Google search returns. This comparison between various Python implementations of json modules reports some bugs in Unicode handling, even though the standard library json module is based on the simplejson module—which actually gets perfect marks in the unicode part of the comparison.

The ssl Module

The ssl (Secure Socket Layer) module is a wrapper around the OpenSSL (if installed) library. OpenSSL should be available on any modern OS. The ssl module lets you create encrypted sockets and authenticate on the other side. It can be used for both client-side (connect to a secure server) and server-side (accept secure connections from clients) applications. The main function is wrap_socket(), which takes a standard network socket and returns an SSLSocket object. You need a certificate to connect. Certificates are pairs of private and public keys, and are used both for identification/authentication and for encrypting/decrypting the payload. I don't have access to a certificate (which needs to be issued by a certificate authority), so I couldn't test the ssl module; however, here's some sample code from the ssl module's documentation for client-side operation:

   import socket, ssl, pprint      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)      # require a certificate from the server   ssl_sock = ssl.wrap_socket(s,      ca_certs="/etc/ca_certs_file",      cert_reqs=ssl.CERT_REQUIRED)      ssl_sock.connect(('www.verisign.com', 443))      print (repr(ssl_sock.getpeername()))   print (ssl_sock.cipher())   print (pprint.pformat(ssl_sock.getpeercert()))      # Set a simple HTTP request -- use httplib in actual code.   ssl_sock.write("""GET / HTTP/1.0
   Host: www.verisign.com

""")      # Read a chunk of data.  Will not necessarily   # read all the data returned by the server.   data = ssl_sock.read()      # note that closing the SSLSocket will also    # close the underlying socket   ssl_sock.close()

Modified Modules

Many of the already-existing modules have been modified in Python 3.0. I've picked the ones that I think are the most important and interesting, but you should be aware that there are others, some of which might be the most important and interesting to you.

The os Module

The os module gained some new functionality. It now handles open files and symlinks much better on *NIX systems. The fchmod() and fchown() functions can change the mode and owner of an open file; they work just like chmod() and chown() but accept the file object itself rather than its path. The lchmod() function changes the mode of a symlink. In addition the os.walk() function can now follow symlinks if the new followlinks argument is True (it's False by default for backward compatibility).

The os.path module has a few useful modifications. The splitext() function used to split a string on the last dot, which resulted in a strange splitting for *NIX-formatted dot files such as .bashrc:

   >>> os.path.splitext('.bashrc')   ('', '.bashrc')

In Python 3.0, leading dots are ignored, which results in a more natural splitting:

   >>> os.path.splitext('.bashrc')   ('.bashrc', '')

A new function, os.path.relpath returns the relative path from a given start path. This is very useful when dealing with lots of nested files and directories (which happens often):

   >>> os.path.relpath('/a/b/c/d.txt', '/a/b')   'c/d.txt'

os.path.expandvars() used to be a *NIX function that expanded shell variables of the form $VAR and ${VAR}. Now, it works on Windows too, and can expand environment variables of the form %VAR% (the Windows shell naming convention):

   >>> os.environ['PROJECTS'] = 'c:/projects'   >>> os.path.expandvars('%PROJECTS%/cool_project')   'c:/projects/cool_project'

The os.path.expanduser function also works on Windows now. It uses the *NIX convention of converting the tilde (~) symbol into the user's home directory:

   >>> os.path.expanduser('~')   'C:\Documents and Settings\gigi' 

The turtle Module

Python 3.0 supplies an extensive reimplementation of the turtle module. I had never even heard of the turtle module before, and I was pretty surprised to find out it's a Tkinter-based turtle graphics module similar to Logo. Logo is an educational programming language intended to simplify teaching programming to kids. The idea is that you have a "turtle" on-screen that you control with simple commands.

You can work with the turtle interactively from the command line. First, you need to create a turtle object, which automatically pops up a graphics window, and places the turtle in the center pointing to the right:

   from turtle import Turtle   t = Turtle()

Subsequently, you can interact with the turtle and tell it what to do. The following commands draw a triangle on the screen by telling the turtle to move forward (fd) 100 pixels, and then turn 120 degrees to the right (rt), repeated three times:

   >>> for i in range(3):   ...   t.fd(100)   ...   t.rt(120)

Here's a short turtle script that draws a yellow and purple checkered board. It uses several new commands and capabilities. The pu (pen up) command tells the turtle to lift the pen (not draw). The pd (pen down) command tells it to start drawing again. The goto command causes the turtle to move to an absolute position (the center of the screen is 0,0). The color command sets pen and fill colors. The begin_fill and end_fill wrap turtle commands and serve to fill a drawn shape. This time, the code turns the turtle to the left with the lt command:

   from turtle import Turtle   t = Turtle()      def draw_square(t, x, y, d, c):     print (x, y)     t.pu()     t.goto(x, y)     t.color('black', c)     t.pd()     t.begin_fill()     for i in range(4):       t.fd(d)       t.lt(90)     t.end_fill()      d = 100  # size of each square will be d x d pixels   n = 4   # will draw a checkered board of size n x n      # Make sure the whole thing is centered   offset = -(d * n / 2)      for x in range(n):     for y in range(n):       c = 'purple' if (x + y) % 2 == 0 else 'yellow'       draw_square(t, offset + x * d, offset + y * d, d, c)
 
Figure 1. Fancy Turtle Graphics: This complex looping shape was generated by a short Logo program translated to Python and uses the turtle module.

Here's another program translated from Logo that draws a fancy loopy shape (see Figure 1):

   from turtle import Turtle   t = Turtle()      for i in range(36):     t.rt(10)     for j in range(36):       t.rt(10)       t.fd(20)

Finally, the reset command simply erases everything already drawn and moves the turtle to the center of the screen, pointing to the right.

The turtle module is a lot of fun, but you can also use it for more serious purposes such as drawing charts and graphs, circles, and even text.

The ctypes Module

The ctypes module is a very slick foreign function interface. It became part of the standard library in Python 2.5 and continues to improve. It allows you to access C DLLs and shared libraries. For example, here's how you can display a standard Windows message box:

   >>> import ctypes   >>> ctypes.windll.user32.MessageBoxW(0, 'ctypes rocks!', 'Title', 0) 

This goes almost directly to the Win32 C API. On Unix/Linux/Mac OSX you access libraries with the cdll object. The following code snippet calls the rand() function in the C runtime library (libc). The code first finds the libc in a cross-platform way (works on Linux and on OSX) using the find_library() function. It then loads the library using the CDLL call and finally calls libc.rand() in a loop.

   import ctypes   from ctypes.util import find_library      libc_path = find_library('c')   libc = ctypes.CDLL(libc_path)      for i in range(4):     print(libc.rand() % 10)

The ctypes module now has a C99 Bool type. I'm not sure how useful it is. Perhaps you might need it if you have a C dynamic library that uses the C99 Bool type in its API as an input or output argument, or as a member in a struct/union. Here is how you create a c_bool value from a Python int:

   >>> from ctypes import c_bool   >>> c_bool(7)   c_bool(True)

You can pass virtually any Python value to construct a c_bool. The rules are simple: Python's False is False (duh!), every zero numeric value (int, float, complex, fractions.Fraction, decimal.Decimal) is False, None is False, an empty string is False. Everything else (including functions, objects and classes) is True:

The following code lists all the False values:

   >>> import decimal   >>> import fractions   >>> from ctypes import c_bool      >>> c_bool(False)   c_bool(False)      >>> c_bool(None)   c_bool(False)      >>> c_bool('')   c_bool(False)      >>> c_bool(0)   c_bool(False)      >>> c_bool(0.0)   c_bool(False)      >>> c_bool(decimal.Decimal('0.0'))   c_bool(False)      >>> c_bool(fractions.Fraction(0, 5)   c_bool(False)      >>> c_bool(complex(0, 0))   c_bool(False)

And this code lists a representative sample of True values:

   >>> import decimal   >>> import fractions   >>> from ctypes import c_bool      >>> c_bool(True)   c_bool(True)      >>> c_bool('Something')   c_bool(True)      >>> c_bool(-7)   c_bool(True)      >>> c_bool(0.3)   c_bool(True)   >>> c_bool(5.5)   c_bool(True)   >>> c_bool(-4.4)   c_bool(True)         >>> c_bool(decimal.Decimal('3.0'))   c_bool(True)   >>> c_bool(decimal.Decimal('0.7'))   c_bool(True)   >>> c_bool(decimal.Decimal('-2.2'))   c_bool(True)      >>> c_bool(fractions.Fraction(0, 5)   c_bool(True)      >>> c_bool(complex(3, 0))   c_bool(True)   >>> c_bool(complex(0, 5))   c_bool(True)   >>> c_bool(complex(-11.22, -11.22))   c_bool(True)

You can use the c_bool type just like a Python bool:

   >>> assert(ctypes.c_bool(True))   >>> assert(not ctypes.c_bool(False))

The ctypes module always had arrays that you could slice using the array[start:end] syntax. Now, you use the full slice syntax array[start:end:step]. You can define a fixed-size array type by multiplying a ctypes data type by a fixed integer:

   int_6_array_type = c_int * 6

The preceding code just creates a type; you need to instantiate it to get a usable array object. You can initialize ctypes arrays with values in the constructor or fill them with a default value (0 for ints):

   a = int_6_array_type()   for i in range(6):     assert a[i] == 0     a[i] = i

Slicing a ctypes array is just like Python's slicing. You can even use the [::-1] idiom to reverse them:

   >>> a[2:5]   [2, 3, 4]      >>> a[2:5:2]   [2, 4]      >>> a[::-1]   [5, 4, 3, 2, 1, 0]

The result of slicing a ctypes array is a Python list and not another ctypes array:

   >>> type(a[1:4])   

Another improvement is better handling of errno (on *NIX systems) and GetLastError()/SetLastError() on Windows. In Python 2.5 you couldn't get to the actual value of errno or GetLastError(), because they were reset by other calls. Now, ctypes preserves a thread-local copy of this value if you load the library with the use_errno/use_last_error flag. After a call you may call ctypes.get_errno() or ctypes.get_last_error() to figure out what the error is.

To find out more about ctypes see my earlier DevX article on Python 2.5.

The collections Module

The collections module (home of dequeue and DefaultDict) has a new factory function called namedtuple. The factory creates a subclass of Python's tuple that provides access to its fields by name in addition to access by index. This is useful when dealing with nested data structures that are so easy to create in Python, such as a tuple of dictionaries that contain pairs of tuples. Here's a little snippet that defines a named tuple Alien class:

   >>> Alien = namedtuple('Alien', 'name color special_power')   >>> et = Alien('E.T.', 'Gray', 'Healing')    >>> et   Alien(name='E.T.', color='Gray', special_power='Healing')   >>> et.name   'E.T.'   >>> et.color   'Gray'   >>> et.special_power   'Healing'

You specify the list of attributes as a space-separated string (see the first line in the preceding code). This is unorthodox, but it works because spaces aren't allowed in attribute names. You can't modify the attributes of named tuples because they are read-only, just like tuples. It would be pretty useful to have a named list too, where you could modify attributes but never add or remove fields.

The zipfile Module

The zipfile module now accepts Unicode filenames. It also has two new methods: extract() and extractall() that extract a single file or the entire archive contents to a specified directory (the current directory by default). This is welcome, because I usually just want to extract a zip file's entire contents. Until now, I always had to implement the extractall() functionality myself. The ZipFile class also supports passwords now, but I couldn't get that functionality to work; I was able to extract and read files from a password-protected .zip file without providing the password.

   >>> from zipfile import ZipFile   >>> z = ZipFile('test.zip', 'w')   >>> z.setpassword(b'secret')   >>> z.writestr('1.txt', '111111111111')   >>> z.writestr('2.txt', '2222222')   >>> z.writestr('3.txt', '33')   >>> z.close()   >>> z = ZipFile('test.zip', 'r')   >>> z.extractall() # extracting WITHOUT the password   >>> os.path.exists('1.txt')   True   >>> open('1.txt').read()   '111111111111'   >>> open('2.txt').read()   '2222222'   >>> open('3.txt').read()   '33'

The queue Module

Python 2.5 had only a Queue class—a first-in first-out (FIFO) queue implementation. Python 3.0 adds LifoQueue and PriorityQueue classes. The only difference between the queues is the order in which stored items are retrieved. The LifoQueue is last-in first-out (which happens to be the definition of a stack). The PriorityQueue always returns the lowest available value. Here are all the queues in action: the Queue returns the items in the order they were entered, the LifoQueue returns them in reversed order, and the PriorityQueue returns them sorted by value.

   >>> import queue   >>> from queue import Queue, LifoQueue, PriorityQueue   >>> q = Queue()   >>> lq = LifoQueue()   >>> pq = PriorityQueue()   >>> for x in q, lq, pq:   ...   for i in [1,2,3,1,2,3]:   ...     x.put(i)   ...    >>> for x in q, lq, pq:   ...   while not x.empty():   ...     print(x.get(), end=' ')   ...   print()   ...    1 2 3 1 2 3    3 2 1 3 2 1    1 1 2 2 3 3 

If you want to store objects in a PriorityQueue that are not ordered, or if you want to control the retrieval order, it's standard practice to insert pairs, where the first element is an integer priority value, and the second element is the actual object:

   >>> from queue import PriorityQueue   >>> pq = PriorityQueue   >>> pq.put((1, 'one'))   >>> pq.put((2, 'two'))   >>> pq.put((3, 'three'))   >>> pq.put((7, 'seven'))   >>> pq.put((6, 'six'))   >>> pq.put((5, 'five'))   >>> pq.put((4, 'four'))   >>> while not pq.empty():   ...   print(pq.get()[1], end=' ')   ...    one two three four five six seven

The tempfile Module

Usually, temporary files are deleted when closed. The new NamedTemporaryFile class lets you keep the file if you pass delete=False. You can access the resulting file name using the name attribute:

   >>> from tempfile import NamedTemporaryFile   >>> with NamedTemporaryFile(delete=False) as f:   ...   f.write(b'12345')   ...    5   >>> f      >>> f.closed   True   >>> f.name      '/var/folders/Rv/RvYDVs4ZEvmbXoIGcdsBc++++TM/-Tmp-/tmpC1xWXr'   >>> f = open(f.name)   >>> f.name   '/var/folders/Rv/RvYDVs4ZEvmbXoIGcdsBc++++TM/-Tmp-/tmpC1xWXr'   >>> f.closed   False   >>> f.read()   '12345'   >>> 

The SpooledTemporaryFile is similar to the standard TemporaryFile, but stores its data in memory until the file size exceeds the max_size parameter or until you call the fileno() method. Spooling data to memory can be useful if you don't want to waste time on file I/O and you have enough memory for your temporary usage.

As you can see, the changes in Python 3.0 are significant. There are many small, less interesting, or less important changes to the standard library that are not mentioned here that you may want to explore on your own.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist