RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


A Developer's Guide to Python 3.0: Numbers, Strings, and Data : Page 6

Python 3.0 makes critical—and not-backwardly-compatible—changes to data types. Find out how these changes will affect your code.


PEP 3101: Advanced String Formatting

Python 3.0 brings a powerful new way to format strings that's based on Microsoft's .NET composite formatting (an excellent choice). I have used the string formatting facilities of many programming languages, but the C# formatting (which uses .NET composite formatting) experience was the best by far. It was powerful, flexible, consistent, and well documented.

Author's Note: BTW, the link that references the .NET composite formatting in PEP-3101 is broken. The correct URL (at the time of this writing) is: http://msdn.microsoft.com/en-us/library/txafckwd.aspx.

In Python 2.x you can format strings using the % operator or using string.Template. The % operator is convenient; when you want to format only a single argument, you can pass it as is:

   import time
   >>> time.localtime()
   (2008, 12, 31, 10, 32, 16, 2, 366, 0)
   >>> 'The current year is %d' % time.localtime()[0]
   'The current year is 2008'
To format multiple arguments, you must pack them in a tuple or list:

   >>> t = time.localtime()
   >>> 'Day: %d, Month: %d, Year: %d' % (t[2], t[1], t[0])
   'Day: 31, Month: 12, Year: 2008'
With the tuple/list approach you must specify the arguments in the exact order they will be formatted. Also, if you want the same value to appear multiple times you must format it multiple times:

   >>> s = 'The solution to the square of %d is: %d * %d = %d'
   >>> s % (5, 5, 5, 5 * 5)
   'The solution to the square of 5 is: 5 * 5 = 25'
Alternatively, you can pass a dictionary and specify the dictionary keys in the format string:

   >>> d = dict(n=5, result=5 * 5)
   >>> s = 'The solution to the square of %(n)d is: %(n)d * %(n)d = %(result)d'
   >>> s % d
   'The solution to the square of 5 is: 5 * 5 = 25'
As you can see, the dictionary approach lets you specify repeating values just once, but at a high price; it's both more complicated than format string, and requires preparation of the dict rather than simply passing values.

Finally there is also the string.Template class. You use this to prepare compiled templates that you can apply multiple times to different values efficiently, because the format string itself must be parsed only once. This especially important for use cases such as templated web pages or code generation scenarios, where the test results can be large, and parsing the format string can be expensive. The format string is a little different. Named values are preceded by a $ sign and optionally enclosed in curly braces to distinguish them from the surrounding text:

   >>> s = 'The solution to the square of ${n} is: ${n} * ${n} = ${result}'
   >>> t = string.Template(s)
   >>> for i in range(1, 7):
   ...   d = dict(n=i, result=i * i)
   ...   print t.substitute(d)
   The solution to the square of 1 is: 1 * 1 = 1
   The solution to the square of 2 is: 2 * 2 = 4
   The solution to the square of 3 is: 3 * 3 = 9
   The solution to the square of 4 is: 4 * 4 = 16
   The solution to the square of 5 is: 5 * 5 = 25
   The solution to the square of 6 is: 6 * 6 = 36
Python 3.0 added a new formatting method called format to the string class. It is intended to replace the % formatting of short format strings and not the string.Template formatting, because it doesn't compile its format string. The format() method understands both positional and keyword arguments within a single format string. You enclose substitution fields in the format string in curly braces. You can reuse the same positional argument multiple times in different fields:

   >>> s = 'Addition is commutative. For example: {0} + {1} = {1} + {0}'
   >>> s.format(5, 7)
   'Addition is commutative. For example: 5 + 7 = 7 + 5'
   >>> s.format(4, 3, result=3 * 4)
   '4 multiplied by 3 is 12'
You can escape curly braces by doubling them:

   >>> '{0} "{{", {1} "}}"'.format('open curly:', 'closed curly:')
   'open curly: "{", closed curly: "}"'
The format() method supports both simple fields, which are either strings or base-10 integers, and compound fields. Compound fields are quite useful because they allow you to access object attributes or elements of arrays:

   >>> import fractions
   >>> r = fractions.Fraction(5, 4)
   >>> '{0.numerator} / {0.denominator}'.format(r)
   '5 / 4'
   >>> 'Day: {0[2]}, Month: {0[1]}, Year: {0[0]}'.format(time.localtime())
   'Day: 31, Month: 12, Year: 2008'
The ability to access attributes and array elements simplifies their use because a developer needs to provide only the object or tuple/list/array, not break it up and arrange the parts in the right order. Compare the preceding example to the Python 2.x version presented earlier.

Unlike some templating languages, you may not use arbitrary Python expressions in the format strings. The Python 3.0 format string is limited to objects, attributes, and indexing into tuples/arrays/lists.

The format() method supports a wide array of format specifiers for fine-tuning the display of formatted fields. You separate format specifiers from the field name with a colon (:) character:

   'The "{0:10}" is right padded to 10 characters'.format('field')
   'The "field     " is right-padded to 10 characters'
Objects may define and accept their own format specifiers in the __format__ method (see below), but Python also has a large selection of standard specifiers that apply to every object. The general form of a standard format specifier is:

There are many fine details and constraints. Some format specifiers make sense only for numeric types, or only if other specifiers exist. There are many display options for integers and real numbers, for example:

   >>> '{0:@^8.4}'.format(1 / 3)             
Ok, what happened here? The ampersand (@) is the fill character. The alignment is centered (^). The precision is 4 and the minimum width is 8, so the number was formatted to have four significant digits (0.3333). The zero and the decimal dot took two other characters, so two additional @ characters were added as padding to get a centered display of eight characters. All this is similar to Python 2.x's % formatting, but much more flexible and powerful.

The real power of the new string formatting becomes evident for custom formatting, which you define by implementing the __format__() method. The signature is:

   def __format__(self, format_spec):
Suppose you want to have a ColorString class that can format itself to be displayed in different colors. To print colored text (and much more) to the screen in Python you can use ANSI escape codes on Linux and Mac OS X. On 32-bit Windows you need to use the SetConsoleTextAttribute() API.

Author's Note: The code presented here will not work properly on Windows—it will just print junk characters around the original text instead of changing the colors.

So to print some red text type:

   print('\033[31mRed Text\033[0m')
The escape sequence starts with the ESC+[ (also known as the Control Sequence Introducer). The ESC character is non-printable, and can also be written as chr(27) or \x1b (hex notation). Note that the 033 is octal notation for 27. The 31m following the \033[is the incantation used to change the text color to red. The actual text (Red Text) is next, and finally, another incantation restores the colors to their default (\033[0m). Although Python itself has switched its octal notation from 0(number} to 0O{number} the ANSI escape sequences tap into terminal facilities that still use the 0{number} notation.

You can do a lot with the escape sequences, such as change text and background color, move the cursor around the screen (to print in a specific location), erase parts of the screen, hide/show the cursor, and scroll the screen buffer. The examples here focus on changing colors only.

Here's a little module containing a function called colorize() that accepts three arguments: a string, a text color, and a background color. It then wraps the string with the appropriate ANSI escape sequence. First, it prepares a small global dictionary containing all the colors and background colors mapped from a string to their ANSI escape code. The function itself checks whether a color and/or background color were provided by name such as red or green, finds the corresponding codes in the dictionary, and prepares a proper escape sequence to change the colors to the requested colors. Finally, it resets everything to normal. The code shown here has no error checking, so if you request a color name that doesn't exist you will get a KeyError exception:

   colors = ['black', 'red', 'green', 'orange', 'blue', 'magenta', 'cyan', 'white']
   color_dict = {}
   for i, c in enumerate(colors):
     color_dict[c] = (i + 30, i + 40)
   def colorize(text, color=None, bgcolor=None):
     c = None
     bg = None
     if color is not None:
       c = color_dict[color][0]
     if bgcolor is not None:
       bg = color_dict[bgcolor][1]
     s = ''
     if c is not None:
       s = '\033[%dm' % c
     if bg is not None:
       s += '\033[%dm' % bg
     return '%s%s\033[0m' % (s, text)
You can experiment with this to print various colored strings on colored backgrounds. Here's an example that prints white text on a magenta background:

   print(colorize('White on Magenta', 'white', 'magenta'))
This code and the colorize module work in both Python 2.x and 3.0.

With the colorize() function under your belt you can create the ColorString class that formats itself in color. The basic idea is to subclass the built in str class and add a __format__() method that takes the format_spec and passes it as the text color to the colorize() function, which returns the wrapped string:

   class ColorString(str):
     def __format__(self, format_spec):
       s = colorize(self, format_spec)
       return s
This implementation lets you change only the text color and not the background, but it makes the format very simple (you just supply the color name). Here is ColorString in action. First, the example prepares a list of ColorString words by splitting a simple sentence ("Yeah, it works!") and then prints each word in a different color, by specifying a format string with the colors red, green, and blue:

   words = [ColorString(x) for x in 'Yeah, it works!'.split()]
   print('{0:red} {1:green} {2:blue}'.format(*words))
Python 3.0 also has a global format() function used to format single objects. It simply calls the object's __format__() method. Here it is at work with ColorString:

   >>> format(ColorString('Gigi'), 'red')
This subclassing scheme works fine, but it feels a bit cumbersome to create a special class with a __format__() method whenever you want some custom formatting. In addition, the subclassing scheme requires developers to construct special objects such as ColorString to take advantage of the formatting. Fortunately, you can go even further by implementing your own formatter classes and using them to format any type. For example, it would be convenient to just be able to print text in any color you want. The next example shows a class called ColorFormatter, which subclasses the string.Formatter class and overrides the format_field method. The override colorizes the field if it finds the format_spec in the colors list, or just applies the default formatting by calling Formatter.format_field():

   from string import Formatter
   class ColorFormatter(Formatter):
     def format_field(self, value, format_spec):
       if format_spec in colors:
         return colorize(value, format_spec)
         return Formatter.format_field(self, value, format_spec)
To use a custom formatter you need to instantiate it and then call its format() method to get the formatted string. To make it even more streamlined I assigned the bound format method to a variable named f, so it's easier to use:

   formatter = ColorFormatter()
   f = formatter.format
   print(f('{0:cyan} works very {1:orange}.', 'ColorFormatter', 'well'))
If you have a list of field values or dictionary with named fields you can use the vformat() method, which takes a list for positional arguments and a dictionary for keyword arguments:

   formatter = ColorFormatter()
   f = formatter.vformat
   args = ['The', 'vformat()']
   kwargs = dict(m='method', t='too')
   print(f('{0:red} {1:blue} {m:green} works {t:magenta}', args, kwargs)
If you are a Windows developer looking for a little Python 3.0 homework a good exercise would be to implement the color ANSI escape codes for Windows by replacing the new print function. Your replacement print function should scan the text to print looking for ANSI escape sequences, parse them, and apply the proper color setting using the SetConsoleTextAttribute() API.

This article showed a wide range of examples detailing how the deep changes in Python 3.0 affect data types, math operations, and string formatting. Beyond these, Python 3.0 also made significant changes to the standard library, which you'll explore in the next article in this series.

Gigi Sayfan specializes in cross-platform object-oriented programming in C/C++/C#/Python/Java with an emphasis on large-scale distributed systems. He is currently trying to build brain-inspired intelligent machines at Numenta.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date