RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


A Developer's Guide to Python 3.0: Numbers, Strings, and Data

Python 3.0 makes critical—and not-backwardly-compatible—changes to data types. Find out how these changes will affect your code.

he first article in this series covered important changes to the core language and its type system. This article focuses on how Python 3.0 treats the basic data types: numbers, text, and binary data.

PEP 237: Unifying Long Integers and Integers

Python 2.x has two integral types: int and long. The int type is limited to the machine's "native" word size (32 or 64 bit in modern machines). Operations on the int type can overflow and result in OverflowError exceptions (before Python 2.2). In contrast, the long type is limited only by the amount of available memory, and could conceptually represent any integer.

The reason for having two integer types is that int is very efficient because it has direct support in hardware and OSs, while the long type is flexible and doesn't require the developer to keep tabs on the size of numbers. But having two types presents several problems when porting compiled Python files or pickled objects across machines with different architectures.

The goal of PEP-237 is to eventually unify these two concepts, combining them into a single integer type that changes its representation internally to use the more efficient machine integer when possible. The implementation actually stretched across four different versions: 2.2, 2.3, 2.4, and is now complete in 3.0.

Python 2.4 and higher already support auto-promotion of int to long without exceptions or warnings. Python 3.0 simply eliminated the long type and long literals at the Python level. If you try to use long in Python 3.0 you will get an error:

   >>> long
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   NameError: name 'long' is not defined
Python 3.0 also removed the L suffix for longs. Now, an integer is an integer is an integer. In Python 2.5 this is fine:

   >>> 5L
But in Python 3.0 it's a syntax error:

   >>> 5L
     File "<stdin>", line 1
   SyntaxError: invalid syntax
In Python 2.5 the following code generates a long object:

   >>> x = 5 ** 88         
   >>> type(x)
   <type 'long'>
   >>> x
In Python 3.0 it's an int:

   >>> x = 5 ** 88
   >>> type(x)
   <class 'int'>
   >>> x

PEP 3127: Integer Literal Support and Syntax

Python has always supported a plethora of radices or bases for integers. The int() and long() functions in Python 2.5 accept a second argument, which is the base to convert from. The base can be any integer between 2 and 36 (inclusive):

   >>> int('000111', 2)
   >>> int('000111', 1)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ValueError: int() base must be >= 2 and <= 36
   >>> int('000111', 36)
   TypeError: long() can't convert non-string with explicit base
   >>> long('555555555555555555555555555555555555555', 6)
Python 3.0 preserves this functionality (although the error message says arg 2 instead of base):

   >>> int('0001111', 2)
   >>> int('5', 36)
   >>> int('5', 37)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ValueError: int() arg 2 must be >= 2 and <= 36
Python 2.5 also supported integer literals in octal and hexadecimal, so whenever an integer was expected you could provide an octal or hexadecimal number instead. Octal numbers required a leading zero, as in 0123; hexadecimal numbers required both a leading zero and the character x or X, as in 0x123. Finally, there are two functions called oct() and hex(), each of which takes an integer and returns its string representation in octal or hexadecimal, for example:

   >>> 010
   >>> 010 + 8
   >>> 0xa   
   >>> 0xa + 010 + 2
   >>> oct(20)
   >>> hex(20)
Python 3.0 maintained all this functionality, but with one small change—the prefix for octal numbers is now a zero and the character o or O as in 0O123 instead of just 0123. The original notation with the single leading zero was borrowed from C programming language. The change reduces the possibility for confusion for developers unfamiliar with C-like languages or with octal numbers. The expectation of such developers is that leading zeros don't change the value of numbers. For example, they might try to use leading zeros for formatting and indentation purposes and unwittingly end up with the wrong numbers. In addition, Python 3.0 adds a binary literal. All in all, this break from the C legacy creates a uniform notation for integer literals in bases 2, 8, and 16 (binary, octal, and hexadecimal). The prefixes are 0b, 0o and 0x:

   >>> 0b10
   >>> 0o10
   >>> 0x10
There is also a new bin() function that converts integers to a binary string representation (analogous to oct() and hex()):

   >>> bin(5)
   >>> bin(0x10)
   >>> bin(0o10)
The oct() function of course uses the new 0o prefix and not the old 0 prefix as in Python 2.5:

   >>> oct(12)
I feel that this change, while pretty minor in the great scheme of things, is an elegant and clean win-win solution. It removed an obstacle from the path of new users, it made a clean break from the past (octal notation in C), and it unified the notation for radix literals, which is important when adding the new binary literal.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date