
he
first article in this series covered important changes to the core language and its type system. This article focuses on how Python 3.0 treats the basic data types: numbers, text, and binary data.
PEP 237: Unifying Long Integers and Integers
Python 2.x has two integral types: int and long. The int type is limited to the machine's "native" word size (32 or 64 bit in modern machines). Operations on the int type can overflow and result in OverflowError exceptions (before Python 2.2). In contrast, the long type is limited only by the amount of available memory, and could conceptually represent any integer.
The reason for having two integer types is that int is very efficient because it has direct support in hardware and OSs, while the long type is flexible and doesn't require the developer to keep tabs on the size of numbers. But having two types presents several problems when porting compiled Python files or pickled objects across machines with different architectures.
The goal of PEP-237 is to eventually unify these two concepts, combining them into a single integer type that changes its representation internally to use the more efficient machine integer when possible. The implementation actually stretched across four different versions: 2.2, 2.3, 2.4, and is now complete in 3.0.
Python 2.4 and higher already support auto-promotion of int to long without exceptions or warnings. Python 3.0 simply eliminated the long type and long literals at the Python level. If you try to use long in Python 3.0 you will get an error:
>>> long
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'long' is not defined
Python 3.0 also removed the
L suffix for longs. Now, an integer is an integer is an integer. In Python 2.5 this is fine:
>>> 5L
5L
But in Python 3.0 it's a syntax error:
>>> 5L
File "<stdin>", line 1
5L
^
SyntaxError: invalid syntax
In Python 2.5 the following code generates a long object:
>>> x = 5 ** 88
>>> type(x)
<type 'long'>
>>> x
32311742677852643549664402033982923967414535582065582275390625L
In Python 3.0 it's an int:
>>> x = 5 ** 88
>>> type(x)
<class 'int'>
>>> x
32311742677852643549664402033982923967414535582065582275390625
PEP 3127: Integer Literal Support and Syntax
Python has always supported a plethora of radices or bases for integers. The
int() and
long() functions in Python 2.5 accept a second argument, which is the base to convert from. The base can be any integer between 2 and 36 (inclusive):
>>> int('000111', 2)
7
>>> int('000111', 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: int() base must be >= 2 and <= 36
>>> int('000111', 36)
1333
TypeError: long() can't convert non-string with explicit base
>>> long('555555555555555555555555555555555555555', 6)
2227915756473955677973140996095L
Python 3.0 preserves this functionality (although the error message says
arg 2 instead of
base):
>>> int('0001111', 2)
15
>>> int('5', 36)
5
>>> int('5', 37)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: int() arg 2 must be >= 2 and <= 36
Python 2.5 also supported integer literals in octal and hexadecimal, so whenever an integer was expected you could provide an octal or hexadecimal number instead. Octal numbers required a leading zero, as in
0123; hexadecimal numbers required both a leading zero and the character
x or
X, as in
0x123. Finally, there are two functions called
oct() and
hex(), each of which takes an integer and returns its string representation in octal or hexadecimal, for example:
>>> 010
8
>>> 010 + 8
16
>>> 0xa
10
>>> 0xa + 010 + 2
20
>>> oct(20)
'024'
>>> hex(20)
'0x14'
Python 3.0 maintained all this functionality, but with one small change—the prefix for octal numbers is now a zero and the character
o or
O as in
0O123 instead of just
0123. The original notation with the single leading zero was borrowed from C programming language. The change reduces the possibility for confusion for developers unfamiliar with C-like languages or with octal numbers. The expectation of such developers is that leading zeros don't change the value of numbers. For example, they might try to use leading zeros for formatting and indentation purposes and unwittingly end up with the wrong numbers. In addition, Python 3.0 adds a binary literal. All in all, this break from the C legacy creates a uniform notation for integer literals in bases 2, 8, and 16 (binary, octal, and hexadecimal). The prefixes are
0b,
0o and
0x:
>>> 0b10
2
>>> 0o10
8
>>> 0x10
16
There is also a new
bin() function that converts integers to a binary string representation (analogous to
oct() and
hex()):
>>> bin(5)
'0b101'
>>> bin(0x10)
'0b10000'
>>> bin(0o10)
'0b1000'
The
oct() function of course uses the new
0o prefix and not the old
0 prefix as in Python 2.5:
>>> oct(12)
'0o14'
I feel that this change, while pretty minor in the great scheme of things, is an elegant and clean win-win solution. It removed an obstacle from the path of new users, it made a clean break from the past (octal notation in C), and it unified the notation for radix literals, which is important when adding the new binary literal.