My Name Is __missing__, dict.__missing__
The dict
__missing__ method is a neat addition to the arsenal of useful tools. It addresses a common problem of returning a default value from a failed lookup on a dictionary.
Suppose your program needs to store securely the code names of British secret agents. You are aware of course that these code names all start with double zero and end with a positive integer. After careful analysis of the problem domain you decide to use a 100x100 sparse matrix (a matrix that contains mostly zeros) to store the code names. Your input is a list of tuples. The first and second elements are the row and column (two-dimensional index), and the third element is the integer that follows the mandatory '00'. You can represent such a matrix using a plain (non-sparse) dictionary:
sparse_matrix = {}
for row in range(100):
for col in range(100):
sparse_matrix[(row,col)] = 0
for i in (5,4,8), (88, 33, 7), (99,99,9):
sparse_matrix[i[:2]] = i[2]
print '%d%d%d %s' % (sparse_matrix[(1,1)],
sparse_matrix[(14,61)],
sparse_matrix[(88,33)],
'licensed to kill')
Output:
007 licensed to kill
That works, but it's not very smart or sparse. A huge dictionary of 10,000 entries is required to identify just three agentsand it takes a while to initialize this huge array with zeros. A much better solution is to keep just the non-zero elements. The problem is what to do when someone accesses a zero entry (missing from the dictionary). The dictionary throws a KeyError exception:
Traceback (most recent call last):
File "/Users/gsayfan/Documents/docs/Publications/DevX/Python 2.5 - Fresh from the Oven/part_3.py", line 57, in
print '%d%d%d %s' % (sparse_matrix[(1,1)],
KeyError: (1, 1)
There were several cumbersome solutions prior to Python 2.5. All of them required the caller to handle the missing value. One way was to wrap every access to the dictionary in a try-except block; another way was to use the
get() method and pass in a default value to return; and the last way was to use the
setdefault() method, which is similar to
get() but also sets the default value in the dictionary for posterity.
x = {1:1, 2:2, 3:3}
# This is just ugly
try:
print x[0]
except KeyError:
print 8
# This just gets the default value without modifying the dict
print x.get(0, 8)
print 'x has %d entries' % len(x)
# This actually adds the entry 0:8 to the dict
print x.setdefault(0, 8)
print 'x has %d entries' % len(x)
Output:
8
8
x has 3 entries
8
x has 4 entries
In Python 2.5 there is an elegant way to handle this situation. The dict type has a new hook function called
__missing__. It is called whenever you try to access a missing key. The default implementation is to raise the infamous KeyError exception, but you can subclass dict and override the
__missing__ method in your subclass to do whatever you want. This is much better because the caller is not responsible for handling default values. Sometimes the returned value should be based on dynamic calculation and the caller doesn't even know what the proper default value is. Note the dict size remains the same even when accessing non-existing elements.
class SparseDict(dict):
def __missing__(self, key):
return 0
sparse_matrix = SparseDict()
for i in (5,4,3), (88, 33, 7), (99,99,99):
sparse_matrix[i[:2]] = i[2]
print '%d%d%d %s' % (sparse_matrix[(1,1)],
sparse_matrix[(14,61)],
sparse_matrix[(88,33)],
'licensed to kill')
print len(sparse_matrix)
print sparse_matrix
Output:
007 licensed to kill
3
{(88, 33): 7, (5, 4): 3, (99, 99): 99}
This solution is elegant and allows full flexibility (you even have the requested key to base your return value on, if you want it). Nonetheless, it feels a little intrusive to write a subclass for every dictionary with a default, especially if you have multiple dictionaries with different defaults. Have no fear. Python 2.5 comes with a default dict, which is almost as flexible as implementing
__missing__ yourself.
The default dict lives in the collections package, and it accept a default_factory callable in its constructor. Whenever a non-existing key is accessed, the default_factory will be invoked to produce the proper value. Don't worry, you don't need to start writing factory classes or functions now. Most of Python's types are also factory functions and in most cases this is exactly what you want. For example, Python's int is a factory function that returns 0 when invoked without arguments. This is exactly what we need for our sparse matrix. Note that accessing non-existing entries sets them in the dictionary just like calling setdefault().
import collections
sparse_matrix = collections.defaultdict(int)
for i in (5,4,3), (88, 33, 7), (99,99,99):
sparse_matrix[i[:2]] = i[2]
print '%d%d%d %s' % (sparse_matrix[(1,1)],
sparse_matrix[(14,61)],
sparse_matrix[(88,33)],
'licensed to kill')
print len(sparse_matrix)
print sparse_matrix
Output:
007 licensed to kill
5
defaultdict(, {(88, 33): 7, (5, 4): 3, (99, 99): 99, (14, 61): 0, (1, 1): 0})