ave you ever found it tedious to mix SQL and other languages, or been reluctant to write the same four lines of code again to do a simple database query? This article can help you eliminate the drudgery involved in database access, and make your programming time more efficient, by wrapping simple transactions in friendly native Python syntax. By making the database emulate regular Python objects, you can remove a source of friction and frustration from your development process. Your time spent programming will be more efficient and productive when you can focus on the task at hand, without being constantly sidetracked by unimportant details like where the cursor object is, or whether you need to escape-protect the data in the next query.
Another benefit of using native syntax is better portability. This approach makes it easy for you to change databases without having to rewrite any of your application code. Simply modify a few lines in the database wrapper, and you can support an entirely new database. Though this article focuses on MySQL, the code should work easily with PostgreSQL or even SQLite with only minor modifications.
The approach described here is useful any time you want to write a quick program to access a database. You may find it convenient if you want to generate a handful of new reports quickly, or perform basic modifications to your data.
To begin, you’ll want to convert the most frequently-used operations. The most basic operation is reading or selecting data, so I will focus on that first.
Applications frequently need to do things such as “retrieve record 87”, or “print results 30-40.” This calls for an array or list structure, which is easy to do in Python. Normally, to show record 87, you would need to execute something such as the following:
c = db.cursor()c.execute("select * from book limit 87,1")row = c.fetchone()display(row)
Four lines of code is needlessly long for what is essentially an array lookup. It barely begins to utilize the full power of SQL, yet this type of thing is quite common. So, what if you could use simpler syntax to do the same thing? The following would be much easier:
That sort of easy syntax is supported in Python, using its special class methods. Any object can emulate an array by implementing special functions. In this case, you want to create a Table class with a method called __getitem__().
class Table: def __init__(self, db, name): self.db = db self.name = name self.dbc = self.db.cursor() def __getitem__(self, item): self.dbc.execute("select * from %s limit %s, 1" % (self.name, item)) return self.dbc.fetchone()
Using this class, you can easily access a database table as if it were a native Python list. It merely provides read-only access to single rows, but it’s a good start. The following short program demonstrates its use:
import MySQLdbdb = MySQLdb.connect(user="user", passwd="passwd", db="library")books = Table(db, "book")for i in xrange(80, 100): print "Book %s: %s" % (i, books[i])
Condensing the database lookup into a single short expression greatly increases its expressive power. When you write code to keep track of a cursor, execute queries, and fetch rows from the result, the operation just doesn’t fit into as many places. Using a word or two for the same concept allows you to incorporate such condensed operations into other ideas more clearly and easily.
Determining Data Length
Something important was missing from the code mentioned so far. Many programs want to know how much data needs to be processed. Most search programs return this with the results, such as “results 80 to 100 of 487.” That 487 is important to many search engine users. Luckily, the length operator in Python is easy to implement. Simply add the following to the Table class:
def __len__(self): self.dbc.execute("select count(*) from %s" % (self.name)) r = int(self.dbc.fetchone()) return r
This will make the function len(books) return the number of items in the table, so you can loop over the entire data set or at least tell the user how much data exists:
for i in xrange(len(books)): print books[i]
Iterating Through Rows
The loop in the previous example was better, but it could have been simpler still. Python supports iterators, which would have been a little easier. It should be possible to process every item in a table with syntax such as:
for book in books: print book
This is simple to do, but first it would be a good idea to refactor a little. The Table class will be doing a lot of database queries, and pumping all those queries through one function will help make debugging easier. First, add a self.debug = 1 line to the Table’s __init__() function, and then add a method for queries:
def _query(self, q): if self.debug: print "Query: %s" % (q) self.dbc.execute(q)
To add iteration, simply add two methods to the Table class: __iter__() and next(). The class does not need to be derived from a built-in iterator type, it merely needs to implement the same policies. This “policy over mechanism” theme is common in Python. It provides most of the same benefits as more strict languages, without losing flexibility. Adding the following code turns the Table class into an iterable object:
def __iter__(self): "creates a data set, and returns an iterator (self)" q = "select * from %s" % (self.name) self._query(q) return self # an Iterator is an object # with a next() methoddef next(self): "returns the next item in the data set, or tells Python to stop" r = self.dbc.fetchone() if not r: raise StopIteration return r
By adding this code, the Table class now acts like an iterator, allowing you to use the easy syntax at the beginning of this section.
Putting the concepts of an idea, such as database access, into simpler terms enables still more and better innovations. For example, a car engine is a very complicated device with countless parts and design considerations. It takes a lot of work to fully describe an engine, and to attach it to wheels to make a car. But once you fully understand the concept of an engine, you no longer need to describe it in detail. It becomes sufficient to say “an engine with wheels.” Then, after building such a car, you can easily create new concepts by reusing the same old engine ideas. Perhaps after the car, you decide to attach a blade and build a lawn mower. And then, attach wings and a propeller to make a plane. Creating these new things would be vastly more difficult if you had to re-create the concept of an engine each time. Or, in this case, if you had to keep writing the same code over and over, each time you needed to perform an operation you’ve done before.
Using Data “Slices”
Databases tend to be large, and accessing an entire data set can be very slow. If you have 50,000 records in a table, and only need twenty of them right now, grabbing the whole set would be very inefficient. However, taking only one record at a time is also inefficient. To solve this, use slices. Python provides slicing for list-like objects, which can select a range of records quickly and easily.
The previous examples executed one database query for each row accessed in the table. Pulling tens or hundreds of records at a time would be more efficient though. The following demonstrates the syntax for taking a slice of a list, or a list-like object:
# create a list of all items to go on the third pagepage3 = books[20:30]# or, pull ten records into a list, and loop over themfor book in books[20:30]: print book
To implement slice support in the Table class, modify the __getitem__() method. Both individual lookups and slice lookups are implemented with this method, and determining which type of result to return is simply a matter of checking whether the parameter is an integer or a slice. The built-in isinstance() function, along with the types module can help you decide what to do.
def __getitem__(self, item): q = "select * from %s" % (self.name) # isinstance is recommended over direct type(a) == type(b) comparisons, # to accomodate derived classes if isinstance(item, types.SliceType): q = q + " limit %s, %s" % (item.start, item.stop - item.start) self._query(q) return self.dbc.fetchall() elif isinstance(item, types.IntType): q = q + " limit %s, 1" % (item) self._query(q) return self.dbc.fetchone() else: raise IndexError, "unsupported index type"
The Slice object contains three attributes: start, stop, and step. The step part will probably not be useful, because SQL databases don’t generally support step sizes other than one. However, start and stop are useful, and translate easily into arguments to MySQL’s LIMIT parameter. The start attribute is used directly as the first parameter. Subtracting start from stop produces the second parameter?length.
Different databases use different syntax for selecting data slices, which can become confusing if you work with more than one. It can be difficult to remember if the LIMIT function takes first, last as arguments, or if it uses first, length, or if it expects something else. This tends to cause silent errors in programs because the syntax is still correct?accidentally getting too many records, or too few, or even none at all. Hopefully, using the easy Slice syntax will help you avoid that type of error.
Searching and Sorting
So far, this article has ignored two of the most important aspects of databases. Searching and sorting, or finding and arranging data, let you narrow and organize results automatically. If you were writing an address book application, the user would often ask questions such as “Who do I know named ‘Bob’?” and expect to get a sorted list. The following code is a simple and feasible way to implement such a query:
contacts.sort("lastname")contacts.search("firstname='bob'")for c in contacts: print "%s %s: %s" % (c[FNAME], c[LNAME], c[PHONE])
The output would print a list of “Bob” names and phone numbers, sorted by last name:
Bob Barker: 123-4567 Bob Dobbs: 234-5678 Bob Monroe: 345-6789 Bob Zemeckis: 456-7890
Python does not provide operators for searching or sorting, so a class method must suffice. These methods are very simple, because SQL implements them as parameters to SELECT instead of allocating separate commands for them. The only work the search and sort methods need to do is save data for later use:
def search(self, method): self._search = "" if method: self._search = "where %s" % (method)def sort(self, method): self._sort = "" if method: self._sort = "order by %s" % (method)
Making these functions actually do something requires modifying the other class methods. The self._search and self._sort attributes must be defined, and inserted into queries where appropriate. The following methods should be modified:
- Table.__init__(): Set self._search = “” and self._sort = “” to avoid accessing them before they are defined.
- Table.__len__(): Change the query to:self._query(“select count(*) from %s %s” % (self.name, self._search))
- Table.__iter__(): Change the query to:q = “select * from %s %s %s” % (self.name, self._search, self._sort)
- Table.__getitem__(): Change the first line to:q = “select * from %s %s %s” % (self.name, self._search, self._sort)
Now, at this point, you may have noticed that some SQL slipped into the interface for the Table class. The parameters for Table.search() and Table.sort() are exactly what you would write in a WHERE or ORDER BY clause in a normal query. The table’s name is also placed directly into the query. This affords some interesting opportunities, or tricks.
Some simple tricks include sorting by several columns, or searching with more than one condition. A more complicated trick is to turn the Table into something like a view, by connecting it to more than one database table. The following code demonstrates all three techniques:
books = Table(db, "book, series")books.search("book.series=series.id AND book.publish_year>1995")books.sort("series.name, book.publish_year")SERIES_NAME, BOOK_TITLE, BOOK_YEAR = 6, 1, 2for b in books: print "%s: %s (%s)" % (b[SERIES_NAME], b[BOOK_TITLE], b[BOOK_YEAR])
The preceding code tells Python to get rows from two tables, book and series. The search method then adds two additional instructions?join the tables by using the series ID, and ignore anything from 1995 or earlier. Then sort the results first by the name of the series, and then by the year each book was published. This has the effect of grouping related items together and showing them in chronological order. The code then chooses three interesting columns from the data, and prints the information in a human-readable form.
Reading data is useful, especially if you need to build reports or otherwise sort through data. But you probably will want to write to the database too. The simplest way to write is to insert new rows, which you can achieve with the following syntax:
contacts.insert('', "Stanley", "Spudowski", "1965-01-01")
This is similar to a direct SQL query, but even simpler. And, because of the way the Python database spec works, you won’t even have to worry about escaping data to make it safe. The spec requires automatic escaping for anything in the second parameter of the cursor’s execute() function. If you ever plan to insert data supplied by a user, it is a good idea to use that protection, because it will prevent possible exploits in your program. Doing this requires a minor modification to the Table._query() method:
def _query(self, q, data=None): if self.debug: print "Query: %s (%s)" % (q, data) self.dbc.execute(q, data)
With that modification in place, you can now write an insert function. All it needs to do is pass the data to your database in a format compatible with the DB API spec. However, because it needs to support tables of any size and shape, the format string should be created dynamically. The following creates a format string such as “%s,%s,%s,%s” according to the length of the row it is given, and then executes the query.
def insert(self, *row): fmt = ("%s," * len(row))[:-1] q = "insert into %s values (%s)" % (self.name, fmt) self._query(q, row)
As an additional usability feature, it avoids the need to pass a tuple object, by collecting all the function’s arguments into a tuple automatically. The *row syntax does this. Its effect, basically, is to remove one set of parenthesis, allowing you to run insert(a,b,c) instead of insert((a,b,c)).
Removing rows is also likely to be useful. And, in most SQL databases, it is not as simple as just telling the DB to remove row number 87. The difficulty is a matter of telling the database to identify which row to remove. One approach is to identify the row by specifying every value in its cells. For example, DELETE FROM movies WHERE title=’Harvey’ AND year=’1950′ AND genre=37;. The problem with this approach is that you might have two rows with identical data. In that case, you would accidentally delete both rows instead of just one.
A less error-prone approach is to use row IDs. Row IDs are supported by most databases, though the implementation differs. In MySQL, the query would look more like this: DELETE FROM movies WHERE _rowid=58;. That query will remove one and only one row, regardless of the content of the table. Even if every row is identical, each one will have a unique row ID. As an added bonus, the query is smaller, simpler, and quite similar to the array-like syntax Python uses to delete list items:
To make your Table class respond to the del command, add the following method to it:
def __delitem__(self, item): q = "select %s from %s %s %s limit %s, 1" % ("_rowid", self.name, self._search, self._sort, item) self._query(q) rid = self.dbc.fetchone() q = "delete from %s where %s=%s" % (self.name, "_rowid", rid) self._query(q)
This function performs two queries. First, it is necessary to obtain the row ID. Then it deletes the row identified by that ID. You can eliminate the need for the first query by always including the row ID in your query results, but for the sake of brevity, that is not demonstrated here. The provided example code includes row IDs, though, along with several other enhancements.
The data-access approach described above should help you use your time more efficiently and help you write data-processing scripts more quickly. You could extend and refine the approach into a much more sophisticated library, and make it more appropriate for larger applications. For example, you could modify the code to use dict-style cursors, instead of tuples. Or, you could add a Row class, with methods to modify individual cells. Another useful extension would be to formally support joined tables instead of using the somewhat tricky method described earlier.
This “bottom-up” approach should help you build up the Python language to better deal with the tasks you commonly perform. By adding a few special functions, you can turn Python into a richer, more appropriate language for whatever you need. And hopefully, by letting the interpreter do more of the work for you, you can build faster, cleaner solutions.