ave you ever found it tedious to mix SQL and other languages, or been reluctant to write the same four lines of code again to do a simple database query? This article can help you eliminate the drudgery involved in database access, and make your programming time more efficient, by wrapping simple transactions in friendly native Python syntax. By making the database emulate regular Python objects, you can remove a source of friction and frustration from your development process. Your time spent programming will be more efficient and productive when you can focus on the task at hand, without being constantly sidetracked by unimportant details like where the cursor object is, or whether you need to escape-protect the data in the next query.
c = db.cursor()
c.execute("select * from book limit 87,1")
row = c.fetchone()
display(row)
Four lines of code is needlessly long for what is essentially an array lookup. It barely begins to utilize the full power of SQL, yet this type of thing is quite common. So, what if you could use simpler syntax to do the same thing? The following would be much easier:
display(books[87])
That sort of easy syntax is supported in Python, using its special class methods. Any object can emulate an array by implementing special functions. In this case, you want to create a Table class with a method called __getitem__().
class Table:
def __init__(self, db, name):
self.db = db
self.name = name
self.dbc = self.db.cursor()
def __getitem__(self, item):
self.dbc.execute("select * from %s limit %s, 1" %
(self.name, item))
return self.dbc.fetchone()
Using this class, you can easily access a database table as if it were a native Python list. It merely provides read-only access to single rows, but it's a good start. The following short program demonstrates its use:
import MySQLdb
db = MySQLdb.connect(user="user", passwd="passwd",
db="library")
books = Table(db, "book")
for i in xrange(80, 100):
print "Book %s: %s" % (i, books[i])
Condensing the database lookup into a single short expression greatly increases its expressive power. When you write code to keep track of a cursor, execute queries, and fetch rows from the result, the operation just doesn't fit into as many places. Using a word or two for the same concept allows you to incorporate such condensed operations into other ideas more clearly and easily.
Determining Data Length
Something important was missing from the code mentioned so far. Many programs want to know how much data needs to be processed. Most search programs return this with the results, such as "results 80 to 100 of 487." That 487 is important to many search engine users. Luckily, the length operator in Python is easy to implement. Simply add the following to the Table class:
def __len__(self):
self.dbc.execute("select count(*) from %s" % (self.name))
r = int(self.dbc.fetchone()[0])
return r
This will make the function len(books) return the number of items in the table, so you can loop over the entire data set or at least tell the user how much data exists:
for i in xrange(len(books)):
print books[i]
Iterating Through Rowsfor book in books:
print book
This is simple to do, but first it would be a good idea to refactor a little. The Table class will be doing a lot of database queries, and pumping all those queries through one function will help make debugging easier. First, add a self.debug = 1 line to the Table's __init__() function, and then add a method for queries:
def _query(self, q):
if self.debug: print "Query: %s" % (q)
self.dbc.execute(q)
To add iteration, simply add two methods to the Table class: __iter__() and next(). The class does not need to be derived from a built-in iterator type, it merely needs to implement the same policies. This "policy over mechanism" theme is common in Python. It provides most of the same benefits as more strict languages, without losing flexibility. Adding the following code turns the Table class into an iterable object:
def __iter__(self):
"creates a data set, and returns an iterator (self)"
q = "select * from %s" % (self.name)
self._query(q)
return self # an Iterator is an object
# with a next() method
def next(self):
"returns the next item in the data set,
or tells Python to stop"
r = self.dbc.fetchone()
if not r:
raise StopIteration
return r
By adding this code, the Table class now acts like an iterator, allowing you to use the easy syntax at the beginning of this section.
Using Data "Slices"
Databases tend to be large, and accessing an entire data set can be very slow. If you have 50,000 records in a table, and only need twenty of them right now, grabbing the whole set would be very inefficient. However, taking only one record at a time is also inefficient. To solve this, use slices. Python provides slicing for list-like objects, which can select a range of records quickly and easily.
The previous examples executed one database query for each row accessed in the table. Pulling tens or hundreds of records at a time would be more efficient though. The following demonstrates the syntax for taking a slice of a list, or a list-like object:
# create a list of all items to go on the third page
page3 = books[20:30]
# or, pull ten records into a list, and loop over them
for book in books[20:30]:
print book
To implement slice support in the Table class, modify the __getitem__() method. Both individual lookups and slice lookups are implemented with this method, and determining which type of result to return is simply a matter of checking whether the parameter is an integer or a slice. The built-in isinstance() function, along with the types module can help you decide what to do.
def __getitem__(self, item):
q = "select * from %s" % (self.name)
# isinstance is recommended over direct type(a) == type(b) comparisons,
# to accomodate derived classes
if isinstance(item, types.SliceType):
q = q + " limit %s, %s" % (item.start, item.stop - item.start)
self._query(q)
return self.dbc.fetchall()
elif isinstance(item, types.IntType):
q = q + " limit %s, 1" % (item)
self._query(q)
return self.dbc.fetchone()
else:
raise IndexError, "unsupported index type"
The Slice object contains three attributes: start, stop, and step. The step part will probably not be useful, because SQL databases don't generally support step sizes other than one. However, start and stop are useful, and translate easily into arguments to MySQL's LIMIT parameter. The start attribute is used directly as the first parameter. Subtracting start from stop produces the second parameterlength.
Searching and Sorting
So far, this article has ignored two of the most important aspects of databases. Searching and sorting, or finding and arranging data, let you narrow and organize results automatically. If you were writing an address book application, the user would often ask questions such as "Who do I know named 'Bob'?" and expect to get a sorted list. The following code is a simple and feasible way to implement such a query:
contacts.sort("lastname")
contacts.search("firstname='bob'")
for c in contacts:
print "%s %s: %s" % (c[FNAME], c[LNAME], c[PHONE])
The output would print a list of "Bob" names and phone numbers, sorted by last name:
Bob Barker: 123-4567
Bob Dobbs: 234-5678
Bob Monroe: 345-6789
Bob Zemeckis: 456-7890
Python does not provide operators for searching or sorting, so a class method must suffice. These methods are very simple, because SQL implements them as parameters to SELECT instead of allocating separate commands for them. The only work the search and sort methods need to do is save data for later use:
def search(self, method):
self._search = ""
if method: self._search = "where %s" % (method)
def sort(self, method):
self._sort = ""
if method: self._sort = "order by %s" % (method)
Making these functions actually do something requires modifying the other class methods. The self._search and self._sort attributes must be defined, and inserted into queries where appropriate. The following methods should be modified:
books = Table(db, "book, series")
books.search("book.series=series.id AND book.publish_year>1995")
books.sort("series.name, book.publish_year")
SERIES_NAME, BOOK_TITLE, BOOK_YEAR = 6, 1, 2
for b in books:
print "%s: %s (%s)" % (b[SERIES_NAME],
b[BOOK_TITLE], b[BOOK_YEAR])
The preceding code tells Python to get rows from two tables, book and series. The search method then adds two additional instructionsjoin the tables by using the series ID, and ignore anything from 1995 or earlier. Then sort the results first by the name of the series, and then by the year each book was published. This has the effect of grouping related items together and showing them in chronological order. The code then chooses three interesting columns from the data, and prints the information in a human-readable form.
Inserting Rows
Reading data is useful, especially if you need to build reports or otherwise sort through data. But you probably will want to write to the database too. The simplest way to write is to insert new rows, which you can achieve with the following syntax:
contacts.insert('', "Stanley", "Spudowski",
"1965-01-01")
This is similar to a direct SQL query, but even simpler. And, because of the way the Python database spec works, you won't even have to worry about escaping data to make it safe. The spec requires automatic escaping for anything in the second parameter of the cursor's execute() function. If you ever plan to insert data supplied by a user, it is a good idea to use that protection, because it will prevent possible exploits in your program. Doing this requires a minor modification to the Table._query() method:
def _query(self, q, data=None):
if self.debug: print "Query: %s (%s)" % (q, data)
self.dbc.execute(q, data)
With that modification in place, you can now write an insert function. All it needs to do is pass the data to your database in a format compatible with the DB API spec. However, because it needs to support tables of any size and shape, the format string should be created dynamically. The following creates a format string such as "%s,%s,%s,%s" according to the length of the row it is given, and then executes the query.
def insert(self, *row):
fmt = ("%s," * len(row))[:-1]
q = "insert into %s values (%s)" % (self.name, fmt)
self._query(q, row)
As an additional usability feature, it avoids the need to pass a tuple object, by collecting all the function's arguments into a tuple automatically. The *row syntax does this. Its effect, basically, is to remove one set of parenthesis, allowing you to run insert(a,b,c) instead of insert((a,b,c)). del movies[58]
To make your Table class respond to the del command, add the following method to it:
def __delitem__(self, item):
q = "select %s from %s %s %s limit %s, 1" %
("_rowid", self.name, self._search, self._sort,
item)
self._query(q)
rid = self.dbc.fetchone()[0]
q = "delete from %s where %s=%s" % (self.name,
"_rowid", rid)
self._query(q)
This function performs two queries. First, it is necessary to obtain the row ID. Then it deletes the row identified by that ID. You can eliminate the need for the first query by always including the row ID in your query results, but for the sake of brevity, that is not demonstrated here. The provided example code includes row IDs, though, along with several other enhancements.
| DevX is a division of Jupitermedia Corporation © Copyright 2007 Jupitermedia Corporation. All Rights Reserved. Legal Notices |