s an open source database, PostgreSQL has one of the largest libraries of Application Programmable Interfaces (APIs) available, letting you access it easily from various languages. One such language is Python, which happens to be one of my favorite languages; I use it for almost all my hacking. To be honest, I’m not that great a programmer. I’m a database administrator and operating system consultant by trade. But Python ensures that the code that I do write is readable by other, more talented programmers?even six months after I stopped working on it.
Nine times out of 10, I use Python to communicate with a PostgreSQL database. My driver of choice when doing so is called Psycopg2. Like the first release (Psycopg), it’s designed to be lightweight and fast. Recently Psycopg2 has been under heavy development and is currently in Beta 4, which is supposed to be the last beta before the official release.
This article discusses how to connect to PostgreSQL with Psycopg2 and illustrates some of the nice features that the driver offers. For the example code I used Psycopg2, Python2.4, and PostgreSQL 8.1dev.
I do not advocate running PostgreSQL 8.1dev in production, it is currently in alpha. The current version of PostgreSQL is 8.0.3, but 8.0.4 should be current by the time you read this article. |
Psycopg2 is a DB API 2.0-compliant PostgreSQL driver that is under active development. It is designed for multi-threaded applications and manages its own connection pool. One other interesting feature of the driver is that if you use the PostgreSQL array data type, Psycopg2 will automatically convert a result containing that data type to a Python list.
The following discusses specific uses of Psycopg2. It does not try to implement a lot of object-oriented code, but instead tries to provide clear and concise syntactical examples of using the driver with PostgreSQL.
Making the Initial Connection
#!/usr/bin/python2.4 # # Small script to show PostgreSQL and Psycopg2 together # import psycopg2 try: conn = psycopg2.connect("dbname='template1' user='dbuser' host='localhost' password='dbpass'"); except: print "I am unable to connect to the database"
The above code imports the adapter and tries to connect to the database. If the connection fails, the code prints a statement to STDOUT. You could also use the exception to try the connection again with different parameters if you like.
The next step is to define a cursor to work with. It is important to note that Python/Psycopg cursors are not cursors as defined by PostgreSQL. They are completely different beasts.
cur = conn.cursor()
Executing a Query
After making the connection and defining a cursor, you can execute a query. For example:
cur.execute("""SELECT datname from pg_database""")
After executing your query you will need to assign the results to a variable. The results will be in the form of a Python list.
rows = cur.fetchall()
At this point, the query results are stored in the variable named rows. Using this variable, you can start processing the results. For example, to print all the data selected to the screen, you could write:
print "
Show me the databases:
" for row in rows: print " ", row[0]
Psychopg2 and Isolation Level
Everything you’ve seen so far would work with any database that Python can access. Now, here are some finer points. PostgreSQL does not have an autocommit facility, which means that all queries execute within a transaction.
Execution within a transaction is a very good thing; it ensures data integrity and allows for appropriate error handling. However there are queries that can not be run from within a transaction. Consider the following example:
#/usr/bin/python2.4 # # import psycopg2 import sys # Try to connect try: conn=psycopg2.connect("dbname='template1' user='dbuser' password='mypass'") except: print "I am unable to connect to the database, exiting." sys.exit() cur = conn.cursor() try: cur.execute("""DROP DATABASE foo_test""") except: print "I can't drop our test database, check your isolation level." sys.ext()
The preceding code would fail with the printed message of “I can’t drop our test database, check your isolation level” That’s because PostgreSQL cannot drop databases within a transaction; dropping a database is an all-or-nothing command. To drop the database you would need to change the isolation level of the database, which you can do using the following command:
conn.set_isolation_level(0)
You would place the conn.set_isolation_level command immediately above the DROP DATABASE cursor execution command in the example.
Support for PostgreSQL’s Special Data Types
The Psycopg2 driver also has the built-in ability to deal with some of the special data types that PostgreSQL has available. One such example is arrays. In the following PostgreSQL table definition, the notes column in the bar table is of type text[]. Square brackets ([]) have a special meaning in PostgreSQL, meaning that the column type is not just text but an array of text values. Table 1 shows the structure of a public.bar sample table
Table 1. Public.bar Structure: The table shows the structure and indexes of the public.bar table.
Table public.bar | ||
Column | Type | Modifiers |
id | bigint | not null default nextval(‘public.bar_id_seq’::text) |
notes | text[] | |
Indexes | ||
“bar_pkey” PRIMARY KEY, btree (id) |
To insert values into this table you would use a statement like the following.
foo=# insert into bar(notes) values ('{An array of text, Another array of text}');
After running that statement, selecting all columns from the table would produce the following result.
foo=# select * from bar; id | notes ----+---------------------------------------------- 2 | {"An array of text","Another array of text"} (1 row)
Note that the notes column contains an array of text strings. With many languages and database drivers you would need to create a routine to parse the above array output manually, but Psycopg2 automatically converts the array into a Python list, for example:
#/usr/bin/python2.4 # # import psycopg2 # Try to connect try: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass'") except: print "I am unable to connect to the database." cur = conn.cursor() try: cur.execute("""SELECT * from bar""") except: print "I can't SELECT from bar" rows = cur.fetchall() print "
Rows:
" for row in rows: print " ", row[1]
If you execute the preceding script you’d see the following output:
[jd@jd ~]$ python test.py Rows: ['An array of text', 'Another array of text']
Again, you see the full embedded list; however, because Psycopg2 and Python give you direct access to the array of text strings, if you change the last few lines, you can access individual items in the array with code similar to the following:
rows = cur.fetchall() for row in rows: print " ", row[1][1]
The above loop outputs the first text string in each array of strings, for example.
Rows: Another array of text
Using Named Columns
Some programmers would prefer to not use numeric column representations such as row[1][1]; instead, it can be easier to use a dictionary. Here’s the same example with a slight modification that uses the column name rather than the column index number:
#/usr/bin/python2.4 # # # load the adapter import psycopg2 # load the psycopg extras module import psycopg2.extras # Try to connect try: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass'") except: print "I am unable to connect to the database." # If we are accessing the rows via column name instead of position we # need to add the arguments to conn.cursor. cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) try: cur.execute("""SELECT * from bar""") except: print "I can't SELECT from bar" # # Note that below we are accessing the row via the column name. rows = cur.fetchall() for row in rows: print " ", row['notes'][1]
Notice that the preceding code did not use row[1] but instead used row[‘notes’], which signifies the notes column within the bar table.
The output from that is identical to the example that used the column number:
Rows: Another array of text
Finally, I’d like to show you how simple it is to insert multiple rows using a dictionary. Suppose you had a dictionary containing the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"}, {"first_name":"Steven", "last_name":"Foo"}, {"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor() cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
The cur.executemany statement automatically iterates through the dictionary and executes the INSERT query for each row.
The only downside to using Psycopg2 and PostgreSQL is that the driver is a little behind in terms of server-side support functions such as prepared queries, but the author of Psycopg2 expects to implement those features in the near future.