Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Optimize Inserts Using SQLAlchemy

Explore some methods to optimize database access when using SQLAlchemy.


advertisement

SQLAlchemy is a very sophisticated toolkit for databases in Python. It has many layers that have different levels of control and ease-of-use. SQLAlchemy is designed to provide superb performance, but it is sometimes too difficult to harness all the power. In this article I'll explore what's possible and demonstrate some methods to optimize database access when using SQLAlchemy.

Setup

Let's set up the stage for a few experiments. First you need to install SQLAlchemy. This is as easy as: 'pip install sqlalchemy'.

I'll use a SQLite database with a very simple table called 'person'. A person has an id and a name. That's it. Here we go:




import time
import sqlite3

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String,  create_engine
from sqlalchemy.orm import scoped_session, sessionmaker

Base = declarative_base()
metadata = Base.metadata
session = scoped_session(sessionmaker())
engine = None


class Person(Base):
    __tablename__ = 'person'
    id = Column(Integer, primary_key=True)
    name = Column(String(512)) 

Now let's define a function to initialize the database. It creates an engine and assigns it to the global variable 'engine'. Then, configures a Session class and resets the database.


def init_db(db_name='sqlite:///test.db'):
    global engine
    engine = create_engine(db_name, echo=False)
    session.remove()
    session.configure(bind=engine, autoflush=False, expire_on_commit=False)
    metadata.drop_all(engine)
    metadata.create_all(engine) 

Armed with this init_db() function, let's measure how long it takes to insert 100,000 records to the Person table using various methods.

SQLAlchemy ORM

The SQLAlchemy ORM is the most high-level interface. It follows an ActiveRecord paradigm and creates a Person object for each row.


init_db()
start = time.time()
for i in xrange(100000):
    p = Person(name='name-{}'.format(i))
    session.add(p)
    if i % 1000 == 0:
        session.flush()
session.commit()
duration = time.time() - start

print('SQLAlchemy ORM - total time: {:.2f} seconds'.format(duration)) 

SQLAlchemy ORM with pre-computed Primary Key

Here I still use the ORM, but instead of letting the ORM compute the primary key I provide it myself.


init_db()
start = time.time()
for i in xrange(100000):
    p = Person(id=i + 1, name='name-{}'.format(i))
    session.add(p)
    if i % 1000 == 0:
        session.flush()
session.commit()
duration = time.time() - start

print('SQLAlchemy ORM with PK - total time: {:.2f} seconds'.format(duration)) 

SQLAlchemy ORM with Bulk Insert

The ORM provides a special method for doing bulk inserts. With bulk insert you don't create a Person object for each row, but instead provide a list of dictionaries that contain each row's columns.

init_db()
start = time.time()
count = 100000
while count > 0:
    count -= 10000
    session.bulk_insert_mappings(
        Person,
        [
            dict(name='name-{}'.format(i)) for i in xrange(min(10000, count))
        ])
session.commit()
duration = time.time() - start

print('SA ORM Bulk insert - total time: {:.2f} seconds'.format(duration)) 

SQLAlchemy Core

SQLAlchemy Core is the low-level interface to SQLAlchemy's goodies. You operate directly on the engine and the tables and execute commands. There are no sessions. Note, that you're still a little higher-level then straight SQL, which means your code will be cross-platform and not DB engine specific.

Let's see how long it takes to populate the person table with 100,000 people.


init_db()
start = time.time()
engine.execute(Person.__table__.insert(),
               [dict(name='name-{}'.format(i)) for i in xrange(100000)])
duration = time.time() - start

print('SQLAlchemy Core - total time: {:.2f} seconds'.format(duration)) 

Straight SQLite3

Finally, to get an overall sense how much overhead SQLAlchemy imposes for ll its abstraction here the same operation of inserting 100,000 rows to the 'person' table done directly using the sqlite3 package (part of Python's standard library).


conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute('DROP TABLE IF EXISTS person')
c.execute(
    'CREATE TABLE person (id INTEGER NOT NULL, '
    '             name VARCHAR(255),'
    '             PRIMARY KEY (id))')
conn.commit()

start = time.time()
for i in xrange(100000):
    row = ('name-{}'.format(i),)
    c.execute('INSERT INTO person (name) VALUES (?)', row)
conn.commit()
duration = time.time() - start

print('Direct sqlite3 - total time: {:.2f} seconds'.format(duration))

Execution Time Comparison

Here are the runtimes of the different methods:

SA ORM - total time: 12.31 seconds
SA ORM with PK - total time: 7.57 seconds
SA ORM Bulk insert - total time: 0.78 seconds
SA Core - total time: 0.09 seconds
Direct sqlite3 - total time: 0.46 seconds

I ran the measurements multiple times and the measurements remain very stable. This is very surprising. The ORM method is quite slow (around 12 seconds). Surprisingly, providing the primary key shaves 40%! The bulk insert method is clearly well optimized and takes just 0.78 seconds. But, using the core directly takes just 0.09 seconds. That's 136X better than the ORM method!

Another small surprise is that using the core is faster than using SQLite3 directly. This could be due to the fact that in the SQLite3 method I used a loop vs. a list comprehension.

Conclusion

The bottom line is that SQLAlchemy is a mature and well-designed framework for working with databases. It provides a lot of abstractions, but is also very aware of performance and makes sure there is a way to squeeze the best performance out of your database. Profile your code and if you discover slowness related to SQLAlchemy start reading. You're probably doing something wrong and SQLAlchemy has a solution waiting for you.



   
Gigi Sayfan is the chief platform architect of VRVIU, a start-up developing cutting-edge hardware + software technology in the virtual reality space. Gigi has been developing software professionally for 21 years in domains as diverse as instant messaging, morphing, chip fabrication process control, embedded multi-media application for game consoles, brain-inspired machine learning, custom browser development, web services for 3D distributed game platform, IoT/sensors and most recently virtual reality. He has written production code every day in many programming languages such as C, C++, C#, Python, Java, Delphi, Javascript and even Cobol and PowerBuilder for operating systems such as Windows (3.11 through 7), Linux, Mac OSX, Lynx (embedded) and Sony Playstation. His technical expertise includes databases, low-level networking, distributed systems, unorthodox user interfaces and general software development life cycle.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date