Browse DevX
Sign up for e-mail newsletters from DevX


Speed Up Your SQL Inserts : Page 2

Traditionally, database performance has been represented by the speed of data retrieval, but that is not always the case. In some situations, database performance relies on the speed of data inserts.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Comparing Inserts

There are many ways to insert data into a table. Some of them are ANSI-compliant, while others are RDBMS-specific. But they all are either one-row inserts or many-rows inserts. Needless to say, that the many-rows insert is much faster than the repetitive one-row inserts, but how much faster? To figure that out, run the test in Listing 1.

Run all the batches from Listing 1 separately. Batch 1 creates and loads data into the table testInserts. The first insert (before the loop) loads 830 rows, selecting OrderID from the table Northwind..Orders. (If you are using SQL Server 2005 (SS2005) and haven't installed the Northwind database yet, you can download it from the Microsoft Download Center.) Then each loop's iteration doubles the number of rows in the testInserts table. The final number of rows for two iterations is 3,320.

To test one-row inserts, copy the result of Batch 3 into the new windows in Query Analyzer or Management Studio and then run it. In my tests on a few boxes with different hardware configurations, the execution time of the many-rows insert (Batch 2) was about 46 ms.; the execution time of the one-row inserts (produced by Batch 3) was approximately 36 sec. (These numbers relate to SS2000.) Thus, the many-rows insert is many times faster than the repetitive one-row insert.

A number of factors make repetitive one-row inserts slower. For example, the total number of locks, execution plans, and execution statements issued by SQL Server is much higher for repetitive one-row inserts. In addition, each insert (batch) needs to obtain object permissions, begin and commit transactions, and write data into a transaction log (even for a simple recovery model).

The following are just a few results that I got by using the Profiler and tracing the inserts:

  • BEGIN...COMMIT transaction pairs – 7,265 for one-row inserts versus one pair for many-rows inserts
  • Writes to transaction log – 11,045 and 6,360, respectively
  • Locks – 26,986 and 11,670, respectively

You also should remember that SQL Server has a pretty complicated mechanism for finding the space for new rows. For the heap tables, as in Listing 1, SQL Server uses IAM (index Allocation Map) and PFS (Page Free Space) pages to find a data page with free space among the pages that have been already allocated to the table. If all the pages are full, SQL Server, using GAM (Global Allocation Map) and SGAM (Shared Global Allocation Map), tries to find a free page in a mixed extent or assign a new uniform extent to the table. For the Listing 1 example, which has a heap table and no deletes, SQL Server inserts data at the end of the last page, allocated to the table. This is may produce a "hot spot" at the end of the table in a multi-user environment or when a few application servers are talking to one database.

Thus, for repetitive one-row inserts, SQL Server will launch the allocation mechanism as many times as you have inserts. For the many-rows insert, the space will be allocated immediately to accommodate all the inserted rows. For tables with indexes, you can additionally expect splits of data pages for clustered indexes and/or the index updates for nonclustered indexes.

Thanks for your registration, follow us on our social networks to keep up-to-date