anaging large amounts of data is always a challenge. Several major database vendors claim that their database engines are ready for terabytes of data, which is true to a certain extent. However, when those terabytes are not static data, but live information that people update constantly, search on, report from, and run web applications against, and when the number of concurrent users accessing the data peaks significantly as well, providing fast, unblocked access to those terabytes becomes extremely challenging.
It would be simple if improving the hardware solved the problem, but unfortunately increasing hardware horsepower on its own in this case doesn't buy a lot for end users. The simple truth is that the end user perception of performance often comes down to a single database record, which several processes may be trying to update and several others read at the same time. Performance and scalability often require special design tricks on the database sideand very often such tricks are database-engine specific.
From an architectural point of view, there are several ways to improve performance and scalability of database-driven applications, and you must usually implement them all to achieve the acceptable results. One technique is to change the database architecture to improve database performance in certain areas. Such changes include database splits, data normalization and de-normalization, table splits (horizontally and vertically), adding indexes and constraints, and other similar techniques. I won't go into detail on these in this article.
Sometimes however, one requirementfor example a need for fast updatescontradicts other requirements, such as a need for fast searches and real-time reporting. The latter requirements mean that you're not free to change the architecture of the database significantly; it must be preserved in a certain stable state that satisfies all the requirements to the greatest possible extent. If at the same time, you need to run a complex process on the database side, that must perform and scale well, you must usually design it as carefully crafted database stored procedure.
There are several reasons to write stored procedures. First, from the logical point of view, if the back-end logic can be divided into business logic and data logic, the latter should naturally reside on the database side, leading to cleaner design and better maintainability. Secondly, database engines compile stored procedures and save execution plans along with them, which improves performance "for free" (from the developer's point of view). Finally, and most importantly for this discussion, placing data manipulation logic into the stored procedures lets you use several approaches and tricks that can improve their performance and scalability significantly.
Some of these approaches are global, obvious, and easy to follow. Others may be specific to a particular database engine. Several approaches require non-standard, task-specific design and ideas to implement properlyand this is where the true art of database design comes to play. All the database design-specific approaches, however, usually end up with more complex script than improper design, and that's the major reason why they are often not followed.
In this article, I'll share a few tips that I have personally found to be the most important when developing SQL Server stored procedues. These tricks reflect both the outcome of my mano-a-mano fights with the database to achieve better performance, and a summary of the experience of the extremely talented software developers I've had the privilege to work with. I'll present the tips, and then illustrate them by the stored procedure-based SUDOKU solution I have designed for this purpose.