Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Designing High Performance Stored Procedures : Page 3

Stored procedures can be an effective way to handle conflicting needs, but it's not always so obvious how to write them so they both perform well and scale.


advertisement
The Sudoku Solution
I decided to craft this stored-procedure based Sudoku solution example for multiple reasons. First, Sudoku uses a 9x9 table, so thinking of this table as a database table is pretty natural. Second, the database solution has a relatively simple structure, while at the same time—if used properly—nicely represents a high-performance implementation of the solution, using the first several tips discussed in the preceding section. Finally, because it's flexible, a database solution lets you look at the puzzle itself from a generic perspective, showing ways you can extend the puzzle solution to more generic tasks.

The solution consists of two database tables: one that stores the initial puzzle data, and a second that stores calculation results, the script(s) to fill the tables, and the stored procedure that calculates the results. You can download the sample code to experiment with the solution yourself.

The TEMP_GS_V1.sql script contains the table definitions. In the first table, each record represents a Sudoku cell. In the second table, each record represents the entire solution. The reason for structuring the tables in this manner will become clearer as you walk through the solution; but keep in mind that it's important that the structure be both flexible and avoid intermixing rules with assumptions (see Tip 5).

 
Figure 1. Unsolved A1Escargot Puzzle: The figure shows A1Escargot puzzle in its initial unsolved state.
Sudoku's solution rules state that each row, column, and 3x3 square must contain a unique set of digits from 1 to 9. The assumption is that given the initial set of numbers, there is only one resulting solution; otherwise the puzzle is considered invalid. This assumption, if transferred to real-life applications, sounds unreasonable. There is no way that developers can ensure a valid data state; therefore, I wanted my Sudoku solution to reflect that—to work properly whether the puzzle is valid or not—and for invalid puzzles, simply figure out all the possible solutions.

I have not created an interface for data entry (to create puzzles), because that's outside of the solution scope, so you have to enter data into the TEMP_GS table using a script. The Fill_GS.sql script contains the data entry example, which corresponds to a very complex Sudoku puzzle called "AlEscargot," which was once claimed to be the most complex SUDOKU puzzle ever. Figure 1 shows the puzzle in its initial unsolved state .

The CALCULATE_GS stored procedure (see the CALCULATE_GS_V3.sql script) contains the solution algorithm, which contains a set of steps that illustrate the aforementioned SQL Server tips. Note that the CALCULATE_GS procedure accepts a Puzzle Number parameter. This is just a habit to get into when you're creating solutions suitable for multi-user environments: if puzzles have distinct IDs, then multiple users can run the stored procedure at the same time, using the same physical tables.

The CALCULATE_GS stored procedure contains the following steps:

  1. Create a table variable, filling it with numbers from 1 to 9. This table variable illustrates Tip 4, because there are not that many records in the table, and it is heavily used within the stored procedure, sometimes joining to itself three times. In such cases, using a table variable is far more efficient than using a temp table. The procedure fills in the TEMP_GS table with all possible rows that correspond to possible cells using two operations:
  2. First, it calculates singletons—the cells which can contain only one possible value to meet the Sudoku rules. This evaluation gets performed multiple times, because new cells may become singletons as a result of previous calculations. This operation does not use a cursor; instead, it executes aggregate statements until singletons exist, which minimizes the number of possible statements. This approach illustrates Tip 1.
  3. Second, when no more singletons exist, the rest of the cells are filled with multiple possible digits, again in one operation (see tip 1).
  4. Next, the procedure creates and fills in a temporary table that corresponds to possible Sudoku rows. In this case, because the number of records is unpredictable in complex scenarios, it's best to use a temp table. It's worth noting that solving the solution using table variables may be faster in many instances. In real life, such situations may necessitate two separate stored procedures—one optimized for low volumes, and another for high volumes. The creation of the intermediate table over the aggregation steps illustrates Tip 2. This table eliminates a lot of invalid combinations based on a very simple aggregation condition: the sum of cells within each row must be equal to 45. This condition is simple enough (compared to the requirement to have unique digits within each row) to not slow down the aggregate statement, yet powerful enough to eliminate the majority of invalid combinations in a single step.
  5. Create and fill in a temporary table that corresponds to possible Sudoku columns, following the same paradigm as in step 2. This approach illustrates Tip 2: even though logically you need only the table that corresponds to valid rows, joining it to itself at a later point may result internally in billions of combinations after the joins and before filtering, which kills performance. In this and the next step, creating the extra tables to join lets you eliminate the possibility of too many intermediate combinations during joins.
  6. Create and fill in a temporary table that corresponds to possible Sudoku squares, following the same paradigm as in steps 2 and 3.
  7. Delete the records that do not adhere to Sudoku's rules from the three temp tables created in steps 2- 4. The tables were pre-filtered using aggregate conditions when the records were inserted into them, so now deleting the extra rows using precise conditions is fast because it operates on a small subset of data. In addition, deleting records separately based on each violation is faster than using an OR clause (the violations are represented by the rows where any value is repeated in any pair of columns). I combined all the delete statements into one loop using dynamic SQL to save some space. Because the statements are simple and don't require complex execution plans, using dynamic SQL does not affect performance much. However, if the statements were complex, it might be wiser to avoid dynamic SQL. This step illustrates Tip 3, though as explained earlier, it is not terribly important for temp tables—it's far more important for permanent database tables.
  8. After completing step 5, all three temp tables contain only valid Sudoku structures (rows, columns, and squares); therefore, simply joining them must result in valid solutions (note that for some puzzles, multiple solutions are possible). Joining operations are always fast, as long as the joined tables are small or indexed, and as long as the intermediate steps do not result in too many rows. The database engine is usually smart in optimizing the joins; however, specifying the proper join order helps the engine sometimes. Therefore, this step joins the three tables in palette order, starting with a row, then a column, then another row, another column, etc., involving squares at some point, so that each extra join will potentially result in the minimal number of virtual rows. The result of the join is inserted into the TEMP_GS_RESULT table, which contains the solutions, as mentioned earlier.
For puzzles that have only one solution, the FinalSelect.sql script selects from the TEMP_GS_RESULT table and outputs the solution in tabular form. The represented solution is reasonably fast. It solves the "AlEscargot" puzzle from Figure 1 in five minutes and twenty seconds on a 2.4 GHz machine with 1 GB of RAM. The solution (output in text form) is shown below:

1 6 2 8 5 7 4 9 3 5 3 4 1 2 9 6 7 8 7 8 9 6 4 3 5 2 1 4 7 5 3 1 2 9 8 6 9 1 3 5 8 6 7 4 2 6 2 8 7 9 4 1 3 5 3 5 6 4 7 8 2 1 9 2 4 1 9 3 5 8 6 7 8 9 7 2 6 1 3 5 4

Apart from the efficiency however, the solution, being both database-based and properly designed, represents the following additional advantages:

  • It does not assume that puzzles have only a single solution; it finds all the possible solutions if more than one is available.
  • It has a very clean structure that prevents mistakes.
The clean structure provides clear paths to extensibility. For example, because all the steps and joins are pretty obvious, it's very easy to rewrite the solution using dynamic SQL, extending it to higher dimensions. The standard Sudoku puzzle is based on a 32 x 32 square. Using a similar clean stored procedure structure and dynamic SQL, it is very easy to extend the solution to solve N2 x N2 square puzzles—and the resulting solution is guaranteed to perform well. However, for big Ns you should redesign the final result table to be cell-based rather than solution based, otherwise it may easily exceed SQL Server's column number and row length limitations. That's a rather simple enhancement though, and doesn't affect the efficiency of the major algorithm. In other words, the solution is scalable, and extended versions of the solution will be scalable as well, and that clearly differentiates it from the other possible solutions.



Gene Pinski currently works as a Dev Lead at Pearson School Systems where he's responsible for several development areas of Chancery SMS in Burnaby, BC, Canada. His professional interests include developing business and web applications, and developing games. Find more about Gene from his web site.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap