Browse DevX
Sign up for e-mail newsletters from DevX


Validate Your Refactored SQL Queries with a New SQL Server 2005 Operator

Prove that your refactored SQL query returns the expected results using the EXCEPT operator, new in SQL Server 2005.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

s a developer, how often have you been asked to refactor a SQL statement to increase performance, accommodate changes to database design, support a new database, or for a whole host of other reasons? Often you need to do the refactoring without affecting the results. When you refactor a query that returns the correct result set, adhering to the Hippocratic principle of "First, do no harm" will ensure you maintain quality. In this case, the principle as applied is "Make sure you return the same results."

The question then arises as to how you prove that the fabulous new SQL you've written returns an identical resultset when your query may return thousands of records. You can count the records, but after that you are into sampling or having QA test the data, which takes time and may have insufficient coverage.

This article shows how you can use the new EXCEPT operator in SQL Server 2005 to prove that your rewritten SQL is returning the same results. It takes you through a simple example, but you can apply the same principle to a query of any complexity.

When a Simple Row Count Is Not Enough
First, let's create a simple scenario that demonstrates the technique: Six rows are inserted into TableA, and the key field on TableA (id) may or may not appear in the refid column on TableB. Here's the SQL:

CREATE TABLE TableA (id INT IDENTITY NOT NULL PRIMARY KEY, textfield char(20)) go CREATE TABLE TableB (refid int NOT NULL FOREIGN KEY REFERENCES TableA(id), moretext char(40)) go INSERT INTO TableA VALUES ('Row one') INSERT INTO TableA VALUES ('Row two') INSERT INTO TableA VALUES ('Row three') INSERT INTO TableA VALUES ('Row four') INSERT INTO TableA VALUES ('Row five') INSERT INTO TableA VALUES ('Row six') INSERT INTO TableB SELECT id, RTRIM(textfield) + ' - Reference' FROM TableA WHERE id in (1, 2, 4, 6)

Four of the rows in TableA are referenced in TableB. The SQL for a query that returns only rows from TableA that are referenced in TableB is as follows:


Here are the results:

id textfield ----------- -------------------- 1 Row one 2 Row two 4 Row four 6 Row six (4 row(s) affected)

Now suppose a new development comes along that requires you to use joins as a discipline instead of WHERE EXISTS. You could change the query to use a left outer join like this:

SELECT TableA.* FROM TableA LEFT OUTER JOIN TableB ON TableA.id = TableB.refid

Here are the results:

id textfield ----------- -------------------- 1 Row one 2 Row two 3 Row three 4 Row four 5 Row five 6 Row six (6 row(s) affected)

This SQL returns six rows, which instantly indicates a problem: The simple row count mechanism does not tell you which extra rows have been returned. In this small sample, you can easily see which data is extra, but if the original and refactored queries returned thousands of rows, it would be much more difficult.

Identifying the extra rows is an important step in determining what is wrong with your refactored query, but a simple row count alone is not enough. The following section demonstrates how to use the new EXCEPT operator in SQL Server 2005 in addition to a row count to check the output. You will still need to use the number of rows returned as a validation mechanism because EXCEPT eliminates all duplicated rows from the results.

Thanks for your registration, follow us on our social networks to keep up-to-date