iltering is an important aspect of writing most complex queries. While there may be instances where you don’t need a filter, such as a simple query that retrieves a list of all U.S. states, in the majority of cases you need to filter the data to a more precise set. For example, you might need to obtain a list of all U.S. states that begin with the letter “I.”
Developers and DBAs have many operators available to perform such filters. Query performance is definitely a very high consideration when deciding which operator to use. There are some experts who claim that certain operators generally perform better than others, and provide rules of thumb to guide your choices. While these claims may be generally true, there are a number of influences on query performance in any individual case that can prove more important than the general rule. Basing the decision to select an operator strictly on a rule of thumb over the study of individual query usage and execution plans could result in a query that has less than optimal performance.
Choosing the Optimal Operator
When approaching the selection of an optimal operator, first consider how the query will be used. If the query is to be used infrequently and is part of a process that executes when there is low activity, such as an overnight SQL Server Integration Service or Data Transformation Service package, a long-running query may be acceptable. Alternatively, if the available time is short, such as a major online shopping outlet that may need to execute a query for each transaction, you’d want to spend the time to get optimal query performance?both to preserve the customer’s experience and to preserve database server stability.
You’ll also want to consider how and where any filtering values are obtained. Is the filtering value a single value or multiple values? Some operators accommodate multiple values better than others. Obtaining multiple values from another table or sub-query rather than passing them specifically may also affect which operator you should select.
It’s also important to consider indexes when approaching operator selection. When an appropriate index is available, the SQL Query Optimizer will usually use that index to perform an index seek. If the index does not exist or is not optimal, a the database must perform a table scan or index scan?typically resulting in reduced performance. However, if the table that is being queried contains relatively few rows, a table scan or index scan may actually be faster than an index seek. You’ll want to take such conditions into account when evaluating execution plans.
Evaluating Execution Plans
Blanket statements such as “the EXISTS logical operator performs better than the IN logical operator,” make the cynic in me clear his throat and tap me on the shoulder. The decision to select one operator over another cannot be determined solely by taking someone’s word for it. Evaluating execution plans to test your queries for the database in which the query is being written is a better approach. Execution plans reveal whether table scans or index seeks are being performed, and illustrate the relative performance of the various parts of the query.
You can view execution plans using the Query Analyzer tool in Enterprise Manager (for SQL Server 2000) or through SQL Server Management Studio (for SQL Server 2005).
Testing a Sample Query
To illustrate the testing process, this article uses the AdventureWorks sample database, testing several variations of a query that retrieves sales details for three specific products. You can assume that the results will be displayed in a GridView object in a web application. The GridView object gets populated when a postback occurs on the web application; therefore, the query must perform quickly and (assuming a busy web site) the frequency of execution will be high as well.
This example uses three constant hard-coded ProductID values to filter the query.
ProductID = '762' ProductID = '754' ProductID = '770'
|What You Need|
|To reproduce the testing environment used in this article, you’ll need:
Although it’s generally best practice to avoid hard-coded values, their use in this example serves to isolate the effect of the operators on the execution plans from any influence by sub-queries that aren’t related to the operators themselves.
The sample uses two tables: SalesOrderDetail and Product, each of which has several defined indexes:
The Product table has four indexes:
- PK_Product_ProductID (Clustered index through a Primary Key)
- AK_Product_rowguid (Unique non-clustered index)
- AK_Product_ProductNumber (Unique non-clustered index)
- AK_Product_Name (Unique non-clustered index)
The SalesOrderDetail table has three indexes:
- PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID (Clustered index through a Primary Key)
- IX_SalesOrderDetail_ProductID (Non-clustered index)
- AK_SalesOrderDetail_rowguid (Unique non-clustered index)
For filtering, there are two types of operators: comparator operators and logical operators.
Comparator operators evaluate the two values being compared to determine if they meet the operator’s criteria. T-SQL’s comparator operators are:
- Equal (=)
- Greater Than (>)
- Less Than (<)
- Greater Than Equal To (>=)
- Less Than Equal To (<=)
- Not Equal To (<>)
Logical operators evaluate a condition and return a Boolean value that lets you know whether the condition was met. T-SQL’s logical operators are:
You can use comparator and logical operators together in complex filtering. When using multiple operators the order of operations may potentially affect the performance of the query. T-SQL treats operators according to both position and type: When the multiple operators being used are the same type (for example, two comparator operators), operations occur from left to right; when the operators are different, comparator operators take precedence over logical operators.
The examples described in the rest of this article evaluate four of the most commonly used operators: the Equal comparator operator (=), and the IN, ANY, and EXISTS logical operators.
The Equal Comparator Operator and the IN Logical Operator
As noted above, you need a query that retrieves sales details for three specific products. You can accomplish the task by using either a WHERE clause that filters with a comparator operator via equality (=) or with a logical operator?using IN.
Comparator operators are designed to be used for only two values: the value being evaluated and the value being sought. To test for equality against three specific ProductIDs, you also have to use the OR operator. Here’s a query that uses the Equal comparator operator:
SELECT SalesOrderID, CarrierTrackingNumber, ProductID, UnitPrice FROM AdventureWorks.Sales.SalesOrderDetail WHERE (ProductID = '762') or (ProductID = '754') or (ProductID = '770')
In contrast, the IN logical operator lets you use a list of arguments without also having to use OR or AND.
SELECT SalesOrderID, CarrierTrackingNumber, ProductID, UnitPrice FROM AdventureWorks.Sales.SalesOrderDetail WHERE ProductID IN ('762','754','770')
|Author’s Note: Although this particular example uses explicit values as arguments to the IN operator, you can use a sub-query to obtain the argument values.|
|Figure 1. Execution Plan for Equal and IN: For these two queries and this particular database, the Equal comparator operator and the IN logical operator result in identical execution plans.|
Despite the distinct differences in the query syntax, when executed against the AdventureWorks database and these specific tables, both sample queries produce identical execution plans (see Figure 1).
The ANY and EXISTS Logical Operators
The ANY logical operator returns a Boolean true when any of the items in its arguments are met, while the EXISTS logical operator returns a Boolean true if the sub-query in its argument returns any rows. Testing the performance of these operators can be tricky because the sub-queries used often contain other operators, which directly affect the results. To minimize such effects, the following ANY and EXISTS query samples both contain a sub-query that uses the Equal comparator operator.
Here’s the query to test the ANY logical operator:
SELECT SalesOrderID, CarrierTrackingNumber, ProductID, UnitPrice FROM AdventureWorks.Sales.SalesOrderDetail WHERE ProductID = ANY( SELECT ProductID FROM AdventureWorks.Production.Product WHERE (ProductID = '762') OR (ProductID = '754') OR (ProductID = '770') )
And here’s the query to test the EXISTS logical operator:
SELECT a.SalesOrderID, a.CarrierTrackingNumber, a.ProductID, a.UnitPrice FROM AdventureWorks.Sales.SalesOrderDetail a WHERE
EXISTS ( SELECT b.ProductID FROM AdventureWorks.Production.Product b WHERE a.ProductID = b.ProductID AND ( (b.ProductID = '762') or (b.ProductID = '754') or (b.ProductID = '770') ) )
? Figure 2. Execution Plan for ANY and EXISTS: The ANY and EXISTS logical operators offer an execution plan that introduces nested loops.
Again, and perhaps unexpectedly, when executed against the AdventureWorks database, both the IN and EXISTS queries above produce identical execution plans (see Figure 2).
Evaluating Execution Plans
At first glance, the comparison of the two unique execution plans reveal that using the Equal comparator operator and the IN logical operator produce a much simpler execution plan than using the ANY and EXISTS logical operators. A closer look reveals that the Equal and IN operators cause an index scan while the ANY and EXISTS operators cause multiple index seeks.
The ANY and EXISTS operators introduce an element in the execution plan called nested loops. A nested loop is where a JOIN, in this case an INNER JOIN, occurs between multiple data sets.
You’ll see a list of information for each icon that’s displayed in the execution plan, revealed when you hover your cursor over the icon. Four specific pieces of information are of interest in evaluating query performance:
- Estimated I/O Cost. This number represents the step’s cost for data input/output activity.
- Estimated CPU Cost. This number represents the step’s cost to the processor(s).
- Estimated Operator Cost. This number represents the step’s cost to the Query Optimizer as well as the step’s percentage of the query’s total cost.
- Estimated Subtree Cost. This number represents the step’s total cost to the Query Optimizer and all previous steps.
These numbers can be affected by many factors, including the server environment in which the database resides and the current activity load on the SQL Server.
Estimated I/O Cost
A higher unit value represents a higher cost in the query’s execution. For this example, the total I/O cost for all steps in the Equal and IN operator execution plan was 0.915718 units; while the total I/O cost for all steps in the ANY and EXISTS operators was 0.0101157 units.
Note that while the ANY and EXISTS logical operators contain Equal comparator operators within their arguments, it is the use of these operators that ultimately affects the performance of the query. In this example, using the Equal operator directly on the SalesOrderDetail table results in a higher I/O cost than using the Equal operator indirectly through the Products table as an argument to the ANY and EXISTS logical operators.
Estimated CPU Cost
The total CPU cost for all steps in the Equal and IN execution plan was 0.133606 units; while the total CPU cost for all steps in the ANY and EXISTS execution plan was 0.0078319 units. Again, a higher unit value represents a higher cost in the query’s execution.
Estimated Operator and Subtree Cost
The estimated operator cost is valuable primarily to identify steps that need attention when tuning the query. This article won’t discuss tuning strategies?only the cost of the step’s execution. The percentage represented in the estimated operator cost is for the entire execution plan. A high percentage does not necessarily represent high cost; however, if query performance is poor, these high percentage items should be the first steps to review because they will have the most impact.
The subtree cost represents the overall cost to the Query Optimizer for the execution of the step in question. Again, the higher the number, the higher the performance cost to execute.
The execution plan for the Equal and IN operators had a total subtree cost of 1.04932 units; while the total subtree cost for all steps using the ANY and EXISTS operators was 1.166191 units.
For the specific queries being evaluated and the environment in which the database resides, the overall performance of the Equal and IN operators was better than the performance of the ANY and EXISTS operators. This performance benefit occurred despite the I/O and CPU numbers stating the opposite, because the subtree cost to the Query Optimizer swung the vote in the other direction.
At this point, for this particular scenario, you’ve eliminated the ANY and EXISTS operators as the best choice; however, the final decision of whether to use the Equal comparator operator or the IN logical operator comes down to personal preference or anticipation of future considerations.
The keys to making an informed decision to determine best query operators involves considering the effect of indexes on query performance, evaluating the execution plans, and taking the database environment into consideration. The results of such evaluations may prove (or disprove) commonly held rules of thumb, but they’re instrumental in determining the operator that will result in optimal performance.