atabase Tuning Advisor (DTA), the SQL Server 2005 replacement for the Index Tuning Wizard (ITW), has two entry points:
- A GUI that practically drives itself
- A command line interface with all its traditional gnarly-ness
The behavior under each interface is a little different. Using the command line, you can load arguments to the DTA utility with an XML file. The GUI does not provide the flexibility of XML to deliver startup information for analysis (but it can be imported and exported), nor does it allow you to ape a production instance from system tables. Instead, you can use the GUI to save a DTA session. In doing so, you create a skeletal XML file, which you can edit to add personal tuning enhancements to the session. This is analogous to using Profiler as a macro recorder; it enables you to watch the command syntax as it is generated, and learn how to do it from T-SQL instead of point-and-clicking.
In addition to using an XML file to deliver arguments to the DTA utility, the command line interface lets DTA ape a production instance’s environment. This is achieved not by moving tables around, but by using a copy of the production instance’s index statistics and metadata. This effectively recreates a target environment without the need to migrate a volume of data onto a development server or laptop. (If you need the real thing, see “Create a Developer’s Extract of a Production Database“). From the database and server, DTA creates a shell database that you can tune by using the TestServer sub-element and the TuningOptions element.
Don’t get me wrong; the GUI is great. I simply needed to discuss its limitations.
As a consequence of XML informing DTA of the user’s assumptions, a simple, powerful (always a great combination) advantage materializes?and it springs from XML’s nature. XML can express a nested relationship, or, viewed another way, it can store multiple values of each instance of something, just like rows in a table. Because XML can express these nests, it has a natural advantage over the command line in that it is the ideal vehicle to deliver alternate scenarios for DTA’s evaluation. Because the structure of the input file supports nested objects, it lets DTA evaluate alternative physical structures (indexes or partitions) in its analysis. It looks like the partitioning feature supported by SQL Server 2005 Enterprise Edition and up, so I cannot test it.
Any way you look at it, DTA is an improvement over ITW. The tuning activities using the GUI or the command line interface are the same, and they mirror the traditional approach. Set up a workload and the tool entertains one assumption at a time. It is up to the tester to introduce and manage variations to the test plan.
Methodology: Picking the Dataset and Indices to Test
I did not have any Profiler traces to feed DTA, so I had to shop around for something interesting to test. I decided to use the AdventureWorks sample database and review system tables to find a data set to evaluate alternate physical implementations suggested by DTA. I was looking for a data set to be narrow and feature a couple of interesting data types: a date and some things to count or add. I prefer columns that are not indexed, so I can create and destroy them, as well as measure their statistics and response times without tinkering with shipped objects.
As an enhancement to the data set, the test can include a volume of rows to make more than one partitioning plausible (say, months in one partition and regions in another). It can also make numerous possible indexes plausible (e.g., a covered query, one involving an equi-join on a clustered index of a single column, then separate standalone indexes). Other comparisons are for a multiple-column clustered index and a multiple-column, non-clustered index, notwithstanding uniqueness. To give the possible test conditions context, Table 1 shows a decision table in rule-name format that you can use as an idealized tuning test plan to exercise join types and index types.
|Join Type||4 (inner, outer,
equi, and theta)
|Table 1. Decision Table for Idealized Tuning Test Plan|
Now, each column from Table 1 needs to be paired with each sequence of four join conditions. The explosion of conditions would be hard to read, but it is conceptually correct as depicted above. The total number of testable conditions is the product of the tables (i.e., sixteen times four, for a total of 64 combinations of conditions to test).
For a fresh look at which columns could be fruitful indexing candidates, I took a slightly automated tack to collect candidates. I looked in the sysindex table for_WA_Sys indexes. Typically, these are created by the optimizer when a column is used in a WHERE clause, but there was no index created for it. In other words, this is the optimizer’s “helper” index, which was created to resolve an ad hoc query. Figure 1shows how I collected the un-indexed columns.
|Figure 1. Finding Un-Indexed Columns to Examine|
Once I knew the table and column names of the un-indexed columns, I could create and destroy indexes on them. This was just a device for assessing columns to be index candidates. DO NOT DO THIS IN PRODUCTION!This is not recommended for anything but test systems.
Alternatively, DTA can work from a SQL Profiler trace file or table as its workload. The benefit here is that it may already be part of your DBA’s routine, which makes analysis an even less intrusive collection mechanism. It also is a more real-world representation of database use.
Filling in the Blanks Using _WA_Sys
To have my way with the index, I needed to commandeer the clustered index already on the table. Since the clustered index is the physical artifact of a primary key declaration, I must deal with the constraint first, and then I can trash the index. The following commands carry out that sequence of events:
sp_helpindex 'production.transactionhistory'alter table production.transactionhistory drop constraint pk_transactionhistory_transactioniddrop index production.transactionhistory.pk_transactionhistory_transactionidclears the desks for us to now create a new one:create unique clustered index CUX_TransactionHistory on production.transactionhistory(transactiondate,
transactionID, ReferenceOrderID, ReferenceOrderLineID, ProductID)
In keeping with the ancient database dictum, “Store it the way you read it,” I have re-designed the old pk_index with a covering index (see Figure 2).
|Figure 2. Using the Information Schema Views to Generate ‘Create Index’ Statements|
I cut and pasted the generated statement into a second query window and executed it. Now, I can examine the statistics generated for the indexes to exercise the DTA, looking for a table with lots of rows and un-indexed columns so I can create a “where” clause to exercise index selections. Table 2 lists the best-looking candidates (i.e., un-indexed date data types with many rows) arrived at by this method.
|Table Name||Column Name|
|Table 2. Candidate Un-Indexed Columns|
Transaction History could be surrounded by SalesOrderHeader and Product. This provides a date, as well as costs and quantities, to aggregate.
Configuration and Enabling Remote Connections
As an aside, under the old model, a user inherited permissions implicitly, whereas under new model requires explicit actions before allowing remote connections and grants. Figure 3displays the set up screen for server configuration, followed by the client set up. Books Online explains: “In Microsoft SQL Server 2005 Express Edition, Developer Edition, and Evaluation Edition, remote connections are disabled by default. In other editions, remote connections are enabled by default.”
|Figure 3. Enabling TCP/IP on the Server|
On the server side of the communication, the Net-Libraries?DLLs that enable communication between the OS and the database?are part of the database engine. On the client side, the Net-Libraries are part of the SQL Native Client. Net-Libraries are not directly configured. Instead, the server and the SQL Native Client are configured to use a network protocol. Then, the database engine and the SQL Native Client automatically use the appropriate Net-Library settings. This state is established using the configuration manager or surface area manager (see Figure 4).
|Figure 4. Enabling TCP/IP on the Client|
Now I have a test plan, a data set, and a configuration ready to go. All I need is a work file against that data set for DTA to evaluate. When evaluating different query formulations, it does not hurt to include the following block at the top of your code:
Set statistics profile on -- gives you a show plan textSet statistics time on -- gives you server and client elapsed time in msDbcc freeproccache -- blows away any compilation in cache for your testsIssue your query -- works the gears?Dbcc show_statistics "your_table_name," "your_index_name"
When you issue
dbcc show_statistics "your_table_name," "your_index_name", you can see the selectivity (or density) of the index recommendations. This measures the index usefulness and the number of boundary rows and tied values. It also gives a great view of the dispersion of values through the data. It is a kind of canned data analysis, but it is all done for free.
I created a work file from the sample data to exercise a singleton select, a range select, and an aggregate. Here are the examples from the work file:
use adventureworksgo--an aggregateselect cast(sum(actualcost) as decimal(9,2)) as sumcost, sum(quantity) as sumqty, b.[name] from production.transactionhistory a join production.product b on a.productid = b.productidjoin sales.salesorderheader c on c.salesorderid = a.referenceorderidwhere transactiondate between '2003-10-01 00:00:00.000' and '2003-10-31 00:00:00.000'group by month(transactiondate), year(transactiondate), b.[name]order by 1 desc,2 desc--a rangeselect TransactionID,a.ProductID,ReferenceOrderID,ReferenceOrderLineID,TransactionDate,TransactionType,Quantity,ActualCost,b.namefrom production.transactionhistory a join production.product b on a.productid = b.productidjoin sales.salesorderheader c on c.salesorderid = a.referenceorderidwhere transactiondate between '2003-10-01 00:00:00.000' and '2004-04-30 00:00:00.000'order by TransactionDate, TransactionID, ReferenceOrderID, ReferenceOrderLineID, ProductID--a singletonselect TransactionID,a.ProductID,ReferenceOrderID,ReferenceOrderLineID,TransactionDate,TransactionType,Quantity,ActualCost,b.namefrom production.transactionhistory a join production.product b on a.productid = b.productidjoin sales.salesorderheader c on c.salesorderid = a.referenceorderidwhere transactionid = 119639order by TransactionDate, TransactionID, ReferenceOrderID, ReferenceOrderLineID, ProductID
Using the workload and entering through the GUI, I got the result shown in Figure 5.
|Figure 5. Feed the File Name for the Work File into the Dropdown|
If you select Export from the File menu, you save the session you created into an XML file. I did, and used that to enhance its definition. MSDN has published a usable cheat sheet for this operation, as well as the template to work from.
To work out an example that will help carry this idea along, invoke DTA from the command line with the bare minimum of arguments and specify an XML input file for “what if?” analysis. After copying this sample into your editing tool, replace the values specified for the Server, Database, Schema, Table, Workload, TuningOptions, and Configuration elements with those for your specific tuning session. (For more information about all of the attributes and child elements you can use with these elements, see the XML Input File Reference (DTA).)
This XML Input File Samplefrom MSDN uses only a subset of available attribute and child element options. The command line equivalent would look something like this:
C:Documents and Settingsdrew.DEVBOX>dta -S devbox -s 20060503DrewB -D Adventureworks
-if DTAWorkload2.sql -ix MyAlternatives.xml
In the Final Analysis: DTA Is Valuable
DTA provides two significant benefits. First, it enables you to know which of several alternate index formulations will perform best, because it runs analysis on a test machine using a production server’s metadata. Secondly, you can entertain lists of related nested attributes to test in the same session, whether that list consists of more than one database or more than one variety of indexes (only non-clustered, all indexes, or all indexes and indexed views).