Indexes on Computed Columns: Speed Up Queries, Add Business Rules

n index on a computed column can boost your SQL Server query’s performance or even implement a complex business rule. (Computed columns do not store their own values; their values are derived from regular columns.) The following examples describe the scenarios in which this technique would be useful.

Use Computed Columns to Speed Up Search

Consider a very simple situation: even with an index on last_name, the following query will run slowly because its search criteria are not index-friendly:

select last_name, first_name, address from partywhere soundex(last_name) = soundex(@last_name) 

If you need to make such searches frequently, need them to respond quickly, or both, add a computed column and create an index on it:

alter table party add soundex_last_name as soundex(last_name)create index soundex_last_name_ind on party(soundex_last_name)

The query won’t run any faster on SQL Server 2000, however, until you explicitly specify the computed column soundex_last_name. That done, the optimizer will use the index that you just created and the query will return much faster:

select last_name, first_name, address from partywhere soundex_last_name = soundex(@last_name) 

Note: you can build an index on a computed column only if the column is deterministic and several database settings are set up properly. For more detail, refer to “Creating Indexes on Computed Columns” and “SET Options That Affect Results” on MSDN.

Another common scenario in which an index on a computed column comes in very handy is when your users submit cookie recipes for a cookie recipe contest. Because many users hit the “Submit” button several times, you need to check whether the submitted recipe is already in the database before storing it. Because a recipe may use up to 6Kb, you cannot use an index on the recipe text. Again, by using an index on a computed column:

alter table recipe add text_checksum as checksum(recipe_text)create index text_checksum_ind on recipe(text_checksum)

You can search for a matching recipe very efficiently:

select submitted_at, recipe_text from recipe where text_checksum = checksum('Just buy some saltines')and recipe_text = 'Just buy some saltines'

Note: the second condition in the where clause is necessary because different recipe texts may have the same checksum.

If you frequently search by specifying how a string ends, such as the following:

last_name like '%stone'

Consider creating a computed column as reverse(last_name) and then creating an index on it. When you do that and rewrite your query as reverse_last_name (like ‘enots%’), your searches are likely to get a performance boost.

Note: in many cases, using an index on a computed column is overkill. For instance, consider this query:

select sale_amount, sale_date, checkout_counter from sales? this expression truncates sale_date to midnight, where dateadd(day, datediff(day,'20070101', sale_date),'20070101') = ? @date is already truncated to [email protected]

If you already have an index on sale_date, do not rush to create a computed column based on the expression used in the query, and do not rush to create an index on the computed column either. A much simpler alternative is to just rewrite the query’s search criteria as follows and then utilize the existing index on sale_date:

where @date <= sale_date and sale_date < @next_date

If you don't already have an index on sale_date, consider creating it?you may reuse it in many situations besides speeding up the query you are currently working on.

Use Computed Columns to Implement Complex Business Rules

Suppose you need to store a list of employees, their phone numbers, their e-mail addresses, etc. Some employees have VPN tokens, others don't. You need to make sure that the VPN token numbers entered are unique, while allowing any number of null VPN token numbers. A unique index on VPN token number will not do it, as it won't let you enter more than one null. Use an approach that Ivan Arjentinski and Steve Kass (who also coined the word nullbuster) simultaneously published on the MS SQL Server newsgroup. Assuming that the table employee has a primary key employee_PK, implementing the following business rule is very easy:

alter table employee add nullbuster as (case when VPN_token is null then employee_PK else null end);create unique index VPN_Unique on employee (VPN_Token, nullbuster);

Similarly, if you have the following table for outstanding tasks:

create table task(task_ID int not null primary key check(Task_ID > 0),agent_ID int not null check(agent_ID > 0),status varchar(10) not null check (status in('Active', 'Pending', 'Completed')));

And you want to implement a business rule "Every agent has at most one active task at any time," use an index on computed columns to implement it:

alter table task add agent_ID_when_active as case when status='Active' then agent_ID else null end;alter table task add task_ID_when_inactive as case when status='Active' then null else task_ID end;create unique index one_active_task  on task(agent_ID_when_active, task_ID_when_inactive)

In this particular case, when both agent_ID and task_ID are positive integers, a more efficient solution is using only one computed column and a smaller index on only that column:

alter table task add agent_or_minus_task_ID as case when status='Active' then agent_ID else (0-task_ID) endcreate unique index one_active_task  on task(agent_or_minus_task_ID)

Whenever you use approaches like this, consider a traditional alternative: implementing the business rule in either a stored procedure or a trigger. However, implementing such a business rule via an index on a computed column has one very clear advantage: there is no way around it. One can create another stored procedure with different logic, but the rule implemented by an index cannot be bypassed, as long as the index exists.

Note: in some cases, you might want to include a computed column in a primary key or in a unique constraint, which is beyond the scope of this article.

Computed Columns in Covering Indexes

If you need to make the following query run as fast as possible:

select ? in this case, all the sales in the table occurred after 1990,? so we won't get negative days-of-weekdatediff(day,convert(datetime,'19860202',112), sale_date)%7 as sale_day_of_week,datediff(hour,convert(datetime,'19860202',112), sale_date)%24 as sale_hour, sum(sale_amount)from salesgroup by datediff(day,convert(datetime,'19860202',112), sale_date)%7,datediff(hour,convert(datetime,'19860202',112), sale_date)%24

You can create computed columns and an index that will cover the query as follows:

alter table sales add sale_day_of_week as   datediff(day,convert(datetime,'19860202',112), sale_date)%7;alter table sales add sale_hour_of_day as   datediff(hour,convert(datetime,'19860202',112), sale_date)%24;create index sale_by_day_hour on sales(sale_day_of_week, sale_hour_of_day, sale_amount);

That done, rewrite the query using the column names sale_day_of_week and sale_hour_of_day. The query will run much faster. However, do not forget that you have a somewhat slower but way simpler alternative. In many cases, an index on (sale_date, sale_amount) would be good enough but not quite as good as the index sale_by_day_hour. In most circumstances, you would want to keep things simple and go for a regular index on (sale_date, sale_amount).

In some cases, creating an indexed view and materializing the query also might make sense. The query against the then indexed view might run much faster, but the indexed view could introduce severe lock contention (refer to Books Online for more detail).

Use in Moderation

You've seen the benefits of using an index on a computed column. But use the technique in moderation?only after thoroughly considering alternatives, which can be simpler.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Related Posts