any major product releases have a dizzyingly long list of new features and enhancements?often introduced atop a product that is in many cases so complex and feature-ridden that it’s already difficult to keep up with the breadth of the product. However, the wide feature sets also mean that most developers find small nuggets of functionality that often have a significant impact on their jobs, if only to make life a little simpler. It is in these nuggets that I think many developers first come to appreciate new product releases.
For me, two particularly interesting new features of Microsoft’s upcoming release of SQL Server 2005 are its new PIVOT and UNPIVOT commands. From a manager’s perspective, pivoting provides a considerably simpler view of data, truly transforming it into information. From a developer’s perspective, the features let you rotate row data into columns in the query response to create pivoted or “crosstabbed” views of query results.
Before going on, it’s worth taking a look at two options for viewing historical information; this quick overview will provide a good look at how useful pivoting can be. After that, you’ll see the basics of pivoting data using SQL Server 2005, as well as how to unpivot data that’s already crosstabbed.
|Author’s Note: The information provided in this article is based on the Beta 2 version of SQL Server 2005. I don’t anticipate major changes to the syntax of the pivoting commands described in this article, but be aware that beta products often change before final release. Further, the sample data I chose to use for this article is taken from the AdventureWorks sample database included with SQL Server 2005. To follow along with the article, you should install the AdventureWorks sample database during the SQL Server 2005 installation process.|
|Figure 1: Sample Response: This response from a typical historical sales total query displays the sales total for an individual year on each row.|
Traditional “Flat” Query
The first, and simplest, option for returning historical data is to query the data, grouping the fact information in a way that answers the questions being asked. The query shown below retrieves data from sales tables, and provides total sales figures by product by year. Figure 1 shows a portion of the response from this type of query. Note that it’s difficult to see trends of sales between years. Using a traditional flat query, it would be even more difficult to answer questions about how sales of one product compared to sales of other products.
SELECT ProductID = det.productID, SALES_YEAR = YEAR(hdr.orderdate), TOTAL_SALES = SUM(det.linetotal) FROM sales.salesorderdetail det JOIN sales.salesorderheader hdr ON det.salesorderid = hdr.salesorderid GROUP BY det.productID, YEAR(hdr.orderdate)
ORDER BY det.productID, YEAR(hdr.orderdate)
? Figure 2: Historical Sales Data: The figure shows how the response to a query using a pivoted view of the data can be easier to read.
Note that you can visualize the same data far more easily if you pivot the SALES_YEAR column to create columns of TOTAL_SALES for each year. The resulting table is generally referred to as a crosstab. Figure 2 shows the same sales information pivoted to display the data in a crosstab format.
A Little Background: Common Table Expressions
I tend to obsess about the format of SQL expressions. Without broadly adopted norms for indentation, capitalization, etc., complex SQL expressions tend to be difficult to read. Making code pretty has little to do with the addition of Common Table Expressions (CTEs) to SQL Server 2005’s feature list, except that, like careful formatting, CTEs can help make difficult-to-read queries less confusing.
Simply stated, CTEs possess the behaviors of views, letting you write “views” that exist only for the duration of the query. But they also offer many other capabilities, including the ability to recursively query tables. Most of their functionality is beyond the scope of this article. However, I’ve used them in this article to make my SQL expressions more readable and allow you to focus more specifically on the functionality of the PIVOT expression.
As I stated, one common use of CTEs is to simplify SQL expressions. For example, the code below queries for products that have had at least one million dollars in product sales during 2004. The CTE, named ProductSales, returns a list of product IDs, along with their sales volume by year. The SQL query shown below the CTE uses the ProductSales CTE to query for products whose total sales for 2004 exceeded one million dollars. The expression also joins to the Product table to add the name of the product.
--CTE used to simplify SQL expression WITH ProductSales(ProductID, OrderYear, TotalSales) AS ( SELECT det.productID, YEAR(hdr.orderdate), SUM(det.linetotal) FROM sales.salesorderdetail det JOIN sales.salesorderheader hdr ON det.salesorderid = hdr.salesorderid GROUP BY det.productID, YEAR(hdr.orderdate) ) -- get list of products who had more than $1M -- in sales in 2004 SELECT sal.ProductID, prd.Name, sal.TotalSales FROM ProductSales sal JOIN Production.Product prd ON sal.ProductID = prd.ProductID WHERE TotalSales > 1000000 and OrderYear = 2004
If this example doesn’t help to highlight the simplifications that you can achieve with CTEs, read on?the PIVOT functionality should serve to help make this clear.
Using PIVOT Commands
As stated previously, PIVOT expressions in SQL Server 2005 provide developers with the ability to rotate row data into columns. Because the use of sales data is commonplace during historical analysis, I’ll use this theme as the basis of my samples.
To use the PIVOT feature, first decide which column contains the important “values” for the query. In this example, the important piece of information is sales dollar amounts. Second, because the example analyzes sales over a period of time, it pivots the sales year field data into columns in the response. You also need to select which field values in the pivot column should be pivoted to form columns.
|Figure 3. ProductSales CTE Results: The figure shows the results of running the ProductSales CTE.|
This example uses the AdventureWorks sample database tables named Sales.SalesOrderDetail and Sales.SalesOrderHeader. Joining these two tables using a CTE gives the results displayed in Figure 3. Note that there is one row per Sales.SalesOrderDetail row. Here’s the CTE:
WITH ProductSales(ProductID, OrderYear, OrderTotal) AS ( SELECT det.productID, YEAR(hdr.orderdate), det.linetotal FROM sales.salesorderdetail det JOIN sales.salesorderheader hdr ON det.salesorderid = hdr.salesorderid )
Using the ProductSales CTE, we can construct a PIVOT SQL expression. SQL Server 2005 Books Online defines the syntax of a PIVOT expression as:
::= table_source PIVOT table_alias ::= ( aggregate_function ( vaule_column ) FOR pivot_column IN ( )
Drawing from the ProductSales CTE, the value column will be the OrderTotal column. You’ll be summing the OrderTotal values, so you need to use the SUM() aggregate function. The pivot column will be OrderYear, and you want to create a list of OrderYear columns for the years 2001, 2002, 2003, and 2004. In the select list for the SQL expression, rename the 2001, 2002, 2003, and 2004 values to Total_Sales_2001, Total_Sales_2002, and so on, making the column data more readable by consumers of the query. Here’s the complete SQL expression using PIVOT. You can see the result of the query in Figure 2.
SELECT ProductSalesPivot.productID, Total_Sales_2001 = ISNULL(, 0), Total_Sales_2002 = ISNULL(, 0), Total_Sales_2003 = ISNULL(, 0), Total_Sales_2004 = ISNULL(, 0) FROM ProductSales PIVOT ( SUM(OrderTotal) FOR OrderYear IN (, , , ) ) AS ProductSalesPivot ORDER BY ProductSalesPivot.ProductID
Notice in the preceding code that the SUM aggregate function is a required component of the PIVOT expression. In fact, the CTE could easily have been coded to SUM the linetotal values?when grouped by year, it would seem reasonable that the PIVOT expression could pivot using only the OrderTotal, without the need for the SUM function. However, the PIVOT expression requires an aggregate function.
Also note that the data specified in the IN column list must include all the values that will be pivoted into columns to be named explicitly. I would like to see a more elegant implementation where either all the values could be implicitly added by SQL Server, or where you could specify ranges or lists of values. However, at the moment, you must name each column value explicitly.
When building the PIVOT result set, SQL Server automatically adds a GROUP BY clause that groups by all the columns from the table_source that are not either the value_column or the pivot_column. Due to this grouping, the table_source should contain only columns that are of interest in the final query. You could achieve this using a View containing only the relevant columns, but using a CTE serves the same purpose.
Note that the complete query includes both the CTE and the PIVOT SQL query as shown below.
WITH ProductSales(ProductID, OrderYear, OrderTotal) AS ( SELECT det.productID, YEAR(hdr.orderdate), det.linetotal FROM sales.salesorderdetail det JOIN sales.salesorderheader hdr ON det.salesorderid = hdr.salesorderid ) SELECT ProductSalesPivot.productID, Total_Sales_2001 = ISNULL(, 0), Total_Sales_2002 = ISNULL(, 0), Total_Sales_2003 = ISNULL(, 0), Total_Sales_2004 = ISNULL(, 0) FROM ProductSales PIVOT ( SUM(OrderTotal) FOR OrderYear IN (, , , ) ) AS ProductSalesPivot ORDER BY ProductSalesPivot.ProductID
|Figure 4. Adding a Product Name: When you run the query after modifying it to add a product name, you’ll see these results.|
It is possible to JOIN the PIVOT results to other tables. In the example above, many consumers of the query would find it useful to include a product name in the response to make the query response more readable. When joining to a pivoted table, you must ensure that you join to the pivoted table’s table_alias, and not to the table_source for the PIVOT function.
The query below shows how you’d modify the preceding SQL expression to add a product name to the results. Figure 4 shows the results.
SELECT ProductID = ProductSalesPivot.productID, Product_Name = prd.Name, Total_Sales_2001 = ISNULL(, 0), Total_Sales_2002 = ISNULL(, 0), Total_Sales_2003 = ISNULL(, 0), Total_Sales_2004 = ISNULL(, 0) FROM ProductSales PIVOT ( SUM(OrderTotal) FOR OrderYear IN (, , , ) ) AS ProductSalesPivot JOIN Production.Product prd ON ProductSalesPivot.productID = prd.productID ORDER BY ProductSalesPivot.ProductID
Performing UNPIVOT Operations
For obvious reasons, you’re likely to use the UNPIVOT function much less often than the PIVOT function. However, in cases where you’re given a set of source data in the form of a crosstab, the UNPIVOT function can be a valuable tool. The UNPIVOT function is predictably similar to the PIVOT function. First, identify the value_column and pivot_column, as well as which columns should be unpivoted. Using the data in Figure 2 as an example, the new value_column name for the unpivoted table would be OrderTotal and the pivot_column name would be OrderYear. If you think of the PIVOT and UNPIVOT expressions as “forward” and “reverse” processes, respectively, it’s easy to see how the two work. The following code shows the resulting UNPIVOT SQL expression.
SELECT ProductID, OrderYear, OrderTotal FROM (SELECT ProductID,  = Total_Sales_2001,  = Total_Sales_2002,  = Total_Sales_2003,  = Total_Sales_2004 FROM Pivoted_Order_Total) OrderTotal UNPIVOT ( OrderTotal FOR OrderYear IN (, , , ) ) AS unpvt
|Author’s Note. The preceding UNPIVOT expression would be simpler to read if you abstracted the table_source from the expression using a CTE.|
While they are very small pieces in the overall SQL Server 2005 feature list, the PIVOT and UNPIVOT commands should greatly simplify many of the reporting processes that developers often work with. Among the many developer-focused enhancements added to SQL Server 2005 that get much of the new product’s press, be careful not to overlook the many smaller-impact enhancements that can make your life just a little simpler.