Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Build a Generic Histogram Generator for SQL Server : Page 2

Histograms help people analyze large amounts of data, whether you display them as tables or as charts. This article shows you how to do both.


advertisement

The Histogram API
Generating a histogram in T-SQL requires just one call to a stored procedure, the database equivalent of a subroutine.

The signature for the histogram stored procedure shows all its parameters. Though there are quite a few, not all parameters are used at one time—and many are optional.

   Histogram @SchemaName, @TableName,
       @BucketExpr, @BucketFilter, @IdExpr, @IdFilter, @IdAlias,
       @DateTemplate, @Mode, @LowerBoundDate, @UpperBoundDate, @Buckets,
       @LowerBoundInt, @UpperBoundInt, @Smooth, @Verbose, @Process
   

Here's an overview of the parameters:

  • Specify a table (@TableName) and a field to tabulate (@BucketExpr) for any histogram.
  • To generate a cross tabulation (or detail histogram) rather than a summary histogram, supply a field to enumerate values (@IdExpr) along with an optional alias (@IdAlias) if you provide an expression rather than just a field name for @IdExpr.
  • Focus your data (@IdFilter) by specifying what goes into each bucket (@BucketFilter) or which values to enumerate from your @IdExpr.
  • The bounds of your histogram will be determined automatically from the data unless you choose to explicitly define them. Use @LowerBoundInt and @UpperBoundInt for a numerical histogram or @LowerBoundDate and @UpperBoundDate for a temporal one.
  • Set the number of buckets explicitly for a numerical histogram (@BucketSize) or implicitly for a temporal histogram (@DateTemplate).
  • The @DateTemplate parameter actually specifies two things: how to format the column headers (the bucket labels) in the result set, and (for a temporal histogram) what size buckets to use. You may specify years (yyyy), months (mm), or days (dd) in any arrangement, separated by virgules (/) or hyphens (-); the smallest unit present determines the bucket size.


From the API, here is a more complete, formal list of parameters:

      @SchemaName     ::= schema or owner name
      @TableName      ::= table name
      @BucketExpr     ::= field name or expression
      @BucketFilter   ::= filter predicate for @BucketExpr
      @IdExpr         ::= field name or expression
      @IdFilter       ::= filter predicate for @IdExpr
      @IdAlias        ::= column name alias for @IdExpr
      @Aggregate      ::= aggregate expression (count, min, max)
      @DateTemplate   ::= template calling out year, month, and/or day specifiers
      @Mode           ::= histogram type: 'DateRange', 'DateMonths', or 'NumberRange'
      @LowerBoundDate ::= smallest date to include for date histograms
      @UpperBoundDate ::= largest date to include for date histograms
      @Buckets        ::= number of columns for NumberRange mode
      @LowerBoundInt  ::= smallest value to include for numerical histograms
      @UpperBoundInt  ::= largest value to include for numerical histograms
      @Smooth         ::= for NumberRange mode, adjust bounds to palatable values
      @Verbose        ::= "0"-"3" for no diagnostic output or progressively more
      @Process        ::= "1" to execute; "0" to only show generated statements
Author's Note: For a full description of all the parameters see the official Histogram API documentation.


The easiest way to understand how to use this stored procedure is by working through the examples sprinkled throughout this article. Most of the examples use the Purchasing.PurchaseOrderHeader table from the AdventureWorks database.

This first example addresses the question: How many orders per month were placed between 6/5/2002 and 8/15/2003? To answer this question, you want to count orders based on the OrderDate field of the table and obtain a result that shows a summary (one-row), temporal (date-buckets) histogram. If you want to cover a specific date range, supply the upper and lower bounds; otherwise, the stored procedure will automatically use the lowest and highest dates that exist in the table for the boundaries (see Table 3). Note that you could also omit the @Mode parameter in this instance, because the procedure can automatically determine the mode from the data type of the OrderDate field and from the presence of the upper and lower bounds given as dates. In this example, the buckets are month-sized. The smallest unit present in the @DateTemplate is mm; the months from June 2002 through March 2003 are represented.

   exec histogram
       @SchemaName='Purchasing',
       @TableName='PurchaseOrderHeader',
       @BucketExpr='OrderDate',
       @LowerBoundDate='6/5/2002',
       @UpperBoundDate='8/15/2003',
       @Mode='DateRange',
       @DateTemplate='yyyy-mm'
Table 3. Temporal Histogram: Here's a temporal histogram where each bucket represents a full calendar month.
2002-06 2002-07 2002-08 2002-09 2002-10 2002-11 2002-12 2003-01 2003-02 2003-03
4 44 8 28 20 24 0 0 0 16


The Underlying Query Exposed
As mentioned earlier, the Histogram stored procedure uses dynamic SQL to do its magic. That is, from the parameters you supply in the procedure call, it dynamically generates a full query and then executes it, returning the result. You can see the actual query sent to your database by setting the @Verbose parameter to 2 or higher. The @Verbose parameter specifies a diagnostic output level, where 0 (the default) is defined as none. If you set @Verbose to 1, you get some derived values of interest; to 2, you also get the completed query; and to 3, you also get any subqueries (if used) that determine implicit upper and lower bounds.

For the preceding example, here's the query generated from the procedure call:

   SELECT
   count(CASE WHEN OrderDate >= '06/01/2002' AND 
      OrderDate < '07/01/2002' THEN 1 END)   
   as '2002-06',
   count(CASE WHEN OrderDate >= '07/01/2002' AND 
      OrderDate < '08/01/2002' THEN 1 END) 
   as '2002-07',
   count(CASE WHEN OrderDate >= '08/01/2002' AND 
      OrderDate < '09/01/2002' THEN 1 END) 
   as '2002-08',
   count(CASE WHEN OrderDate >= '09/01/2002' AND 
      OrderDate < '10/01/2002' THEN 1 END) 
   as '2002-09',
   count(CASE WHEN OrderDate >= '10/01/2002' AND 
      OrderDate < '11/01/2002' THEN 1 END) 

   /* This query continues in a similar fashion for each month */
   ... 
   FROM [Purchasing].[PurchaseOrderHeader] (nolock)
 
Figure 1. Date Template Variations: Frames (1) and (2) show the same data in month-sized buckets, differing only in column headings. Frame (3) shows year-size buckets. Frame (4), (5), and (6) show day-sized buckets.

As you can see, the query code is rather repetitive, boring, and simple—all qualities that make it a good candidate for programmatic generation. The power and advantage of the stored procedure lies not in any one example, but rather in the flexibility to generate a wide variety of different queries by merely adjusting parameters.

Figure 1 shows the variety of different histograms that you can generate just by changing the @DateTemplate. Note that (1) and (2) have the same data and differ only in column headings, the first showing the year and the month, the second showing just the month. Recall that the smallest unit in the @DateTemplate determines the bucket size. Both yyyy-mm and mm have mm as the smallest unit, hence the buckets are both month-sized. The @DateTemplate, however, also formats the column names. Similarly, (4), (5), and (6) all show the same data, because their @DateTemplate parameters contain the day selector (dd); they all resolve down to a single day. (I changed the date range to keep the number of columns small.) The year and month selectors for (4), (5), and (6) are optional, and serve only to provide context if you need it.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap