Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Build a Generic Histogram Generator for SQL Server : Page 4

Histograms help people analyze large amounts of data, whether you display them as tables or as charts. This article shows you how to do both.


The Summary vs. Detail Histogram
The single-row result set typically provides a summary histogram—an analysis of an entire table or entire criterion. A detail histogram, on the other hand, relates two variables—one enumerated in columns, the other enumerated in rows. Each row provides a separate mini-histogram for a single enumerated value of the second variable. This type of histogram is also called a cross tabulation (or crosstab for short).

You may generate a detail histogram for any of the types of histograms specifiable with @Mode. To extend the previous example you need add only one more parameter—@IdExpr—to convert the summary histogram into a detail histogram.

   exec histogram

This query addresses the question: Between March and August in the 4-year span 2001 to 2004, how many orders per month did each employee enter? Table 6 shows the result set.

Table 6. Detail Histogram: A detail histogram introduces a row for each value of the EmployeeID field.
EmployeeID March April May June July August
164 33 31 36 37 39 42
198 16 11 16 19 18 17
223 36 33 41 41 43 46
231 30 31 39 35 40 41
233 30 30 38 37 39 41
238 29 30 39 36 39 41
241 33 30 38 35 40 41
244 32 30 39 36 40 40
261 31 31 36 36 40 40
264 33 29 37 36 41 39
266 36 33 42 40 44 45
274 14 14 16 17 15 19

If you compare Table 6 to Table 4, you can see that each column in Table 6 adds up to the single value in the same column in Table 4. So Table 6 shows a breakdown by employee of the histogram in Table 4. In other words, each row of Table 6 is a summary histogram focused on a single employee.

Filtering in Two Dimensions

Figure 2. Filtering Variations: Both column (@BucketFilter) and row (@IdFilter) filters may be applied to a detail histogram. The unfiltered query (1) sheds columns (2) when you filter on a particular status value; sheds rows (3) when filtered by Employee IDs, and sheds both rows and columns (4) when both filters are applied.

Figure 2 provides a visualization of how filtering works. All the result sets shown in the figure come from this base query:

   exec histogram

In the figure, @BucketFilter and @IdFilter parameters are added to the base query individually or in combination to change the output from the unfiltered result in frame one to the filtered results in frames two through four. An @IdFilter operates on the domain specified by the @IdExpr. Since the purpose of the @IdExpr is to enumerate values into rows, applying the @IdFilter typically reduces the number of rows (see frame three in Figure 2).

Applying a @BucketFilter, on the other hand, may or may not reduce the number of columns. Theoretically speaking, it serves to constrict the "flow" of values to any or all buckets. Taking constriction to the extreme it may actually eliminate an entire column, as happens in frame two. There are no values in the table for 2001 or 2002 with a status of 2, so those columns do not appear.

You might argue that, yes, filtering by rows should reduce the number of rows, but the column count should be inviolate, independent of column filtering. Why? When counting things, zero is still valid! So why are columns eliminated in Figure 2 and what could you do about it? Run the procedure for Figure 2, frame 2, with verbose set to 3, then review the diagnostic and query output (see Figure 3).

Figure 3. Column Constriction vs. Column Elimination: The @BucketFilter clause applied to the boundary determinants causes columns to be eliminated. Applied to the selection query, on the other hand, @BucketFilter causes columns to be constricted.

   exec histogram
       @BucketFilter='Status = 2',
Author's Note: For clarity, Figure 3 omits irrelevant lines from the output, shows the remaining output reformatted , and highlights the application of the @BucketFilter parameter.

As Figure 3 clearly shows, the @BucketFilter is applied in two areas, first to the boundary determinants—which may eliminate columns—and then to the selection query, which only constricts columns. Now that you see how columns can get eliminated, you may also have guessed what you could do about it—avoid letting the histogram procedure determine the boundaries. If the procedure is not calculating boundary determinants, it will not be able to eliminate columns. In this case, all you need do is supply @LowerBoundDate and @UpperBoundDate parameters.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.