ith the help of today’s Web integration and GUI tools, it’s now much easier to build reports without any programming. SQL provides GROUP BY and ORDER BY clauses, to collate and sort otherwise unordered data. But sometimes we want more flexibility. Consider the following report requests:
- List my top two clients, according to sales totals
- List the next three clients
- Rank all clients by sales volume, but list them alphabetically
- Rank all clients by sales volume on a per-month basis
- Show each client’s year-to-date and all-time cumulative totals
- A rolling three-month average, from one before to one after, and the same from the previous year
How can I rank my results in DB2 and retrieve only the data I want?
Use DB2’s OLAP functions to refine the data; Use FETCH FIRST to minimize the results.FETCH FIRST for Top-N
DB2’s method of performing a Top-N query is the FETCH FIRST clause. You can append these variations to a regular SELECT query:
- FETCH FIRST ROW ONLY
- FETCH FIRST 1 ROW ONLY
- FETCH FIRST integer ROWS ONLY
- FETCH FIRST 1 ROWS ONLY
- FETCH FIRST 5 ROW ONLY
For demonstration, I’ll create a table called MYSALES, which can be generated by running Listing 1.
Now, we can ask for a single record as follows:
Listing 2. Return a Single Row
We have retrieved one row, but there’s no way to know ahead of time which row it will be.(HINT: This does give us a handy way to remind ourselves what fields are in a table, with a row of sample data as a bonus!)
SELECT * FROM MYSALES FETCH FIRST ROW ONLY CLIENT MONTHEND SALEVOL DEVX 03/31/1998 100
To find our top two clients, use the code in Listing 3.
Listing 3: Show Top Two Clients
Ranking and Numbering
SELECT CLIENT, SUM(SALEVOL) AS TOTALVOL FROM MYSALES GROUP BY CLIENT ORDER BY SUM(SALEVOL) DESC FETCH FIRST 2 ROWS ONLY CLIENT TOTALVOL DEVX 5785 EGGHEAD 5341
DB2 supports ranking and numbering in much the same way that Oracle does (see “Performing Top-N Queries in Oracle“).
The available functions are:
- ROW_NUMBER(), which simply numbers the returned rows sequentially
- RANK(), which ranks the results, but, in the case of a tie, gives the same number to each and leaves a gap to compensate
- DENSE_RANK() operates the same as RANK() but doesn’t leave any gaps.
Listing 4. Ranking and Numbering Results
The results need not be returned in rank order. We may wish to show each client’s rank while listing them alphabetically
select client, MONTHEND, SALEVOL, ROW_NUMBER() over (order by SALEVOL desc) AS RN, RANK() over (order by SALEVOL desc) AS RANK, DENSE_RANK() over (order by SALEVOL desc) AS DENSE from mysales where MONTHEND=DATE(‘1997-11-30’) order by RN CLIENT MONTHEND SALEVOL RN RANK DENSE CYRIX 11/30/1997 120 1 1 1 BIG BLUE 11/30/1997 106 2 2 2 EGGHEAD 11/30/1997 106 3 2 2 DEVX 11/30/1997 80 4 4 3 FIGTREE 11/30/1997 62 5 5 4 ACME 11/30/1997 20 6 6 5
Listing 5. Return Ranks in Any Order
One application of ROW_NUMBER is to select a numbered range of rows from the middle of your results. For example, you could retrieve the next three clients following the top two as shown in Listing 6.
select CLIENT, SALEVOL, RANK() over (order by SALEVOL desc) AS RANK from mysales where MONTHEND=DATE(1997-11-30′) order by CLIENT CLIENT SALEVOL RANK ACME 20 6 BIG BLUE 106 2 CYRIX 120 1 DEVX 80 4 EGGHEAD 106 2 FIGTREE 62 5
Listing 6. Selecting Rows by Number
Partitioning?Ranking within Groups
WITH ALLSALES AS (SELECT CLIENT, SUM(SALEVOL) AS TOTALVOL, ROW_NUMBER() OVER (ORDER BY SUM(SALEVOL) DESC, CLIENT) AS RN FROM MYSALES GROUP BY CLIENT) SELECT CLIENT, TOTALVOL, RN FROM ALLSALES WHERE RN BETWEEN 3 AND 5 ORDER BY RN CLIENT TOTALVOL RN BIG BLUE 4781 3 FIGTREE 3986 4 ACME 3044 5
We have seen ROW_NUMBER and RANK operating over an ORDER clause with one or more columns. The ORDER BY must be specified for RANK() and DENSE_RANK(), but is optional for ROW_NUMBER(). That is, selecting
will number the rows in an arbitrary fashion.
The other clause that may be used with RANK, DENSE_RANK, and ROW_NUMBER is a partitioning clause. This allows you to perform the function over subsets of data.
The query in Listing 7 will obtain the ranking of clients for each month in the past year.
Listing 7. Ranking in partitions
As with the ORDER clause, it is possible to specify multiple PARTITION columns, separated by commas.A Wider RANGE of Functions
SELECT CLIENT, MONTHEND, SALEVOL, RANK() OVER (PARTITION BY MONTHEND ORDER BY SALEVOL DESC) AS MONTHRANK FROM MYSALES WHERE MONTHEND > CURRENT DATE – 1 YEAR ORDER BY MONTHEND
Now we come to a more powerful feature?the ability to perform column functions on a moving range of rows. You can use any column function, and it will be applied to the records:
- Grouped in the PARTITION BY clause, if present
- Limited to a subset of rows using a ROWS or RANGE clause, if specified, where the range operates in the ORDER given.
Listing 8. A Wider Range of Functions
You can use RANGE in place of ROWS, except that RANGE operates only on a single numeric column, and compares the values in the column rather than the number of rows. (See the NUMSAME calculation near the end of Listing 8 above).
SELECT CLIENT, MONTHEND, SALEVOL, — Total sales ever SUM(SALEVOL) OVER () AS TOTALSALES, –Year-to-date cumulative total SUM(SALEVOL) OVER( PARTITION BY CLIENT, YEAR(MONTHEND) ORDER BY MONTHEND ROWS UNBOUNDED PRECEDING) AS YTD, –All-time cumulative total across all –clients prior to this month SUM(SALEVOL) OVER( ORDER BY MONTHEND ROWS UNBOUNDED PRECEDING) AS ALLYTD, –Moving 3-month average AVG(SALEVOL) OVER( PARTITION BY CLIENT ORDER BY MONTHEND ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS MOVAVGDIF, –Difference to last year’s moving avg SALEVOL – AVG(SALEVOL) OVER( PARTITION BY CLIENT ORDER BY MONTHEND ROWS BETWEEN 13 PRECEDING AND 11 PRECEDING) AS MOVPREVDIF, –Greatest volume by any client for the –current month MAX(SALEVOL) OVER( PARTITION BY MONTHEND) AS MONTHMAX, –Percentage of total for month SMALLINT(SALEVOL * 100 / SUM(SALEVOL) OVER (PARTITION BY MONTHEND)) AS THISPCT, –Number of months where this client’s sales — are within +/- 5 units of this month COUNT(*) OVER( PARTITION BY CLIENT ORDER BY SALEVOL RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING) AS NUMSAME FROM MYSALES ORDER BY CLIENT, MONTHEND
That’s only the beginning of the possibilities. It’s important to remember that OLAP queries can place a heavy load on a database, and sometimes the whole data set must be processed even to return a small number of results. That said, you don’t have to transfer large amounts of data to other programs, or worse, across a network, to get the results.