TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
 Specialized Dev Zones Research Center eBook Library .NET Java C++ Web Dev Architecture Database Security Open Source Enterprise Mobile Special Reports 10-Minute Solutions DevXtra Blogs Slideshow

 Home » Tip Bank » Database Development » SQL » Transact-sql
Language: SQL Server
Expertise: Beginner
Mar 25, 1997

### WEBINAR:On-Demand

Building the Right Environment to Support AI, Machine Learning and Deep Learning

# Comparing Aggregate Results

Question:
[Joe Celko's Promo Puzzle]

Let's assume that you have just gotten a job as the sales manager for a department store. Your database has the following two tables. One is a calendar of the promotional events the store has had, and the other is a list of the sales which have been made during the promotions. You need to write a query which will tell me which clerk had the highest amount of sales for each promotion, so we can pay them a performance bonus.

```    CREATE TABLE Promotions
(promo CHAR (25) NOT NULL PRIMARY KEY,
startdate DATE NOT NULL,
enddate DATE NOT NULL);

Promotions
promo                   startdate      enddate
=====================================================
'Feast of St. Fred'        '1995-02-01'  '1995-02-07'
'National Pickle Pageant'  '1995-11-01'  '1995-11-07'
'Christmas Week'           '1995-12-18'  '1995-12-25'
```
and
```    CREATE TABLE Sales
(clerk CHAR (15) NOT NULL,
saledate DATE NOT NULL,
amount DECIMAL (8,2) NOT NULL);

Sales
clerk      saledate      amount
=================================
'Curly'    '1995-02-03'   250.99
'Curly'    '1995-02-03'   250.99
'Curly'    '1995-02-04'   100.00
'Curly'    '1995-02-05'   400.98
'Curly'    '1995-12-19'   400.98
'Curly'    '1995-12-20'     4.98
'Larry'    '1995-02-03'   257.50
'Larry'    '1995-02-04'   110.00
'Larry'    '1995-02-05'   300.98
'Larry'    '1995-11-01'   150.25
'Larry'    '1995-11-01'   325.00
'Larry'    '1995-11-02'   150.75
'Larry'    '1995-12-23'   257.50
'Larry'    '1995-12-24'    25.50
'Moe'      '1995-11-01'   325.00
'Moe'      '1995-11-01'   999.75
'Moe'      '1995-11-03'   150.00
'Moe'      '1995-12-18'   800.00
'Moe'      '1995-12-19'   100.00
'Moe'      '1995-12-20'   200.00
'Moe'      '1995-01-04'   100.00
```

The trick in this query is that we need to find out what each employee sold during each promotion and finally pick the highest sum from those groups. The first part is a fairly easy join and GROUP BY statement.

The final step of finding the largest total sales in each grouping requires a fairly tricky HAVING clause. Let's look at the answer first, then explain it.

```    SELECT S1.clerk, P1.promo, SUM(S1.amount) AS totsales
FROM Sales AS S1, Promotions AS P1
WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate
GROUP BY S1.clerk, P1.promo
HAVING SUM(amount) >=
ALL (SELECT SUM(amount)
FROM Sales AS S2
WHERE S2.clerk <> S1.clerk
AND S2.saledate
BETWEEN (SELECT startdate
FROM Promotions AS P2
WHERE P2.promo = P1.promo)
AND (SELECT enddate
FROM Promotions AS P3
WHERE P3.promo = P1.promo)
GROUP BY S2.clerk);
```
We want the total sales for the chosen clerk and promotion to be equal or greater than the other total sales of all the other clerks during that promotion. The predicate "S2.clerk <> S1.clerk" excludes the other clerks from the subquery total. The subquery expressions in the BETWEEN predicate make sure that we are using the right dates for the promotion.

The first thought when trying to improve this query is to replace the subquery expressions in the BETWEEN predicate with direct outer references, like this:

```    SELECT S1.clerk, P1.promo, SUM(S1.amount) AS totsales
FROM Sales AS S1, Promotions AS P1
WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate
GROUP BY S1.clerk, P1.promo
HAVING SUM(amount) >=
ALL (SELECT SUM(amount)
FROM Sales AS S2
WHERE S2.clerk <> S1.clerk
AND S2.saledate          -- Error !!
BETWEEN P1.startdate AND P1.enddate
GROUP BY S2.clerk);
```
But this will not work -- and if you know why, then you really know your SQL. Cover the rest of this column and try to figure it out before you read further.

The "GROUP BY S1.clerk, P1.promo" clause has created a grouped table whose rows contain only aggregate functions and two grouping columns. The original working table built in the FROM clause ceased to exist and was replaced by this grouped working table, so the startdate and enddate also ceased to exist at that point.

However, the subquery expressions work because they reference the outer table P1 while it is still available since the query works from the innermost subqueries outward, and not the grouped table. If we were looking for sales performance between two known, constant dates, then the second query would work when we replaced P1.startdate and P1.enddate with those constants.

To give you a better feel for the data, here are the total sales per clerk for each promotion:

```    'Feast of St. Fred'
'Curly'     \$1,002.96 <== Winner
'Larry'     \$  668.48
'Moe'       \$    0.00

'National Pickle Pageant'
'Curly'     \$    0.00
'Larry'     \$  626.00
'Moe'       \$1,474.75 <== Winner

'Christmas Week'
'Curly'     \$  405.96
'Larry'     \$  283.00
'Moe'       \$1,100.00 <== Winner
```

Puzzle provided courtesy of:
Joe Celko
71062.1056@compuserve.com

DevX Pro

 Submit a Tip Browse "Database Development" Tips Browse All Tips
Comment and Contribute

(Maximum characters: 1200). You have 1200 characters left.

Thanks for your registration, follow us on our social networks to keep up-to-date