Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


Tip of the Day
Language: SQL Server
Expertise: Beginner
Mar 25, 1997

Comparing Aggregate Results

Question:
[Joe Celko's Promo Puzzle]

Let's assume that you have just gotten a job as the sales manager for a department store. Your database has the following two tables. One is a calendar of the promotional events the store has had, and the other is a list of the sales which have been made during the promotions. You need to write a query which will tell me which clerk had the highest amount of sales for each promotion, so we can pay them a performance bonus.

    CREATE TABLE Promotions
        (promo CHAR (25) NOT NULL PRIMARY KEY,
        startdate DATE NOT NULL,
        enddate DATE NOT NULL);

    Promotions
    promo                   startdate      enddate 
    =====================================================
    'Feast of St. Fred'        '1995-02-01'  '1995-02-07'
    'National Pickle Pageant'  '1995-11-01'  '1995-11-07'
    'Christmas Week'           '1995-12-18'  '1995-12-25'
and
    CREATE TABLE Sales
        (clerk CHAR (15) NOT NULL,
        saledate DATE NOT NULL,
        amount DECIMAL (8,2) NOT NULL);

    Sales
    clerk      saledate      amount 
    =================================
    'Curly'    '1995-02-03'   250.99 
    'Curly'    '1995-02-03'   250.99 
    'Curly'    '1995-02-04'   100.00 
    'Curly'    '1995-02-05'   400.98 
    'Curly'    '1995-12-19'   400.98 
    'Curly'    '1995-12-20'     4.98 
    'Larry'    '1995-02-03'   257.50 
    'Larry'    '1995-02-04'   110.00 
    'Larry'    '1995-02-05'   300.98 
    'Larry'    '1995-11-01'   150.25 
    'Larry'    '1995-11-01'   325.00 
    'Larry'    '1995-11-02'   150.75 
    'Larry'    '1995-12-23'   257.50 
    'Larry'    '1995-12-24'    25.50 
    'Moe'      '1995-11-01'   325.00 
    'Moe'      '1995-11-01'   999.75 
    'Moe'      '1995-11-03'   150.00 
    'Moe'      '1995-12-18'   800.00 
    'Moe'      '1995-12-19'   100.00 
    'Moe'      '1995-12-20'   200.00 
    'Moe'      '1995-01-04'   100.00 

Answer:
The trick in this query is that we need to find out what each employee sold during each promotion and finally pick the highest sum from those groups. The first part is a fairly easy join and GROUP BY statement.

The final step of finding the largest total sales in each grouping requires a fairly tricky HAVING clause. Let's look at the answer first, then explain it.

    SELECT S1.clerk, P1.promo, SUM(S1.amount) AS totsales
        FROM Sales AS S1, Promotions AS P1
        WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate
        GROUP BY S1.clerk, P1.promo
        HAVING SUM(amount) >=
            ALL (SELECT SUM(amount)
                FROM Sales AS S2
                WHERE S2.clerk <> S1.clerk
                    AND S2.saledate
                        BETWEEN (SELECT startdate
                            FROM Promotions AS P2
                            WHERE P2.promo = P1.promo)
                        AND (SELECT enddate
                            FROM Promotions AS P3
                            WHERE P3.promo = P1.promo)
                GROUP BY S2.clerk);
We want the total sales for the chosen clerk and promotion to be equal or greater than the other total sales of all the other clerks during that promotion. The predicate "S2.clerk <> S1.clerk" excludes the other clerks from the subquery total. The subquery expressions in the BETWEEN predicate make sure that we are using the right dates for the promotion.

The first thought when trying to improve this query is to replace the subquery expressions in the BETWEEN predicate with direct outer references, like this:

    SELECT S1.clerk, P1.promo, SUM(S1.amount) AS totsales
        FROM Sales AS S1, Promotions AS P1
        WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate 
        GROUP BY S1.clerk, P1.promo
        HAVING SUM(amount) >=
            ALL (SELECT SUM(amount) 
                FROM Sales AS S2
                WHERE S2.clerk <> S1.clerk
                    AND S2.saledate          -- Error !!
                        BETWEEN P1.startdate AND P1.enddate
                GROUP BY S2.clerk);
But this will not work -- and if you know why, then you really know your SQL. Cover the rest of this column and try to figure it out before you read further.

The "GROUP BY S1.clerk, P1.promo" clause has created a grouped table whose rows contain only aggregate functions and two grouping columns. The original working table built in the FROM clause ceased to exist and was replaced by this grouped working table, so the startdate and enddate also ceased to exist at that point.

However, the subquery expressions work because they reference the outer table P1 while it is still available since the query works from the innermost subqueries outward, and not the grouped table. If we were looking for sales performance between two known, constant dates, then the second query would work when we replaced P1.startdate and P1.enddate with those constants.

To give you a better feel for the data, here are the total sales per clerk for each promotion:

    'Feast of St. Fred'
    'Curly'     $1,002.96 <== Winner
    'Larry'     $  668.48
    'Moe'       $    0.00

    'National Pickle Pageant'
    'Curly'     $    0.00
    'Larry'     $  626.00
    'Moe'       $1,474.75 <== Winner

    'Christmas Week'
    'Curly'     $  405.96
    'Larry'     $  283.00
    'Moe'       $1,100.00 <== Winner

Puzzle provided courtesy of:
Joe Celko
71062.1056@compuserve.com

DevX Pro
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date