devxlogo

Comparing Aggregate Results

Comparing Aggregate Results

Question:
[Joe Celko’s Promo Puzzle]

Let’s assume that you have just gotten a job as the sales manager for a department store. Your database has the following two tables. One is a calendar of the promotional events the store has had, and the other is a list of the sales which have been made during the promotions. You need to write a query which will tell me which clerk had the highest amount of sales for each promotion, so we can pay them a performance bonus.

    CREATE TABLE Promotions        (promo CHAR (25) NOT NULL PRIMARY KEY,        startdate DATE NOT NULL,        enddate DATE NOT NULL);    Promotions    promo                   startdate      enddate     =====================================================    ‘Feast of St. Fred’        ‘1995-02-01’  ‘1995-02-07’    ‘National Pickle Pageant’  ‘1995-11-01’  ‘1995-11-07’    ‘Christmas Week’           ‘1995-12-18’  ‘1995-12-25’
and
    CREATE TABLE Sales        (clerk CHAR (15) NOT NULL,        saledate DATE NOT NULL,        amount DECIMAL (8,2) NOT NULL);    Sales    clerk      saledate      amount     =================================    ‘Curly’    ‘1995-02-03’   250.99     ‘Curly’    ‘1995-02-03’   250.99     ‘Curly’    ‘1995-02-04’   100.00     ‘Curly’    ‘1995-02-05’   400.98     ‘Curly’    ‘1995-12-19’   400.98     ‘Curly’    ‘1995-12-20’     4.98     ‘Larry’    ‘1995-02-03’   257.50     ‘Larry’    ‘1995-02-04’   110.00     ‘Larry’    ‘1995-02-05’   300.98     ‘Larry’    ‘1995-11-01’   150.25     ‘Larry’    ‘1995-11-01’   325.00     ‘Larry’    ‘1995-11-02’   150.75     ‘Larry’    ‘1995-12-23’   257.50     ‘Larry’    ‘1995-12-24’    25.50     ‘Moe’      ‘1995-11-01’   325.00     ‘Moe’      ‘1995-11-01’   999.75     ‘Moe’      ‘1995-11-03’   150.00     ‘Moe’      ‘1995-12-18’   800.00     ‘Moe’      ‘1995-12-19’   100.00     ‘Moe’      ‘1995-12-20’   200.00     ‘Moe’      ‘1995-01-04’   100.00 

Answer:
The trick in this query is that we need to find out what each employee sold during each promotion and finally pick the highest sum from those groups. The first part is a fairly easy join and GROUP BY statement.

The final step of finding the largest total sales in each grouping requires a fairly tricky HAVING clause. Let’s look at the answer first, then explain it.

    SELECT S1.clerk, P1.promo, SUM(S1.amount) AS totsales        FROM Sales AS S1, Promotions AS P1        WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate        GROUP BY S1.clerk, P1.promo        HAVING SUM(amount) >=            ALL (SELECT SUM(amount)                FROM Sales AS S2                WHERE S2.clerk  S1.clerk                    AND S2.saledate                        BETWEEN (SELECT startdate                            FROM Promotions AS P2                            WHERE P2.promo = P1.promo)                        AND (SELECT enddate                            FROM Promotions AS P3                            WHERE P3.promo = P1.promo)                GROUP BY S2.clerk);
We want the total sales for the chosen clerk and promotion to be equal or greater than the other total sales of all the other clerks during that promotion. The predicate “S2.clerk S1.clerk” excludes the other clerks from the subquery total. The subquery expressions in the BETWEEN predicate make sure that we are using the right dates for the promotion.

The first thought when trying to improve this query is to replace the subquery expressions in the BETWEEN predicate with direct outer references, like this:

    SELECT S1.clerk, P1.promo, SUM(S1.amount) AS totsales        FROM Sales AS S1, Promotions AS P1        WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate         GROUP BY S1.clerk, P1.promo        HAVING SUM(amount) >=            ALL (SELECT SUM(amount)                 FROM Sales AS S2                WHERE S2.clerk  S1.clerk                    AND S2.saledate          — Error !!                        BETWEEN P1.startdate AND P1.enddate                GROUP BY S2.clerk);
But this will not work — and if you know why, then you really know your SQL. Cover the rest of this column and try to figure it out before you read further.

The “GROUP BY S1.clerk, P1.promo” clause has created a grouped table whose rows contain only aggregate functions and two grouping columns. The original working table built in the FROM clause ceased to exist and was replaced by this grouped working table, so the startdate and enddate also ceased to exist at that point.

However, the subquery expressions work because they reference the outer table P1 while it is still available since the query works from the innermost subqueries outward, and not the grouped table. If we were looking for sales performance between two known, constant dates, then the second query would work when we replaced P1.startdate and P1.enddate with those constants.

To give you a better feel for the data, here are the total sales per clerk for each promotion:

    ‘Feast of St. Fred’    ‘Curly’     $1,002.96 

Puzzle provided courtesy of:
Joe Celko
[email protected]

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist