Comparing Aggregate Results

Comparing Aggregate Results

[Joe Celko’s Promo Puzzle]

Let’s assume that you have just gotten a job as the sales manager for a department store. Your database has the following two tables. One is a calendar of the promotional events the store has had, and the other is a list of the sales which have been made during the promotions. You need to write a query which will tell me which clerk had the highest amount of sales for each promotion, so we can pay them a performance bonus.

    CREATE TABLE Promotions        (promo CHAR (25) NOT NULL PRIMARY KEY,        startdate DATE NOT NULL,        enddate DATE NOT NULL);    Promotions    promo                   startdate      enddate     =====================================================    ‘Feast of St. Fred’        ‘1995-02-01’  ‘1995-02-07’    ‘National Pickle Pageant’  ‘1995-11-01’  ‘1995-11-07’    ‘Christmas Week’           ‘1995-12-18’  ‘1995-12-25’
    CREATE TABLE Sales        (clerk CHAR (15) NOT NULL,        saledate DATE NOT NULL,        amount DECIMAL (8,2) NOT NULL);    Sales    clerk      saledate      amount     =================================    ‘Curly’    ‘1995-02-03’   250.99     ‘Curly’    ‘1995-02-03’   250.99     ‘Curly’    ‘1995-02-04’   100.00     ‘Curly’    ‘1995-02-05’   400.98     ‘Curly’    ‘1995-12-19’   400.98     ‘Curly’    ‘1995-12-20’     4.98     ‘Larry’    ‘1995-02-03’   257.50     ‘Larry’    ‘1995-02-04’   110.00     ‘Larry’    ‘1995-02-05’   300.98     ‘Larry’    ‘1995-11-01’   150.25     ‘Larry’    ‘1995-11-01’   325.00     ‘Larry’    ‘1995-11-02’   150.75     ‘Larry’    ‘1995-12-23’   257.50     ‘Larry’    ‘1995-12-24’    25.50     ‘Moe’      ‘1995-11-01’   325.00     ‘Moe’      ‘1995-11-01’   999.75     ‘Moe’      ‘1995-11-03’   150.00     ‘Moe’      ‘1995-12-18’   800.00     ‘Moe’      ‘1995-12-19’   100.00     ‘Moe’      ‘1995-12-20’   200.00     ‘Moe’      ‘1995-01-04’   100.00 

The trick in this query is that we need to find out what each employee sold during each promotion and finally pick the highest sum from those groups. The first part is a fairly easy join and GROUP BY statement.

The final step of finding the largest total sales in each grouping requires a fairly tricky HAVING clause. Let’s look at the answer first, then explain it.

    SELECT S1.clerk,, SUM(S1.amount) AS totsales        FROM Sales AS S1, Promotions AS P1        WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate        GROUP BY S1.clerk,        HAVING SUM(amount) >=            ALL (SELECT SUM(amount)                FROM Sales AS S2                WHERE S2.clerk <> S1.clerk                    AND S2.saledate                        BETWEEN (SELECT startdate                            FROM Promotions AS P2                            WHERE =                        AND (SELECT enddate                            FROM Promotions AS P3                            WHERE =                GROUP BY S2.clerk);
We want the total sales for the chosen clerk and promotion to be equal or greater than the other total sales of all the other clerks during that promotion. The predicate “S2.clerk <> S1.clerk” excludes the other clerks from the subquery total. The subquery expressions in the BETWEEN predicate make sure that we are using the right dates for the promotion.

The first thought when trying to improve this query is to replace the subquery expressions in the BETWEEN predicate with direct outer references, like this:

    SELECT S1.clerk,, SUM(S1.amount) AS totsales        FROM Sales AS S1, Promotions AS P1        WHERE S1.saledate BETWEEN P1.startdate AND P1.enddate         GROUP BY S1.clerk,        HAVING SUM(amount) >=            ALL (SELECT SUM(amount)                 FROM Sales AS S2                WHERE S2.clerk <> S1.clerk                    AND S2.saledate          — Error !!                        BETWEEN P1.startdate AND P1.enddate                GROUP BY S2.clerk);
But this will not work — and if you know why, then you really know your SQL. Cover the rest of this column and try to figure it out before you read further.

The “GROUP BY S1.clerk,” clause has created a grouped table whose rows contain only aggregate functions and two grouping columns. The original working table built in the FROM clause ceased to exist and was replaced by this grouped working table, so the startdate and enddate also ceased to exist at that point.

However, the subquery expressions work because they reference the outer table P1 while it is still available since the query works from the innermost subqueries outward, and not the grouped table. If we were looking for sales performance between two known, constant dates, then the second query would work when we replaced P1.startdate and P1.enddate with those constants.

To give you a better feel for the data, here are the total sales per clerk for each promotion:

    ‘Feast of St. Fred’    ‘Curly’     $1,002.96 <== Winner    'Larry'     $  668.48    'Moe'       $    0.00    'National Pickle Pageant'    'Curly'     $    0.00    'Larry'     $  626.00    'Moe'       $1,474.75 <== Winner    'Christmas Week'    'Curly'     $  405.96    'Larry'     $  283.00    'Moe'       $1,100.00 <== Winner

Puzzle provided courtesy of:
Joe Celko
[email protected]

Share the Post:
Heading photo, Metadata.

What is Metadata?

What is metadata? Well, It’s an odd concept to wrap your head around. Metadata is essentially the secondary layer of data that tracks details about the “regular” data. The regular

XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes