Finding the Top n Within a Group

Finding the Top n Within a Group

Question:
[Joe Celko’s Best Salesmen by Territory Puzzle]

Martin Tillinger posted this problem on the MSAccess forum of CompuServe. How do you get the top three salesmen in each territory, given a table of SalesReport that looks like this?

    CREATE TABLE SalesReport        (name CHAR(20) NOT NULL REFERENCES Salesmen(name),        territory INTEGER NOT NULL,        amount DECIMAL (8,2) NOT NULL);

Answer:
The following query gives the highest SalesReport the rank of 1, the next highest the rank of 2 and so forth. The query can handle ties within a rank. If the territory has less than (n) members, then it will return all the members.

    SELECT S0.territory, S0.name, S0.amount,            (SELECT COUNT(DISTINCT amount)                FROM SalesReport AS S1                WHERE (S1.amount >= S0.amount)                    AND (S1.territory = S0.territory)) AS rank        FROM SalesReport AS S0        WHERE rank <= 3; -- adjustable parameter
The problem was that Mr. Tillinger was working with Microsoft Access, which does not have a COUNT (DISTINCT x) function, so he tried to do the query without it:
    SELECT S0.territory, S0.name, S0.amount,            (SELECT COUNT(amount)                FROM SalesReport AS S1                WHERE (S1.amount >= S0.amount)                    AND (S1.territory = S0.territory)) AS rank        FROM SalesReport AS S0        WHERE rank <= 3; -- adjustable parameter
The problem is that ties get decided wrong. For example given:
    SalesReport    name  territory  amount     =======================    Wilson      1   $990.00    Smith       1   $950.00    Richards    1   $800.00    Quinn       1   $700.00    Parker      1   $345.00    Jones       1   $345.00    Hubbard     1   $345.00    Date        1   $200.00    Codd        1   $200.00    Blake       1   $100.00
The correct answer is:
    DISTINCT Result    name  territory  amount rank    ============================    Wilson      1   $990.00  1    Smith       1   $950.00  2    Richards    1   $800.00  3    Quinn       1   $700.00  4    Parker      1   $345.00  5    Jones       1   $345.00  5    Hubbard     1   $345.00  5    Date        1   $200.00  6    Codd        1   $200.00  6    Blake       1   $100.00  7
The un-DISTINCT answer is
    Un-DISTINCT Result    name  territory  amount rank    ============================    Wilson      1   $990.00  1    Smith       1   $950.00  2    Richards    1   $800.00  3    Quinn       1   $700.00  4    Parker      1   $345.00  6    Jones       1   $345.00  6    Hubbard     1   $345.00  6    Date        1   $200.00  8    Codd        1   $200.00  8    Blake       1   $100.00  9
The ties cause gaps and mess up the ability you had to use the rank column to select (n) items. There are also other problems in certain statistics due to the gaps.

Puzzle provided courtesy of:
Joe Celko
[email protected]

Share the Post:
Heading photo, Metadata.

What is Metadata?

What is metadata? Well, It’s an odd concept to wrap your head around. Metadata is essentially the secondary layer of data that tracks details about the “regular” data. The regular

XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes