Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


Tip of the Day
Language: SQL Server
Expertise: Beginner
Mar 25, 1997

Finding the Top n Within a Group

Question:
[Joe Celko's Best Salesmen by Territory Puzzle]

Martin Tillinger posted this problem on the MSAccess forum of CompuServe. How do you get the top three salesmen in each territory, given a table of SalesReport that looks like this?

    CREATE TABLE SalesReport
        (name CHAR(20) NOT NULL REFERENCES Salesmen(name),
        territory INTEGER NOT NULL,
        amount DECIMAL (8,2) NOT NULL);

Answer:
The following query gives the highest SalesReport the rank of 1, the next highest the rank of 2 and so forth. The query can handle ties within a rank. If the territory has less than (n) members, then it will return all the members.

    SELECT S0.territory, S0.name, S0.amount,
            (SELECT COUNT(DISTINCT amount)
                FROM SalesReport AS S1
                WHERE (S1.amount >= S0.amount)
                    AND (S1.territory = S0.territory)) AS rank
        FROM SalesReport AS S0
        WHERE rank <= 3; -- adjustable parameter
The problem was that Mr. Tillinger was working with Microsoft Access, which does not have a COUNT (DISTINCT x) function, so he tried to do the query without it:
    SELECT S0.territory, S0.name, S0.amount,
            (SELECT COUNT(amount)
                FROM SalesReport AS S1
                WHERE (S1.amount >= S0.amount)
                    AND (S1.territory = S0.territory)) AS rank
        FROM SalesReport AS S0
        WHERE rank <= 3; -- adjustable parameter
The problem is that ties get decided wrong. For example given:
    SalesReport
    name  territory  amount 
    =======================
    Wilson      1   $990.00
    Smith       1   $950.00
    Richards    1   $800.00
    Quinn       1   $700.00
    Parker      1   $345.00
    Jones       1   $345.00
    Hubbard     1   $345.00
    Date        1   $200.00
    Codd        1   $200.00
    Blake       1   $100.00
The correct answer is:
    DISTINCT Result
    name  territory  amount rank
    ============================
    Wilson      1   $990.00  1
    Smith       1   $950.00  2
    Richards    1   $800.00  3
    Quinn       1   $700.00  4
    Parker      1   $345.00  5
    Jones       1   $345.00  5
    Hubbard     1   $345.00  5
    Date        1   $200.00  6
    Codd        1   $200.00  6
    Blake       1   $100.00  7
The un-DISTINCT answer is
    Un-DISTINCT Result
    name  territory  amount rank
    ============================
    Wilson      1   $990.00  1
    Smith       1   $950.00  2
    Richards    1   $800.00  3
    Quinn       1   $700.00  4
    Parker      1   $345.00  6
    Jones       1   $345.00  6
    Hubbard     1   $345.00  6
    Date        1   $200.00  8
    Codd        1   $200.00  8
    Blake       1   $100.00  9
The ties cause gaps and mess up the ability you had to use the rank column to select (n) items. There are also other problems in certain statistics due to the gaps.

Puzzle provided courtesy of:
Joe Celko
71062.1056@compuserve.com

DevX Pro
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date