Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


Tip of the Day
Language: SQL Server
Expertise: Beginner
Mar 25, 1997

Tax Problem Puzzle

Question:
[Joe Celko's Tax Problem Puzzle]

I got this one from Richard S. Romley, via CompuServe. We were engaged in a debate about Sybase's old T-SQL versus ANSI/ISO SQL-92. I assume the reader can figure out which side I was on.

This is a simplified version of a problem relating to tax calculations. I will define a tax area as being made up of multiple tax authorities. For example, a tax area might be a city, and the tax authorities for that city might be the city, the county that the city is in, and the state that the county and city are in. When you pay tax on a purchase in the city, the tax rate you pay is made up of the city tax, the county tax, and the state tax. Each of these taxing authorities changes their tax rate independent of the others.

We have the following two tables:

    CREATE TABLE 
        (taxauthority CHAR(10) NOT NULL,
        taxarea CHAR(10) NOT NULL,
        PRIMARY KEY (taxauthority, taxarea));
This is a many to many relationship. Each tax area is made up of multiple tax authorities and each tax authority may appear in multiple tax areas.
    TaxAreaAuthority
    taxauthority taxarea
    ======================
    'city1'      'city1'
    'city2'      'city2'
    'city3'      'city3'
    'county1'    'city1'
    'county1'    'city2'
    'county2'    'city3'
    'state1'     'city1'
    'state1'     'city2'
    'state1'     'city3'
This means that city1 and city2 are in county1 of state1; city3 is in county2 of state 1 and so forth. The other table we need is the tax rates.
    CREATE TABLE TaxRates
        (taxauthority CHAR(10) NOT NULL,
        dteffective DATE NOT NULL,
        authtaxrate DECIMAL (8,2) NOT NULL,
        PRIMARY KEY (taxauthority, dteffective));
Populate this table as follows:
    TaxRates
    taxauthority dteffective authtaxrate
    ======================================
    'city1'      '1993-01-01'     1.0
    'city1'      '1994-01-01'     1.5
    'city2'      '1993-09-01'     1.5
    'city2'      '1994-01-01'     2.0
    'city2'      '1995-01-01'     2.5
    'city3'      '1993-01-01'     1.7
    'city3'      '1993-07-01'     1.9
    'county1'    '1993-01-01'     2.3
    'county1'    '1994-Oct01'     2.5
    'county1'    '1995-01-01'     2.7
    'county2'    '1993-01-01'     2.4
    'county2'    '1994-01-01'     2.7
    'county2'    '1995-01-01'     2.8
    'state1'     '1993-01-01'     0.5
    'state1'     '1994-01-01'     0.8
    'state1'     '1994-07-01'     0.9
    'state1'     '1994-10-01'     1.1
The problem: What is the total tax rate for City2 on November 1 1994? Richard's solution with a single SELECT, used no views or subqueries. In T-SQL, it looked like this:
    SELECT totaltaxrate = SUM(MAX(R1.AuthTaxRate)) 
        FROM TaxAreaAuthority A1, TaxRates R1, TaxRates R2
        WHERE A1.taxarea = 'city2'
            AND A1.taxauthority = R1.taxauthority
            AND A1.taxauthority = R2.taxauthority
            AND R1.dteffective <= '1994-11-01'
            AND R2.dteffective <= '1994-11-01'
        GROUP BY R1.taxauth, R1.dteffective
        HAVING R1.dteffective = MAX(R2.dteffective);
The answer for City2 on November 1, 1994 would be
    City2   taxrate = 2.0
    County1 taxrate = 2.5
    State1  taxrate = 1.1
    ---------------------
    TotalTaxRate    = 5.6
The nested aggregate functions will not work in any standard SQL. The SELECT list does not have the GROUP BY columns and the HAVING clause references columns not in the SELECT list, too. In short, this is not standard SQL.

Ignoring the requirement that you do not use subqueries, can you write a single query that will answer this question in SQL-92?

Answer:
The nested aggregate functions are much harder to understand than subqueries. Since aggregate functions apply to the groups, the only way SUM(MAX(R1.AuthTaxRate)) could be interpreted is that it is the sum of the largest value of R1.AuthTaxRate in each group, which is simply the largest value of R1.AuthTaxRate in each group, period. In order to read this as T-SQL intends, you have to force a hierarchy of data, instead of sets of data. In short, you are throwing away the relational model.

In English the question is: "For City2 on 1994 November 1, what was the total tax rate, based on the latest rates in effect at that time from each of the authorities which has control over City2?"

Let's solve this problem in pieces. First we want to find out who the taxing authorities for the city are, so we write a subquery

    (SELECT taxauthority 
        FROM TaxAreaAuthority AS A1
        WHERE A1.taxarea = 'city2')
which will give us the set ('city2', 'county1', 'state1').

Next we want to find what the tax rates were on 1994 November 1, so we write another subquery:

    (SELECT taxauthority, authtaxrate
        FROM TaxRates AS R1
        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)
            FROM TaxRates AS R2
            WHERE R2.dteffective <= '1994-11-01'))
Now, let's combine the two subqueries and put our constants in the SELECT list to make the final answer readable. Actually, I would make these constants into parameters, so the routine would generalize, but for now, let's stick to the original problem:
    SELECT 'city2' AS city, 
            '1994-11-01' AS effective_date, 
            R1.taxauthority, 
            authtaxrate
        FROM TaxRates AS R1
        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)
                FROM TaxRates AS R2
                WHERE R2.dteffective <= '1994-11-01'
                    AND R1.taxauthority = R2.taxauthority)
            AND taxauthority IN (SELECT taxauthority 
                FROM TaxAreaAuthority AS A1
                WHERE A1.taxarea = 'city2');
now let's do the totaling:
    SELECT 'city2' AS city, 
            '1994-11-01' AS effective_date, 
            SUM (authtaxrate) AS total_taxes
        FROM TaxRates AS R1
        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)
                FROM TaxRates AS R2
                WHERE R2.dteffective <= '1994-11-01'
                    AND R1.taxauthority = R2.taxauthority)
            AND taxauthority IN (SELECT taxauthority 
                FROM TaxAreaAuthority AS A1
                WHERE A1.taxarea = 'city2')
        GROUP BY city, effective_date;
But wait! We can do more consolidation and get rid of the AND predicate in favor of more nesting, thus:
    SELECT 'city2' AS city,
            '1994-11-01' AS effect_date,
            SUM (authtaxrate) AS total_taxes
        FROM TaxRates AS R1
        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)
            FROM TaxRates AS R2
            WHERE R2.dteffective <= '1994-11-01'
                AND R1.taxauthority = R2.taxauthority
                AND R2.taxauthority
                    IN (SELECT taxauthority
                        FROM TaxAreaAuthority AS A1
                        WHERE A1.taxarea = 'city2'))
        GROUP BY city, effect_date;
Since the innermost subquery is a non-correlated, constant list, performance should be pretty good. And sure enough, when I look at the execution plan in Watcom SQL where I tested this, I find that the R1 and R2 tables were sequentially scanned, but the A1 table used the primary key index. If I put indexes on the TaxRates table, then I can get a faster execution plan.

Puzzle provided courtesy of:
Joe Celko
71062.1056@compuserve.com

DevX Pro
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date