Tax Problem Puzzle

Question:
[Joe Celko’s Tax Problem Puzzle]

I got this one from Richard S. Romley, via CompuServe. We were engaged in a debate about Sybase’s old T-SQL versus ANSI/ISO SQL-92. I assume the reader can figure out which side I was on.

This is a simplified version of a problem relating to tax calculations. I will define a tax area as being made up of multiple tax authorities. For example, a tax area might be a city, and the tax authorities for that city might be the city, the county that the city is in, and the state that the county and city are in. When you pay tax on a purchase in the city, the tax rate you pay is made up of the city tax, the county tax, and the state tax. Each of these taxing authorities changes their tax rate independent of the others.

We have the following two tables:

    CREATE TABLE         (taxauthority CHAR(10) NOT NULL,        taxarea CHAR(10) NOT NULL,        PRIMARY KEY (taxauthority, taxarea));
This is a many to many relationship. Each tax area is made up of multiple tax authorities and each tax authority may appear in multiple tax areas.
    TaxAreaAuthority    taxauthority taxarea    ======================    ‘city1’      ‘city1’    ‘city2’      ‘city2’    ‘city3’      ‘city3’    ‘county1’    ‘city1’    ‘county1’    ‘city2’    ‘county2’    ‘city3’    ‘state1’     ‘city1’    ‘state1’     ‘city2’    ‘state1’     ‘city3’
This means that city1 and city2 are in county1 of state1; city3 is in county2 of state 1 and so forth. The other table we need is the tax rates.
    CREATE TABLE TaxRates        (taxauthority CHAR(10) NOT NULL,        dteffective DATE NOT NULL,        authtaxrate DECIMAL (8,2) NOT NULL,        PRIMARY KEY (taxauthority, dteffective));
Populate this table as follows:
    TaxRates    taxauthority dteffective authtaxrate    ======================================    ‘city1’      ‘1993-01-01’     1.0    ‘city1’      ‘1994-01-01’     1.5    ‘city2’      ‘1993-09-01’     1.5    ‘city2’      ‘1994-01-01’     2.0    ‘city2’      ‘1995-01-01’     2.5    ‘city3’      ‘1993-01-01’     1.7    ‘city3’      ‘1993-07-01’     1.9    ‘county1’    ‘1993-01-01’     2.3    ‘county1’    ‘1994-Oct01’     2.5    ‘county1’    ‘1995-01-01’     2.7    ‘county2’    ‘1993-01-01’     2.4    ‘county2’    ‘1994-01-01’     2.7    ‘county2’    ‘1995-01-01’     2.8    ‘state1’     ‘1993-01-01’     0.5    ‘state1’     ‘1994-01-01’     0.8    ‘state1’     ‘1994-07-01’     0.9    ‘state1’     ‘1994-10-01’     1.1
The problem: What is the total tax rate for City2 on November 1 1994? Richard’s solution with a single SELECT, used no views or subqueries. In T-SQL, it looked like this:
    SELECT totaltaxrate = SUM(MAX(R1.AuthTaxRate))         FROM TaxAreaAuthority A1, TaxRates R1, TaxRates R2        WHERE A1.taxarea = ‘city2’            AND A1.taxauthority = R1.taxauthority            AND A1.taxauthority = R2.taxauthority            AND R1.dteffective <= '1994-11-01'            AND R2.dteffective <= '1994-11-01'        GROUP BY R1.taxauth, R1.dteffective        HAVING R1.dteffective = MAX(R2.dteffective);
The answer for City2 on November 1, 1994 would be
    City2   taxrate = 2.0    County1 taxrate = 2.5    State1  taxrate = 1.1    ———————    TotalTaxRate    = 5.6
The nested aggregate functions will not work in any standard SQL. The SELECT list does not have the GROUP BY columns and the HAVING clause references columns not in the SELECT list, too. In short, this is not standard SQL.

Ignoring the requirement that you do not use subqueries, can you write a single query that will answer this question in SQL-92?

Answer:
The nested aggregate functions are much harder to understand than subqueries. Since aggregate functions apply to the groups, the only way SUM(MAX(R1.AuthTaxRate)) could be interpreted is that it is the sum of the largest value of R1.AuthTaxRate in each group, which is simply the largest value of R1.AuthTaxRate in each group, period. In order to read this as T-SQL intends, you have to force a hierarchy of data, instead of sets of data. In short, you are throwing away the relational model.

In English the question is: “For City2 on 1994 November 1, what was the total tax rate, based on the latest rates in effect at that time from each of the authorities which has control over City2?”

Let’s solve this problem in pieces. First we want to find out who the taxing authorities for the city are, so we write a subquery

    (SELECT taxauthority         FROM TaxAreaAuthority AS A1        WHERE A1.taxarea = ‘city2’)
which will give us the set (‘city2’, ‘county1’, ‘state1’).

Next we want to find what the tax rates were on 1994 November 1, so we write another subquery:

    (SELECT taxauthority, authtaxrate        FROM TaxRates AS R1        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)            FROM TaxRates AS R2            WHERE R2.dteffective <= '1994-11-01'))
Now, let’s combine the two subqueries and put our constants in the SELECT list to make the final answer readable. Actually, I would make these constants into parameters, so the routine would generalize, but for now, let’s stick to the original problem:
    SELECT ‘city2’ AS city,             ‘1994-11-01’ AS effective_date,             R1.taxauthority,             authtaxrate        FROM TaxRates AS R1        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)                FROM TaxRates AS R2                WHERE R2.dteffective <= '1994-11-01'                    AND R1.taxauthority = R2.taxauthority)            AND taxauthority IN (SELECT taxauthority                 FROM TaxAreaAuthority AS A1                WHERE A1.taxarea = 'city2');
now let’s do the totaling:
    SELECT ‘city2’ AS city,             ‘1994-11-01’ AS effective_date,             SUM (authtaxrate) AS total_taxes        FROM TaxRates AS R1        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)                FROM TaxRates AS R2                WHERE R2.dteffective <= '1994-11-01'                    AND R1.taxauthority = R2.taxauthority)            AND taxauthority IN (SELECT taxauthority                 FROM TaxAreaAuthority AS A1                WHERE A1.taxarea = 'city2')        GROUP BY city, effective_date;
But wait! We can do more consolidation and get rid of the AND predicate in favor of more nesting, thus:
    SELECT ‘city2’ AS city,            ‘1994-11-01’ AS effect_date,            SUM (authtaxrate) AS total_taxes        FROM TaxRates AS R1        WHERE R1.dteffective = (SELECT MAX (R2.dteffective)            FROM TaxRates AS R2            WHERE R2.dteffective <= '1994-11-01'                AND R1.taxauthority = R2.taxauthority                AND R2.taxauthority                    IN (SELECT taxauthority                        FROM TaxAreaAuthority AS A1                        WHERE A1.taxarea = 'city2'))        GROUP BY city, effect_date;
Since the innermost subquery is a non-correlated, constant list, performance should be pretty good. And sure enough, when I look at the execution plan in Watcom SQL where I tested this, I find that the R1 and R2 tables were sequentially scanned, but the A1 table used the primary key index. If I put indexes on the TaxRates table, then I can get a faster execution plan.

Puzzle provided courtesy of:
Joe Celko
[email protected]

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

The Latest

your company's audio

4 Areas of Your Company Where Your Audio Really Matters

Your company probably relies on audio more than you realize. Whether you’re creating a spoken text message to a colleague or giving a speech, you want your audio to shine. Otherwise, you could cause avoidable friction points and potentially hurt your brand reputation. For example, let’s say you create a

chrome os developer mode

How to Turn on Chrome OS Developer Mode

Google’s Chrome OS is a popular operating system that is widely used on Chromebooks and other devices. While it is designed to be simple and user-friendly, there are times when users may want to access additional features and functionality. One way to do this is by turning on Chrome OS

homes in the real estate industry

Exploring the Latest Tech Trends Impacting the Real Estate Industry

The real estate industry is changing thanks to the newest technological advancements. These new developments — from blockchain and AI to virtual reality and 3D printing — are poised to change how we buy and sell homes. Real estate brokers, buyers, sellers, wholesale real estate professionals, fix and flippers, and beyond may