Browse DevX
Sign up for e-mail newsletters from DevX

Tip of the Day
Language: SQL Server
Expertise: Beginner
Mar 25, 1997



Building the Right Environment to Support AI, Machine Learning and Deep Learning

Finding the Greatest Lower Bound

[Joe Celko's Sales Gaps Puzzle]

Given a table "sales" and a column "saledate" and "customer," is there a way to get the average number of days between sales dates for each customer in a single SQL statement? Let's assume nobody makes a sale to the same person on the same day, so we have the very simple table:

        (customer CHAR(5) NOT NULL,
        saledate DATE NOT NULL
        PRIMARY KEY (customer, saledate));

This is a problem where the more you know the more you hurt yourself. To find the gap between sales, the SQL guru does a self join or makes VIEWS. The first task is to get the sales into a table with the current saledate and the date of the last purchase:

    CREATE VIEW LastSales (customer, thissaledate, lastsaledate)
        AS SELECT S1.customer, S1.saledate,
                (SELECT MAX(saledate)
                    FROM Sales AS S2
                    WHERE S2.saledate < S1.saledate 
                        AND S2.customer = S1.customer)
            FROM Sales AS S1, Sales AS S2;
This is a greatest lower bound query -- we want the highest date in the set of dates for this customer which comes before the current date.

Now we construct a VIEW with the gap in days between this sale and their last purchase. You could combine the two views in one statement, but it would be unreadable.

    CREATE VIEW SalesGap (customer, gap)
        AS SELECT customer, DAYS(thissaledate, lastsaledate)
            FROM LastSales;
The DAYS function, or something like it, is a library routine in each vendor's SQL which will give the interval between two dates in days. The final answer is one query:
    SELECT customer, AVG(gap)
        FROM SalesGap
        GROUP BY customer;
You could combine the two nested views into the AVG() call, but it would be unreadable, might blow up and would run like molasses.

Or if you stop and think about the question being asked, you simply write:

    SELECT customer, 
            (MAX(saledate) - MIN(saledate) / (COUNT(*) - 1)) AS gap
        FROM Sales
        GROUP BY customer 
        HAVING COUNT(*) > 1;

Puzzle provided courtesy of:
Joe Celko

DevX Pro
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date