Finding the Most Recent and Consecutive Data

Finding the Most Recent and Consecutive Data

[Joe Celko’s Pension Plan Puzzle]

Luke Tymowski, a Canadian programmer, posted an interesting problem on the MSAccess forum on CompuServe in 1994 November. He was working on a pension fund problem. In SQL-92, the table involved would look like this:

    CREATE TABLE Pensions        (sin CHAR(10) NOT NULL,        pyear INTEGER NOT NULL,        service INTEGER NOT NULL DEFAULT 0             CHECK (year BETWEEN 0 AND 12),        earnings DECIMAL (8,2) NOT NULL DEFAULT 0.00);
The SIN column is the Social Insurance Number, something like the SSN Social Security Number used in the United States to identify taxpayers. The pyear column is the calendar year, the service column is the number of months in that year the person worked, and earnings is their total earnings for year.

The problem is to find the total earnings of each employee for the most recent consecutive 60 months of service. This number is used to compute their pension. The shortest period going back could be five years with 12 months of service each. The longest period could be 60 years with 1 month of service each. Some people might work four years and not the fifth, and thus not qualify for a pension at all.

The reason this is a beast to solve is that “most recent” and “consecutive” are hard to write in SQL. HINT: For each employee in each year, insert a row even in the years they did not work. It not only makes the query easier, but you also have a record to update when you get in new information.

This query will get me the starting and ending years of consecutive periods where (1) the employee worked (i.e. service greater than 0 months) and (2) the service totaled 60 or more consecutive months:

    CREATE VIEW         PenPeriods (sin, startyear, endyear, totearnings)        AS SELECT P0.sin, P0.pyear, P1.pyear,            (SELECT SUM (earnings) — total earnings for period                FROM Pensions AS P2                WHERE P2.sin = P0.sin                    AND P2.pyear BETWEEN P0.pyear AND P1.pyear)            FROM Pensions AS P0, Pensions AS P1            WHERE P1.sin = P0.sin        — self join to make intervals                AND P1.pyear >= (P0.pyear – 4)  — why look sooner?                AND 0 < ALL (SELECT service     -- consecutive service                    FROM Pensions AS P3                    WHERE P3.sin = P0.sin                        AND P3.pyear BETWEEN P0.pyear AND P1.pyear)                AND 60 <= (SELECT SUM (service) -- more than 60 months                    FROM Pensions AS P4                    WHERE P4.sin = P0.sin                        AND P4.pyear BETWEEN P0.pyear AND P1.pyear);
The subquery expression in the SELECT list is a SQL-92 trick, but a number of products already have it.

The gimmick is that this will give you ALL the periods of 60 months or more. What we really want is the most recent endyear. I would handle this with the Pension Period VIEW I just defined and a MAX(endyear) predicate:

    SELECT *         FROM PenPeriod AS P0        WHERE endyear = (SELECT MAX(endyear)            FROM PenPeriod AS P1            WHERE P1.sin = P0.sin);
I could handle that with some ugly HAVING clauses in SQL-92, I could combine both those subquery predicates with an EXISTS clause, etc. The trouble is that these features would not be as portable right now.

As an exercise, you can try to add another predicate to the final subquery which says there does not exists a year between P0.pyear and P1.pyear which is greater than the P4.pyear and still gives a total of 60 or more consecutive months.

Puzzle provided courtesy of:
Joe Celko
[email protected]

Share the Post:
XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing

data dictionary

Tools You Need to Make a Data Dictionary

Data dictionaries are crucial for organizations of all sizes that deal with large amounts of data. they are centralized repositories of all the data in organizations, including metadata such as