Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

It All Adds Up

Use SQL Server 2005's Common Table Expressions, UNPIVOT, and the ROW_NUMBER() function to solve an interesting problem.


advertisement
while ago I responded to a newsgroup post in which the poster had a table of inventory changes containing start and end dates, as well as the amount that was added. The poster wanted to be able to determine the inventory in stock at any given date. I gave him a solution but I got to thinking about generic ways to solve problems of this nature with SQL Server 2005.

I'll present a generic solution here that you can apply against a broad range of problem domains. For example, if you knew the times that an aircraft entered and left an air traffic controller's airspace, you could calculate—for any given time—how many aircraft that controller was working. Similarly, if you knew when a party entered and left a restaurant, as well as the number of people in the party, you could then determine how many people were in the restaurant at any given time. Figure 1 shows how this would look, if represented as a Gantt-like chart.

 
Figure 1: Gantt-like representation of deltas.
Each stripe on the chart in Figure 1 corresponds to an entry in the table of values. Unfortunately, it doesn't do a good job of showing you how much of whatever you're monitoring is present at any given time. You want something more along the lines of Figure 2, in which you can see the inventory total changes as items are added and removed from the tally. This reminds me of my favorite subject in university calculus. While you're not using calculus here, the principle is the same. You're integrating stuff. You start with something and add the increments between the start and the end and wind up with the new amount. It's kind of like basic accounting.

 
Figure 2: Timeline of inventory totals.
Let's put together a table of raw data and work from there. Check out this code:

create table dbo.Deltas ( ID int primary key, Start datetime not null, Finish datetime null, Amount int not null check (Amount > 0), unique (Start, Finish) check (Finish >= Start) ) go insert dbo.Deltas values (1, '20060101', '20060108', 5) insert dbo.Deltas values (2, '20060101', '20060106', 2) insert dbo.Deltas values (3, '20060104', '20060110', 3) insert dbo.Deltas values (4, '20060112', null, 4) insert dbo.Deltas values (5, '20060112', '20060114', 1) insert dbo.Deltas values (6, '20060109', '20060110', 7)

Note that the Start column is not null but the Finish column allows nulls. This is because at any given time, an amount may be on the books but hasn't yet been removed. Alternatively, you could put an end date of 9999-12-31 23:59:59, if you have an allergy to nulls.

In this example, no two entries would both start and finish on the same dates. Rather, the Amount would adjust to account for the combined total. The UNIQUE constraint on (Start, Finish) enforces this. Also, I added a CHECK constraint to ensure that the Finish date is on or after the Start date. Yes, you can have something that starts and finishes on the same day. Another CHECK constraint ensures that the Amount is positive.

As a first step, you need to unpivot the start and end dates. (I'll show you how to use SQL Server 2005's cool new UNPIVOT operator to do this, though you can do this using other techniques.) This has the effect of producing one row for the beginning of the entry and another for the end of the same entry. The Start event adds the amount, while the Finish event removes it. Therefore, the Start event should have a positive amount, while the Finish Event should have a negative amount. Indeed, they are mirror images of each other.

You have to account for the fact that the Start and Finish dates are the dates where the Amount was on the books, inclusive. That means that it goes off the books the following day. Therefore, you must add one day to the Finish date if you want things to work out correctly.

It's not beyond the realm of possibility that several entries in the table could either start or stop on the same date. Consequently, after you unpivot, you have to take daily totals for each event. In this example, you have two entries that begin on 2006-01-01. This means that the unpivoted data will be summed, but grouped by event date.

The job's still not done. What you have right now are the daily total increments. Assume that you had nothing on the books at the beginning of the exercise. Thus, all you have left to do is take a running total. That then gives you the total amount on record for any date on which the amount changes.

It's easier to show you the code than to describe everything. The code in Listing 1 is actually a view, since you're going to be re-using it several times in what's to come. It uses a CTE (Common Table Expression). In fact, it uses two of them, one of which depends on the other. CTE x simply adds 1 day to the Finish date, in preparation for the UNPIVOT, which you do in CTE y.

CTE y requires careful study. You're unpivoting the Start and Finish columns. Their contents will go into a new column, named EventDate, as shown in the table below:

EventDate



Amount

2006-01-04 00:00:00.000

10

2006-01-07 00:00:00.000

8


The column names—Start and Finish—now disappear but are kept in another new column, Event. The other columns in CTE x are preserved through the UNPIVOT. After the UNPIVOT, you need to sum the Amount values, grouping on EventDate. As mentioned earlier, a Start event represents an addition, while a Finish event is a subtraction. This means that you need to use a CASE construct inside your sum.

Now that you have CTE y, you take the running total. Be careful here. It involves a self-join. You group on y1.EventDate, but you take the sum of y2.Amount.

At this point you can find out the totals for a range of dates by doing a simple SELECT, as shown below. You'll see that you get one row for every day where there was a change within the given range of dates:

Select * from dbo.Totals where EventDate between '20060104' and '20060107' order by EventDate

It's fairly easy to find the high-water mark—the highest amount on record—between any two dates, together with the date on which you recorded it:

select top 1 with ties * from dbo.Totals where EventDate between '20060104' and '20060110' order by Amount desc

Notice that the ORDER BY Amount DESC returns the value with the highest Amount. The SELECT TOP WITH TIES is there because it's possible that you hit the same high-water level more than once. If you wanted the row for the date when the high-water mark was initially set, then you could leave out the WITH TIES. If you wanted the most recent high-water event, then change the ORDER BY clause to:

order by Amount desc, EventDate desc

Editor's Note: This article was first published in the July/August 2007 issue of CoDe Magazine, and is reprinted here by permission.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap