Browse DevX
Sign up for e-mail newsletters from DevX


Calculating Baseline Metrics for Business Intelligence : Page 2

At first glance, calculating a baseline metric—an average of some measure over a specific period of time—appears to be a simple task. In reality, baseline metrics are slippery beasts, and not at all as simple as they appear.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Workflow Baselines in a Transaction-processing Setting
Workflow analytics largely involve problems that answer the question "how many." For example, counting how many transactions were processed in a given unit of time, or counting how many customer inquiries occurred during a specific period. The historical baseline data (as opposed to targets) are the averages of such counts over a specified "trailing period." There's nothing complex about this in principle. The chart in Figure 1 shows how the baseline for a transaction creation volume measure (Tvol) might be presented:
Figure 1. Scorecard Baseline Presentation: The figure shows baselines in blue alongside the current values for those metrics. Note that these relationships are not always the same as their relative magnitudes.
In this example, suppose the "trailing period" for the baseline calculation were one year. At first glance this seems simple enough. To get the baseline, you could sum the last 365 days of Tvol measures and divide by 365. For comparison to "today's" daily Tvol measure this would suffice. But in a typical analytic application Tvol could be viewed at many different Time granularities in addition to Day, such as Week, Month, Quarter and Year; therefore, the baseline you provide must be relative to any arbitrary point in time. In other words, you must calculate the baseline dynamically. For example, to display a Tvol measure for Month MMYYYY, the baseline should be the sum of the daily measures for the 365 days preceding MMYYYY divided by 12. If at the Quarter level, it should be the sum of the daily measures for the 365 days preceding MMYYYY divided by 4. Each time period has its own proper divisor.

As a practical matter, having such dynamic calculations would be almost impossible without multidimensional cube technology (imagine doing this with SQL), and is not trivial even with it. Yet what has been discussed so far is only the beginning! In a real application the baseline calculations might additionally need to account for many specific conditions, such as:

  • Transactions Exist for Multiple Days. When transactions can exist for multiple days, counting problems become considerably more difficult. Such long-term transactions are common in business domains such as call-center and order fulfillment, where the lifecycle of a call or order may span more than one day. If not properly designed, double-counting is almost inevitable if your application lets users query any time period at any level.
  • Calculation on a Business Days Basis. Typically, measures are only generated on business days, not every day. In this case the proper divisor at the Day level is the number of business days in the prior 365 calendar days, taking into account weekends, holidays and leap years. Because of weekends, leap years, and holidays, the number of business days is not a constant, it's a function of the day chosen for reporting, and of how much history the baseline includes.
  • Calculation with Multiple Calendars. In the sample scenario, the business processes transactions at multiple cities around the world. This means that the data encompasses multiple calendars reflecting different holidays and work norms (for example, six-day work weeks). The baseline divisor thus becomes a function not only of Time but also of Location. Aside from the complication of considering this in the baseline calculation, often there is not even a system of record for this information.
  • Effective "Live" Date Varies by Location. Suppose new locations periodically come "live" as the enterprise expands or as the transaction-processing application gets deployed around the world, such as in a phased Enterprise Resource Planning (ERP) implementation. In this case, the baseline calculation must take into account how long a particular location has been live; otherwise the baseline would be grossly distorted.
For example, suppose Paris went live in October 2002 and you're trying to display Tvol at the Month level for December 2002. The baseline should be the sum of October 2002 and November 2002 activity divided by 2, not 12.

On the other hand, after the Paris location has been live for more than a year, the baseline calculation should behave as described earlier—on a trailing twelve-month basis. Continuing the example, if the display were of October 2003, the baseline should now be the sum of October 2002 through September 2003, divided by 12. More generally, when more than a year's worth of historical data for a measure exists, the baseline divisor for a given Time level is a constant. In this case, the monthly divisor becomes a constant 12 whenever the "live" date is at least one year prior to the reporting date. Table 1 illustrates this behavior for the Month and Quarter levels for Paris, assuming it went live in October 2002:

Table 1. Month and Quarter baseline devisors for the Paris location.

Quarter Baseline Divisor Month Baseline Divisor
Q4 2002 and earlier N/A Oct 2002 and earlier N/A
Q1 2003 1 Nov 2002 1
Q2 2003 2 Dec 2002 2
Q3 2003 3 Jan 2003 3
Q4 2003 4 Feb 2003 4
Q1 2004 and later 4 Mar 2003 5
    Apr 2003 6
    May 2003 7
    Jun 2003 8
    Jul 2003 9
    Aug 2003 10
    Sep 2003 11
    Oct 2003 12
    Nov 2003 and later 12

Author's Note: Strictly speaking, the Day level divisor for a given City never becomes constant but "vibrates" around a constant after the City has been live for a year. For example, for a City with a 5-day work week and 10 annual holidays, the nominal number of business days in a year is 250 ((52*5) - 10)—the "constant"—but the actual number of business days in the preceding 365 calendar days varies day by day. It could assume values such as 249, 250, and 251, and possibly even 248 or 252. For baselines describing workflow volumes this makes very little difference, but in other applications (for example if interest calculations were involved) it could be important.

It's easy to see that supporting all these requirements in a dynamic manner that works for any time period and also performs responsively is not a trivial task.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date