Browse DevX
Sign up for e-mail newsletters from DevX


Calculating Baseline Metrics for Business Intelligence : Page 4

At first glance, calculating a baseline metric—an average of some measure over a specific period of time—appears to be a simple task. In reality, baseline metrics are slippery beasts, and not at all as simple as they appear.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Some MDX Examples
With a star schema created as described, the next step is to use Analysis Services technology to take advantage of it. Assume that you're translating the star schema into a cube using standard techniques, with the IsCounted facts becoming measures in the cube with "Sum" as their Aggregate Property. After doing that, you can create calculated members (CMs) to complete the solution. Assume the Time dimension is named [Date]. These MDX examples were created to illustrate the ideas, so consider them as templates rather production-ready code.

The first requirement is to create a CM to sum the proper "IsCounted" measure based on the current Date context. Calling it TranCount, the MDX would look like:

iif([Date].CurrentMember.Level.Ordinal = 0, NULL, -- NULL if [Date] dimension is at the "All" level iif([Date].CurrentMember.Level.Name="Day", [Measures].[IsCountedDay], iif([Date].CurrentMember.Level.Name="Month", [Measures].[IsCountedMonth], iif([Date].CurrentMember.Level.Name="Quarter", [Measures].[IsCountedQuarter], [Measures].[IsCountedYear] ))))

TranCount provides Tvol for whatever the current Date constraint is, such as MMYYYY in the earlier example.

The baseline CM is defined in terms of two other CMs: BaselineNumerator / BaselineDenominator.

The BaselineNumerator needs to sum the previously defined TranCount over the one year trailing period relative to the current period. This suggests the MDX functions ParallelPeriod and Sum. A template for BaselineNumerator looks like:

Sum({ParallelPeriod([Date].[Year],1,[Date].CurrentMember): [Date].CurrentMember.PrevMember},[Measures].[TranCount])

Note that what this deceptively simple expression actually does is sum the set of Date members between the member prior to the current Date member and one year back from that prior member. For example, if the current member is Q1 2004, the result sums Q1 2003 : Q4 2003. Also note that you don't need to add any explicit code regarding when a Location went live, because the code adding "actuals." For a given Location, the expression will add either history-to-date (if less than a year) or a trailing year's worth, depending on the value of [Date].CurrentMember. The ParallelPeriod function also automatically does the right thing regardless of the current level in the [Date] hierarchy.

A complete version of BaselineNumerator would require logic to handle hitting the "edge" of the cube (i.e. when there is no PrevMember) to avoid generating an error.

Next, BaselineDenominator must derive the correct divisor for the baseline. This calculation must take into account the business rules regarding Location-based calendars, when the Location went live relative to the Date constraint of the query, and the level within the Date hierarchy of the query. Because this CM is the most complex part of the solution and is very application-specific, I'll sketch only high-level ideas here.

First, you need a way to get the number of business days in the prior year given a Day and a Location. Within the MDX environment, the most practical way to do this is using the MDX LookUp function against another cube (here named BLDays). This other cube and the star schema feeding it would need to be specially designed for this purpose, and would have the following characteristics:

  • Its dimensions are the [Date] and [Location] dimensions of the main cube, which therefore must be shared dimensions in Analysis Services terms.
  • It must be actively maintained so as to contain data for every combination of Date and Location—thus it is not a static object but must be updated as Time goes forward and as new Locations go live.
  • The star schema has one integer fact, BusinessDaysInLastYear. This value is the number of business days in the preceding year as a function of Day and Location, taking into account the calendar for the Location and when the Location went live. BusinessDaysInLastYear is zero for Days prior to the date the Location went live.
  • In effect, the complicated business rules are pre-calculated in this schema. This not only improves query-time performance but is probably the only practical way to keep query complexity manageable.
The lookup code, when Date is at Day level, looks like:

LookupCube("BLDays", "(" + MemberToStr([Date].[ByMonth].CurrentMember) + "," + MemberToStr([Location].CurrentMember)+ ",[Measures].[BusinessDaysLastYear])")

The preceding query gives the required number of business days in the prior year.

Unfortunately it's still not complete. Recall that at higher levels of the [Date] hierarchy, the proper divisor is not the number of business days in the prior year but the number of time periods at the same level in the prior year (taking Location live date into account). The following expression gives an idea on how to achieve this:

LookupCube("BLDays", "Sum({ParallelPeriod([Date].[Year],1," + MemberToStr([Date].CurrentMember) + "):" + MemberToStr([Date].CurrentMember.PrevMember) + "},iif((" + MemberToStr([Location].CurrentMember) + ",[Measures].[BusinessDaysInLastYear])>0,1,0))"))

The (simplified) sub-expression from this is:

iif(([Location].CurrentMember,[Measures]. [BusinessDaysInLastYear])>0,1,0)

When evaluated in the context of the BLDays cube, the preceding expression will yield "1" if [Location].CurrentMember was live within the current Date period, and "0" if not. This will be evaluated for each Date member generated by the ParallelPeriod function (as described for BaselineNumerator). Sum then adds the results. For example, if the current Date level is Month and the current Location were live only during the two months prior to the current month, the result would be "2," the desired divisor. Note that this logic considers a Location live for an entire Date period if it was live in any part of that period.

A complete version of BaselineDenominator must knit together the two lookup expressions above to use the appropriate one based on the current level in the [Date] dimension.

As you can see, calculating realistic baselines can be far more complex than simply taking a simple average. If you neglect to take the appropriate business rules into account, your baseline data won't reflect reality, and thus will also throw off the accuracy of the results.

Mark Frawley is a Technical Business Analyst on the Citigate Hudson Pervasive Business Intelligence team that created an Executive Dashboard for ABN Amro. His focus areas include data modeling, relational and multidimensional database technology, software development, and writing and teaching about the BI industry. Prior to joining Citigate Hudson, Mark worked at a variety of financial institutions and consulting companies on a wide range of applications and technologies, most recently as one of the chief designers and developers of data warehouses and datamarts over a seven year period at Soci�t� Generale.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date