Browse DevX
Sign up for e-mail newsletters from DevX


An Introduction to OLAP in SQL Server 2005 : Page 2

Get a preview of the upcoming SQL Server 2005's Business Intelligence suite and find out about the major OLAP components of Analysis Services.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Elements of Multidimensional Models
To fully leverage the SQL Server 2005 BI Workbench platform, one must first understand the basic elements of multidimensional modeling. The basic elements of a multidimensional cube are: measures, dimensions, and schema. Measures
Measures are the key performance indicators that you want to evaluate. To determine which of the numbers in the data might be measures, a rule of thumb is: If a number makes sense when it is aggregated, then it is a measure. For example, it makes sense to aggregate daily volume to month, quarter and year. On the other hand, aggregating zip codes or telephone numbers would not make sense; therefore, zip codes and telephone numbers are not measures. Typical measures include volume, sales, and cost.

Dimensions are the categories of data analysis. The rule of thumb is: When a report is requested "by" something, that something is usually a dimension. For example, in a revenue report by month by sales region, the two dimensions needed are time and sales region. For this reason, OLAP analysts often nickname dimensions the "bys." Typical dimensions include product, time, and region. Dimensions are arranged in hierarchical levels, with unique positions within each level. For example, a time dimension may have four levels, such as Year, Quarter, Month, and Day. Or the dimension might have only three levels, for example, Year, Week, and Day. The values within the levels are called members. For example, the years 2002 and 2003 are members of the level Year in the Time dimension.

Figure 2. A Tree-Structured Multidimensional schema: The figure shows Excel Pivot Table interface consisting of tree view for a multidimensional structure.
We believe as a best practice, a cube should have no more than twelve dimensions. A cube with more than twelve dimensions becomes difficult to understand and browse. Too many dimensions can cause confusion among end users and having too many dimensions and aggregations can also lead to "data explosion." As the number of dimensions and levels increase, the amount of data grows exponentially. As mentioned earlier, an OLAP application is typically used to manipulate large volumes of data. To optimize response time, Analysis Services usually pre-aggregate a multidimensional schema. A dimension can be thought of as a tree structure. Many OLAP tools present it in a tree control (see Figure 2). This familiar software control makes using dimensions easier as it allows dimension members and their relationships to be viewed simultaneously. This simple interface makes using the dimensions extremely user-friendly and allows user to view data of different levels simultaneously.

The dimensions and measures are physically represented by a star schema. The most basic star schema arranges the dimension tables around a central fact table that contains the measures (see Figure 3).

Figure 3. Simple Star Schema: The figure shows a basic star schema with the dimension tables arranged around a central fact table that contains the measures.
A fact table contains a column for each measure as well as a column for each dimension. Each dimension column has a foreign-key relationship to the related dimension table, and the dimension columns taken together are the key to the fact table.

After determining the measures, dimensions, and schema using the BI Workbench, there is one more step—you must decide where the data aggregation is to be stored. Historically, there were three basic storage options: Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP), or Hybrid OLAP (HOLAP). SQL Server 2005's introduction of what Microsoft calls the Unified Dimensional Model, which leverages the best of relational and OLAP cube technologies, allows the designer many more storage options and unlike SQL Server 2000, allows combining them in the same solution. DTS
Microsoft's Data Transformation Services (DTS) is perhaps the most critical tool in an OLAP project. DTS is used to pull data from various sources into the star schema. The data warehouse will, in turn, feed the Analysis Services database. More often that not, you must transform data from the source (for example, you may have to convert currency values, balance calculations, and the like) and remap it. Microsoft has estimated that in most cases, organizations spend eighty percent of their data warehousing on the extract, transform, and load (ETL) phase.

Visual Studio 2005 hosts a new tool, BI Workbench, which is a replacement for DTS Designer. Chief among the improvements found in BI Workbench is its separation of control flow (insertions, looping, sequencing, scripting, etc.) from the data flow (source identification, aggregation, character mapping, and data conversion) tasks (see Figure 4 and Figure 5). This separation makes DTS packages easier to read, develop, and maintain. BI Workbench is reason enough to learn and use Visual Studio 2005.

Figure 4. Control Flow Task Diagram: The figure shows a typical control flow and an associated task statement.
Figure 5. Data Flow Diagram: The figure shows a typical data flow from source to destination.
Because DTS has been completely reworked in SQL Server 2005, current SQL 2000 DTS user will need to brush up on DTS—and learn a few new tricks.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date