Browse DevX
Sign up for e-mail newsletters from DevX


Enterprise Reporting with Excel : Page 2

Microsoft Office Excel 2007 version lifts several of the limitations in previous versions, making Excel reporting even more powerful.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

The Server-side Solution
Server-side Excel automation combines the power of VSTO with the more enterprise-oriented approach of SQL Reporting Services. You can build templates, use PivotTables, and use a centralized creation and distribution system. Server-side Excel Automation also doesn't require the use of Visual Studio like VSTO does, so business intelligence users can create templates without development efforts, providing you give them enough flexibility to configure where the data needs to come from and where it needs to go. The fact that you are providing the plumbing and need to provide that flexibility is the hardest part of a custom solution like this. How far you need to take that depends on the needs of your organization, and as such is beyond the scope of this article, which is more about the technical issues for this solution.

Doing server-side Excel Automation combines the power of VSTO with the more enterprise-oriented approach of SQL Reporting Services.
There are multiple ways to do server-side Automation with Excel. Most don't actually involve Excel itself but an alternative access method to an Excel file. There are several commercial components offering such services, and you can also use OLE DB to access sheet data. These methods are all thread safe, so using them in a server application is no problem. The main drawback is that none of these methods allow you to manipulate the pivot cache. As long as you stay within the 65,535 row limit, you can base a pivot on sheet data, but once you pass that mark you have no option but to install Excel on the server and automate it there. Running Excel on the server is something you shouldn't turn to lightly, as you can read in the sidebar, Running Excel on a Server. However, in some scenarios there really isn't much choice. In a real-world application I worked on, the following requirements pushed us towards this solution:

  • Reports often require PivotTables
  • Analysis should be possible offline
  • Efficient use of network bandwidth (many users in many countries, with sometimes limited bandwidth)
  • Many different reports required, including reports for a specific country or specific person
Solution Architecture
With the requirements outlined above, my team went to the drawing board (literally) and came up with an architecture that consists of two parts: data acquisition and report generation. Data acquisition meant getting the required data from the different source systems in different parts of the world and processing them to a single table format that works well with OLAP and Excel PivotTables. Depending on the data (and the location of that data) for a specific report, this could take anywhere between a few minutes and several hours. When the processing is complete, a row is added to a table with reports to be generated. After the application has added the new records, the report generation application that monitors that table starts generating reports. Figure 1 shows the basic architecture of this application implemented as a Windows Service. The architecture very much resembles the generation process, which consists of the following steps:

Figure 1: Basic architecture of Excel reporting solution.
  1. Read a new work item from the queue table.
  2. Read the configuration for the work item.
  3. Create a generation request.
  4. Generate report for request.
  5. Publish generated report.
  6. If the configuration consists of multiple reports of the same type (for instance, one for each country), go to step 4 until all requests are done.
You can see the pluggable architecture in Figure 1. You can extend both the generation and publishing module with additional generator and publisher plugins. These plugins have to implement a specific interface. To use them you just have to change the configuration for a report. Such a configuration consists of the Excel template, generator(s), and publisher(s) to use for the report, as well as what data to load into the report, and where to put it (a sheet, a cell, or in the pivot cache).

The application can parameterize the data it needs to load so that one configuration can result in multiple reports with data filtered according to the parameter. The same also applies to, for instance, the file name of the generated report and the publish location (e.g. folder), based on the same parameter. That is what a generation request is all about. It contains the actual queries, file name, publishing target, and so on, to create and publish one report file.

When the generator module is called, it checks the request and loads the needed generator and passes the request on. The loaded generator then reads its configuration and processes the request, resulting in a file. That file is passed to the publisher module along with the configuration, which then loads the needed publisher or publishers if there are multiple publish targets. If needed, the publisher module compresses the file, which is a big bonus over the SQL Reporting Services or VSTO approach.

Loading the Data
The focus of this article is not how to use the Excel Automation API, but about how to use Excel on the server without getting into big trouble. However, you need to know some things about how the application loads the data and its impact. The target for the data determines how the application loads the data. If it is just a cell, the cell value is set directly as shown below:

public void SetCellValue(Worksheet sheet, string cellName, string cellValue) { Range cell = null; try { cell = sheet.get_Range(cellName, Missing.Value); cell.set_Value(XlRangeValueDataType.xlRangeValueDefault, cellValue); } catch (COMException exception) { // Handle exception } finally { ComInterop.ReleaseComObject(cell); } }

The same would work for a sheet, but for a sheet another option is to use a query table, which is a sheet that loads data from a query. In this case, you set the connection string and the query, and then turn over control to Excel and have it load/refresh the data. With a PivotTable you have two options: either point a PivotTable to data in a sheet that you can load with the methods mentioned above or set up the underlying pivot cache. Setting up the pivot cache is similar to setting up a query table, so again you have to turn over control to Excel. The following code example shows you how this works in a nutshell. Both methods shown here assume that you imported the Excel namespace (Microsoft.Office.Interop.Excel). The following code acquires a reference to the pivot cache and sets the properties. Then the workbook, the actual Excel file, is refreshed. Note that all of this, as well as the operations in the SetCellValue method shown earlier, are done in a try-catch block that catches a COMException. Because you're doing interop, this is the only exception you'll get.

public void SetPivotTable(Workbook workbook, Worksheet sheet, PivotConfig config) { PivotTables pivots = null; PivotTable pivot = null; PivotCache cache = null; try { // Get PivotTable pivots = (PivotTables)sheet.PivotTables(Missing.Value); pivot = pivots.Item(config.Name); cache = pivot.PivotCache(); // Set connection and command cache.Connection = config.ConnectionString; cache.CommandText = config.CommandText; // Load the data workbook.RefreshAll(); } catch (COMException exception) { // Handle exception } finally { ComInterop.ReleaseComObject(cache); ComInterop.ReleaseComObject(pivot); ComInterop.ReleaseComObject(pivots); } }

Note that all the objects created in the method are carefully released at the end using the code shown below:

public void ReleaseComObject(object o) { if (o == null) return; try { int referenceCount; do { // COM uses reference count to determine if an object // can be disposed of. referenceCount = System.Runtime.InteropServices. Marshal.ReleaseComObject(o); } while (referenceCount > 0); } catch { //Dummy, releasing COM object should not cause failure. } finally { o = null; } }

If you don't release the COM objects, you run the risk of having a dangling reference because COM uses reference counting to know when it can release an object. As long as the count is greater than zero, the object remains active. When doing Excel interop, all objects count against the Excel instance that was started, and that instance will stay active if you try to shut it down when the reference count is higher than zero.

Thanks for your registration, follow us on our social networks to keep up-to-date