RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Enterprise Reporting with Excel : Page 3

Microsoft Office Excel 2007 version lifts several of the limitations in previous versions, making Excel reporting even more powerful.

What to Do When Excel Hangs
As mentioned earlier, running Excel on the server is not ideal. One common problem is that Excel becomes unresponsive. The most likely cause for this is a pop-up window coming up in Excel or a dangling reference as explained in the previous paragraph. Since Excel is running on the server, within a context where there is no user interface, responding to the pop-up is not possible. In 99 percent of the cases a pop-up window means you made a configuration error so it is very important to check the configuration at runtime and log any anomalies.

When Excel hangs, you somehow need to abort it. If your application simply hands over control to Excel, that isn't possible. To get around this, you need to start Excel on a different thread or in a different AppDomain. Listing 1 shows you how to start another thread that does the actual Excel interaction. Basically, you start a secondary thread, and the primary thread waits for the secondary thread to finish. You set the primary thread to wait only for a certain amount of time so that it can take action if Excel hangs. When that happens, you need to do two things: terminate Excel and terminate the secondary thread. Just doing the latter will not help because Excel runs out-of-process and the thread that activated it can't be aborted until it is in control again.

To kill Excel you have to kill the process that's running it, but only the one that was started by your application. Because the application is running on the server under a specific user account, you actually know that Excel processes with that user account as owner are associated with your application. Listing 2 shows how you can terminate those processes using that information. With the System.Management namespace you can get access to the processes running on the machine and query for the processes you're looking for. Once Excel has been terminated, control flows back to the secondary thread, at which point that thread can be aborted.

Excel should be started on a different thread, or in a different AppDomain.
With Excel running on a different thread you need to be aware that exceptions on that thread are not caught by the primary thread. So you need to catch any exception on the secondary thread and store it in a place where the primary thread can get to it. The last part of Listing 1 reads that exception and re-throws it on the primary thread, so upper-level logic can take the appropriate action. Note that I've made the distinction between fatal and non-fatal exceptions. The former terminates processing of the work item and then checks for a new work item. The latter just causes the current generation request and goes to the next one if applicable.

Maximizing Throughput
Because this is a server application you need every scrap of performance you can get. Normally that would just mean firing up multiple threads and processing multiple reports at the same time. Unfortunately, running multiple instances of Excel, or having Excel process multiple reports on one instance, is asking for trouble. And if Excel hangs for one of those reports you'd end up terminating all Excel instances. However, the application is sort of a pipeline because after a file is generated, it is published. Because the publishing step in no way interferes with the generation step, you can publish one file while generating another. This means you can keep the main thread occupied with generating reports while publishing files on another thread. On a multi-processor machine this would likely result in Excel running on one processor, while publishing is done on another, which is extremely efficient.

In 2007 Microsoft Office System, Excel will have a new XML-based file format that doesn't have the 65,535 row limit on sheets.
There are multiple ways to achieve processing in parallel. You could queue the published files in a thread-safe queue and start a second thread that monitors that queue and publishes files that are placed in the queue. Or you could asynchronously start the publishing process, so control is handed back to the main thread immediately.

When all files in a configuration have been generated, the main thread can wait for all publishing threads to return, as shown in Listing 3. Generation and publishing is done within the first for loop, which is only somewhat more complex than publishing synchronously. Basically you call the Publish method asynchronously through a delegate with the following signature:

   delegate void AsyncPublisher(
       Request request, string fileToPublish);
The IAsyncResult object returned from BeginInvoke is then stored so the result of the method can be retrieved after all files have been generated. At that point an array of WaitHandle objects is created to instruct the main thread to wait until all publishing threads have done their job, and then it will retrieve those results. Note that I have used the AsyncResult object from the System.Runtime.Remoting.Messaging namespace to get the delegate used to make the initial call, and that the EndInvoke method is performed inside a try-catch block just in case there was an unhandled exception in the asynchronous call.

The advantage of this approach over the queue method is that it is clear where the control is, and you can easily handle exceptions that weren't caught in the publishing process. From the developer's perspective, this approach is only mildly more complicated than synchronous execution. With that said, the queue version is slightly more efficient because the main thread could continue to the next report configuration immediately.

Looking Forward
Automating Microsoft Office as discussed here is far from ideal, but it is your best option in some cases. The new 2007 Microsoft Office System offers some interesting alternatives. For example, it features new file formats that consists of ZIP-compressed XML files. With the fabulous XML support in .NET and several free and commercial ZIP components available, you no longer need to run Excel on the server. Instead, you can manipulate the XML documents in the compressed file, so you can fill a sheet or pivot cache by extracting data and put it in the appropriate XML document inside the compressed Excel file format. This is completely thread safe, so generating multiple reports in parallel wouldn't be such a big deal In addition, Excel 2007 raises the worksheet limit from 65,535 rows to over one million rows. The new file format probably opens the door for SQL Reporting Services extensions that can generate Excel files with PivotTables and such. In addition, Microsoft Office SharePoint Server 2007 ships with Excel services, which aims to solve the same reporting need discussed in this article. But these technologies aren't very useful until they are widespread, which is a problem in large organizations with a slow adoption rate.

Michiel van Otegem is Lead Software Architect for The Vision Web, a Microsoft Gold Partner company in the Netherlands. He specializes in .NET development with Windows Server System and XML and has written Sams Teach Yourself XSLT in 21 Days.Michiel has been active in the community building mailing lists and forums, as a speaker and author, and founding the Dutch .NET User Group dotNED in 2002. Michiel also co-founded and runs the premier Dutch ASP/ASP.NET Web site, ASPNL.com. For his community efforts, Microsoft has awarded Michiel the Microsoft MVP Award for the past four years.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date