s programming languages have evolved over the years, the challenges in optimizing and fine-tuning applications have changed. I began programming 20 years ago and quickly adopted assembler language for its fast performance and small footprint. However, coding in assembler meant that I had to do everything myself. If I wanted to draw to the screen, I wrote a line-drawing routine. If I wanted to use floating-point arithmetic, I wrote the floating-point routines. Magazine articles and advice from other developers helped, but the onus was on me to write every byte of the code.
Although this level of responsibility was tricky, having the code under my complete control had its advantages. If it was slow, I could pinpoint the performance glitch and fix it myself. Nowadays, a developer wouldn’t dream of designing his own fonts or writing his own printer drivers. All this commonly used functionality has been wrapped up in black boxes and given away or sold in components. The challenge no longer lies in low-level programming; it’s in integrating pre-packaged bits of code largely written by other people?and making sure they perform at their optimum levels. The performance of your application will depend largely on third-party components whose implementation is hidden, and over which you have no control.
Take, for example, the Microsoft .NET class libraries, an object-oriented layer that Redmond has spent a lot of time developing. This layer sits on top of the operating system to provide .NET functionality, which reminds me of the expression “putting lipstick on a pig.” Microsoft has applied the lipstick in thick layers, but the OS underneath is still piggish. If you forget that parts of it were designed over a decade ago and are in no way object-oriented, you can encounter problems when you try to optimize your .NET applications.
Granted, all code is ultimately just a bunch of non-object-oriented machine code, but Microsoft could have implemented the .NET performance counters (features that gauge how CPU levels, memory usage, disk I/O, and other functions are performing) at the OS level in an object-oriented fashion. In fact, developers might assume Redmond did just that if all they can see is the .NET wrapper, but layers such as .NET provide a veneer of object-oriented classes over an underlying system that is not. This means a big disparity could exist between how a developer assumes a call will work and how it actually does.
A good way of avoid these pitfalls is to keep an eye on what your code is actually doing and how long it is taking to do it. A number of tools can help you do this. The ones I used to write this article include ANTS Profiler from Red Gate Software, the Anakrino .NET decompiler, and Microsoft’s .NET Allocation Profiler.
|Editor’s Note: Guest commentator Neil Davidson is the technical director at Red Gate Software, a vendor of code profiling software tools.|
Is Your App a Memory Hog?
The Windows NT, 2000, and XP operating systems provide performance counters. For some unknown reason, however, Microsoft implemented these performance counters in bizarre fashion at the underlying operating-system level.
If you accessed the counters natively, without using helper libraries, you would need to make calls to the RegQueryValueEx Windows API, interrogate the variable-length data that the API returned (which is stored as nested-nested-nested-memory blocks), and perform lookups on a special “Counter 009” registry key.
As if this procedure isn’t odd enough, RegQueryValueEx doesn’t even query a registry key; it loads up performance DLLs and calls different entry points to retrieve data. For example, once you have the raw data, you need to convert the values from numbers into percentages or time differences. All told, you’ll need to carry out about 20 common calculations.
All in all, this is a very time-consuming bit of programming to do yourself. Fortunately with .NET, all you need to do is instantiate a PerformanceCounter object and interrogate it. The code sample in Listing 1, which took five minutes to write, replaces what took a week to write pre-.NET. Visual Studio .NET auto-generated most of it, and I had to type in only six lines myself. The snippet examines the value of a particular performance counter five times and displays the results.
Still, no matter how clean this code is, and no matter how well Microsoft has hidden the underlying Windows API, it is still called at some level during optimization. With the six lines of code I added, the .NET garbage collector is highly efficient, and allocating and reclaiming memory is well optimized, but instantiating a PerformanceCounter object and calling NextValue() five times allocates more than 17,000 objects on the heap and grabs 1.1MB from memory. That might not be significant for your particular application, but it is certainly worth your attention.
The graph in Figure 1 shows the number of allocations for each object type, while the graph in Figure 2 shows memory allocated for each object.
|Figure 1: Number of Allocations for Each Object Type|
|Figure 2: Memory Allocated for Each Object|
In short, those six lines of code will allocate 8,000 strings that take up a total of 500Kb of memory. That might or might not represent a performance hit, but I’ve found that it certainly surprises most developers.
With tools like the .NET memory profiler, you can determine what objects are allocated on the heap, what creates them, and how long they stay there before being garbage collected. You might not think you need to worry about memory allocation with .NET, but performance hits like those I’ve discussed in this section should persuade you otherwise.
Quick and Dirty Is Better Than Slow and Proper
Binary serialization in .NET is another example where I discovered unexpected performance results. I generally prefer custom serialization because it gives you finer control of what goes into the serialization stream and how it is versioned. The issue I raise in this section is a simplification of a real-life example I encountered when doing some .NET development.
Say you use a class to store data points for a graph and you want to store those points to disk. The “proper” object-oriented, .NET-friendly way of doing this is to mark the class with the [Serializable] attribute and implement the ISerializable interface. You can then create a BinaryFormatter object and simply stream the object’s state to disk. A messier, “dirty” alternative would be to open a file for writing, loop through the data points, and write them to disk.
You might think the first option is better. Doing it “properly” might carry a performance hit, but it can’t be that much, right? Wrong.
The code sample in Listing 2 shows the two approaches. After some common code, it then presents two functions: RunUsingISerializable (proper) and RunUsingDirectStreams (dirty). When you run the sample, you might be surprised to find that the proper way is 50 times slower than the dirty way. It takes about 10 seconds to save the array to disk and load it back up. The dirty way takes 0.2 seconds. That time gap can be the difference between a frustrated user and a happy one.
As the sample runs, .NET calls the function System.Runtime.Serialization.Formatters.Binary.__BinaryParser.get_prs(). I don’t know what that function does, but isn’t it interesting that serializing 100,000 items to disk involves 3.1 million calls?
I encountered this issue while loading a file from disk and displaying it to the end user in an HTML format. This process involved deserializing a series of classes from disk, creating an XML file, doing an XSLT transform to create the HTML output, creating and saving about 100 images to disk, and then showing the HTML file in a browser. The overall process was slow to the end-user, but it wasn’t at all obvious to me that the bottlenecks were the loading from disk and the XSLT transform (which, incidentally, also seems to have a very poor implementation in .NET). Without a code profiler to identify the specific performance clogs, I would have wasted a lot of time optimizing the creating and saving of images to disk, which actually wasn’t slow.
Keep Your Eye on the Code
By learning to profile your code, you can identify slow areas and remove bottlenecks. Code profiling tools (see Figure 3 for an example) generally can show you exactly how your application is behaving and can pinpoint where to concentrate your optimization efforts.
|Figure 3: ANTS Profiler Screenshot|
It’s great that Microsoft and other companies provide pre-packaged, easy-to-use, object-oriented components that implement and hide complex bits of functionality. But since these routines are black-boxed, understanding what is going on inside them is very hard. This makes predicting how they will behave nearly impossible, and you very easily can be stuck with slow code.