Effective Windows DNA Applications: Maximizing Performance with Object Pooling

indows Distributed interNet Application (DNA) is an architecture for building n-tiered applications for the Microsoft Windows platform. DNA is based on the Microsoft Component Object Model (COM), which provides for object level reuse and allows for a high degree of language neutrality. Developers can generally choose the language in which they are most comfortable for building components, though you should be aware that not all languages are created equal. We will discuss the performance implications of development languages later in this article.

Microsoft continues to enhance the features provided through COM, with a goal of eliminating much of the complexity involved in building scalable server-based applications. Windows NT 4.0 and MTS showed a lot of promise toward delivering on the goal of simplifying the development of systems utilizing distributed components and databases. Unfortunately, neither the infrastructure of the Windows NT OS nor the reference material made available about developing for MTS were sufficient to realize the vision.

While MTS on Windows NT was a step in the right direction, developers were like toddlers learning to walk as they implemented solutions based on this new vision. Even several years after the introduction of MTS (the dinosaur age in Web years) there is still an amazing amount of misinformation being published about MTS, the related infrastructure, interaction with RDBMS’s, and effective component design.

With Windows 2000 Microsoft has taken another step toward the goal of simplifying the development of highly scalable server-based applications. The functionality in COM+ Services now offers multiple avenues to resolve performance and scalability issues without having to write your own infrastructure. The ability to manage application resources by object pooling is one of the new features of COM+ Services, and is the primary focus of this article.

Be aware that each of the features of COM+ can be used to reduce the perceived complexity of developing server applications. Unfortunately, the gross understatement of the complexity of implementing real-world applications based on some of the new features provided by COM+ Services will result in a repetition of the MTS cycle.

While in an ideal world you can use semi-skilled labor to build robust, scalable applications, the real world is a little more demanding. To design and build effective Windows DNA applications, developers must have a thorough understanding of the resources involved in a distributed application.

As we transition to an n-tier framework based on Windows DNA, server performance becomes an issue that many desktop application developers are unprepared to address. Coding techniques that were effective for 2-tier designs are unsuitable in an n-tier application. Many developers currently writing n-tier code targeted for the server have yet to understand the impact of object context and distributed transactions, even after several projects. There are major differences between the coding practices that prove effective for desktop vs. server environments. Failing to anticipate the impact of MTS or COM+ applications with regard to the network, database, or the application server resources will result in a solution that performs poorly and will not scale.

The purpose of this article is to provide developers with an understanding of the performance characteristics of middle-tier transactional components in a Windows DNA application. This article will not attempt to describe the basic concepts of Windows DNA, but rather will focus on the issues encountered in real-world deployment that lead us to conduct our performance analysis. We will also cover the tests conducted and our observations regarding the results.

Examining the Misconceptions About MTS
As some of you may note, I am a frequent contributor to the MTS and COM+ newsgroups. I never cease to be amazed at the questions that appear on a daily basis that are the result of the confusing information availabe for MTS. To quote Don Box “No technology since the dawn of COM has been more misunderstood than MTS”. If I could convey everything that I have come to understand about MTS in this article, I would. However, it would take several hundred pages and numerous code examples to adequately convey the understanding that I have arrived at.

With that said, let’s move forward with the focus of this article

Why Discuss the Misconceptions Surrounding MTS?
The experiences we have had with implementing real-world, practical applications based on MTS have exposed short-comings in the documentation, and also in code samples provided by Microsoft and appearing in a number of third party books. While the available material demonstrates the fundamental coding techniques necessary to get started, the in-depth information needed to build scalable systems with MTS is not readily available. This is not to say that all of the published material is incorrect, but it takes experience to separate out the wheat from the chaff?experience that developers new to this framework simply do not have.

MTS?Separating Fact from Fiction
Contrary to what the marketing literature would have us believe, MTS is not a cure-all. There are advantages to using MTS in some scenarios, as well as disadvantages. If you approach the development of a new project with an objective view based on an understanding of the technologies involved, then you are more likely to have a successful experience. Failing to understand the issues related to MTS (many of which are shared by COM+) that can have an adverse impact on performance and scalability can lead to frustration, and possibly failure of the project. Some of the most commonly misunderstood facets of developing with MTS are listed below:

MTS Applications Are Inherently More Scalable
Most developers working on n-tier applications are well-versed in coding transactions, either local (managed via a connection object) or via stored procedures. These transactions are normally performed with a transaction isolation level of ‘Read Committed’, which incurs minimal overhead due to contention for database resources. MTS works in conjunction with the Microsoft Distributed Transaction Coordinator (DTC) to manage transactions. In this environment, database connections are automatically enlisted in distributed transactions, which have an isolation level of ‘Read Serializable’. As any skilled developer can attest, serialization is BAD. A more common term is ‘bottleneck’, but no matter how you phrase it the end result is the same?a limit on system throughput at some level of usage.

MTS Will Cut Up to 40% Off of Development Time Due to Eliminating the Need to Code Transactions
The automatic enlistment of connections within distributed transactions is a powerful feature. It allows coordination of transaction boundaries across multiple databases, as well as across multiple components; these are powerful arguments for implementing the DNA framework. However, with distributed transactions comes the afore mentioned issue of serialization of data access. Many many developers have little or no experience with the transaction isolation level of ‘Read Serializable’. If the database schema has not been designed with this in mind, then you will run into performance issues much sooner than expected. The time saved on initial development can be a drop in the bucket when compared to the effort required to salvage a code base built on a schema that was not designed for this environment.

Object Pooling with MTS
No version of MTS running on Windows NT4 has ever implemented object pooling. Period. I realize that this statement contradicts publications by a number of noted “authorities” on MTS, as well as misinformation propagated in the newsgroups, but it is an indisputable fact.

JIT activation / ASAP Deactivation
I recently read an explanation of JIT activation / ASAP deactivation that was so far removed from the actual implementation that I felt a need to address the topic. The root of the confusion stems from Microsoft documents that indicate object pooling as a feature of MTS. As I indicated above, this functionality was never implemented. However, many developers still lack a clear understanding of what JITA actually does under the hood. I will use a simple object instantiation from a client to outline the flow of events. The client instantiates an object running within MTS, which activates the proxy/stub?not the object itself. When the client calls a method of the object, then the Object Context is established, followed by the actual method invocation. If either SetComplete or SetAbort is called then the object will be torn down when the public method goes out of scope, thus resulting in “ASAP deactivation”. Object that are deactivated in this fashion are referred to as “stateless”, as all local data members are lost when the object goes out of scope; the proxy/stub remains, so the client “thinks” it is still connected. Failure to call SetComplete or SetAbort results in the object being “stateful”, as the local storage is maintained due to the actual object (not just the proxy/stub) remains in memory.

Database Connection Pooling
This is also a feature of ODBC (since 3.0) and the Microsoft Data Access Components (MDAC); components can utilize the services with or without MTS. The connection pooling feature is extremely beneficial in an MTS environment that has a fairly consistent rate of usage. However, in systems that experience intermittent usage there are frequently insufficient connections in the pool to satisfy the immediate demand. These connections must be opened against the database (which is an expensive activity) all at once, only for the majority to be discarded in short order. The connection pooling algorithm is very simplistic, and frequently results in the very connection thrashing it was designed to prevent. (As you shall see later, object pooling under COM+ can eliminate the inefficiencies associated with connection pooling.)

MTS Manages Connection Cleanup
If your goal is to be the skipper of the Titanic, then don’t bother to properly close and release the database connections. MTS doesn’t eliminate the need for good programming practices; if anything, it heightens the importance because the components are server-based. Applications running on a server need to be more robust than a desktop application. Rebooting a client workstation is an inconvenience, but the need to frequently reboot a server can be extremely expensive. To minimize resource leaks you should religiously free any resources that you have allocated, especially database connections.

Objects Perform Better When Running Under MTS
This statement qualifies as an urban legend among the Microsoft development community. How this father of all misconceptions came to be accepted as fact is beyond me, as simple logic readily reveals the short-comings of this train of thought. All objects running within MTS are wrapped within an Object Context, which incurs overhead not only for the initial instantiation but for every method call. The amount of overhead varies, with the most expensive (in terms of time) being calls to a remote Server Package that is participating in an MTS managed transaction. The principle goal of middle-ware is not to provide the best performance, but rather two-fold: a) to efficiently share a limited set of resources with a much larger pool of users, and b) to manage the complexity of deployment and system maintenance. For anyone wishing to examine performance characteristics I recommend that you start with WinDNAPerf.exe from the Platform SDK.

Why Benchmark COM+ / Object Pooling?
With all of the long-awaited features of Windows 2000, the top on my list has been COM+ object pooling. Despite all of the changes incorporated into Win2K, the behavior of COM+ as compared to MTS has not changed radically. While the rough edges have been knocked off and performance is a little better, the same underlying issues that impacted the scalability of MTS are still lurking.

After the Windows DNA 2000 Readiness Conference in Denver, I returned to the office with an understanding of the impact that object pooling could offer. This did not stem from spending hours in sessions on the advantages of object pooling, nor even the advantages of COM. If anything, what brought me to this conclusion was learning how the most impressive Microsoft benchmarks actually minimize resource thrashing. Lon Fulton presented tuning techniques employed in the Doculabs benchmarks (Web App Server Shoot-Out, PC Week, July 11th 1999)?the VC++ implementation utilized an ISAPI extension DLL that pre-allocated all database connections and memory buffers; database “transactions” utilized implicit commits in order to minimize overhead. Also, the ODBC API was used (instead of OLE DB or ADO, the data access technologies being pushed by Microsoft) as it provided better throughput.

Note that MTS was used in the VB benchmark, though the components were installed in a Library Package in order to maximize performance. Based on my knowledge of the characteristics of VB components within MTS, I question whether these components were using MTS transactions. It is not likely that the MTS objects could deliver roughly two-thirds (2/3) the throughput of the multi-threaded ISAPI extension if any style transactions (much less distributed) were employed.

Another important factor pointed me toward the potential benefits of object pooling, particularly with transactional components. This was the information contained in the Full Disclosure Report (FDR) of the recently announced TPC-C benchmarks. Once again I found pre-allocation of resources and buffering techniques, along with the ODBC API. In the code listings (see Appendix A of the FDR) I did note that transactions were employed within stored procedures.

All information that I could find pointed to nailed-up resources offering a significant advantage. I realize that this is contradictory to information appearing in MSJ over the last year or so, but I performed an initial round of benchmarks that demonstrated the clear advantage of object pooling.

Initial Test Results?Object Pooling vs. JIT Activation
The tests are based on the Sample Bank projects included with the Platform SDK. Be aware that the Account.VC source leaks connections when utilizing connection pooling. I have provided Microsoft Support with the modified source code that corrects this behavior, with the expectation that the Platform SDK sample will be updated.

For the initial tests I configured the OPBank Client to run 20 iterations of the MoveMoney transaction, with the default setting of random accounts selected. All of the objects were configured with for a minimum of 20 and a maximum of 50. I chose 100 thread increments, as I was anxious to see just how far object pooling could go; I was pleasantly surprised to see that the behavior exceeded my expectations. Table 1 shows the results of the initial round of testing.

 

Thread Count

Pooled Objects w/ Constructed Connections

Pooled Objects w/ Connection Pooling

JIT Activation w/ Connection Pooling

100

36.390

78.891

70.281

200

39.953

136.109

149.484

300

58.016

159.756

** failed **

400

76.860

195.766

– not tested –

500

92.750

269.125

– not tested –

Table 1: Object Pooling vs. JIT Activation (time in seconds)

Notes About the Initial Test Results?Pooled Objects with Constructed Connections.
Clearly this technique provides better throughput than either alternative. I have had a long-standing theory that it is frequently better to delay creation of extra resources when the latency is low. By allowing unchecked growth (of any object) the OS is overwhelmed and the system responsiveness degrades to an unacceptable level.

Pooled Objects with Connection Pooling
I expected a difference in behavior, but to be honest the actual discrepancy between constructed and pooled connections is far greater than anticipated.

JIT Activation with Connection Pooling
The similarity in performance with Pooled Objects w/ Connection Pooling is the first point of real interest, as it demonstrates that the interaction with the connection pool is a bottleneck. The second important point to note about this last test is the failure under load, which resulted in DTC hanging while attempting to abort all 301 transactions (300 attempting to update account information and 1 recording the next receipt value). These results provide more evidence to support my long-standing comments about the ineffectiveness of the connection pooling algorithm, though it took COM+ and an object pooling test case to demonstrate the impact of JITA. This is the first case where MTS / SQL Server produced a fault that is in-line with the issue afflicting Oracle. This behavior is 100% reproducible, and the test case has been furnished to Microsoft Support in order that the DTC issue can be resolved.

Impact of the ‘Use Random Accounts’ Setting
As you have probably noted, there is a glaring 8.6 seconds discrepancy in the total times at the 100 thread count level in the two tests using connection pooling. Each iteration will produce different results, as the contention varies from one test to another based on the account numbers generated.

A Note Regarding MTS vs. COM+
As MTS cannot implement object pooling, the current performance limitations cannot be resolved without building an infrastructure similar to that which Microsoft has delivered with Windows 2000 and COM+ Services. The behavior of MTS is very similar to the characteristics of the JIT Activation with Connection Pooling test, though the results will not be identical.

Without investing a significant amount of resources in building a proprietary infrastructure with a limited life cycle, there is little hope of dramatically improving the scalability of MTS. Because this course of action is impractical, in my opinion MTS on Windows NT 4.0 is a dead end; the future is with COM+.

Comparing Additional Factors
The scalability and performance characteristics of the Sample Bank application are also impacted by differences in the programming language, hardware platform, and the back-end DBMS. This section discusses the response times resulting from changes in each of these three factors.

All tests were driven by the OPBank client, with 20 iterations at the indicated thread count and test type (fixed vs. random accounts). The timing results based on fixed accounts should be weighted more heavily when comparing behavior, as the use of random accounts introduces statistically significant variances. The total number of distributed transactions performed is 1,010 for every 50 threads (for a total of 10,100 at 500 threads).

Note: the random accounts setting more closely simulates a typical real-world environment, but my ultimate goal in testing was to determine the throughput potential when data access is highly serialized. By determining how the DBMS handles contention I can make a better estimate of the behavior that can be expected in large scale DNA applications.

Impact of Language?VB, Delphi, and VC++
The test suite was implemented in the two programming languages that our shop uses most heavily (VB and VC++). In addition, Jamie and I had long been discussing the merits of Delphi in middle-tier development and this was an opportune time to put it to the test.

The choice for the data access technology used with each language was due to multiple factors. I selected the ODBC API as the first implementation because it is the technology used by Microsoft in both the recent TPC-C results and the Doculabs benchmark that was discussed at the Windows DNA 2000 Readiness Conference in Denver, CO in early March. The focus switched to ADO, not because we felt that it would provide either superior performance, but rather due to its ease of integration in DNA applications and wide acceptance among developers using VB / VBA.

The specific versions of the languages used are:

  • Visual Basic 6.0 Enterprise Edition, Service Pack 3
  • Visual C++ 6.0 Enterprise Edition, Service Pack 3
  • Delphi 5 Enterprise Edition, March 2000 Update

Thread Count

Fixed Accounts

Random Accounts

50

19.297

10.188

100

36.750

18.796

150

54.343

28.047

200

71.078

38.171

250

87.703

45.672

300

106.985

54.765

350

124.250

64.500

400

141.641

74.797

450

159.375

79.860

500

176.844

90.484

Table 2: VC++ w/ ODBC API (time in seconds)

Thread Count

Fixed Accounts

Random Accounts

50

19.797

12.625

100

38.469

23.938

150

56.016

31.812

200

75.781

43.563

250

92.610

56.078

300

119.172

64.906

350

138.516

77.688

400

171.187

88.907

450

500

Table 3: Delphi w/ ADO (time in seconds)

Thread Count

Fixed Accounts

Random Accounts

50

53.422

31.547

100

105.750

61.156

150

160.937

97.688

200

211.000

109.422

250

265.859

133.828

300

320.031

164.344

350

372.469

189.954

400

426.297

212.406

450

468.516

238.578

500

516.265

265.375

Table 4: VB w/ ADO (time in seconds)

Figure 1 – Language Comparison  (time in seconds)

Impact of Hardware?Single vs. Dual CPU
The difference in performance between single and dual CPU machines is was so small in initial tests with the Delphi and VC++ objects as to be insignificant. The objects written in Delphi ran to successful completion of 500 threads on the single CPU machine, whereas the same test failed predictably on a dual CPU box.

The pattern of failure pointed to a resource leak, as memory utilization for the instance of dllhost.exe continued to climb while the test was in process. However, once the test subsided the memory used by the dllhost.exe instance returned to the baseline. The VB objects produced a similar pattern in memory consumption, though at a slower rate (most likely due to the 10 threads per CPU limit for STA objects). 

Note: I suspect that the problem is related to ADO, as the VC++ objects coded to the ODBC API exhibited no growth in memory usage.

Thread Count

Fixed Accounts

Dual CPU

Fixed Accounts

Single CPU

Random Accounts

Dual CPU

Random Accounts

Single CPU

50

53.422

56.341

31.547

47.157

100

105.750

113.554

61.156

88.257

150

160.937

166.680

97.688

133.221

200

211.000

219.686

109.422

179.288

250

265.859

279.222

133.828

221.909

300

320.031

342.473

164.344

266.473

350

372.469

395.649

189.954

307.813

400

426.297

464.268

212.406

347.110

450

468.516

556.441

238.578

395.038

500

516.265

628.644

265.375

434.865

Table 5 – VB w/ ADO (time in seconds)

Impact of DBMS?SQL Server vs. Oracle
It should come as no surprise that performance with SQL Server is much better than with Oracle. The difference in behavior is primarily due to the better support that Microsoft products have for OLE transactions, which is a Microsoft standard.

Note that this document is focused on the results of benchmarking distributed transactions. However, I feel that I must point out that Oracle actually performed better in the same test environment than SQL Server when transactions were not managed by DTC. When both were put through a 14 hour endurance test, the Oracle test processed a little over 400,000 more method calls in the same time frame. This equates to 2,000,000 more rows inserted, with a proportional number of additional update, delete and select statements processed.

The performance of Oracle with MTS / COM+ could be better, though at least the stability is improved under COM+. The gradual leak of Oracle sessions that has been an on-going problem under MTS still occurs, though not as rapidly as with MTS on NT4. A number of other factors work to improve this, such as MDAC 2.5 and patches to DTC.

However, sustained operation at high load levels will still eventually lead to all Oracle sessions being held open but not being reused (marked ‘INACTIVE’ in v$session). This problem appears to be related to interaction between connection pooling and the Microsoft Distributed Transaction Coordinator; the problem is exaggerated by JIT activation.

Thread Count

SQL Server

Fixed Accounts

Oracle

Fixed Accounts

50

53.422

183.609

100

105.750

302.328

150

160.937

441.141

200

211.000

584.016

250

265.859

722.968

300

320.031

902.438

350

372.469

1071.921

400

426.297

1188.797

Table 6: VB w/ ADO (time in seconds)

We’ve learned a few things about performance and scalability of the middle-tier components, confirming some theories while disproving others. We have also debunked some of the popular myths surrounding MTS and the doubts of noted authorities about the potential benefits of object pooling. Now it is time to practice what has been learned about optimizing middle-tier performance with pooled objects.

I recommend that you start with the Sample Bank application in the Platform SDK, as it is fairly easy to get up and running. Before implementing real-world solutions however, I also think you should take the time to explore the sample source code included with the Windows DNA Performance toolkit.

Good luck!

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: