o one likes to wait, especially your customers. If you can’t keep them satisfied, your competitors will. This leads many developers to spend countless hours trying to optimize the performance of their applications. Fortunately for Java developers, many tools and techniques have been developed that facilitate this effort, from graphical integrated development environments with interactive profiling and debugging capabilities to language features, such as multi-threading, that simplify interactive programming.
When using a database management system from Java, however, the landscape isn’t as clearly marked. How do you know if performance is being limited by your application, the database, or your JDBC driver?
Simply put, when using a database, your application must use a JDBC driver in addition to the database system?all communicating over the network. This means that multiple elements besides your code affect your application’s performance. These include:
- The database system,
- Your Structured Query Language (SQL) statements, and
- The underlying network and database hardware.
Often overlooked in this sea of potential pitfalls, yet vitally important, is the JDBC driver itself?because, as you will see, all JDBC drivers are not created equal.
JDBC Drivers?Simple! Or Are They?
The basic tenants of using JDBC to communicate with a database are rather simple: register the driver, create a connection, execute a SQL statement, and process the results. This simplicity often leads programmers to think they know JDBC, when they have in reality barely scratched the surface. Any car can get you between point A and point B, but we all know that not all cars are the same?certain cars make the trip faster (and more enjoyably).
Of course, despite their similarity in functionality, automobile manufacturers distinguish their products by utilizing unique features that improve security, safety, or performance. The same is true for JDBC drivers, where different vendors bring different backgrounds and experiences to the table when interpreting the JDBC specification. Unlike the situation for automobiles, however, the JDBC specification provides a great deal of leeway to driver vendors. In fact, large parts of the specification are optional, leading to the creation of many JDBC drivers that accomplish the basics at the expense of degrees of performance and stability.
Take, For Example, Database Transactions
One of the most important areas where this is evident is with database transactions. By default, JDBC applications utilize autocommit mode, in which every SQL statement is executed in its own transaction. While easier on the vendor writing the JDBC driver, this severely impacts performance, due to the implicit database overheads in creating and completing a transaction. In a multi-user environment, such as an e-commerce site, an additional complication arises due to the visibility of a query.
A given transaction, and therefore a given JDBC application, is always aware of any changes that it makes to the underlying data. But what about modifications made by other transactions? Tackling this issue involves balancing concurrency with reliability and properly utilizing different levels of database locking.
Dirty Reads, Nonrepeatable Reads, and Phantom Reads
Controlling data consistently through database locking generally means deciding whether to allow dirty reads, nonrepeatable reads, or phantom reads. A dirty read occurs when the results in one transaction are modified by another uncommitted transaction, which is later discarded. The data in the first transaction is invalid, or “dirty.” A nonrepeatable read occurs when one transaction repeatedly retrieves data, while a different transaction alters the underlying data. This causes different, or nonrepeatable, results to be read by the first transaction. A phantom read is similar to a nonrepeatable read, except that the number of rows is changed either by insertion or deletion, so that the first transaction contains “phantom” data.
Fortunately, the JDBC specification allows developers to decide how transactions should interact by passing parameters to the appropriate constructor during the creation of Connection and ResultSet objects. Transactions also can now utilize savepoints, which reduce transaction overhead by providing checkpoints for saving or ignoring modifications during a long-running transaction. The specification also provides for batch processing and updateable ResultSet objects, which also affect performance and transaction behavior. Of course, how a JDBC driver implements these features?if it even does?greatly affects the performance of your application.
Handling Network Communication
Another area where the performance of JDBC drivers differs is in handling network communication. Primarily this arises when the JDBC driver transfers the results of a query from the database server to the Java application. Depending on the needs of the application, a developer might only want a few rows transferred, such as in a search engine, or all the rows transferred, such as in a billing service. The JDBC specification allows a developer to specify the order and number of rows that are fetched by the driver via the setFetchDirection and setFetchSize methods in the Statement interface. It is up to the driver vendor, however, in how, or even if, to implement these features.
Choosing a Vendor?Easier Than You’d Think
Using the car analogy, a car’s performance can be improved by a knowledgeable mechanic. In fact, the first step taken by a good mechanic is to obtain as much diagnostic information as possible. The same is true when using a JDBC driver. The beauty of Java specifications, like JDBC, is that different vendors provide competing implementations, which can and generally do work in different ways.
By now, most developers understand that Type 4 JDBC drivers generally offer the highest performance. But how do you choose a JDBC driver? Many developers (incorrectly) assume that a database vendor provides the optimal JDBC driver; after all, they wrote the database so they should be able to write the best JDBC driver for their database. This is not the correct line of reasoning because a JDBC driver is in reality a middleware product, not a database product.
A Diagnostic Tool Helps Too
One of the most recognized anecdotes in the computer industry is the 80-20 rule. When it comes to the performance of JDBC applications, this rule states that 80 percent of performance bottlenecks are due to either the actual JDBC application or how the application interacts with the JDBC driver, while 20 percent is due directly to the JDBC driver. Telling the difference can be difficult, even for the most advanced JDBC developer. What you need is the right diagnostic information, in order to know whether your performance is hindered by your SQL statements or locking convention.
Many driver vendors also provide a variety of diagnostic tools that allows a developer to “spy” on a JDBC driver, even when it is used by other users. In this way they can see what is truly going on “under the hood,” and identify bottlenecks, whether they are in the JDBC code or the driver.