he way a typical three-tier architecture separates the concerns of data-management logic, business logic, and presentation logic makes clustering, the practice of deploying a single application on multiple machines, a laborious and expensive task for Java developers. (See Sidebar 1. The Typical Three-Tier Architecture.) With the separation structured the way it currently is, business logic currently includes the code required for clustering. Without a clustering plug-in at runtime, developers are left to cluster their own business logic, the frameworks they use, and in some cases, the container in which it all runs.
Even in the simplest case of load-balanced Web applications with session state and servlet architectures, the application servers that cluster session are expensive and the clustering implementation impacts the business logic by violating the language's object orientedness. Java applications would be easier to write and cheaper to run if the Java Virtual Machine (JVM) could be extended through a plug-in, allowing multiple machines to cooperate as one. Clustering at the JVM level this way provides the Java developer with benefits that can drastically speed up his or her work. How drastically? Think crossing the United States on an airplane versus in a wagon train (see Sidebar 2. A Case Study: NTT Results from Terracotta Clustered JVM Solution).
Separating explicit knowledge of the deployment footprint or infrastructure logic from business logic (e.g., using JMS or databases to share serialized object state) presents another leap in the simplification of enterprise application development. Consider the challenge of scale-out. Today, it is an explicit deployment footprint characterized by many inexpensive servers running together under a single application, and developers "code" to it as a model. Scale-out should be delivered both as part of the infrastructure and as a runtime service. Allowing the deployment footprint to be determined at production time will lead to faster development, more robust applications, and greater performance.
This article explains the clustered JVM approach and describes its impact on Java from the developer's point of view (with code samples). It also illustrates the resulting performance, based on real world testing.
All clustering tools can be analyzed across the following functional dimensions:
Synchronize(), wait()/notify(), and join() should all work across the JVM boundary. Otherwise, the clustering tool is less than clustered-shared memory; two processes cannot cooperate on memory operations. The two processes must inform each other by a separate signaling mechanism when changes to memory are needed. If executing cooperative clustering of memory requires a separate signaling mechanism, the business logic is doubly impacted in that the tool requires explicit integration to the code, and then the separate signaling tool now requires more custom infrastructure logic.Table 1 illustrates the current state of the art in clustering and how the strengths and weaknesses of each tool lead to a resulting impact on the business logic.
| Scalable | Available | Not Serialization-Based | Cooperation | Impact on Business Logic |
|
|---|---|---|---|---|---|
| JMS | X | High | |||
| Database | High | ||||
| JGroups | X | High | |||
| Custom API | X | X | High | ||
| App Server | X | X | Medium | ||
| Clustered JVM | X | X | X | X | Low |
| Table 1. State of the Art Clustering Solutions |
Java Messaging Service (JMS) provides a wrapper on top of classic message-queuing services. Developers often send messages between JVMs on machines, published on a "clustering" topic where all instances listen and learn about objects, transactions, and other state through specific sharing of the data. Developers must integrate JMS into the business logic to share information across JVMs. This integration is commingled in the application, making the application's intent hard to decipher as more and more lines of code become about clustering instead of remaining purely business.
The JMS approach provides high availability by sharing critical information across machines and JVMs. However, it sends all information to all JVMs and will bottleneck on the network long before the business logic taxes the CPU. Hence, JMS delivers availability without scalability, and it has a negative impact (serialization and lack of cooperation) on business logic.
Databases can store serialized Java objects under a unique ID in the database. That ID is usually a session ID. This scheme can be used to store cached data without an OR mapper. Session is a prime example. Databases act as a central data hub for all objects and ensure transactional updates. This leads to a stable application, but capacity is bound by the database, as is availability. So, it delivers neither predictable capacity nor high availability (without clustering the database, of course).
The database approach provides high availability by storing data in a highly available database server. However, it sends all information to the database and will bottleneck on the DB server long before the business logic would have taxed the CPU. Just like JMS, databases deliver availability without scalability, and they have a negative impact (serialization and lack of cooperation) on business logic.
JGroups, according to its own site, "is a toolkit for reliable multicast communication." (Note that this doesn't necessarily mean IP Multicast; JGroups can also use transports such as TCP).
It can be used to create groups of processes whose members can send messages to each other. The main features include the following:
As is hopefully evident by its description, JGroups would be used much in the same way JMS would be used when clustering applications. Objects would get serialized in any one JVM and sent as a message to all other JVMs. Because of the similarities, you can guess what JGroups' clustering [dis]advantages are.
Custom API solutions are most easily characterized as a "shared bucket" of data or clustered shared memory. They may not use Java native serialization, but they still copy data between the JVM's natural heap and the bucket in order to move data across machines. These solutions impact the application in all the same ways as JMS, databases, or JGroups do. The main difference is that these custom tools are built to be scalable and designed to have no single point of failure (i.e., any one machine loss does not constitute a loss of data). So, while custom solutions impact business logic as much as other solutions, they can deliver good operating characteristics.
Application servers such as BEA's WebLogic leverage the notion of sticky load balancers and share objects between two machines, regardless of the size of the cluster. This is generically referred to as the buddy system. Most vendors are now using custom solutions or JGroups to implement the buddy system architecture and are starting to provide capacity and availability as long as a load balancer can be used. Again, buddy systems are a viable option, but they still impact the business logic.
No matter which solution you use for clustering, you have to change the business logic in order to address the impact of serializationhaving multiple copies of objects floating around on many machines. And, while this impact may be acceptable for a single, small application, most businesses have some front-office, back-office, and partner-integration applications, each of which is usually running on Java.
To do so, Terracotta extends the Java thread and memory models such that all threads in the entire cluster signal and share data with each other as if they were all in the same logical virtual machine. Shared objects have a unique, cluster-wide identity. No deserialized copies of shared objects are lying around. If a shared object is present in a particular JVM in the cluster, all references to that object refer to exactly the same object in the heap of that JVM.
In a clustered JVM, synchronization and calls to wait() and notify() apply for all threads cluster-wide. When a thread acquires a cluster-wide lock on a shared object, it is assured that all changes from other threads in the cluster made under that same shared object's monitor are visible locally.
Because a clustered JVM extends built-in JVM facilities cluster-wide, it is semantically equivalent to a single JVM. A clustered JVM allows plain Java applications to be deployed in a clustered environment, free from the distortions caused by tools that try to provide clustering behavior to the application itself. A clustered JVM can deliver clustering underneath the application, keeping infrastructure concerns completely separate from business logic.
Let's look at an example. The following code is an extremely simplistic Java spreadsheet program that accepts only data entry. It cannot do math and makes no assumptions about the cell contents, but it works well as a sample application. You can copy it into an editor and compile it, and it will run:
Sample App: simple JTable Demo
package demo.jtable;
import javax.swing.JFrame;
import javax.swing.JScrollPane;
import javax.swing.JTable;
import javax.swing.table.DefaultTableModel;
class TableDemo extends JFrame {
private DefaultTableModel model; // Shared object
private static Object[][] tableData = {
{ " 9:00", "", "", ""}, { "10:00", "", "", ""}, { "11:00", "", "", ""},
{ "12:00", "", "", ""}, { " 1:00", "", "", ""}, { " 2:00", "", "", ""},
{ " 3:00", "", "", ""}, { " 4:00", "", "", ""}, { " 5:00", "", "", ""}
};
TableDemo() {
super("Table Demo");
setSize(350, 220);
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
Object[] header = {"Time", "Room A", "Room B", "Room C"};
model = new DefaultTableModel(tableData, header);
JTable schedule = new JTable(model);
getContentPane().add(new JScrollPane(schedule), java.awt.BorderLayout.CENTER);
}
public static void main(String[] args) {
new TableDemo().setVisible(true);
}
}
With a clustered JVM, you need only locate the data/domain model and share it. The cooperation across JVMs and the sharing of object data (state) is abstracted from the spreadsheet's main purpose, which is to be a simple data-entry tool.
For example, a load balancer may send a Web session to two different machines in a 10-machine cluster. Since only those two machines ever access this session, only those two machines will ever see changes for objects in that session. If only one field in the entire object graph of that session changes, only the data for that field is updated on the other machine. The other eight machines are spared the effort of handling object updates for objects they will never seeuntil, for example, the load balancer is reconfigured and requests for that session are diverted to other machines.
In the example spreadsheet application, the field named "model" should be clustered. Since all data in the cells of the spreadsheet are hanging off the model object, it is a conceptual "root" for all data that needs to be shared. The clustered JVM shares this root. Why is this sufficient for clustering? It seems almost too simple. Swing's MVC nature allows developers to share the model (the "M" in MVC), and the framework itself then takes care of all keyboard and mouse inputs and calls back to paint methods, etc. to redraw the screen. Since the JVM is clustered in a cooperative fashion, the signals and methods that act upon the model and view are fired cluster-wide, which can lead to cluster-wide screen redraw events, for example.
So in the spreadsheet example, thanks to model-view-controller, clustering the model is both simple and sufficient to turn the single JVM spreadsheet into a clustered one.
As a result, capacity and availability become synonymous with clustering. Capacity can be predicted as follows:
Tmax, denoting the most transactions a single machine can perform per unit of time.T2, or generically Tn for n machines in a cluster) and relate it to that of one machine (Tmax) as follows:
Overhead = 1 - ((Tn) / n) / Tmax
Overhead by taking the business demand as transactions per unit of time (call it Tr for "required transactions per time") as follows:
Capacity = Tr / Tmax x (1 + Overhead).
Availability is "n + 1", meaning that there is no single point of failure and that you achieved such an architecture without the buddy system. (Buddy system redundancy is not "n + 1" but rather Active/Passive redundancy.)
In one sense, all the math is unnecessary because the equations simply are indicating that Overhead is a constant with clustered JVMs. And this is true as long as the clustering use case does not require all data to be resident in all JVMs simultaneously and entire objects are not changed at all times.
An example of this degenerate case where the clustered JVM performs at the same level as all other clustering tools is a simple application that creates objects only on one JVM and reads them on another. Every object is new, it is read every time, and it needs to be accessed. As a result, all data in this case moves between every node. However, in real-world use, the degenerate case is actually pretty far from reality (see Sidebar 2. A Case Study: NTT Results from Terracotta Clustered JVM Solution).
| DevX is a division of Jupitermedia Corporation © Copyright 2007 Jupitermedia Corporation. All Rights Reserved. Legal Notices |