RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Clustering at the JVM Level to Maintain Business Logic Integrity : Page 3

The typical three-tier architecture keeps the code Java developers need for clustering inside the business logic, making clustering a real chore. Clustering at the JVM level makes Java applications easier to write and cheaper to run.


Solution: The Clustered JVM

The simplest solution to the challenge of delivering predictable capacity, high availability, and scalability—all without impacting business logic—is to move clustering concerns out of the application layer down into the JVM. That is, to cluster the JVM.

To do so, Terracotta extends the Java thread and memory models such that all threads in the entire cluster signal and share data with each other as if they were all in the same logical virtual machine. Shared objects have a unique, cluster-wide identity. No deserialized copies of shared objects are lying around. If a shared object is present in a particular JVM in the cluster, all references to that object refer to exactly the same object in the heap of that JVM.

In a clustered JVM, synchronization and calls to wait() and notify() apply for all threads cluster-wide. When a thread acquires a cluster-wide lock on a shared object, it is assured that all changes from other threads in the cluster made under that same shared object's monitor are visible locally.

Because a clustered JVM extends built-in JVM facilities cluster-wide, it is semantically equivalent to a single JVM. A clustered JVM allows plain Java applications to be deployed in a clustered environment, free from the distortions caused by tools that try to provide clustering behavior to the application itself. A clustered JVM can deliver clustering underneath the application, keeping infrastructure concerns completely separate from business logic.

Let's look at an example. The following code is an extremely simplistic Java spreadsheet program that accepts only data entry. It cannot do math and makes no assumptions about the cell contents, but it works well as a sample application. You can copy it into an editor and compile it, and it will run:

Sample App: simple JTable Demo

package demo.jtable;
import javax.swing.JFrame;
import javax.swing.JScrollPane;
import javax.swing.JTable;
import javax.swing.table.DefaultTableModel;
class TableDemo extends JFrame {
  private DefaultTableModel model; // Shared object
  private static Object[][] tableData = {
      { " 9:00", "", "", ""}, { "10:00", "", "", ""}, { "11:00", "", "", ""},
      { "12:00", "", "", ""}, { " 1:00", "", "", ""}, { " 2:00", "", "", ""},
      { " 3:00", "", "", ""}, { " 4:00", "", "", ""}, { " 5:00", "", "", ""}
  TableDemo() {
    super("Table Demo");
    setSize(350, 220);
    Object[] header = {"Time", "Room A", "Room B", "Room C"};
    model = new DefaultTableModel(tableData, header);
    JTable schedule = new JTable(model);
    getContentPane().add(new JScrollPane(schedule), java.awt.BorderLayout.CENTER);
  public static void main(String[] args) {
    new TableDemo().setVisible(true);

With a clustered JVM, you need only locate the data/domain model and share it. The cooperation across JVMs and the sharing of object data (state) is abstracted from the spreadsheet's main purpose, which is to be a simple data-entry tool.

Clustered JVM Performance
Existing clustering solutions follow a "copy to the cluster on change" paradigm, where whole object graphs must be copied around the cluster when data changes. Conversely, clustered JVMs push only the data that changes—and only to those participating JVMs that happen to have the relevant objects in heap. Also, because object identity is preserved cluster-wide, no extra copies of objects are made.

For example, a load balancer may send a Web session to two different machines in a 10-machine cluster. Since only those two machines ever access this session, only those two machines will ever see changes for objects in that session. If only one field in the entire object graph of that session changes, only the data for that field is updated on the other machine. The other eight machines are spared the effort of handling object updates for objects they will never see—until, for example, the load balancer is reconfigured and requests for that session are diverted to other machines.

Clustered JVM Simplicity
Because clustered JVMs do not change Java semantics, applications can be kept simple—free from deployment concerns. Developers can write standard multi-threaded Java or use a framework that is itself multi-threaded (e.g., application servers), concentrating only on business logic.

In the example spreadsheet application, the field named "model" should be clustered. Since all data in the cells of the spreadsheet are hanging off the model object, it is a conceptual "root" for all data that needs to be shared. The clustered JVM shares this root. Why is this sufficient for clustering? It seems almost too simple. Swing's MVC nature allows developers to share the model (the "M" in MVC), and the framework itself then takes care of all keyboard and mouse inputs and calls back to paint methods, etc. to redraw the screen. Since the JVM is clustered in a cooperative fashion, the signals and methods that act upon the model and view are fired cluster-wide, which can lead to cluster-wide screen redraw events, for example.

So in the spreadsheet example, thanks to model-view-controller, clustering the model is both simple and sufficient to turn the single JVM spreadsheet into a clustered one.

Plug-in Capacity and Availability
With business logic separated from infrastructure logic, developers can build more applications faster. And because clustering behaviors aren't built in with the application, they can be made consistent across applications, thus simplifying the operations of Java applications in production.

As a result, capacity and availability become synonymous with clustering. Capacity can be predicted as follows:

  1. Measure the performance of an application on a single node. For example, suppose the application's maximum performance is Tmax, denoting the most transactions a single machine can perform per unit of time.
  2. Deploy a clustered JVM, and start a second server running the same application. The two servers are now working in concert as one, thanks to the clustered JVM.
  3. To measure the overhead of the clustered JVM, you should be able to take the performance for two machines (T2, or generically Tn for n machines in a cluster) and relate it to that of one machine (Tmax) as follows:
    Overhead = 1 - ((Tn) / n) / Tmax
  4. Now you can derive capacity from Overhead by taking the business demand as transactions per unit of time (call it Tr for "required transactions per time") as follows:
    Capacity = Tr / Tmax x (1 + Overhead). 

    Availability is "n + 1", meaning that there is no single point of failure and that you achieved such an architecture without the buddy system. (Buddy system redundancy is not "n + 1" but rather Active/Passive redundancy.)

In one sense, all the math is unnecessary because the equations simply are indicating that Overhead is a constant with clustered JVMs. And this is true as long as the clustering use case does not require all data to be resident in all JVMs simultaneously and entire objects are not changed at all times.

An example of this degenerate case where the clustered JVM performs at the same level as all other clustering tools is a simple application that creates objects only on one JVM and reads them on another. Every object is new, it is read every time, and it needs to be accessed. As a result, all data in this case moves between every node. However, in real-world use, the degenerate case is actually pretty far from reality (see Sidebar 2. A Case Study: NTT Results from Terracotta Clustered JVM Solution).

Ari Zilka is founder and CEO of Terracotta, a developer of solutions for Java scalability.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date