he current enterprise Java stack contains facilities for clustering the application server and application frameworks. To the consternation of enterprise Java developers, and consequently, to operators, application state objects lack similar facilities. They don't have a simple, fast, and reliable clustering mechanism.
Older approaches for maintaining application state favored putting state objects into a session or application context. However, past attempts at clustering those contexts have proven unsuccessful, especially for deployment beyond two nodes. So the tide has turned away from clustering state data in plain old Java objects (POJOs) and toward a "stateless" model, where the application state is externalized from the Java heap entirely. Technologies such as Hibernate, JDO, and even raw JDBC enable this model. While this approach may work well for marshalling business data back and forth between the application server and the system where the data lives, it is clumsy and difficult to use. It also scales poorly when clustering data that represents application state.
This article proposes a simple alternative: clustering POJOs at the JVM level. It discusses what JVM-level clustering is, how it works, and how it performs. It also explores how developers can leverage POJO clustering in order to solve real day-to-day problems with existing frameworks, as well as how POJO clustering might form the core of a POJO containera new application container to provide enterprise-class operations based on simple user library and JDK library objects.
How POJO Clustering Works
POJOs recently have adopted a false definition. It now seems any object that (1) is not a Bean, (2) does not have a management context, and (3) can compile using imported libraries and an off-the-shelf Java compiler is considered a POJO. But objects are not POJOs just because an application can invoke a library or framework and those invocations successfully compile into the application bytecode. Somehow, however, the vendor community has managed to usurp the POJO momentum from Aspect Oriented Programming and repurpose it to describe a subjective lack of complexity to the interfaces that an application must implement.
Whether or not an object is a POJO actually refers to the ability of a library or framework to maintain the semantics of the Java language specification. POJOs are plain objects that have primitive fields and references to other plain old objects. You should be able to access their data through regular references and lock on them through the built-in concurrency primitives. Any two references to the same logical object should actually point to the same object in the heap. A library or framework exhibiting any other behavior is not POJO based.
Maintaining the simple, built-in behavior of objects across a cluster has proven an elusive goal. That is, until now. How, then, does POJO clustering work? How do you deliver the same semantics to operations between threads in different virtual machines that you get between threads in the same virtual machine? The answer is by dropping clustering services down from the API level (EJB, JDO, Hibernate, etc.) to the JVM level. Investing the JVM with clustering services at its core bytecode interpretation layer yields simple and seamless clustering behavior. By virtualizing the JVM heap and thread coordination calls, clustering becomes a natural extension of the virtual machine. Once that's done, clustering everything else becomes easier.
The following are the three core components to JVM-level clustering:
- Shared virtual heap
- Object identity maintenance
- Clustered locking
Let's examine each of these components.
The first step to heap virtualization is intercepting access between threads and the local heap and injecting cluster-awareness. This is done by reinterpreting the bytecode instructions used to read from and write to the heap. The physical VM heap data is kept consistent with the virtual clustered heap data across lock boundaries with the same semantics used to keep local thread memory consistent with heap data.
As a thread changes the fields of a clustered object, those changes are sent up to the main memory as the thread crosses synchronization boundaries. When POJO clustering is enabled, those changes are also sent up to the clustered heap.
Conversely, changes made by other threads in other VMs in a particular lock context are guaranteed to be applied prior to crossing the same synchronization boundary. This ensures that the objects in the local heap are always up to date with the clustered heap when a thread operates on them.
A singular advantage of clustering at the JVM level is the preservation of object identity across the cluster. As mentioned previously, true POJO clustering must preserve the semantics of the single virtual machine. Among other semantic features, object identity must be preserved as illustrated in Figure 1.
If Cain is sent across servers in a cluster and Abel is sent later, then both must point back at the same instance of Adam. Otherwise, Cain's Adam is not equal to Abel's Adam on one server, whereas the originating server contained only one Adam. In a true POJO cluster, this statement must always be true:
cain.father == abel.father == adam
In most clustering frameworks, Adam, Cain, and Abel cannot refer to each other using native Java references/pointers. They must instead refer to each other by having Cain and Abel remember Adam only by ID. If objects reference each other by ID, then the issue of copies and clones of Adam floating around the cluster is avoided. Although this is a violation of the core Java specification, in fact, most frameworks in use today carry it out (including Hibernate, OSCache, EhCache, clustered servlet containers, and proprietary grid solutions).
You may think: if developers are living with it, then what's the issue? There actually are several issues, including the following:
- Performance degrades when Adam is sent along the network with Cain and/or Abel.
- Third-party code cannot be clustered unless it provides explicit support for the given clustering methodology being used.
- Basic data structures can no longer be used.
Figure 2 illustrates the Adam/Cain/Abel problem in real-world terms.
|Figure 2. A Real-World Case of the Adam/Cain/Abel Problem|
In Figure 2, Account Positions refers to a centralized metadata description of all products. That metadata should only be referenced and not copied. That way, it can be changed centrally and the application can avoid redundant data storage. Serialization makes it impossible for a developer to use Java references and still have the application behave as depicted. Thus, a developer tears the domain model apart and manually maintains an ID-mapping between data structures, just like a database developer would across tables.
To be truly powerful, the POJO container must be able to cluster data structures without copying the structures and subsequently violating all object relationships across those structures.
Clustered thread coordination is achieved by reinterpreting the thread coordination mechanisms built into the Java language spec and the JVM: the synchronized keyword (the MONITORENTER and MONITOREXIT bytecode instructions) and the methods Object.wait(), Object.notify(), and Object.notifyAll().
MONITORENTER is extended to contend for the cluster-wide lock for a given object in addition to the local VM lock for that object. Likewise, MONITOREXIT releases both the local VM lock and the cluster-wide lock for that object. Object.wait(), Object.notify(), and Object.notifyAll() are similarly extended to have a cluster-wide meaning as well as a local VM meaning.
Through these mechanisms, cluster-wide thread coordination can be provided at the JVM-level, preserving the existing semantics of thread coordination but extending to interactions between all threads in the cluster.