he current enterprise Java stack contains facilities for clustering the application server and application frameworks. To the consternation of enterprise Java developers, and consequently, to operators, application state objects lack similar facilities. They don’t have a simple, fast, and reliable clustering mechanism.
Older approaches for maintaining application state favored putting state objects into a session or application context. However, past attempts at clustering those contexts have proven unsuccessful, especially for deployment beyond two nodes. So the tide has turned away from clustering state data in plain old Java objects (POJOs) and toward a “stateless” model, where the application state is externalized from the Java heap entirely. Technologies such as Hibernate, JDO, and even raw JDBC enable this model. While this approach may work well for marshalling business data back and forth between the application server and the system where the data lives, it is clumsy and difficult to use. It also scales poorly when clustering data that represents application state.
This article proposes a simple alternative: clustering POJOs at the JVM level. It discusses what JVM-level clustering is, how it works, and how it performs. It also explores how developers can leverage POJO clustering in order to solve real day-to-day problems with existing frameworks, as well as how POJO clustering might form the core of a POJO container?a new application container to provide enterprise-class operations based on simple user library and JDK library objects.
How POJO Clustering Works
POJOs recently have adopted a false definition. It now seems any object that (1) is not a Bean, (2) does not have a management context, and (3) can compile using imported libraries and an off-the-shelf Java compiler is considered a POJO. But objects are not POJOs just because an application can invoke a library or framework and those invocations successfully compile into the application bytecode. Somehow, however, the vendor community has managed to usurp the POJO momentum from Aspect Oriented Programming and repurpose it to describe a subjective lack of complexity to the interfaces that an application must implement.
Whether or not an object is a POJO actually refers to the ability of a library or framework to maintain the semantics of the Java language specification. POJOs are plain objects that have primitive fields and references to other plain old objects. You should be able to access their data through regular references and lock on them through the built-in concurrency primitives. Any two references to the same logical object should actually point to the same object in the heap. A library or framework exhibiting any other behavior is not POJO based.
Maintaining the simple, built-in behavior of objects across a cluster has proven an elusive goal. That is, until now. How, then, does POJO clustering work? How do you deliver the same semantics to operations between threads in different virtual machines that you get between threads in the same virtual machine? The answer is by dropping clustering services down from the API level (EJB, JDO, Hibernate, etc.) to the JVM level. Investing the JVM with clustering services at its core bytecode interpretation layer yields simple and seamless clustering behavior. By virtualizing the JVM heap and thread coordination calls, clustering becomes a natural extension of the virtual machine. Once that’s done, clustering everything else becomes easier.
The following are the three core components to JVM-level clustering:
- Shared virtual heap
- Object identity maintenance
- Clustered locking
Let’s examine each of these components.
Clustered Heap
The first step to heap virtualization is intercepting access between threads and the local heap and injecting cluster-awareness. This is done by reinterpreting the bytecode instructions used to read from and write to the heap. The physical VM heap data is kept consistent with the virtual clustered heap data across lock boundaries with the same semantics used to keep local thread memory consistent with heap data.
As a thread changes the fields of a clustered object, those changes are sent up to the main memory as the thread crosses synchronization boundaries. When POJO clustering is enabled, those changes are also sent up to the clustered heap.
Conversely, changes made by other threads in other VMs in a particular lock context are guaranteed to be applied prior to crossing the same synchronization boundary. This ensures that the objects in the local heap are always up to date with the clustered heap when a thread operates on them.
Object Identity
A singular advantage of clustering at the JVM level is the preservation of object identity across the cluster. As mentioned previously, true POJO clustering must preserve the semantics of the single virtual machine. Among other semantic features, object identity must be preserved as illustrated in Figure 1.
Figure 1. Object Identity Must Be Preserved |
If Cain is sent across servers in a cluster and Abel is sent later, then both must point back at the same instance of Adam. Otherwise, Cain’s Adam is not equal to Abel’s Adam on one server, whereas the originating server contained only one Adam. In a true POJO cluster, this statement must always be true:
cain.father == abel.father == adam
In most clustering frameworks, Adam, Cain, and Abel cannot refer to each other using native Java references/pointers. They must instead refer to each other by having Cain and Abel remember Adam only by ID. If objects reference each other by ID, then the issue of copies and clones of Adam floating around the cluster is avoided. Although this is a violation of the core Java specification, in fact, most frameworks in use today carry it out (including Hibernate, OSCache, EhCache, clustered servlet containers, and proprietary grid solutions).
You may think: if developers are living with it, then what’s the issue? There actually are several issues, including the following:
- Performance degrades when Adam is sent along the network with Cain and/or Abel.
- Third-party code cannot be clustered unless it provides explicit support for the given clustering methodology being used.
- Basic data structures can no longer be used.
Figure 2 illustrates the Adam/Cain/Abel problem in real-world terms.
Figure 2. A Real-World Case of the Adam/Cain/Abel Problem |
In Figure 2, Account Positions refers to a centralized metadata description of all products. That metadata should only be referenced and not copied. That way, it can be changed centrally and the application can avoid redundant data storage. Serialization makes it impossible for a developer to use Java references and still have the application behave as depicted. Thus, a developer tears the domain model apart and manually maintains an ID-mapping between data structures, just like a database developer would across tables.
To be truly powerful, the POJO container must be able to cluster data structures without copying the structures and subsequently violating all object relationships across those structures.
Clustered Locking
Clustered thread coordination is achieved by reinterpreting the thread coordination mechanisms built into the Java language spec and the JVM: the synchronized keyword (the MONITORENTER and MONITOREXIT bytecode instructions) and the methods Object.wait(), Object.notify(), and Object.notifyAll().
MONITORENTER is extended to contend for the cluster-wide lock for a given object in addition to the local VM lock for that object. Likewise, MONITOREXIT releases both the local VM lock and the cluster-wide lock for that object. Object.wait(), Object.notify(), and Object.notifyAll() are similarly extended to have a cluster-wide meaning as well as a local VM meaning.
Through these mechanisms, cluster-wide thread coordination can be provided at the JVM-level, preserving the existing semantics of thread coordination but extending to interactions between all threads in the cluster.
How POJO Clustering Scales
At enterprise scale where application infrastructure will be called upon to move large volumes of data in a cluster of hundreds of application servers, as well as handle disaster recovery and datacenter-level fault tolerance, critical performance and availability features emerge from POJO clustering.
POJO clustering scales by increasing locality of reference, reducing the working set of data upon which any cluster node operates, and enabling on-the-fly lock optimization. All of these features reduce the number of messages that must be passed around the cluster as well as the amount of data those messages must contain.
Locality of Reference
The term locality of reference here means where data is kept and accessed in a system relative to its home in the system of record for that data and relative to the worker operating on that data. In a cluster that can maintain a centralized view of which nodes have references to which objects, object data can be pushed where it’s needed, when it’s needed. This can reduce the latency perceived by each node because the data it needs is already there. If cluster nodes have a mechanism for maintaining a constrained window on the clustered data, the cluster can likewise reduce the amount of data that must move across the cluster, since only the nodes that actually need to see changes for particular objects are required to receive changes for those objects.
Networked Virtual Memory
This data windowing functionality is provided by networked virtual memory. As references to clustered POJOs are traversed by an application running in the container, they are automatically faulted in from the clustered object store by the container as needed. If the referenced object is already in the heap, the reference traversal is a normal GETFIELD bytecode instruction. If that object is not currently in the heap, the container faults the object into the heap automatically. Once the object has been pulled into the container, the GETFIELD instruction is allowed to proceed normally.
Conversely, as in-heap clustered objects become less frequently used, the POJO container is free to flush those objects out of the heap. If they are ever needed again, they may be faulted back in, but flushing unused objects keeps heap usage to a fixed size.
While a clustered object is in the heap, that container instance must subscribe to changes for that object to ensure that it is up to date (this happens automatically as a service of the container). Therefore, keeping the virtual memory window constrained to just those objects that the container needs drastically improves performance since it isn’t processing updates to objects that it doesn’t care about.
Any object access patterns that emerge at runtime from the application will automatically be reflected in the set of clustered objects that is resident in a particular container’s heap. Windowing via dynamic, network-attached virtual memory provides a simple and efficient way to provide access to a working set of data, thereby keeping per-container heap usage under control and maintaining a high degree of locality of reference that can improve over time as the working set emerges.
Scalable Lock Performance
The same centralized view of the cluster that allows locality of reference to be optimized can also be used to optimize lock acquisition. A lock that is not contended for by other nodes can be granted to the node that wants it so that subsequent releases and acquisitions of that lock can be made locally instead of going out on the cluster. If there is no contention on that lock by any other thread, or if the omniscient view can tell that a thread is working only on its own data frame, lock acquisition and release can be made a no-op.
Introducing the POJO Container
POJO clustering provides a good foundation for a new, simpler, scaled-out architecture, but it is not enough by itself. Applications built around POJO clustering should be contained by some well-understood design patterns to ensure consistent and predictable application architecture and performance. What emerges from a system of consistent application design layered on top of POJO clustering services is what Terracotta calls the POJO container––a stack of services and practices based on simple POJOs that exhibit enterprise-scale performance, availability, and control when deployed in a production datacenter.
The UnitofWork Pattern
Inside the POJO container, Terracotta turns to the Unit of Work pattern to provide a simple and consistent application architecture. Think of the Unit of Work pattern inside the POJO container in terms of the classic design pattern of master and workers. The Unit of Work pattern consists of a master, a work queue, a results queue, and workers. The master communicates with the workers simply by introducing units of work into the work queue. Workers then remove those units of work, execute the work, and put any necessary responses onto the response queue for master aggregation and summary.
This design pattern is easy to use and run in a single JVM and Java 1.5 offers some built-in support for it, but deploying Unit of Work applications in a cluster today requires some heavyweight infrastructure (such as a JMS message bus), which inevitably violates the POJO principle. Therefore, the Unit of Work pattern institutionalized inside a cluster-ready container is the core of the POJO container. Figure 3 illustrates the container’s basic design.
Figure 3. Basic Design of the POJO Container |
To deliver simplicity, the POJO container demands that a developer implement only the Unit of Work interface that, in pseudo-code, looks something like this:
Interface UnitOfWork { Sink routeMe(); // Return a unique worker key // so that work can easily be // delivered to the same // worker across invocations };Interface WorkHandler { void handleWork(UnitOfWork work); // business implementation}
The Master is passed an object that implements the UnitOfWork interface. The master then calls the routeMe() method from which the UnitofWork object returns a reference to the Sink that will send the unit of work to a worker. The master then sends that unit of work down the sink to the worker. Workers de-queue only the work they logically own. When they do, they pass the unit of work to the handler attached to it that contains the business logic for the given type of work. The worker then en-queues the return result onto the results queue. This logical structure is a loop in which every time a UnitOfWork is en-queued by any member of the cluster, routeMe() gets called. This means work can pass from worker to worker, not simply between master and worker.
Simple and Scalable
The Unit of Work pattern in the POJO container can be very simple. A lead developer can implement the master. Workers needn’t be implemented at all because they are application agnostic. They implement the following simple loop:
- Call queue.pop();
- Block until queue.pop() returns;
- Get a UnitofWork off the queue;
- Fire UnitofWork.doWork();
- Fire resultsQueue.push( return result ); and
- Loop
The average developer need develop only Units-Of-Work (UoWs) and handlers for different types of UoWs. The only caveat to this is that when UoWs depend on other UoWs they are no longer atomic. In such cases, once amongst the master, the UoW or the worker has to carry with it some notion of state. To maintain simplicity, the UoW is the best place to maintain this state. A worker can fire doWork(), and then re-enqueue the work for itself to pick up at a later time. This way, workers are stateless and completely scalable, restartable, and all those good operational traits that IT likes.
But what of scalability? Scaling the POJO container does not require understanding the blackbox networking infrastructure that is moving the objects amongst the JVMs. In fact, the biggest challenge is around bottlenecking on a single queue since that queue functions as the routing core for communications amongst master and workers. For the average application, this is rarely an issue.
At high data volumes, a single work queue will most certainly be a bottleneck. The solution is to create a queue per worker and put the load-balancing work on the master. With a queue per worker, the mutations to the queue data structure will not be broadcast to the cluster, lowering the container’s tendency to bottleneck on the network as more workers are introduced. Furthermore, multiple load-balancing schemes can be used because the master now has discrete queues per worker. Consider a few examples:
- Round-robin would require the master to en-queue each UnitOfWork into workers’ queues, one after the other.
- Workload-sensitive balancing would require the master to look at each worker’s queue depth and en-queue work to the least utilized worker.
- Data affinity (a.k.a. sticky load balancing) would require the developer to implement a routeMe() method that sends the same class of work to the same worker each time.
The scalability characteristics of applications developed against the Unit of Work pattern and deployed in a clustered POJO container offer significant advantages. Workers can be introduced on demand. Load balancing techniques can be introduced separately from the basic business logic. The average developer does not see threads and concurrency in this event-driven model and so will tend to introduce bugs that are more of a functional scope than an infrastructure or performance scope; such bugs are both easier to reproduce and to automate tests for.
The key to clustered POJO-based application development is to have access to the source behind the Master/Worker/UoW interfaces so that lead-level developers can freely manipulate workers, master, routing/load balancing, and work-state. The container loses its power to simplify when it begins to constrain instead of abstract?or contain?development. A blackbox UnitOfWork engine not based on POJOs will tend toward constraint since, logically, a one-size-fits-all abstraction doesn’t exist.
UnitOfWork Pattern Case Study
Let’s look now at a real-world use case from a Fortune 50 company as an example of how an application might be designed using the UnitOfWork pattern. A dataset lives in three separate systems of record that must all be reconciled as changes arrive. Each system of record is updated asynchronously, so the reconciliation engine must wait until a change event arrives from each of the three systems of record before the reconciliation begins.
Figure 4. Complete Diagram of Example Use Case |
The UnitOfWork pattern can be applied to this use case in a number of ways. For this example,the two UnitOfWork types are Change and ChangeSet. Change represents a change event received by one of the systems of record, and ChangeSet represents the three required events from the different systems of record bundled together. The two WorkHandlers are a coordinator and a reconciler. The coordinator determines when all of the change events for a given logical change have arrived from each system of record. When all the change events have arrived, it passes the set of changes to a reconciler that receives ChangeSets and does the actual reconciliation. (See Figure 4 for a complete diagram of the use case).Here is the sample code for the two WorkHandler implementations:
public class Coordinator implements WorkHandler { // The pending map should be configured to be a // clustered Map so all WorkHandler instances see // the same pending ChangeSets. private Map pending; private ChangeSetFactory modSetFactory; // ... constructor, etc. public void handleWork(UnitOfWork work) { Change modification = (Change) work; ChangeID modID = modification.getModID(); // Since the pending Map is clustered, this // becomes a clustered synchronization synchronized (this.pending) { ChangeSet mods = getPendingSet(modID); mods.add(modification); if (mods.size()< 3) { this.pending.put(modID, mods); } else { Sink completeSink = mods.routeMe(); completeSink.put(mods); pending.remove(mods); } } private ChangeSet getPendingSet(ChangeID modID) { ChangeSet pendingFor = (ChangeSet) pending.get(modID); if (pendingFor == null) { pendingFor = modSetFactory.newModificationSet(); } return pendingFor; }}public class Reconciler implements WorkHandler { public void handleWork(UnitOfWork work) { ChangeSet mods = (ChangeSet) work; reconcile(mods); } private void reconcile(ChangeSet mods) { // do whatever needs to be done to reconcile // the modifications }}
The master will receive change events from the different systems of record and put them on the queue for the appropriate worker based on the ChangeID. Each worker will have a WorkHandler attached to it. Workers with Coordinator handlers attached to them will receive Change units of work, determine whether or not all the changes have arrived, and either make it pending or forward it on to the Reconciler queue. Workers with Reconciler handlers attached to them will receive complete ChangeSets and do the work of reconciling the Changes found in those ChangeSets. If you make the pending Map of Changes in the Coordinator a clustered map, Change events can be routed blindly to any Coordinator worker and, if any Coordinator worker fails, other Coordinators can pick up where it left off.
In the POJO container, any number of workers can be executed on threads in any number of container instances. Each worker’s queue is implemented as a simple library queue (e.g., LinkedBlockingQueue) and clustered using the POJO clustering techniques described earlier. The cluster can be dynamically load balanced such that new container instances can join the cluster on the fly as can the number of Worker threads devoted to each type of work. As a new worker comes online, its work queue is registered in a clustered data structure such that it becomes visible to the Master and ready to receive new work.
Toward the POJO Container
POJO clustering at the JVM level provides a simple, seamless way for applications to achieve scale-out. Clustered heap, object identity, and clustered locking can all be delivered as a service of the virtual machine. A flexible UnitofWork pattern layered on top can deliver consistent and predictable application design that can be tuned for performance without massive refactoring. What emerges is a stack of services and practices based on simple POJOs that exhibit enterprise-scale performance, availability, and control: the POJO container.