ll conversational Web applications are stateful, but the question is where to keep the state. For HTTP session clustering, the answer is keeping session in memory, closer to the processing context where it is needed. This is because data stored in session has no business value; session is just a temporary place to keep state while an application converses with a browser. The conversation occurs over the course of several network calls that take a significant amount of time (seconds, minutes, or hours).
Still, popular opinion regarding which application design is right for clustering HTTP sessions has swung back and forth between stateful and stateless. Today, the pendulum hovers squarely over stateless application design, but the Java community needs stateful application design. Many critical frameworks that Java developers use to build Web applications rely on session to be durable across requestsin other words, stateful. Struts, Spring Web Flow, RIFE, AJAX, and others keep track of per-user information that needs to remain in memory. Otherwise, the application can survive logically, but the end-user experience is not ideal. Shopping carts get lost; checkout starts again; log-ins are dropped and users have to re-enter passwords; complicated multi-stage Web forms have to be filled in from scratch, and so on.
These frameworks rely on session to be a durable yet short-term bucket for state. And why shouldn't they? If clustering didn't rely on Java serialization, and it could be reasonably scalable without significant tool integration and customization, the various session interfaces (servlet, HTTP, etc.) would be the perfect place to store this information.
Moreover, stateful is easier to code to, more scalable, and more cost effective. And a new class of infrastructure called network-attached memory (NAM) makes stateful possible in a whole new way by enabling developers to avoid Java serialization while storing state in a central place.
The Challenges of Session Clustering
In-memory session replication has long been too hard, because it attempts to deliver the highest availability without compromising scalability. Yet it tends to fail at delivering either:
- For availability, highly scalable clustering solutions copy session to only one backup location, leaving room for failure.
- For scalability, truly available solutions copy session to every possible location and bottleneck on the network at as few as three application server nodes.
Availability and scalability conflict because anything not written to disk is not highly available, yet any information written to disk/database or to messaging services slows the application down. Session clustering architectures further challenge engineers because Web apps usually need to keep track of lots of per-user state (if they are not static content sites). So the clustering workload increases with end-user traffic, and scalability issues appear over and over again in a single production application.
Some application servers support session clustering. For example, BEA's WebLogic Server solves the scalability/availability trade-off by clustering sessions between only two servers, regardless of cluster size. Tomcat and JBoss provide more options in terms of where to store and how many places to store session information. Still other application servers cannot provide session clustering at all.
In many of the most common use cases, the problem is purely about reliability at small scale. Specifically, if the cluster is small enough (two or three servers) or is failing rapidly, the clustering technology seems to bring that cluster down faster. However, if the cluster is large enough or geographically distributed across multiple datacenters, some of these clustering solutions do not scale.
Other use cases do not involve availability at all. In these cases, developers cannot use session clustering technologies because the application uses software libraries (such as RIFE) that do not store information in session and are not serialization-safe.
Without going into detail for every use case, Table 1 lists the problems with some common session clustering cases.
|Frequent add/delete of servers
||Sessions need to move across servers and can place inordinate burdens on memory and network along the way.
|Open source and other third-party libraries
||Incompatible with native Java serialization
|Wide area networking (WAN) and disaster recovery
||Sharing session across the WAN suffers from network latency issues. App servers can crash if network latency increases beyond some app-specific threshold when sharing session updates.
||Memory and network footprint size is always changing because developers can put anything into session, and load/performance testing will not necessarily catch the problems before production.
|I/O bottlenecking and controlling session size
||As an example, one client reported to the author that his BEA WebLogic-based application needed to maintain session below 3Kb. He was convinced that any larger session would crash his cluster when replicating sessions.
|Table 1. Common Challenges for Session Clustering Use Cases|