Garbage Collectors: Spoiled for Choice
|"The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements."
—- Java Virtual Machine Specification, Section 3.5.3 [JVMS2 1999]
In fact, the Java HotSpot VM has three different garbage collectors, each of which is targeted to a different set of applications. The next three sections describe them.
The Serial Collector
The configuration of the Serial Collector is a young generation over an old generation managed by a sliding compacting mark-sweep, also known as a mark-compact, collector. Both minor and major collections take place in a stop-the-world fashion (i.e., the application is stopped while a collection is taking place). Only after the collection has finished is the application restarted (see Figure 6).
The mark-compact collector first identifies which objects are still live in the old generation. It then slides them towards the beginning of the heap, leaving any free space in a single contiguous chunk at the end of the heap. This allows any future allocations into the old generation, which will most likely take place as objects are being promoted from the young generation, to use the fast bump-the-pointer technique. Figure 7 illustrates the operation of such a collector (in sub-figure a, objects marked with a gray X are assumed to be garbage, and the shaded area in sub-figure b denotes free space).
The Serial Collector is the collector of choice for most applications that do not have low pause time requirements and run on client-style machines. It takes advantage of only a single CPU for garbage collection work (hence, its name). Still, on today's hardware, the Serial Collector can efficiently manage a lot of non-trivial applications with a few 100MBs of heap, with relatively short worst-case pauses (around a couple of seconds for major collections).
|Figure 7. Compaction of the Old Generation|
The Parallel Collector: Throughput Matters!
These days, a lot of important Java applications run on (sometimes dedicated) servers with a lot of physical memory and multiple CPUs. Ideally, the garbage collector can take advantage of all available CPUs and not leave most of them idle while only one does garbage collection work.
To decrease garbage collection overhead and hence increase application throughput, on server-style machines, the Java HotSpot VM includes the Parallel Collector, also called the Throughput Collector. Its operation is similar to that of the Serial Collector (i.e., it is a stop-the-world collector with the young generation over a mark-compact old generation). However, the minor collections take place in parallel, using all available CPUs (see Figure 8). The major collections are performed serially, but there are plans to "parallelize" them in the near future.
Applications that can benefit from the Parallel Collector are those that do not have pause time constraints (as infrequent—but potentially long—major collections will still occur), that run on machines with more than one CPU, and that don't have a need for end-to-end throughput. Examples of such applications include batch processing, scientific computing, etc. The Parallel Collector, compared to the Serial Collector, will improve minor collection efficiency and as a result will improve application throughput. Whereas the major collection times will remain largely unchanged (as, in both cases, they are done serially and with the same algorithm).
|Figure 8. The Parallel Collector: Throughput Matters!|
The Mostly-Concurrent Collector: Latency Matters!
For a number of applications, end-to-end throughput is not as important as fast response time. In the stop-the-world model, when a collection is taking place, the application itself is not running and external requests will not be satisfied until the application is restarted. Minor collections do not typically cause very long pauses. However, major collections, even though infrequent, can impose very long pauses, especially when large heaps are involved.
To deal with this, the Java HotSpot VM includes the Mostly-Concurrent Collector, also known as the Concurrent Mark-Sweep (CMS) or the Low Latency Collector. It manages its young generation the same way the Parallel and Serial Collectors do. Its old generation, however, is managed by an algorithm that performs most of its work concurrently, imposing only two short pauses per collection cycle.
Figure 9 illustrates how a collection cycle works in the Mostly-Concurrent Collector. It starts with a short pause, called initial mark, that identifies the set of objects that are immediately reachable from outside the heap. Then, during the concurrent marking phase, it marks all live objects that are transitively reachable from this set. Because the application is running and updating reference fields (hence, modifying the object graph) while the marking phase is taking place, not all live objects are guaranteed to be marked at the end of the concurrent marking phase. To deal with this, the application stops again for a second pause, called remark, which finalizes marking by revisiting any objects that were modified during the concurrent marking phase. The card table data structure is re-used to also keep track of modified objects. Because the remark pause is more substantial than the initial mark, it is "parallelized" to increase its efficiency. At the end of the remark phase, all live objects in the heap are guaranteed to have been marked. Since revisiting objects during the remark phase increases the amount of work the collector has to do, its overhead increases as well. This is a typical trade-off for most collectors that attempt to reduce pause times.
|Figure 9. The Mostly-Concurrent Collector: Latency Matters!|
Having identified all live objects in the old generation, the final phase of the collection cycle is the concurrent sweeping phase, which sweeps over the heap, de-allocating garbage objects in-place without relocating the live ones. Figure 10 illustrates the operation of the sweeping phase: in sub-figure a, objects marked with a gray X are assumed to be garbage, and the shaded areas in sub-figure b denote free space. In sub-figure b, free space is not contiguous (unlike in the previous two collectors, as illustrated in Figure 7), and the collector needs to employ a data structure (free lists, in this case) that records which parts of the heap contain free space. As a result, allocation into the old generation is more expensive, as allocation from free lists is not as efficient as the bump-the-pointer technique. This imposes extra overhead to minor collections, as most allocations in the old generation take place when objects are promoted during minor collections.
Another disadvantage that the Mostly-Concurrent Collector has, which the previous two don't, is that it typically has larger heap requirements. A few reasons explain why. First, a concurrent marking cycle will last much longer than that of a stop-the-world collector. And it is only during the sweeping phase that space is actually reclaimed. Given that the application is allowed to run during the marking phase, it is also allowed to allocate memory, hence the occupancy of the old generation potentially will grow during the marking phase and drop only during the sweeping phase. Additionally, despite the collector's guarantee to identify all live objects during the marking phase, it doesn't actually guarantee that it will identify all objects that are garbage. Some objects that will become garbage during the marking phase may or may not be reclaimed during the cycle. If they are not, then they will be reclaimed during the next cycle. Garbage objects that are wrongly identified as live are usually referred to as floating garbage.
Finally, fragmentation issues due to the lack of compaction might also prevent the collector from using the available space as efficiently as possible. If the old generation is full before the collection cycle in progress has actually reclaimed sufficient space, the Mostly-Concurrent Collector will revert to an expensive stop-the-world compacting phase, similar to that of the Parallel and Serial Collectors.
Compared to the Parallel Collector, the Mostly-Concurrent Collector decreases old-generation pauses—sometimes dramatically—at the expense of slightly longer young generation pauses, some reduction in throughput, and extra heap size requirements. Due to its concurrency, it also takes CPU cycles away from the application during a collection cycle. Applications that can benefit from it are ones that require fast response times (such as data-tracking servers, Web servers, etc.), and it is in fact widely used in this context.