he Java virtual machine (JVM) specification dictates that any JVM implementation must include a garbage collector (GC) to reclaim unused memory (i.e., unreachable objects) [JVMS2 1999]
. However, the behavior and efficiency of a garbage collector can heavily influence the performance and responsiveness of any application that relies on it. This article gives an introduction to the garbage collection techniques that Sun Microsystems Inc. adopted in the Java HotSpot Virtual Machine, its production Java virtual machine. The aim is for the reader to gain a better understanding of how garbage collection in the Java HotSpot VM works and, as a result, be able to take full advantage of it when designing, developing, and deploying their applications.
|"Heap storage for objects is reclaimed by an automatic storage management system (typically a garbage collector); objects are never explicitly deallocated."
— Java Virtual Machine Specification, Section 3.5.3 [JVMS2 1999]
Generational Garbage Collection
The Java HotSpot VM uses a generational garbage collector, a well-known technique that exploits the following two observations:
- Most allocated objects will die young.
- Few references from older to younger objects exist.
|Figure 1. Generational Garbage Collection|
These two observations are collectively known as the weak generational hypothesis, which generally holds true for Java applications. To take advantage of this hypothesis, the Java HotSpot VM splits the heap into two physical areas, which are referred to as generations, one young and the other old:
- Young Generation. Most newly allocated objects are allocated in the young generation (see Figure 1), which is typically small and collected frequently. Since most objects in it are expected to die quickly, the number of objects that survive a young generation collection (also referred to as a minor collection) is expected to be low. In general, minor collections are very efficient because they concentrate on a space that is usually small and is likely to contain a lot of garbage objects.
- Old Generation. Objects that are longer-lived are eventually promoted, or tenured, to the old generation (see Figure 1). This generation is typically larger than the young generation and its occupancy grows more slowly. As a result, old generation collections (also referred to as major collections) are infrequent, but when they do occur they are quite lengthy.
To keep minor collections short, the GC must be able to identify live objects in the young generation without having to scan the entire (and potentially larger) old generation. It can achieve this in several ways. In the Java HotSpot VM, the GC uses a data structure called a card table. The old generation is split into 512-byte chunks called cards. The card table is an array with one byte entry per card in the heap. Every update to a reference field of an object also ensures that the card containing the updated reference field is marked dirty
by setting its entry in the card table to the appropriate value. During a minor collection, only the areas that correspond to dirty cards are scanned to potentially discover old-to-young references (see Figure 2
In cooperation with the bytecode interpreter and the just-in-time (JIT) compiler, the Java HotSpot VM uses a write barrier to maintain the card table. This barrier is a small fragment of code that sets an entry of the card table to the dirty value. The interpreter executes a write barrier every time it executes a bytecode that updates a reference field. Additionally, the JIT compiler emits the write barrier after emitting the code that updates a reference field. Although write barriers do impact performance on the execution a bit, their presence allows for much faster minor collections, which typically improves the end-to-end throughput of an application.
A big advantage of generational garbage collection is that each generation can be managed by the garbage collection algorithm that is most appropriate for its characteristics. A fast GC usually manages the young generation, as minor collections are very frequent. This GC might be a little space wasteful, but since the young generation typically is a small portion of the heap, this is not a big problem. On the other hand, a GC that is space efficient usually manages the old generation, as the old generation takes up most of the heap. This GC might not be quite as fast, but because major collections are infrequent, it doesn't have a big performance impact.
To take full advantage of generational garbage collection, applications should conform to the weak generational hypothesis, as it is what generational garbage collection exploits. For the Java applications that do not do so, a generational garbage collector unfortunately might add more overhead. In practice, such applications are not very common.