Automatic Garbage Collection in Java and C++
Stephen Leibowitz

The purpose of this writing is to outline my views on automatic garbage collection (AGC) in Java and C++. It is well known that AGC is standard in Java, but is rarely available in C++. Garbage collection concerns objects with dynamic extent (allocated with a new statement in these languages). Java uses various techniques to automatically discover when these objects are no longer referred to, and to reclaim them. In contrast, C++ programmers manually specify where an object with dynamic extent is to be reclaimed by coding a delete statement.

In order to implement secure Internet applets, certain language restrictions are necessary. Some of these restrictions would also be helpful in implementing AGC. But even if the security restrictions were in place, I would have reservations about AGC.

The benefits of AGC would not be so great in C++:

STL data structures will eliminate the majority of the needs for using new. Most people should not allocate arrays because STL does an effective job in doing so. I never need to use new in my code, and I pay great attention to efficiency. The code tends to be more efficient than if I were to use new. With the acceptance of STL, new will sort of fade away. STL also solves the problem of deleting because, for example, in the case of a vector, the destructor will destroy it on the exit from the block. You don’t need to worry about releasing the storage as you do when you use new. STL can dramatically minimize the demand for garbage collection. Disciplined use of containers allows you to do whatever you need to do without automatic memory management. The STL constructors and destructors do allocation properly.

Java-style AGC has costs besides the language restrictions that were referred to in the second paragraph:

Reference Counting

Smart pointer templates are commonly used in C++ to implement the memory management technique of reference counting. Unlike ordinary pointers, the smart pointers themselves invoke delete upon the reference count dropping to zero. They guard against memory leaks without resorting to Java-style AGC.

The Boost library has a number of these freely available smart pointer templates. Another freely available smart pointer template is the Loki library’s SmartPtr, which has a more parameterized design. The next C++ standard may incorporate one or more smart pointer templates for reference counting.

Reference counting has a significant limitation. It cannot reclaim a set of objects linked with cyclic (also known as circular) references. If a set loses all references from the outside, it becomes garbage and should be reclaimed. But the reference counts would not be zero, due to the cyclic references. Boost tested a smart pointer implementation that used a hybrid approach so as to correctly reclaim cyclic data structures, but the timing results were poor. Neither the Boost nor the Loki library has a smart pointer for cyclic structures.

In the writing above this section, I discussed the costs of Java-style AGC. I will now contrast that with how reference counting functions in a C++ environment. This assumes that the reference counting is not used for cyclic data:

AGC techniques can introduce significant overhead into a program even if only used for a single object. This is not so for reference counting, which has more of a per-object overhead (with some objects having more active reference counts than others). The per-object nature of reference counted overhead, along with the similarities in behavior noted above between reference counting and manual reclamation, makes it feasible to consider hybrid arrangements. Cyclic data would be the prime candidate to be manually reclaimed, as reference counting does not work correctly with it. But other objects could also be manually reclaimed, for better efficiency. A programmer might use reference counting for objects during the initial phases of development, with the option of later investing time to code delete statements for some or all of these objects.

The Limbo programming language uses a different approach. Except for cyclic data structures (which the programmer marks with the cyclic keyword), the language automatically manages dynamically created objects with reference counting, and considers it a form of AGC. The Limbo literature [Phil Winterbottom and Rob Pike: The design of the Inferno virtual machine] argues that for non-cyclic data, reference counting is superior to an AGC technique known as mark-and-sweep. It also faults Java for not using reference counting.

As explained above, reference counting does not work correctly with cyclic data. The designers of Limbo chose to manage cyclic data with mark-and-sweep AGC, instead of manual reclamation. The Limbo language does not have a destructor. Therefore, mark-and-sweep AGC cannot introduce uncertainties into Limbo programs as to when, or if, a destructor is called. The other characteristics, both good and bad, of the two techniques (reference counting and mark-and-sweep), do surface in Limbo programs.

C++ has powerful template and class facilities not available in Limbo, making smart pointers a logical and attractive approach to implementing reference counting in C++. This is in spite of some advantages to Limbo’s more integrated approach. While I have not benchmarked it, I suspect that Limbo’s automatic reference counting is at least as fast as intrusive_ptr, which in turn is faster than shared_ptr, both C++ smart pointers from Boost. Also, intrusive_ptr requires the programmer to supply definitions for two functions and to place the reference count inside the object being managed. The space for the reference count is wasted when the object is not managed with intrusive_ptr, such as when the object is stored on the stack or is manually reclaimed. Limbo’s reference counting and shared_ptr do not have these disadvantages of intrusive_ptr.

© 1999, 2004 Stephen Leibowitz