Automatic Garbage Collection in Java and C++
The purpose of this writing is to outline my views on automatic garbage collection (AGC) in Java and C++. It is well known that AGC is standard in Java, but is rarely available in C++. Garbage collection concerns objects with dynamic extent (allocated with a
new statement in these languages). Java uses various techniques to automatically discover when these objects are no longer referred to, and to reclaim them. In contrast, C++ programmers manually specify where an object with dynamic extent is to be reclaimed by coding a
In order to implement secure Internet applets, certain language restrictions are necessary. Some of these restrictions would also be helpful in implementing AGC. But even if the security restrictions were in place, I would have reservations about AGC.
The benefits of AGC would not be so great in C++:
STL data structures will eliminate the majority of the needs for using
new. Most people should not allocate arrays because STL does an effective job in doing so. I never need to use
newin my code, and I pay great attention to efficiency. The code tends to be more efficient than if I were to use
new. With the acceptance of STL,
newwill sort of fade away. STL also solves the problem of deleting because, for example, in the case of a
vector, the destructor will destroy it on the exit from the block. You don’t need to worry about releasing the storage as you do when you use
new. STL can dramatically minimize the demand for garbage collection. Disciplined use of containers allows you to do whatever you need to do without automatic memory management. The STL constructors and destructors do allocation properly.
Java-style AGC has costs besides the language restrictions that were referred to in the second paragraph:
finalizefunction. But the facility is weak, compared to the C++ destructor. Java programmers sometimes resort to writing a separate class function for cleanup and calling it explicitly. The Java programmer may forget to call a cleanup function, just as easily as a C++ programmer may forget to use
delete. This can cause non-memory leaks and other problems in Java programs. At least C++ programmers can often avail themselves of utilities to detect leakage. And C++ programmers do not use
deletefor class objects with automatic extent, while a separate cleanup function in Java would always have to be explicitly called.
deletestatements (for heap objects) and stack objects going out of scope at the end of blocks.
Smart pointer templates are commonly used in C++ to implement the memory management technique of reference counting. Unlike ordinary pointers, the smart pointers themselves invoke
delete upon the reference count dropping to zero. They guard against memory leaks without resorting to Java-style AGC.
The Boost library has a number of these freely available smart pointer templates. Another freely available smart pointer template is the Loki library’s SmartPtr, which has a more parameterized design. The next C++ standard may incorporate one or more smart pointer templates for reference counting.
Reference counting has a significant limitation. It cannot reclaim a set of objects linked with cyclic (also known as circular) references. If a set loses all references from the outside, it becomes garbage and should be reclaimed. But the reference counts would not be zero, due to the cyclic references. Boost tested a smart pointer implementation that used a hybrid approach so as to correctly reclaim cyclic data structures, but the timing results were poor. Neither the Boost nor the Loki library has a smart pointer for cyclic structures.
In the writing above this section, I discussed the costs of Java-style AGC. I will now contrast that with how reference counting functions in a C++ environment. This assumes that the reference counting is not used for cyclic data:
AGC techniques can introduce significant overhead into a program even if only used for a single object. This is not so for reference counting, which has more of a per-object overhead (with some objects having more active reference counts than others). The per-object nature of reference counted overhead, along with the similarities in behavior noted above between reference counting and manual reclamation, makes it feasible to consider hybrid arrangements. Cyclic data would be the prime candidate to be manually reclaimed, as reference counting does not work correctly with it. But other objects could also be manually reclaimed, for better efficiency. A programmer might use reference counting for objects during the initial phases of development, with the option of later investing time to code
delete statements for some or all of these objects.
The Limbo programming language uses a different approach. Except for cyclic data structures (which the programmer marks with the
cyclic keyword), the language automatically manages dynamically created objects with reference counting, and considers it a form of AGC. The Limbo literature [Phil Winterbottom and Rob Pike: The design of the Inferno virtual machine] argues that for non-cyclic data, reference counting is superior to an AGC technique known as mark-and-sweep. It also faults Java for not using reference counting.
As explained above, reference counting does not work correctly with cyclic data. The designers of Limbo chose to manage cyclic data with mark-and-sweep AGC, instead of manual reclamation. The Limbo language does not have a destructor. Therefore, mark-and-sweep AGC cannot introduce uncertainties into Limbo programs as to when, or if, a destructor is called. The other characteristics, both good and bad, of the two techniques (reference counting and mark-and-sweep), do surface in Limbo programs.
C++ has powerful template and class facilities not available in Limbo, making smart pointers a logical and attractive approach to implementing reference counting in C++. This is in spite of some advantages to Limbo’s more integrated approach. While I have not benchmarked it, I suspect that Limbo’s automatic reference counting is at least as fast as
intrusive_ptr, which in turn is faster than
shared_ptr, both C++ smart pointers from Boost. Also,
intrusive_ptr requires the programmer to supply definitions for two functions and to place the reference count inside the object being managed. The space for the reference count is wasted when the object is not managed with
intrusive_ptr, such as when the object is stored on the stack or is manually reclaimed. Limbo’s reference counting and
shared_ptr do not have these disadvantages of
© 1999, 2004 Stephen Leibowitz