Garbage collection (computer science)
In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.[2]
Garbage collection relieves the programmer from doing
Resources other than memory, such as
Overview
This section needs additional citations for verification. (July 2014) |
Many
Although many languages integrate GC into their compiler and runtime system, post-hoc GC systems also exist, such as Automatic Reference Counting (ARC). Some of these post-hoc GC systems do not require recompilation.[5]
Advantages
GC frees the programmer from manually de-allocating memory. This helps avoid some kinds of
- dereferenced. By then the memory may have been reassigned to another use, with unpredictable results.
- Double free bugs, which occur when the program tries to free a region of memory that has already been freed, and perhaps already been allocated again.
- Certain kinds of unreachable, which can lead to memory exhaustion.[7]
Disadvantages
GC uses computing resources to decide which memory to free. Therefore, the penalty for the convenience of not annotating object lifetime manually in the source code is overhead, which can impair program performance.[8] A peer-reviewed paper from 2005 concluded that GC needs five times the memory to compensate for this overhead and to perform as fast as the same program using idealized explicit memory management. The comparison however is made to a program generated by inserting deallocation calls using an oracle, implemented by collecting traces from programs run under a profiler, and the program is only correct for one particular execution of the program.[9] Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing. The impact on performance was given by Apple as a reason for not adopting garbage collection in iOS, despite it being the most desired feature.[10]
The moment when the garbage is actually collected can be unpredictable, resulting in stalls (pauses to shift/free memory) scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with varying trade-offs.
Strategies
Tracing
Tracing garbage collection is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics.
Reference counting
Reference counting garbage collection is where each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created and decremented when a reference is destroyed. When the count reaches zero, the object's memory is reclaimed.[11]
As with manual memory management, and unlike tracing garbage collection, reference counting guarantees that objects are destroyed as soon as their last reference is destroyed, and usually only accesses memory which is either in CPU caches, in objects to be freed, or directly pointed to by those, and thus tends to not have significant negative side effects on CPU cache and virtual memory operation.
There are a number of disadvantages to reference counting; this can generally be solved or mitigated by more sophisticated algorithms:
- Cycles
- If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual references never let their reference counts become zero. Some garbage collection systems using reference counting (like the one in CPython) use specific cycle-detecting algorithms to deal with this issue.[12] Another strategy is to use weak references for the "backpointers" which create cycles. Under reference counting, a weak reference is similar to a weak reference under a tracing garbage collector. It is a special reference object whose existence does not increment the reference count of the referent object. Furthermore, a weak reference is safe in that when the referent object becomes garbage, any weak reference to it lapses, rather than being permitted to remain dangling, meaning that it turns into a predictable value, such as a null reference.
- Space overhead (reference count)
- Reference counting requires space to be allocated for each object to store its reference count. The count may be stored adjacent to the object's memory or in a side table somewhere else, but in either case, every single reference-counted object requires additional storage for its reference count. Memory space with the size of an unsigned pointer is commonly used for this task, meaning that 32 or 64 bits of reference count storage must be allocated for each object. On some systems, it may be possible to mitigate this overhead by using a
- Speed overhead (increment/decrement)
- In naive implementations, each assignment of a reference and each reference falling out of scope often require modifications of one or more reference counters. However, in a common case when a reference is copied from an outer scope variable into an inner scope variable, such that the lifetime of the inner variable is bounded by the lifetime of the outer one, the reference incrementing can be eliminated. The outer variable "owns" the reference. In the programming language C++, this technique is readily implemented and demonstrated with the use of
const
references. Reference counting in C++ is usually implemented using "smart pointers"[15] whose constructors, destructors, and assignment operators manage the references. A smart pointer can be passed by reference to a function, which avoids the need to copy-construct a new smart pointer (which would increase the reference count on entry into the function and decrease it on exit). Instead, the function receives a reference to the smart pointer which is produced inexpensively. The Deutsch-Bobrow method of reference counting capitalizes on the fact that most reference count updates are in fact generated by references stored in local variables. It ignores these references, only counting references in the heap, but before an object with reference count zero can be deleted, the system must verify with a scan of the stack and register that no other reference to it still exists. A further substantial decrease in the overhead on counter updates can be obtained by update coalescing introduced by Levanoni and Petrank.[16][17] Consider a pointer that in a given interval of the execution is updated several times. It first points to an objectO1
, then to an objectO2
, and so forth until at the end of the interval it points to some objectOn
. A reference counting algorithm would typically executerc(O1)--
,rc(O2)++
,rc(O2)--
,rc(O3)++
,rc(O3)--
, ...,rc(On)++
. But most of these updates are redundant. In order to have the reference count properly evaluated at the end of the interval it is enough to performrc(O1)--
andrc(On)++
. Levanoni and Petrank measured an elimination of more than 99% of the counter updates in typical Java benchmarks.
- Requires atomicity
- When used in a atomic operations such as compare-and-swap, at least for any objects which are shared, or potentially shared among multiple threads. Atomic operations are expensive on a multiprocessor, and even more expensive if they have to be emulated with software algorithms. It is possible to avoid this issue by adding per-thread or per-CPU reference counts and only accessing the global reference count when the local reference counts become or are no longer zero (or, alternatively, using a binary tree of reference counts, or even giving up deterministic destruction in exchange for not having a global reference count at all), but this adds significant memory overhead and thus tends to be only useful in special cases (it is used, for example, in the reference counting of Linux kernel modules). Update coalescing by Levanoni and Petrank[16][17]can be used to eliminate all atomic operations from the write-barrier. Counters are never updated by the program threads in the course of program execution. They are only modified by the collector which executes as a single additional thread with no synchronization. This method can be used as a stop-the-world mechanism for parallel programs, and also with a concurrent reference counting collector.
- Not real-time
- Naive implementations of reference counting do not generally provide real-time behavior, because any pointer assignment can potentially cause a number of objects bounded only by total allocated memory size to be recursively freed while the thread is unable to perform other work. It is possible to avoid this issue by delegating the freeing of unreferenced objects to other threads, at the cost of extra overhead.
Escape analysis
Availability
Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In some languages lacking built-in garbage collection, it can be added through a library, as with the Boehm garbage collector for C and C++.
Most
Other dynamic languages, such as
BASIC
BASIC and Logo have often used garbage collection for variable-length data types, such as strings and lists, so as not to burden programmers with memory management details. On the Altair 8800, programs with many string variables and little string space could cause long pauses due to garbage collection.[21] Similarly the Applesoft BASIC interpreter's garbage collection algorithm repeatedly scans the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in performance
Objective-C
While the
Limited environments
Garbage collection is rarely used on
Java
Garbage collectors available in Java JDKs include:
- G1
- Parallel
- Concurrent mark sweep collector (CMS)
- Serial
- C4 (Continuously Concurrent Compacting Collector)[35]
- Shenandoah
- ZGC
Compile-time use
Compile-time garbage collection is a form of static analysis allowing memory to be reused and reclaimed based on invariants known during compilation.
This form of garbage collection has been studied in the Mercury programming language,[36] and it saw greater usage with the introduction of LLVM's automatic reference counter (ARC) into Apple's ecosystem (iOS and OS X) in 2011.[31][32][28]
Real-time systems
Incremental, concurrent, and real-time garbage collectors have been developed, for example by Henry Baker and by Henry Lieberman.[37][38][39]
In Baker's algorithm, the allocation is done in either half of a single region of memory. When it becomes half full, a garbage collection is performed which moves the live objects into the other half and the remaining objects are implicitly deallocated. The running program (the 'mutator') has to check that any object it references is in the correct half, and if not move it across, while a background task is finding all of the objects.[40]
Generational garbage collection schemes are based on the empirical observation that most objects die young. In generational garbage collection, two or more allocation regions (generations) are kept, which are kept separate based on the object's age. New objects are created in the "young" generation that is regularly collected, and when a generation is full, the objects that are still referenced from older regions are copied into the next oldest generation. Occasionally a full scan is performed.
Some high-level language computer architectures include hardware support for real-time garbage collection.
Most implementations of real-time garbage collectors use
See also
- Destructor (computer programming)
- Dynamic dead-code elimination
- Smart pointer
- Virtual memory compression
References
- ^ Abelson, Harold; Sussman, Gerald Jay; Sussman, Julie (2016). Structure and Interpretation of Computer Programs (PDF) (2nd ed.). Cambridge, Massachusetts, US: MIT Press. pp. 734–736.
- S2CID 1489409. Retrieved 2009-05-29.
- ^ "What is garbage collection (GC) in programming?". SearchStorage. Retrieved 2022-10-17.
- ^ "Overview – D Programming Language". dlang.org. Digital Mars. Retrieved 2014-07-29.
- ^ "Garbage Collection - D Programming Language". dlang.org. Retrieved 2022-10-17.
- ^ "Garbage Collection". rebelsky.cs.grinnell.edu. Retrieved 2024-01-13.
- ^ Microsoft. "Fundamentals of garbage collection | Microsoft Learn". Retrieved 2023-03-29.
- S2CID 16182444.
- (PDF) from the original on 2012-04-02. Retrieved 2015-03-15.
- ^ Apple, Inc. 2011-06-24. Archived from the original(PDF) on 2023-09-04. Retrieved 2015-03-27.
- ^ Microsoft. "Reference Counting Garbage Collection". Retrieved 2023-03-29.
- ^ "Reference Counts". Extending and Embedding the Python Interpreter. 2008-02-21. Retrieved 2014-05-22.
- ^ Ash, Mike. "Friday Q&A 2013-09-27: ARM64 and You". mikeash.com. Retrieved 2014-04-27.
- ^ "Hamster Emporium: [objc explain]: Non-pointer isa". Sealiesoftware.com. 2013-09-24. Retrieved 2014-04-27.
- ^ Pibinger, Roland (2005-05-03) [2005-04-17]. "RAII, Dynamic Objects, and Factories in C++".
- ^ .
- ^ S2CID 14777709.
- .
- ^ Chisnall, David (2011-01-12). Influential Programming Languages, Part 4: Lisp.
- ^ "PHP: Performance Considerations". php.net. Retrieved 2015-01-14.
- ^ "Altair 8800 Basic 4.1 Reference Manual" (PDF). The Vintage Technology Digital Archive. April 1977. p. 108. Archived (PDF) from the original on 2021-06-29. Retrieved 2021-06-29.
- ^ "I did some work to speed up string garbage collection under Applesoft..." Hacker News. Retrieved 2021-06-29.
- ISBN 0-89303-564-5. Retrieved 2021-06-29.
- ^ "Fast Garbage Collection". Call-A.P.P.L.E.: 40–45. January 1981.
- ISBN 0-912985-05-4. Archived(PDF) from the original on 2008-12-03. Retrieved 2021-06-29.
- ^ "Objective-C 2.0 Overview". Archived from the original on 2010-07-24.
- ^ Siracusa, John (2011-07-20). "Mac OS X 10.7 Lion: the Ars Technica review".
- ^ a b "Apple says Mac app makers must transition to ARC memory management by May". AppleInsider. 2015-02-20.
- Heise.de. Retrieved 2015-03-30.
- ^ Silva, Precious (2014-11-18). "iOS 8 vs Android 5.0 Lollipop: Apple Kills Google with Memory Efficiency". International Business Times. Archived from the original on 2015-04-03. Retrieved 2015-04-07.
- ^ ISBN 978-1-11844997-4. Retrieved 2015-03-30.
- ^ Dr. Dobbs. Archived from the originalon 2020-05-16. Retrieved 2015-03-30.
- S2CID 8635481.
- ^ ".NET nanoFramework".
- ISBN 978-1-45030263-0. Archived(PDF) from the original on 2017-08-09.
- Katholieke Universiteit Leuven. Archived(PDF) from the original on 2014-04-27.
- (PDF) from the original on 2008-05-13.
- ^ "GC FAQ".
- S2CID 14161480.
- S2CID 17661259. see also description
- ^ McCloskey; Bacon; Cheng; Grove (2008), Staccato: A Parallel and Concurrent Real-time Compacting Garbage Collector for Multiprocessors (PDF), archived (PDF) from the original on 2014-03-11
Further reading
- Jones, Richard; Hosking, Antony; Moss, J. Eliot B. (2011-08-16). The Garbage Collection Handbook: The Art of Automatic Memory Management. CRC Applied Algorithms and Data Structures Series. ISBN 978-1-4200-8279-1. (511 pages)
- Jones, Richard; Lins, Rafael (1996-07-12). Garbage Collection: Algorithms for Automatic Dynamic Memory Management (1 ed.). ISBN 978-0-47194148-4. (404 pages)
- Schorr, Herbert; Waite, William M. (August 1967). "An Efficient Machine-Independent Procedure for Garbage Collection in Various List Structures" (PDF). (PDF) from the original on 2021-01-22.
- Wilson, Paul R. (1992). "Uniprocessor Garbage Collection Techniques". Memory Management. Lecture Notes in Computer Science. Vol. 637. )
- Wilson, Paul R.; Johnstone, Mark S.; Neely, Michael; Boles, David (1995). "Dynamic Storage Allocation: A Survey and Critical Review". Memory Management. Lecture Notes in Computer Science. Vol. 986 (1 ed.). pp. 1–116. )
External links
- The Memory Management Reference
- The Very Basics of Garbage Collection
- Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning
- TinyGC - an independent implementation of the BoehmGC API
- Conservative Garbage Collection Implementation for C Language
- MeixnerGC - an incremental mark and sweep garbage collector for C++ using smart pointers