Skip to content

Move comment about permanent generation to gcmodule.c #17718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 0 additions & 36 deletions Include/internal/pycore_pymem.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,42 +16,6 @@ extern "C" {
/* If we change this, we need to change the default value in the
signature of gc.collect. */
#define NUM_GENERATIONS 3

/*
NOTE: about the counting of long-lived objects.

To limit the cost of garbage collection, there are two strategies;
- make each collection faster, e.g. by scanning fewer objects
- do less collections
This heuristic is about the latter strategy.

In addition to the various configurable thresholds, we only trigger a
full collection if the ratio
long_lived_pending / long_lived_total
is above a given value (hardwired to 25%).

The reason is that, while "non-full" collections (i.e., collections of
the young and middle generations) will always examine roughly the same
number of objects -- determined by the aforementioned thresholds --,
the cost of a full collection is proportional to the total number of
long-lived objects, which is virtually unbounded.

Indeed, it has been remarked that doing a full collection every
<constant number> of object creations entails a dramatic performance
degradation in workloads which consist in creating and storing lots of
long-lived objects (e.g. building a large list of GC-tracked objects would
show quadratic performance, instead of linear as expected: see issue #4074).

Using the above ratio, instead, yields amortized linear performance in
the total number of objects (the effect of which can be summarized
thusly: "each full garbage collection is more and more costly as the
number of objects grows, but we do fewer and fewer of them").

This heuristic was suggested by Martin von Löwis on python-dev in
June 2008. His original analysis and proposal can be found at:
http://mail.python.org/pipermail/python-dev/2008-June/080579.html
*/

/*
NOTE: about untracking of mutable objects.

Expand Down
36 changes: 34 additions & 2 deletions Modules/gcmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -1381,8 +1381,40 @@ collect_generations(PyThreadState *tstate)
for (int i = NUM_GENERATIONS-1; i >= 0; i--) {
if (gcstate->generations[i].count > gcstate->generations[i].threshold) {
/* Avoid quadratic performance degradation in number
of tracked objects. See comments at the beginning
of this file, and issue #4074.
of tracked objects (see also issue #4074):

To limit the cost of garbage collection, there are two strategies;
- make each collection faster, e.g. by scanning fewer objects
- do less collections
This heuristic is about the latter strategy.

In addition to the various configurable thresholds, we only trigger a
full collection if the ratio

long_lived_pending / long_lived_total

is above a given value (hardwired to 25%).

The reason is that, while "non-full" collections (i.e., collections of
the young and middle generations) will always examine roughly the same
number of objects -- determined by the aforementioned thresholds --,
the cost of a full collection is proportional to the total number of
long-lived objects, which is virtually unbounded.

Indeed, it has been remarked that doing a full collection every
<constant number> of object creations entails a dramatic performance
degradation in workloads which consist in creating and storing lots of
long-lived objects (e.g. building a large list of GC-tracked objects would
show quadratic performance, instead of linear as expected: see issue #4074).

Using the above ratio, instead, yields amortized linear performance in
the total number of objects (the effect of which can be summarized
thusly: "each full garbage collection is more and more costly as the
number of objects grows, but we do fewer and fewer of them").

This heuristic was suggested by Martin von Löwis on python-dev in
June 2008. His original analysis and proposal can be found at:
http://mail.python.org/pipermail/python-dev/2008-June/080579.html
*/
if (i == NUM_GENERATIONS - 1
&& gcstate->long_lived_pending < gcstate->long_lived_total / 4)
Expand Down