Skip to content

Commit 9091398

Browse files
authored
Move comment about permanent generation to gcmodule.c (GH-17718)
The comment about the collection rules for the permanent generation was incorrectly referenced by a comment in gcmodule.c (the comment has been moved long ago into a header file). Moving the comment into the relevant code helps with readability and avoids broken references.
1 parent 91874bb commit 9091398

File tree

2 files changed

+34
-38
lines changed

2 files changed

+34
-38
lines changed

Include/internal/pycore_pymem.h

Lines changed: 0 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -16,42 +16,6 @@ extern "C" {
1616
/* If we change this, we need to change the default value in the
1717
signature of gc.collect. */
1818
#define NUM_GENERATIONS 3
19-
20-
/*
21-
NOTE: about the counting of long-lived objects.
22-
23-
To limit the cost of garbage collection, there are two strategies;
24-
- make each collection faster, e.g. by scanning fewer objects
25-
- do less collections
26-
This heuristic is about the latter strategy.
27-
28-
In addition to the various configurable thresholds, we only trigger a
29-
full collection if the ratio
30-
long_lived_pending / long_lived_total
31-
is above a given value (hardwired to 25%).
32-
33-
The reason is that, while "non-full" collections (i.e., collections of
34-
the young and middle generations) will always examine roughly the same
35-
number of objects -- determined by the aforementioned thresholds --,
36-
the cost of a full collection is proportional to the total number of
37-
long-lived objects, which is virtually unbounded.
38-
39-
Indeed, it has been remarked that doing a full collection every
40-
<constant number> of object creations entails a dramatic performance
41-
degradation in workloads which consist in creating and storing lots of
42-
long-lived objects (e.g. building a large list of GC-tracked objects would
43-
show quadratic performance, instead of linear as expected: see issue #4074).
44-
45-
Using the above ratio, instead, yields amortized linear performance in
46-
the total number of objects (the effect of which can be summarized
47-
thusly: "each full garbage collection is more and more costly as the
48-
number of objects grows, but we do fewer and fewer of them").
49-
50-
This heuristic was suggested by Martin von Löwis on python-dev in
51-
June 2008. His original analysis and proposal can be found at:
52-
http://mail.python.org/pipermail/python-dev/2008-June/080579.html
53-
*/
54-
5519
/*
5620
NOTE: about untracking of mutable objects.
5721

Modules/gcmodule.c

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1381,8 +1381,40 @@ collect_generations(PyThreadState *tstate)
13811381
for (int i = NUM_GENERATIONS-1; i >= 0; i--) {
13821382
if (gcstate->generations[i].count > gcstate->generations[i].threshold) {
13831383
/* Avoid quadratic performance degradation in number
1384-
of tracked objects. See comments at the beginning
1385-
of this file, and issue #4074.
1384+
of tracked objects (see also issue #4074):
1385+
1386+
To limit the cost of garbage collection, there are two strategies;
1387+
- make each collection faster, e.g. by scanning fewer objects
1388+
- do less collections
1389+
This heuristic is about the latter strategy.
1390+
1391+
In addition to the various configurable thresholds, we only trigger a
1392+
full collection if the ratio
1393+
1394+
long_lived_pending / long_lived_total
1395+
1396+
is above a given value (hardwired to 25%).
1397+
1398+
The reason is that, while "non-full" collections (i.e., collections of
1399+
the young and middle generations) will always examine roughly the same
1400+
number of objects -- determined by the aforementioned thresholds --,
1401+
the cost of a full collection is proportional to the total number of
1402+
long-lived objects, which is virtually unbounded.
1403+
1404+
Indeed, it has been remarked that doing a full collection every
1405+
<constant number> of object creations entails a dramatic performance
1406+
degradation in workloads which consist in creating and storing lots of
1407+
long-lived objects (e.g. building a large list of GC-tracked objects would
1408+
show quadratic performance, instead of linear as expected: see issue #4074).
1409+
1410+
Using the above ratio, instead, yields amortized linear performance in
1411+
the total number of objects (the effect of which can be summarized
1412+
thusly: "each full garbage collection is more and more costly as the
1413+
number of objects grows, but we do fewer and fewer of them").
1414+
1415+
This heuristic was suggested by Martin von Löwis on python-dev in
1416+
June 2008. His original analysis and proposal can be found at:
1417+
http://mail.python.org/pipermail/python-dev/2008-June/080579.html
13861418
*/
13871419
if (i == NUM_GENERATIONS - 1
13881420
&& gcstate->long_lived_pending < gcstate->long_lived_total / 4)

0 commit comments

Comments
 (0)