Skip to content

Commit 575f888

Browse files
authored
GH-96068: Document object layout (GH-96069)
1 parent 16ebae4 commit 575f888

File tree

5 files changed

+157
-0
lines changed

5 files changed

+157
-0
lines changed

Objects/object_layout.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Object layout
2+
3+
## Common header
4+
5+
Each Python object starts with two fields:
6+
7+
* ob_refcnt
8+
* ob_type
9+
10+
which the form the header common to all Python objects, for all versions,
11+
and hold the reference count and class of the object, respectively.
12+
13+
## Pre-header
14+
15+
Since the introduction of the cycle GC, there has also been a pre-header.
16+
Before 3.11, this pre-header was two words in size.
17+
It should be considered opaque to all code except the cycle GC.
18+
19+
## 3.11 pre-header
20+
21+
In 3.11 the pre-header was extended to include pointers to the VM managed ``__dict__``.
22+
The reason for moving the ``__dict__`` to the pre-header is that it allows
23+
faster access, as it is at a fixed offset, and it also allows object's
24+
dictionaries to be lazily created when the ``__dict__`` attribute is
25+
specifically asked for.
26+
27+
In the 3.11 the non-GC part of the pre-header consists of two pointers:
28+
29+
* dict
30+
* values
31+
32+
The values pointer refers to the ``PyDictValues`` array which holds the
33+
values of the objects's attributes.
34+
Should the dictionary be needed, then ``values`` is set to ``NULL``
35+
and the ``dict`` field points to the dictionary.
36+
37+
## 3.12 pre-header
38+
39+
In 3.12 the the pointer to the list of weak references is added to the
40+
pre-header. In order to make space for it, the ``dict`` and ``values``
41+
pointers are combined into a single tagged pointer:
42+
43+
* weakreflist
44+
* dict_or_values
45+
46+
If the object has no physical dictionary, then the ``dict_or_values``
47+
has its low bit set to one, and points to the values array.
48+
If the object has a physical dictioanry, then the ``dict_or_values``
49+
has its low bit set to zero, and points to the dictionary.
50+
51+
The untagged form is chosen for the dictionary pointer, rather than
52+
the values pointer, to enable the (legacy) C-API function
53+
`_PyObject_GetDictPtr(PyObject *obj)` to work.
54+
55+
56+
## Layout of a "normal" Python object in 3.12:
57+
58+
* weakreflist
59+
* dict_or_values
60+
* GC 1
61+
* GC 2
62+
* ob_refcnt
63+
* ob_type
64+
65+
For a "normal" Python object, that is one that doesn't inherit from a builtin
66+
class or have slots, the header and pre-header form the entire object.
67+
68+
![Layout of "normal" object in 3.12](./object_layout_312.png)
69+
70+
There are several advantages to this layout:
71+
72+
* It allows lazy `__dict__`s, as described above.
73+
* The regular layout allows us to create tailored traversal and deallocation
74+
functions based on layout, rather than inheritance.
75+
* Multiple inheritance works properly,
76+
as the weakrefs and dict are always at the same offset.
77+
78+
The full layout object, with an opaque part defined by a C extension,
79+
and `__slots__` looks like this:
80+
81+
![Layout of "full" object in 3.12](./object_layout_full_312.png)
82+

Objects/object_layout_312.gv

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
digraph ideal {
2+
3+
rankdir = "LR"
4+
5+
6+
object [
7+
shape = none
8+
label = <<table border="0" cellspacing="0">
9+
<tr><td><b>object</b></td></tr>
10+
<tr><td port="w" border="1">weakrefs</td></tr>
11+
<tr><td port="dv" border="1">dict or values</td></tr>
12+
<tr><td border="1" >GC info 0</td></tr>
13+
<tr><td border="1" >GC info 1</td></tr>
14+
<tr><td port="r" border="1" >refcount</td></tr>
15+
<tr><td port="h" border="1" >__class__</td></tr>
16+
</table>>
17+
]
18+
19+
values [
20+
shape = none
21+
label = <<table border="0" cellspacing="0">
22+
<tr><td><b>values</b></td></tr>
23+
<tr><td port="0" border="1">values[0]</td></tr>
24+
<tr><td border="1">values[1]</td></tr>
25+
<tr><td border="1">...</td></tr>
26+
</table>>
27+
28+
]
29+
30+
class [
31+
shape = none
32+
label = <<table border="0" cellspacing="0">
33+
<tr><td><b>class</b></td></tr>
34+
<tr><td port="head" bgcolor="lightgreen" border="1">...</td></tr>
35+
<tr><td border="1" bgcolor="lightgreen">dict_offset</td></tr>
36+
<tr><td border="1" bgcolor="lightgreen">...</td></tr>
37+
<tr><td port="k" border="1" bgcolor="lightgreen">cached_keys</td></tr>
38+
</table>>
39+
]
40+
41+
keys [label = "dictionary keys"; fillcolor="lightgreen"; style="filled"]
42+
NULL [ label = " NULL"; shape="plain"]
43+
object:w -> NULL
44+
object:h -> class:head
45+
object:dv -> values:0
46+
class:k -> keys
47+
48+
oop [ label = "pointer"; shape="plain"]
49+
oop -> object:r
50+
}

Objects/object_layout_312.png

30 KB
Loading

Objects/object_layout_full_312.gv

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
digraph ideal {
2+
3+
rankdir = "LR"
4+
5+
6+
object [
7+
shape = none
8+
label = <<table border="0" cellspacing="0">
9+
<tr><td><b>object</b></td></tr>
10+
<tr><td port="w" border="1">weakrefs</td></tr>
11+
<tr><td port="dv" border="1">dict or values</td></tr>
12+
<tr><td border="1" >GC info 0</td></tr>
13+
<tr><td border="1" >GC info 1</td></tr>
14+
<tr><td port="r" border="1" >refcount</td></tr>
15+
<tr><td port="h" border="1" >__class__</td></tr>
16+
<tr><td border="1">opaque (extension) data </td></tr>
17+
<tr><td border="1">...</td></tr>
18+
<tr><td border="1">__slot__ 0</td></tr>
19+
<tr><td border="1">...</td></tr>
20+
</table>>
21+
]
22+
23+
oop [ label = "pointer"; shape="plain"]
24+
oop -> object:r
25+
}

Objects/object_layout_full_312.png

16.7 KB
Loading

0 commit comments

Comments
 (0)