Skip to content

Commit f007762

Browse files
committed
create a separate chapter on arenas/interning
1 parent 4b34444 commit f007762

File tree

5 files changed

+119
-107
lines changed

5 files changed

+119
-107
lines changed

src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
- [Debugging and Testing](./incrcomp-debugging.md)
4444
- [Profiling Queries](./queries/profiling.md)
4545
- [Salsa](./salsa.md)
46+
- [Memory Management in Rustc](./memory.md)
4647
- [Lexing and Parsing](./the-parser.md)
4748
- [`#[test]` Implementation](./test-implementation.md)
4849
- [Panic Implementation](./panic-implementation.md)

src/memory.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Memory Management in Rustc
2+
3+
Rustc tries to be pretty careful how it manages memory. The compiler allocates
4+
_a lot_ of data structures throughout compilation, and if we are not careful,
5+
it will take a lot of time and space to do so.
6+
7+
One of the main way the compiler manages this is using arenas and interning.
8+
9+
## Arenas and Interning
10+
11+
We create a LOT of data structures during compilation. For performance reasons,
12+
we allocate them from a global memory pool; they are each allocated once from a
13+
long-lived *arena*. This is called _arena allocation_. This system reduces
14+
allocations/deallocations of memory. It also allows for easy comparison of
15+
types for equality: for each interned type `X`, we implemented [`PartialEq for
16+
X`][peqimpl], so we can just compare pointers. The [`CtxtInterners`] type
17+
contains a bunch of maps of interned types and the arena itself.
18+
19+
[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
20+
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena
21+
22+
### Example: `ty::TyS`
23+
24+
Taking the example of [`ty::TyS`] which represents a type in the compiler (you
25+
can read more [here](./ty.md)). Each time we want to construct a type, the
26+
compiler doesn’t naively allocate from the buffer. Instead, we check if that
27+
type was already constructed. If it was, we just get the same pointer we had
28+
before, otherwise we make a fresh pointer. With this schema if we want to know
29+
if two types are the same, all we need to do is compare the pointers which is
30+
efficient. `TyS` is carefully setup so you never construct them on the stack.
31+
You always allocate them from this arena and you always intern them so they are
32+
unique.
33+
34+
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
35+
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
36+
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
37+
to that buffer is freed and our `'tcx` references would be invalid.
38+
39+
In addition to types, there are a number of other arena-allocated data structures that you can
40+
allocate, and which are found in this module. Here are a few examples:
41+
42+
- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
43+
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
44+
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
45+
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
46+
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
47+
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
48+
defined and discussed in depth in the `AdtDef and DefId` section.
49+
- [`Predicate`] defines something the trait system has to prove (see `traits` module).
50+
51+
[subst]: ./generic_arguments.html#subst
52+
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
53+
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html
54+
55+
[`ty::TyS`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
56+
57+
## The tcx and how it uses lifetimes
58+
59+
The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
60+
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
61+
context:
62+
63+
```rust,ignore
64+
tcx: TyCtxt<'tcx>
65+
// ----
66+
// |
67+
// arena lifetime
68+
```
69+
70+
As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
71+
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
72+
the arenas, anyhow).
73+
74+
### A Note On Lifetimes
75+
76+
The Rust compiler is a fairly large program containing lots of big data
77+
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
78+
references are heavily relied upon to minimize unnecessary memory use. This
79+
manifests itself in the way people can plug into the compiler (i.e. the
80+
[driver](./rustc-driver.md)), preferring a "push"-style API (callbacks) instead
81+
of the more Rust-ic "pull" style (think the `Iterator` trait).
82+
83+
Thread-local storage and interning are used a lot through the compiler to reduce
84+
duplication while also preventing a lot of the ergonomic issues due to many
85+
pervasive lifetimes. The [`rustc::ty::tls`][tls] module is used to access these
86+
thread-locals, although you should rarely need to touch it.
87+
88+
[tls]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/tls/index.html

src/rustc-driver.md

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -32,19 +32,6 @@ replaces this functionality.
3232
> **Warning:** By its very nature, the internal compiler APIs are always going
3333
> to be unstable. That said, we do try not to break things unnecessarily.
3434
35-
## A Note On Lifetimes
36-
37-
The Rust compiler is a fairly large program containing lots of big data
38-
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
39-
references are heavily relied upon to minimize unnecessary memory use. This
40-
manifests itself in the way people can plug into the compiler, preferring a
41-
"push"-style API (callbacks) instead of the more Rust-ic "pull" style (think
42-
the `Iterator` trait).
43-
44-
Thread-local storage and interning are used a lot through the compiler to reduce
45-
duplication while also preventing a lot of the ergonomic issues due to many
46-
pervasive lifetimes. The `rustc::ty::tls` module is used to access these
47-
thread-locals, although you should rarely need to touch it.
4835

4936
[cb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/trait.Callbacks.html
5037
[rd_rc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/fn.run_compiler.html

src/ty.md

Lines changed: 30 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,41 @@ field of type [`TyKind`][tykind], which represents the key type information. `Ty
119119
which represents different kinds of types (e.g. primitives, references, abstract data types,
120120
generics, lifetimes, etc). `TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They
121121
are convenient hacks for efficiency and summarize information about the type that we may want to
122-
know, but they don’t come into the picture as much here.
122+
know, but they don’t come into the picture as much here. Finally, `ty::TyS`s
123+
are [interned](./memory.md), so that the `ty::Ty` can be a thin pointer-like
124+
type. This allows us to do cheap comparisons for equality, along with the other
125+
benefits of interning.
123126

124127
[tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
125128
[kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html#structfield.kind
126129
[tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html
127130

131+
## Allocating and working with types
132+
133+
To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
134+
that correspond mostly to the various kinds of types. For example:
135+
136+
```rust,ignore
137+
let array_ty = tcx.mk_array(elem_ty, len * 2);
138+
```
139+
140+
These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
141+
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
142+
allocate exactly the same type twice).
143+
144+
> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
145+
> – however, this is almost never what you want to do unless you happen to be hashing and looking
146+
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
147+
> particularly once inference is involved. If you are going to be testing for type equality, you
148+
> probably need to start looking into the inference code to do it right.
149+
150+
You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
151+
`tcx.types.char`, etc (see [`CommonTypes`] for more).
152+
153+
[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html
154+
155+
## `ty::TyKind` Variants
156+
128157
Note: `TyKind` is **NOT** the functional programming concept of *Kind*.
129158

130159
Whenever working with a `Ty` in the compiler, it is common to match on the kind of type:
@@ -147,8 +176,6 @@ types in the compiler.
147176
There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes,
148177
“substitutions”, etc).
149178

150-
## `ty::TyKind` Variants
151-
152179
There are a bunch of variants on the `TyKind` enum, which you can see by looking at the rustdocs.
153180
Here is a sampling:
154181

@@ -191,90 +218,6 @@ will discuss this more later.
191218
[kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variant.Error
192219
[kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variants
193220

194-
## Interning
195-
196-
We create a LOT of types during compilation. For performance reasons, we allocate them from a global
197-
memory pool, they are each allocated once from a long-lived *arena*. This is called _arena
198-
allocation_. This system reduces allocations/deallocations of memory. It also allows for easy
199-
comparison of types for equality: we implemented [`PartialEq for TyS`][peqimpl], so we can just
200-
compare pointers. The [`CtxtInterners`] type contains a bunch of maps of interned types and the
201-
arena itself.
202-
203-
[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
204-
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena
205-
206-
Each time we want to construct a type, the compiler doesn’t naively allocate from the buffer.
207-
Instead, we check if that type was already constructed. If it was, we just get the same pointer we
208-
had before, otherwise we make a fresh pointer. With this schema if we want to know if two types are
209-
the same, all we need to do is compare the pointers which is efficient. `TyS` which represents types
210-
is carefully setup so you never construct them on the stack. You always allocate them from this
211-
arena and you always intern them so they are unique.
212-
213-
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
214-
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
215-
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
216-
to that buffer is freed and our `'tcx` references would be invalid.
217-
218-
219-
## The tcx and how it uses lifetimes
220-
221-
The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
222-
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
223-
context:
224-
225-
```rust,ignore
226-
tcx: TyCtxt<'tcx>
227-
// ----
228-
// |
229-
// arena lifetime
230-
```
231-
232-
As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
233-
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
234-
the arenas, anyhow).
235-
236-
## Allocating and working with types
237-
238-
To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
239-
that correspond mostly to the various kinds of types. For example:
240-
241-
```rust,ignore
242-
let array_ty = tcx.mk_array(elem_ty, len * 2);
243-
```
244-
245-
These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
246-
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
247-
allocate exactly the same type twice).
248-
249-
> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
250-
> – however, this is almost never what you want to do unless you happen to be hashing and looking
251-
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
252-
> particularly once inference is involved. If you are going to be testing for type equality, you
253-
> probably need to start looking into the inference code to do it right.
254-
255-
You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
256-
`tcx.types.char`, etc (see [`CommonTypes`] for more).
257-
258-
[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html
259-
260-
## Beyond types: other kinds of arena-allocated data structures
261-
262-
In addition to types, there are a number of other arena-allocated data structures that you can
263-
allocate, and which are found in this module. Here are a few examples:
264-
265-
- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
266-
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
267-
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
268-
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
269-
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
270-
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
271-
defined and discussed in depth in the `AdtDef and DefId` section.
272-
- [`Predicate`] defines something the trait system has to prove (see `traits` module).
273-
274-
[subst]: ./generic_arguments.html#subst
275-
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
276-
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html
277-
278221
## Import conventions
279222

280223
Although there is no hard and fast rule, the `ty` module tends to be used like so:

src/type-inference.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,6 @@ tcx.infer_ctxt().enter(|infcx| {
4343
})
4444
```
4545

46-
Each inference context creates a short-lived type arena to store the
47-
fresh types and things that it will create, as described in the
48-
[chapter on the `ty` module][ty-ch]. This arena is created by the `enter`
49-
function and disposed of after it returns.
50-
51-
[ty-ch]: ty.html
52-
5346
Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
5447
fresh `'cx`, while `'tcx` is the same as outside the inference context.
5548
(Again, see the [`ty` chapter][ty-ch] for more details on this setup.)

0 commit comments

Comments
 (0)