Skip to content

Commit 70db841

Browse files
committed
split maps into submodules, document
1 parent 76eac36 commit 70db841

File tree

7 files changed

+1521
-1117
lines changed

7 files changed

+1521
-1117
lines changed

src/librustc/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,10 @@ pointers for understanding them better.
224224
- MIR -- the **Mid-level IR** that is created after type-checking for use by borrowck and trans.
225225
Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is
226226
found in `src/librustc_mir`.
227-
- obligation -- something that must be proven by the trait system.
227+
- obligation -- something that must be proven by the trait system; see `librustc/traits`.
228+
- local crate -- the crate currently being compiled.
229+
- query -- perhaps some sub-computation during compilation; see `librustc/maps`.
230+
- provider -- the function that executes a query; see `librustc/maps`.
228231
- sess -- the **compiler session**, which stores global data used throughout compilation
229232
- substs -- the **substitutions** for a given generic type or item
230233
(e.g., the `i32, u32` in `HashMap<i32, u32>`)

src/librustc/ty/maps/README.md

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# The Rust Compiler Query System
2+
3+
The Compiler Query System is the key to our new demand-driven
4+
organization. The idea is pretty simple. You have various queries
5+
that compute things about the input -- for example, there is a query
6+
called `type_of(def_id)` that, given the def-id of some item, will
7+
compute the type of that item and return it to you.
8+
9+
Query execution is **memoized** -- so the first time you invoke a
10+
query, it will go do the computation, but the next time, the result is
11+
returned from a hashtable. Moreover, query execution fits nicely into
12+
**incremental computation**; the idea is roughly that, when you do a
13+
query, the result **may** be returned to you by loading stored data
14+
from disk (but that's a separate topic we won't discuss further here).
15+
16+
The overall vision is that, eventually, the entire compiler
17+
control-flow will be query driven. There will effectively be one
18+
top-level query ("compile") that will run compilation on a crate; this
19+
will in turn demand information about that crate, starting from the
20+
*end*. For example:
21+
22+
- This "compile" query might demand to get a list of codegen-units
23+
(i.e., modules that need to be compiled by LLVM).
24+
- But computing the list of codegen-units would invoke some subquery
25+
that returns the list of all modules defined in the Rust source.
26+
- That query in turn would invoke something asking for the HIR.
27+
- This keeps going further and further back until we wind up doing the
28+
actual parsing.
29+
30+
However, that vision is not fully realized. Still, big chunks of the
31+
compiler (for example, generating MIR) work exactly like this.
32+
33+
### Invoking queries
34+
35+
To invoke a query is simple. The tcx ("type context") offers a method
36+
for each defined query. So, for example, to invoke the `type_of`
37+
query, you would just do this:
38+
39+
```rust
40+
let ty = tcx.type_of(some_def_id);
41+
```
42+
43+
### Cycles between queries
44+
45+
Currently, cycles during query execution should always result in a
46+
compilation error. Typically, they arise because of illegal programs
47+
that contain cyclic references they shouldn't (though sometimes they
48+
arise because of compiler bugs, in which case we need to factor our
49+
queries in a more fine-grained fashion to avoid them).
50+
51+
However, it is nonetheless often useful to *recover* from a cycle
52+
(after reporting an error, say) and try to soldier on, so as to give a
53+
better user experience. In order to recover from a cycle, you don't
54+
get to use the nice method-call-style syntax. Instead, you invoke
55+
using the `try_get` method, which looks roughly like this:
56+
57+
```rust
58+
use ty::maps::queries;
59+
...
60+
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
61+
Ok(result) => {
62+
// no cycle occurred! You can use `result`
63+
}
64+
Err(err) => {
65+
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
66+
// meaning essentially an "in-progress", not-yet-reported error message.
67+
// See below for more details on what to do here.
68+
}
69+
}
70+
```
71+
72+
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that
73+
you must ensure that a compiler error message is reported. You can do that in two ways:
74+
75+
The simplest is to invoke `err.emit()`. This will emit the cycle error to the user.
76+
77+
However, often cycles happen because of an illegal program, and you
78+
know at that point that an error either already has been reported or
79+
will be reported due to this cycle by some other bit of code. In that
80+
case, you can invoke `err.cancel()` to not emit any error. It is
81+
traditional to then invoke:
82+
83+
```
84+
tcx.sess.delay_span_bug(some_span, "some message")
85+
```
86+
87+
`delay_span_bug()` is a helper that says: we expect a compilation
88+
error to have happened or to happen in the future; so, if compilation
89+
ultimately succeeds, make an ICE with the message `"some
90+
message"`. This is basically just a precaution in case you are wrong.
91+
92+
### How the compiler executes a query
93+
94+
So you may be wondering what happens when you invoke a query
95+
method. The answer is that, for each query, the compiler maintains a
96+
cache -- if your query has already been executed, then, the answer is
97+
simple: we clone the return value out of the cache and return it
98+
(therefore, you should try to ensure that the return types of queries
99+
are cheaply cloneable; insert a `Rc` if necessary).
100+
101+
#### Providers
102+
103+
If, however, the query is *not* in the cache, then the compiler will
104+
try to find a suitable **provider**. A provider is a function that has
105+
been defined and linked into the compiler somewhere that contains the
106+
code to compute the result of the query.
107+
108+
**Providers are defined per-crate.** The compiler maintains,
109+
internally, a table of providers for every crate, at least
110+
conceptually. Right now, there are really two sets: the providers for
111+
queries about the **local crate** (that is, the one being compiled)
112+
and providers for queries about **external crates** (that is,
113+
dependencies of the local crate). Note that what determines the crate
114+
that a query is targeting is not the *kind* of query, but the *key*.
115+
For example, when you invoke `tcx.type_of(def_id)`, that could be a
116+
local query or an external query, depending on what crate the `def_id`
117+
is referring to (see the `self::keys::Key` trait for more information
118+
on how that works).
119+
120+
Providers always have the same signature:
121+
122+
```rust
123+
fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
124+
key: QUERY_KEY)
125+
-> QUERY_RESULT
126+
{
127+
...
128+
}
129+
```
130+
131+
Providers take two arguments: the `tcx` and the query key. Note also
132+
that they take the *global* tcx (i.e., they use the `'tcx` lifetime
133+
twice), rather than taking a tcx with some active inference context.
134+
They return the result of the query.
135+
136+
#### How providers are setup
137+
138+
When the tcx is created, it is given the providers by its creator using
139+
the `Providers` struct. This struct is generate by the macros here, but it
140+
is basically a big list of function pointers:
141+
142+
```rust
143+
struct Providers {
144+
type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
145+
...
146+
}
147+
```
148+
149+
At present, we have one copy of the struct for local crates, and one
150+
for external crates, though the plan is that we may eventually have
151+
one per crate.
152+
153+
These `Provider` structs are ultimately created and populated by
154+
`librustc_driver`, but it does this by distributing the work
155+
throughout the other `rustc_*` crates. This is done by invoking
156+
various `provide` functions. These functions tend to look something
157+
like this:
158+
159+
```rust
160+
pub fn provide(providers: &mut Providers) {
161+
*providers = Providers {
162+
type_of,
163+
..*providers
164+
};
165+
}
166+
```
167+
168+
That is, they take an `&mut Providers` and mutate it in place. Usually
169+
we use the formulation above just because it looks nice, but you could
170+
as well do `providers.type_of = type_of`, which would be equivalent.
171+
(Here, `type_of` would be a top-level function, defined as we saw
172+
before.) So, if we wanted to have add a provider for some other query,
173+
let's call it `fubar`, into the crate above, we might modify the `provide()`
174+
function like so:
175+
176+
```rust
177+
pub fn provide(providers: &mut Providers) {
178+
*providers = Providers {
179+
type_of,
180+
fubar,
181+
..*providers
182+
};
183+
}
184+
185+
fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. }
186+
```
187+
188+
NB. Most of the `rustc_*` crate only provide **local
189+
providers**. Almost all **extern providers** wind up going through the
190+
`rustc_metadata` crate, which loads the information from the crate
191+
metadata. But in some cases there are crates that provide queries for
192+
*both* local and external crates, in which case they define both a
193+
`provide` and a `provide_extern` function that `rustc_driver` can
194+
invoke.
195+
196+
### Adding a new kind of query
197+
198+
So suppose you want to add a new kind of query, how do you do so?
199+
Well, defining a query takes place in two steps:
200+
201+
1. first, you have to specify the query name and arguments; and then,
202+
2. you have to supply query providers where needed.
203+
204+
The specify the query name and arguments, you simply add an entry
205+
to the big macro invocation in `mod.rs`. This will probably have changed
206+
by the time you read this README, but at present it looks something
207+
like:
208+
209+
```
210+
define_maps! { <'tcx>
211+
/// Records the type of every item.
212+
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
213+
214+
...
215+
}
216+
```
217+
218+
Each line of the macro defines one query. The name is broken up like this:
219+
220+
```
221+
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
222+
^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^
223+
| | | | |
224+
| | | | result type of query
225+
| | | query key type
226+
| | dep-node constructor
227+
| name of query
228+
query flags
229+
```
230+
231+
Let's go over them one by one:
232+
233+
- **Query flags:** these are largely unused right now, but the intention
234+
is that we'll be able to customize various aspects of how the query is
235+
processed.
236+
- **Name of query:** the name of the query method
237+
(`tcx.type_of(..)`). Also used as the name of a struct
238+
(`ty::maps::queries::type_of`) that will be generated to represent
239+
this query.
240+
- **Dep-node constructor:** indicates the constructor function that
241+
connects this query to incremental compilation. Typically, this is a
242+
`DepNode` variant, which can be added by modifying the
243+
`define_dep_nodes!` macro invocation in
244+
`librustc/dep_graph/dep_node.rs`.
245+
- However, sometimes we use a custom function, in which case the
246+
name will be in snake case and the function will be defined at the
247+
bottom of the file. This is typically used when the query key is
248+
not a def-id, or just not the type that the dep-node expects.
249+
- **Query key type:** the type of the argument to this query.
250+
This type must implement the `ty::maps::keys::Key` trait, which
251+
defines (for example) how to map it to a crate, and so forth.
252+
- **Result type of query:** the type produced by this query. This type
253+
should (a) not use `RefCell` or other interior mutability and (b) be
254+
cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
255+
non-trivial data types.
256+
- The one exception to those rules is the `ty::steal::Steal` type,
257+
which is used to cheaply modify MIR in place. See the definition
258+
of `Steal` for more details. New uses of `Steal` should **not** be
259+
added without alerting `@rust-lang/compiler`.
260+
261+
So, to add a query:
262+
263+
- Add an entry to `define_maps!` using the format above.
264+
- Possibly add a corresponding entry to the dep-node macro.
265+
- Link the provider by modifying the appropriate `provide` method;
266+
or add a new one if needed and ensure that `rustc_driver` is invoking it.
267+
268+
#### Query structs and descriptions
269+
270+
For each kind, the `define_maps` macro will generate a "query struct"
271+
named after the query. This struct is a kind of a place-holder
272+
describing the query. Each such struct implements the
273+
`self::config::QueryConfig` trait, which has associated types for the
274+
key/value of that particular query. Basically the code generated looks something
275+
like this:
276+
277+
```rust
278+
// Dummy struct representing a particular kind of query:
279+
pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }
280+
281+
impl<'tcx> QueryConfig for type_of<'tcx> {
282+
type Key = DefId;
283+
type Value = Ty<'tcx>;
284+
}
285+
```
286+
287+
There is an additional trait that you may wish to implement called
288+
`self::config::QueryDescription`. This trait is used during cycle
289+
errors to give a "human readable" name for the query, so that we can
290+
summarize what was happening when the cycle occurred. Implementing
291+
this trait is optional if the query key is `DefId`, but if you *don't*
292+
implement it, you get a pretty generic error ("processing `foo`...").
293+
You can put new impls into the `config` module. They look something like this:
294+
295+
```rust
296+
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
297+
fn describe(tcx: TyCtxt, key: DefId) -> String {
298+
format!("computing the type of `{}`", tcx.item_path_str(key))
299+
}
300+
}
301+
```
302+

0 commit comments

Comments
 (0)