Skip to content

Commit 44e45d9

Browse files
committed
rework the README.md for rustc and add other readmes
This takes way longer than I thought it would. =)
1 parent 9a00f3c commit 44e45d9

File tree

11 files changed

+468
-46
lines changed

11 files changed

+468
-46
lines changed

src/librustc/README.md

Lines changed: 99 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -13,49 +13,82 @@ https://github.com/rust-lang/rust/issues
1313

1414
Your concerns are probably the same as someone else's.
1515

16-
The crates of rustc
17-
===================
18-
19-
Rustc consists of a number of crates, including `libsyntax`,
20-
`librustc`, `librustc_back`, `librustc_trans`, and `librustc_driver`
21-
(the names and divisions are not set in stone and may change;
22-
in general, a finer-grained division of crates is preferable):
23-
24-
- [`libsyntax`][libsyntax] contains those things concerned purely with syntax –
25-
that is, the AST, parser, pretty-printer, lexer, macro expander, and
26-
utilities for traversing ASTs – are in a separate crate called
27-
"syntax", whose files are in `./../libsyntax`, where `.` is the
28-
current directory (that is, the parent directory of front/, middle/,
29-
back/, and so on).
30-
31-
- `librustc` (the current directory) contains the high-level analysis
32-
passes, such as the type checker, borrow checker, and so forth.
33-
It is the heart of the compiler.
34-
35-
- [`librustc_back`][back] contains some very low-level details that are
36-
specific to different LLVM targets and so forth.
16+
You may also be interested in the
17+
[Rust Forge](https://forge.rust-lang.org/), which includes a number of
18+
interesting bits of information.
3719

38-
- [`librustc_trans`][trans] contains the code to convert from Rust IR into LLVM
39-
IR, and then from LLVM IR into machine code, as well as the main
40-
driver that orchestrates all the other passes and various other bits
41-
of miscellany. In general it contains code that runs towards the
42-
end of the compilation process.
43-
44-
- [`librustc_driver`][driver] invokes the compiler from
45-
[`libsyntax`][libsyntax], then the analysis phases from `librustc`, and
46-
finally the lowering and codegen passes from [`librustc_trans`][trans].
47-
48-
Roughly speaking the "order" of the three crates is as follows:
49-
50-
librustc_driver
51-
|
52-
+-----------------+-------------------+
53-
| |
54-
libsyntax -> librustc -> librustc_trans
20+
Finally, at the end of this file is a GLOSSARY defining a number of
21+
common (and not necessarily obvious!) names that are used in the Rust
22+
compiler code. If you see some funky name and you'd like to know what
23+
it stands for, check there!
5524

25+
The crates of rustc
26+
===================
5627

57-
The compiler process:
58-
=====================
28+
Rustc consists of a number of crates, including `syntax`,
29+
`rustc`, `rustc_back`, `rustc_trans`, `rustc_driver`, and
30+
many more. The source for each crate can be found in a directory
31+
like `src/libXXX`, where `XXX` is the crate name.
32+
33+
(NB. The names and divisions of these crates are not set in
34+
stone and may change over time -- for the time being, we tend towards
35+
a finer-grained division to help with compilation time, though as
36+
incremental improves that may change.)
37+
38+
The dependency structure of these crates is roughly a diamond:
39+
40+
````
41+
rustc_driver
42+
/ | \
43+
/ | \
44+
/ | \
45+
/ v \
46+
rustc_trans rustc_borrowck ... rustc_metadata
47+
\ | /
48+
\ | /
49+
\ | /
50+
\ v /
51+
rustc
52+
|
53+
v
54+
syntax
55+
/ \
56+
/ \
57+
syntax_pos syntax_ext
58+
```
59+
60+
61+
The idea is that `rustc_driver`, at the top of this lattice, basically
62+
defines the overall control-flow of the compiler. It doesn't have much
63+
"real code", but instead ties together all of the code defined in the
64+
other crates and defines the overall flow of execution.
65+
66+
At the other extreme, the `rustc` crate defines the common and
67+
pervasive data structures that all the rest of the compiler uses
68+
(e.g., how to represent types, traits, and the program itself). It
69+
also contains some amount of the compiler itself, although that is
70+
relatively limited.
71+
72+
Finally, all the crates in the bulge in the middle define the bulk of
73+
the compiler -- they all depend on `rustc`, so that they can make use
74+
of the various types defined there, and they export public routines
75+
that `rustc_driver` will invoke as needed (more and more, what these
76+
crates export are "query definitions", but those are covered later
77+
on).
78+
79+
Below `rustc` lie various crates that make up the parser and error
80+
reporting mechanism. For historical reasons, these crates do not have
81+
the `rustc_` prefix, but they are really just as much an internal part
82+
of the compiler and not intended to be stable (though they do wind up
83+
getting used by some crates in the wild; a practice we hope to
84+
gradually phase out).
85+
86+
Each crate has a `README.md` file that describes, at a high-level,
87+
what it contains, and tries to give some kind of explanation (some
88+
better than others).
89+
90+
The compiler process
91+
====================
5992
6093
The Rust compiler is comprised of six main compilation phases.
6194
@@ -172,3 +205,29 @@ The 3 central data structures:
172205
[back]: https://github.com/rust-lang/rust/tree/master/src/librustc_back/
173206
[rustc]: https://github.com/rust-lang/rust/tree/master/src/librustc/
174207
[driver]: https://github.com/rust-lang/rust/tree/master/src/librustc_driver
208+
209+
Glossary
210+
========
211+
212+
The compiler uses a number of...idiosyncratic abbreviations and
213+
things. This glossary attempts to list them and give you a few
214+
pointers for understanding them better.
215+
216+
- AST -- the **abstract syntax tree** produced the `syntax` crate; reflects user syntax
217+
very closely.
218+
- cx -- we tend to use "cx" as an abbrevation for context. See also tcx, infcx, etc.
219+
- HIR -- the **High-level IR**, created by lowering and desugaring the AST. See `librustc/hir`.
220+
- `'gcx` -- the lifetime of the global arena (see `librustc/ty`).
221+
- generics -- the set of generic type parameters defined on a type or item
222+
- infcx -- the inference context (see `librustc/infer`)
223+
- MIR -- the **Mid-level IR** that is created after type-checking for use by borrowck and trans.
224+
Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is
225+
found in `src/librustc_mir`.
226+
- obligation -- something that must be proven by the trait system.
227+
- sess -- the **compiler session**, which stores global data used throughout compilation
228+
- substs -- the **substitutions** for a given generic type or item
229+
(e.g., the `i32, u32` in `HashMap<i32, u32>`)
230+
- tcx -- the "typing context", main data structure of the compiler (see `librustc/ty`).
231+
- trans -- the code to **translate** MIR into LLVM IR.
232+
- trait reference -- a trait and values for its type parameters (see `librustc/ty`).
233+
- ty -- the internal representation of a **type** (see `librustc/ty`).

src/librustc/hir/README.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Introduction to the HIR
2+
3+
The HIR -- "High-level IR" -- is the primary IR used in most of
4+
rustc. It is a desugared version of the "abstract syntax tree" (AST)
5+
that is generated after parsing, macro expansion, and name resolution
6+
have completed. Many parts of HIR resemble Rust surface syntax quite
7+
closely, with the exception that some of Rust's expression forms have
8+
been desugared away (as an example, `for` loops are converted into a
9+
`loop` and do not appear in the HIR).
10+
11+
This README covers the main concepts of the HIR.
12+
13+
### Out-of-band storage and the `Crate` type
14+
15+
The top-level data-structure in the HIR is the `Crate`, which stores
16+
the contents of the crate currently being compiled (we only ever
17+
construct HIR for the current crate). Whereas in the AST the crate
18+
data structure basically just contains the root module, the HIR
19+
`Crate` structure contains a number of maps and other things that
20+
serve to organize the content of the crate for easier access.
21+
22+
For example, the contents of individual items (e.g., modules,
23+
functions, traits, impls, etc) in the HIR are not immediately
24+
accessible in the parents. So, for example, if had a module item `foo`
25+
containing a function `bar()`:
26+
27+
```
28+
mod foo {
29+
fn bar() { }
30+
}
31+
```
32+
33+
Then in the HIR the representation of module `foo` (the `Mod`
34+
stuct) would have only the **`ItemId`** `I` of `bar()`. To get the
35+
details of the function `bar()`, we would lookup `I` in the
36+
`items` map.
37+
38+
One nice result from this representation is that one can iterate
39+
over all items in the crate by iterating over the key-value pairs
40+
in these maps (without the need to trawl through the IR in total).
41+
There are similar maps for things like trait items and impl items,
42+
as well as "bodies" (explained below).
43+
44+
The other reason to setup the representation this way is for better
45+
integration with incremental compilation. This way, if you gain access
46+
to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately
47+
gain access to the contents of the function `bar()`. Instead, you only
48+
gain access to the **id** for `bar()`, and you must some function to
49+
lookup the contents of `bar()` given its id; this gives us a change to
50+
observe that you accessed the data for `bar()` and record the
51+
dependency.
52+
53+
### Identifiers in the HIR
54+
55+
Most of the code that has to deal with things in HIR tends not to
56+
carry around references into the HIR, but rather to carry around
57+
*identifier numbers* (or just "ids"). Right now, you will find four
58+
sorts of identifiers in active use:
59+
60+
- `DefId`, which primarily name "definitions" or top-level items.
61+
- You can think of a `DefId` as being shorthand for a very explicit
62+
and complete path, like `std::collections::HashMap`. However,
63+
these paths are able to name things that are not nameable in
64+
normal Rust (e.g., impls), and they also include extra information
65+
about the crate (such as its version number, as two versions of
66+
the same crate can co-exist).
67+
- A `DefId` really consists of two parts, a `CrateNum` (which
68+
identifies the crate) and a `DefIndex` (which indixes into a list
69+
of items that is maintained per crate).
70+
- `HirId`, which combines the index of a particular item with an
71+
offset within that item.
72+
- the key point of a `HirId` is that it is *relative* to some item (which is named
73+
via a `DefId`).
74+
- `BodyId`, this is an absolute identifier that refers to a specific
75+
body (definition of a function or constant) in the crate. It is currently
76+
effectively a "newtype'd" `NodeId`.
77+
- `NodeId`, which is an absolute id that identifies a single node in the HIR tree.
78+
- While these are still in common use, **they are being slowly phased out**.
79+
- Since they are absolute within the crate, adding a new node
80+
anywhere in the tree causes the node-ids of all subsequent code in
81+
the crate to change. This is terrible for incremental compilation,
82+
as you can perhaps imagine.
83+
84+
### HIR Map
85+
86+
Most of the time when you are working with the HIR, you will do so via
87+
the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in
88+
the `hir::map` module). The HIR map contains a number of methods to
89+
convert between ids of various kinds and to lookup data associated
90+
with a HIR node.
91+
92+
For example, if you have a `DefId`, and you would like to convert it
93+
to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This
94+
returns an `Option<NodeId>` -- this will be `None` if the def-id
95+
refers to something outside of the current crate (since then it has no
96+
HIR node), but otherwise returns `Some(n)` where `n` is the node-id of
97+
the definition.
98+
99+
Similarly, you can use `tcx.hir.find(n)` to lookup the node for a
100+
`NodeId`. This returns a `Option<Node<'tcx>>`, where `Node` is an enum
101+
defined in the map; by matching on this you can find out what sort of
102+
node the node-id referred to and also get a pointer to the data
103+
itself. Often, you know what sort of node `n` is -- e.g., if you know
104+
that `n` must be some HIR expression, you can do
105+
`tcx.hir.expect_expr(n)`, which will extract and return the
106+
`&hir::Expr`, panicking if `n` is not in fact an expression.
107+
108+
Finally, you can use the HIR map to find the parents of nodes, via
109+
calls like `tcx.hir.get_parent_node(n)`.
110+
111+
### HIR Bodies
112+
113+
A **body** represents some kind of executable code, such as the body
114+
of a function/closure or the definition of a constant. Bodies are
115+
associated with an **owner**, which is typically some kind of item
116+
(e.g., a `fn()` or `const`), but could also be a closure expression
117+
(e.g., `|x, y| x + y`). You can use the HIR map to find find the body
118+
associated with a given def-id (`maybe_body_owned_by()`) or to find
119+
the owner of a body (`body_owner_def_id()`).
120+
121+
122+
123+

src/librustc/hir/map/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
The HIR map, accessible via `tcx.hir`, allows you to quickly navigate the
2+
HIR and convert between various forms of identifiers. See [the HIR README] for more information.
3+
4+
[the HIR README]: ../README.md

src/librustc/hir/mod.rs

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,9 @@ pub struct WhereEqPredicate {
413413

414414
pub type CrateConfig = HirVec<P<MetaItem>>;
415415

416+
/// The top-level data structure that stores the entire contents of
417+
/// the crate currently being compiled.
418+
///
416419
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Debug)]
417420
pub struct Crate {
418421
pub module: Mod,
@@ -927,7 +930,27 @@ pub struct BodyId {
927930
pub node_id: NodeId,
928931
}
929932

930-
/// The body of a function or constant value.
933+
/// The body of a function, closure, or constant value. In the case of
934+
/// a function, the body contains not only the function body itself
935+
/// (which is an expression), but also the argument patterns, since
936+
/// those are something that the caller doesn't really care about.
937+
///
938+
/// Example:
939+
///
940+
/// ```rust
941+
/// fn foo((x, y): (u32, u32)) -> u32 {
942+
/// x + y
943+
/// }
944+
/// ```
945+
///
946+
/// Here, the `Body` associated with `foo()` would contain:
947+
///
948+
/// - an `arguments` array containing the `(x, y)` pattern
949+
/// - a `value` containing the `x + y` expression (maybe wrapped in a block)
950+
/// - `is_generator` would be false
951+
///
952+
/// All bodies have an **owner**, which can be accessed via the HIR
953+
/// map using `body_owner_def_id()`.
931954
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Hash, Debug)]
932955
pub struct Body {
933956
pub arguments: HirVec<Arg>,

src/librustc/lib.rs

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,28 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
//! The Rust compiler.
11+
//! The "main crate" of the Rust compiler. This crate contains common
12+
//! type definitions that are used by the other crates in the rustc
13+
//! "family". Some prominent examples (note that each of these modules
14+
//! has their own README with further details).
15+
//!
16+
//! - **HIR.** The "high-level (H) intermediate representation (IR)" is
17+
//! defined in the `hir` module.
18+
//! - **MIR.** The "mid-level (M) intermediate representation (IR)" is
19+
//! defined in the `mir` module. This module contains only the
20+
//! *definition* of the MIR; the passes that transform and operate
21+
//! on MIR are found in `librustc_mir` crate.
22+
//! - **Types.** The internal representation of types used in rustc is
23+
//! defined in the `ty` module. This includes the **type context**
24+
//! (or `tcx`), which is the central context during most of
25+
//! compilation, containing the interners and other things.
26+
//! - **Traits.** Trait resolution is implemented in the `traits` module.
27+
//! - **Type inference.** The type inference code can be found in the `infer` module;
28+
//! this code handles low-level equality and subtyping operations. The
29+
//! type check pass in the compiler is found in the `librustc_typeck` crate.
30+
//!
31+
//! For a deeper explanation of how the compiler works and is
32+
//! organized, see the README.md file in this directory.
1233
//!
1334
//! # Note
1435
//!

0 commit comments

Comments
 (0)