@@ -2,8 +2,9 @@ An informal guide to reading and working on the rustc compiler.
2
2
==================================================================
3
3
4
4
If you wish to expand on this document, or have one of the
5
- slightly-more-familiar authors add anything else to it, please get in touch or
6
- file a bug. Your concerns are probably the same as someone else's.
5
+ slightly-more-familiar authors add anything else to it, please get in
6
+ touch or file a bug. Your concerns are probably the same as someone
7
+ else's.
7
8
8
9
9
10
High-level concepts
@@ -13,65 +14,63 @@ Rustc consists of the following subdirectories:
13
14
14
15
front/ - front-end: lexer, parser, AST.
15
16
middle/ - middle-end: resolving, typechecking, translating
17
+ back/ - back-end: linking and ABI
16
18
driver/ - command-line processing, main() entrypoint
17
19
util/ - ubiquitous types and helper functions
18
20
lib/ - bindings to LLVM
21
+ pretty/ - pretty-printing
19
22
20
- The entry-point for the compiler is main() in driver/rustc.rs, and this file
21
- sequences the various parts together.
23
+ The entry-point for the compiler is main() in driver/rustc.rs, and
24
+ this file sequences the various parts together.
22
25
23
26
24
27
The 3 central data structures:
25
28
------------------------------
26
29
27
- #1: front/ast.rs defines the AST. The AST is treated as immutable after
28
- parsing despite containing some mutable types (hashtables and such).
29
- There are three interesting details to know about this structure:
30
-
31
- - Many -- though not all -- nodes within this data structure are wrapped
32
- in the type spanned[T], meaning that the front-end has marked the
33
- input coordinates of that node. The member .node is the data itself,
34
- the member .span is the input location (file, line, column; both low
35
- and high).
36
-
37
- - Many other nodes within this data structure carry a def_id. These
38
- nodes represent the 'target' of some name reference elsewhere in the
39
- tree. When the AST is resolved, by middle/resolve.rs, all names wind
40
- up acquiring a def that they point to. So anything that can be
41
- pointed-to by a name winds up with a def_id.
42
-
43
- - Many nodes carry an additional type 'ann', for annotations. These
44
- nodes are those that later stages of the middle-end add information
45
- to, augmenting the basic structure of the tree. Currently that
46
- includes the calculated type of any node that has a type; it will also
47
- likely include typestates, layers and effects, when such things are
48
- calculated.
49
-
50
- #2: middle/ty.rs defines the datatype ty.t, with its central member ty.struct.
51
- This is the type that represents types after they have been resolved and
52
- normalized by the middle-end. The typeck phase converts every ast type to
53
- a ty.t, and the latter is used to drive later phases of compilation. Most
54
- variants in the ast.ty tag have a corresponding variant in the ty.struct
55
- tag.
56
-
57
- #3: lib/llvm.rs defines the exported types ValueRef, TypeRef, BasicBlockRef,
58
- and several others. Each of these is an opaque pointer to an LLVM type,
59
- manipulated through the lib.llvm interface.
30
+ #1: front/ast.rs defines the AST. The AST is treated as immutable
31
+ after parsing despite containing some mutable types (hashtables
32
+ and such). There are three interesting details to know about this
33
+ structure:
34
+
35
+ - Many -- though not all -- nodes within this data structure are
36
+ wrapped in the type spanned[T], meaning that the front-end has
37
+ marked the input coordinates of that node. The member .node is
38
+ the data itself, the member .span is the input location (file,
39
+ line, column; both low and high).
40
+
41
+ - Many other nodes within this data structure carry a
42
+ def_id. These nodes represent the 'target' of some name
43
+ reference elsewhere in the tree. When the AST is resolved, by
44
+ middle/resolve.rs, all names wind up acquiring a def that they
45
+ point to. So anything that can be pointed-to by a name winds
46
+ up with a def_id.
47
+
48
+ #2: middle/ty.rs defines the datatype sty. This is the type that
49
+ represents types after they have been resolved and normalized by
50
+ the middle-end. The typeck phase converts every ast type to a
51
+ ty::sty, and the latter is used to drive later phases of
52
+ compilation. Most variants in the ast::ty tag have a
53
+ corresponding variant in the ty::sty tag.
54
+
55
+ #3: lib/llvm.rs defines the exported types ValueRef, TypeRef,
56
+ BasicBlockRef, and several others. Each of these is an opaque
57
+ pointer to an LLVM type, manipulated through the lib.llvm
58
+ interface.
60
59
61
60
62
61
Control and information flow within the compiler:
63
62
-------------------------------------------------
64
63
65
- - main() in driver/rustc.rs assumes control on startup. Options are parsed,
66
- platform is detected, etc.
64
+ - main() in driver/rustc.rs assumes control on startup. Options are
65
+ parsed, platform is detected, etc.
67
66
68
67
- front/parser.rs is driven over the input files.
69
68
70
- - Multiple middle-end passes (middle/resolve.rs, middle/typeck.rs) are run
71
- over the resulting AST. Each pass produces a new AST with some number of
72
- annotations or modifications .
69
+ - Multiple middle-end passes (middle/resolve.rs, middle/typeck.rs) are
70
+ run over the resulting AST. Each pass generates new information
71
+ about the AST which is stored in various side data structures .
73
72
74
73
- Finally middle/trans.rs is applied to the AST, which performs a
75
- type-directed translation to LLVM-ese. When it's finished synthesizing LLVM
76
- values, rustc asks LLVM to write them out as an executable, on which the
77
- normal LLVM pipeline (opt, llc, as) was run.
74
+ type-directed translation to LLVM-ese. When it's finished
75
+ synthesizing LLVM values, rustc asks LLVM to write them out in some
76
+ form (.bc, .o) and possibly run the system linker .
0 commit comments