Skip to content

Commit 4e7aafb

Browse files
author
Matthew Hammer
committed
update doc of profile-queries
1 parent 763b8c3 commit 4e7aafb

File tree

4 files changed

+139851
-75495
lines changed

4 files changed

+139851
-75495
lines changed

profile-queries.md

Lines changed: 143 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,13 @@ how we profile its performance. We intend this profiling effort to address
1111

1212
## Quick Start
1313

14+
### 0. Enable debug assertions
15+
```
16+
./configure --enable-debug-assertions
17+
```
18+
1419
### 1. Compile `rustc`
15-
Compile the compiler, as usual:
20+
Compile the compiler, up to at least stage 1:
1621

1722
```
1823
python x.py --stage 1
@@ -28,8 +33,8 @@ rustc -Z profile-queries -Z incremental=cache foo.rs
2833
Regarding the two additional parameters:
2934

3035
- `-Z profile-queries` tells the compiler to run a separate thread
31-
that profiles the queries made by the main compiler thread(s). -
32-
`-Z incremental=cache` tells the compiler to "cache" various files
36+
that profiles the queries made by the main compiler thread(s).
37+
- `-Z incremental=cache` tells the compiler to "cache" various files
3338
that describe the compilation dependencies, in the subdirectory
3439
`cache`.
3540

@@ -39,15 +44,23 @@ This command will generate the following files:
3944
the [trace of queries](#trace-of-queries).
4045
- `profile_queries.counts.txt` consists of a histogram, where each histogram "bucket" is a query provider.
4146

42-
### 3. Inspect the output
43-
3(a). Open the HTML file (`profile_queries.html`) with a browser. See [this section](#interpret-the-html-output) for an explanation of this file.
4447

45-
3(b). Open the data file (`profile_queries.counts.txt`) with a text editor, or spreadsheet. See [this section](#interpret-the-data-output) for an explanation of this file.
48+
### 3. Run `rustc`, with `-Z time-passes`:
4649

47-
3(c). Older stuff, also generated as output (you can _ignore these files_; we won't discuss them further here):
50+
- This additional flag will add all timed passes to the output files
51+
mentioned above, in step 2. As described below, these passes appear
52+
visually distinct from the queries in the HTML output (they
53+
currently appear as green boxes, via CSS).
4854

49-
- `dep_graph.dot` consists of old stuff: a representation of dependencies that are _outside_ the newer query model.
50-
- `dep_graph.txt` consists of old stuff: a representation of dependencies that are _outside_ the newer query model.
55+
### 4. Inspect the output
56+
57+
- 4(a). Open the HTML file (`profile_queries.html`) with a browser.
58+
See [this section](#interpret-the-html-output) for an explanation of
59+
this file.
60+
- 4(b). Open the data file (`profile_queries.counts.txt`) with a text
61+
editor, or spreadsheet. See [this
62+
section](#interpret-the-data-output) for an explanation of this
63+
file.
5164

5265

5366
## Interpret the HTML Output
@@ -70,27 +83,28 @@ The trace of the queries has a formal structure; see
7083

7184
We style this formal structure as follows:
7285

73-
- Blue dots represent query hits. They consist of leaves in the
74-
trace's tree. (CSS class: `hit`).
75-
- Red boxes represent query misses. They consist of internal nodes in
76-
the trace's tree. (CSS class: `miss`).
77-
- Many red boxes contain _nested boxes and dots_. This nesting structure
78-
reflects that some providers _depend on_ results from other
79-
providers, which consist of their nested children.
80-
- For example, the red box labeled as `typeck_tables_of` depends
81-
on the one labeled `adt_dtorck_constraint`, which itself
82-
depends on one labeled `coherent_trait`.
83-
- Some red boxes are _labeled_ with text, and have highlighted borders
84-
(light red, and bolded). (See [heuristics](#heuristics) for
85-
details). Where they are present, the labels give the following
86-
information:
86+
- **Timed passes:** Green boxes, when present (via `-Z time-passes`), represent _timed
87+
passes_ in the compiler. In future versions, these passes may be
88+
replaced by queries, explained below.
89+
- **Labels:** Some green and red boxes are labeled with text. Where they are
90+
present, the labels give the following information:
8791
- The [query's _provider_](#queries), sans its _key_ and its _result_,
8892
which are often too long to include in these labels.
8993
- The _duration_ of the provider, as a fraction of the total time
9094
(for the entire trace). This fraction includes the query's
9195
entire extent (that is, the sum total of all of its
9296
sub-queries).
93-
97+
- **Query hits:** Blue dots represent query hits. They consist of leaves in the
98+
trace's tree. (CSS class: `hit`).
99+
- **Query misses:** Red boxes represent query misses. They consist of internal nodes in
100+
the trace's tree. (CSS class: `miss`).
101+
- **Nesting structure:** Many red boxes contain _nested boxes and
102+
dots_. This nesting structure reflects that some providers _depend
103+
on_ results from other providers, which consist of their nested
104+
children.
105+
- Some red boxes are _labeled_ with text, and have highlighted borders
106+
(light red, and bolded). (See [heuristics](#heuristics) for
107+
details).
94108

95109
## Heuristics
96110

@@ -102,12 +116,17 @@ Heuristics-based CSS Classes:
102116
but easy to modify). Important nodes are styled with textual
103117
labels, and highlighted borders (light red, and bolded).
104118

119+
- `frac-50`, `-40`, ... -- Trace nodes whose total duration (self and
120+
children) take a large fraction of the total duration, at or above
121+
50%, 40%, and so on. We style nodes these with larger font and
122+
padding.
123+
105124
## Interpret the Data Output
106125

107126
The file `profile_queries.counts.txt` contains a table of information
108127
about the queries, organized around their providers.
109128

110-
For each provider, we produce:
129+
For each provider (or timed pass, when `-Z time-passes` is present), we produce:
111130

112131
- A total **count** --- the total number of times this provider was
113132
queried
@@ -126,45 +145,113 @@ The following example `profile_queries.counts.txt` file results from
126145
running on a hello world program (a single main function that uses
127146
`println` to print `"hellow world").
128147

129-
As explained above, the columns consist of `provider`, `count`, `duration`:
148+
As explained above, the columns consist of `provider/pass`, `count`, `duration`:
130149

131150
```
132-
symbol_name,2441,0.362
133-
def_symbol_name,2414,0.129
134-
item_attrs,5300,0.060
135-
type_of,4841,0.059
136-
generics_of,7216,0.049
137-
impl_trait_ref,2898,0.037
138-
def_span,20381,0.030
139-
adt_def,1142,0.028
140-
is_foreign_item,2425,0.021
141-
adt_dtorck_constraint,2,0.016
142-
typeck_tables_of,33,0.014
143-
typeck_item_bodies,1,0.010
144-
coherent_trait,7,0.008
145-
adt_destructor,10,0.008
146-
borrowck,4,0.008
147-
mir_validated,4,0.007
148-
impl_parent,306,0.003
149-
trait_def,216,0.001
150-
mir_const,2,0.001
151-
optimized_mir,6,0.000
152-
adt_sized_constraint,9,0.000
153-
predicates_of,82,0.000
154-
privacy_access_levels,5,0.000
151+
translation,1,0.891
152+
symbol_name,2658,0.733
153+
def_symbol_name,2556,0.268
154+
item_attrs,5566,0.162
155+
type_of,6922,0.117
156+
generics_of,8020,0.084
157+
serialize dep graph,1,0.079
158+
relevant_trait_impls_for,50,0.063
159+
def_span,24875,0.061
160+
expansion,1,0.059
161+
const checking,1,0.055
162+
adt_def,1141,0.048
163+
trait_impls_of,32,0.045
164+
is_copy_raw,47,0.045
165+
is_foreign_item,2638,0.042
166+
fn_sig,2172,0.033
167+
adt_dtorck_constraint,2,0.023
168+
impl_trait_ref,2434,0.023
169+
typeck_tables_of,29,0.022
170+
item-bodies checking,1,0.017
171+
typeck_item_bodies,1,0.017
172+
is_default_impl,2320,0.017
173+
borrow checking,1,0.014
174+
borrowck,4,0.014
175+
mir_validated,4,0.013
176+
adt_destructor,10,0.012
177+
layout_raw,258,0.010
178+
load_dep_graph,1,0.007
179+
item-types checking,1,0.005
180+
mir_const,2,0.005
181+
name resolution,1,0.004
182+
is_object_safe,35,0.003
183+
is_sized_raw,89,0.003
184+
parsing,1,0.003
185+
is_freeze_raw,11,0.001
186+
privacy checking,1,0.001
187+
privacy_access_levels,5,0.001
188+
resolving dependency formats,1,0.001
189+
adt_sized_constraint,9,0.001
190+
wf checking,1,0.001
191+
liveness checking,1,0.001
192+
compute_incremental_hashes_map,1,0.001
193+
match checking,1,0.001
194+
type collecting,1,0.001
195+
param_env,31,0.000
196+
effect checking,1,0.000
197+
trait_def,140,0.000
198+
lowering ast -> hir,1,0.000
199+
predicates_of,70,0.000
200+
extern_crate,319,0.000
201+
lifetime resolution,1,0.000
202+
is_const_fn,6,0.000
203+
intrinsic checking,1,0.000
204+
translation item collection,1,0.000
155205
impl_polarity,15,0.000
156-
trait_of_item,7,0.000
157-
region_maps,11,0.000
206+
creating allocators,1,0.000
207+
language item collection,1,0.000
208+
crate injection,1,0.000
209+
early lint checks,1,0.000
210+
indexing hir,1,0.000
211+
maybe creating a macro crate,1,0.000
212+
coherence checking,1,0.000
213+
optimized_mir,6,0.000
214+
is_panic_runtime,33,0.000
158215
associated_item_def_ids,7,0.000
216+
needs_drop_raw,10,0.000
217+
lint checking,1,0.000
218+
complete gated feature checking,1,0.000
219+
stability index,1,0.000
220+
region_maps,11,0.000
159221
super_predicates_of,8,0.000
160-
variances_of,12,0.000
222+
coherent_trait,2,0.000
223+
AST validation,1,0.000
224+
loop checking,1,0.000
225+
static item recursion checking,1,0.000
226+
variances_of,11,0.000
227+
associated_item,5,0.000
228+
plugin loading,1,0.000
229+
looking for plugin registrar,1,0.000
230+
stability checking,1,0.000
231+
describe_def,15,0.000
232+
variance testing,1,0.000
233+
codegen unit partitioning,1,0.000
234+
looking for entry point,1,0.000
235+
checking for inline asm in case the target doesn't support it,1,0.000
236+
inherent_impls,1,0.000
161237
crate_inherent_impls,1,0.000
162-
is_exported_symbol,2,0.000
163-
associated_item,3,0.000
238+
trait_of_item,7,0.000
164239
crate_inherent_impls_overlap_check,1,0.000
240+
attribute checking,1,0.000
241+
internalize symbols,1,0.000
242+
impl wf inference,1,0.000
243+
death checking,1,0.000
244+
reachability checking,1,0.000
165245
reachable_set,1,0.000
166-
is_mir_available,1,0.000
167-
inherent_impls,1,0.000
246+
is_exported_symbol,3,0.000
247+
is_mir_available,2,0.000
248+
unused lib feature checking,1,0.000
249+
maybe building test harness,1,0.000
250+
recursion limit,1,0.000
251+
write allocator module,1,0.000
252+
assert dep graph,1,0.000
253+
plugin registration,1,0.000
254+
write metadata,1,0.000
168255
```
169256

170257
# Background

profile-queries/example0.png

47.8 KB
Loading
Lines changed: 98 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,104 @@
1-
symbol_name,2441,0.362
2-
def_symbol_name,2414,0.129
3-
item_attrs,5300,0.060
4-
type_of,4841,0.059
5-
generics_of,7216,0.049
6-
impl_trait_ref,2898,0.037
7-
def_span,20381,0.030
8-
adt_def,1142,0.028
9-
is_foreign_item,2425,0.021
10-
adt_dtorck_constraint,2,0.016
11-
typeck_tables_of,33,0.014
12-
typeck_item_bodies,1,0.010
13-
coherent_trait,7,0.008
14-
adt_destructor,10,0.008
15-
borrowck,4,0.008
16-
mir_validated,4,0.007
17-
impl_parent,306,0.003
18-
trait_def,216,0.001
19-
mir_const,2,0.001
20-
optimized_mir,6,0.000
21-
adt_sized_constraint,9,0.000
22-
predicates_of,82,0.000
23-
privacy_access_levels,5,0.000
1+
translation,1,0.891
2+
symbol_name,2658,0.733
3+
def_symbol_name,2556,0.268
4+
item_attrs,5566,0.162
5+
type_of,6922,0.117
6+
generics_of,8020,0.084
7+
serialize dep graph,1,0.079
8+
relevant_trait_impls_for,50,0.063
9+
def_span,24875,0.061
10+
expansion,1,0.059
11+
const checking,1,0.055
12+
adt_def,1141,0.048
13+
trait_impls_of,32,0.045
14+
is_copy_raw,47,0.045
15+
is_foreign_item,2638,0.042
16+
fn_sig,2172,0.033
17+
adt_dtorck_constraint,2,0.023
18+
impl_trait_ref,2434,0.023
19+
typeck_tables_of,29,0.022
20+
item-bodies checking,1,0.017
21+
typeck_item_bodies,1,0.017
22+
is_default_impl,2320,0.017
23+
borrow checking,1,0.014
24+
borrowck,4,0.014
25+
mir_validated,4,0.013
26+
adt_destructor,10,0.012
27+
layout_raw,258,0.010
28+
load_dep_graph,1,0.007
29+
item-types checking,1,0.005
30+
mir_const,2,0.005
31+
name resolution,1,0.004
32+
is_object_safe,35,0.003
33+
is_sized_raw,89,0.003
34+
parsing,1,0.003
35+
is_freeze_raw,11,0.001
36+
privacy checking,1,0.001
37+
privacy_access_levels,5,0.001
38+
resolving dependency formats,1,0.001
39+
adt_sized_constraint,9,0.001
40+
wf checking,1,0.001
41+
liveness checking,1,0.001
42+
compute_incremental_hashes_map,1,0.001
43+
match checking,1,0.001
44+
type collecting,1,0.001
45+
param_env,31,0.000
46+
effect checking,1,0.000
47+
trait_def,140,0.000
48+
lowering ast -> hir,1,0.000
49+
predicates_of,70,0.000
50+
extern_crate,319,0.000
51+
lifetime resolution,1,0.000
52+
is_const_fn,6,0.000
53+
intrinsic checking,1,0.000
54+
translation item collection,1,0.000
2455
impl_polarity,15,0.000
25-
trait_of_item,7,0.000
26-
region_maps,11,0.000
56+
creating allocators,1,0.000
57+
language item collection,1,0.000
58+
crate injection,1,0.000
59+
early lint checks,1,0.000
60+
indexing hir,1,0.000
61+
maybe creating a macro crate,1,0.000
62+
coherence checking,1,0.000
63+
optimized_mir,6,0.000
64+
is_panic_runtime,33,0.000
2765
associated_item_def_ids,7,0.000
66+
needs_drop_raw,10,0.000
67+
lint checking,1,0.000
68+
complete gated feature checking,1,0.000
69+
stability index,1,0.000
70+
region_maps,11,0.000
2871
super_predicates_of,8,0.000
29-
variances_of,12,0.000
72+
coherent_trait,2,0.000
73+
AST validation,1,0.000
74+
loop checking,1,0.000
75+
static item recursion checking,1,0.000
76+
variances_of,11,0.000
77+
associated_item,5,0.000
78+
plugin loading,1,0.000
79+
looking for plugin registrar,1,0.000
80+
stability checking,1,0.000
81+
describe_def,15,0.000
82+
variance testing,1,0.000
83+
codegen unit partitioning,1,0.000
84+
looking for entry point,1,0.000
85+
checking for inline asm in case the target doesn't support it,1,0.000
86+
inherent_impls,1,0.000
3087
crate_inherent_impls,1,0.000
31-
is_exported_symbol,2,0.000
32-
associated_item,3,0.000
88+
trait_of_item,7,0.000
3389
crate_inherent_impls_overlap_check,1,0.000
90+
attribute checking,1,0.000
91+
internalize symbols,1,0.000
92+
impl wf inference,1,0.000
93+
death checking,1,0.000
94+
reachability checking,1,0.000
3495
reachable_set,1,0.000
35-
is_mir_available,1,0.000
36-
inherent_impls,1,0.000
96+
is_exported_symbol,3,0.000
97+
is_mir_available,2,0.000
98+
unused lib feature checking,1,0.000
99+
maybe building test harness,1,0.000
100+
recursion limit,1,0.000
101+
write allocator module,1,0.000
102+
assert dep graph,1,0.000
103+
plugin registration,1,0.000
104+
write metadata,1,0.000

0 commit comments

Comments
 (0)