-
Notifications
You must be signed in to change notification settings - Fork 550
Document the usage of cargo-llvm-lines and -Ztimings. #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,5 +12,84 @@ Depending on what you're trying to measure, there are several different approach | |
See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information. | ||
|
||
- If you want function level performance data or even just more details than the above approaches: | ||
- Consider using a native code profiler such as [perf](profiling/with_perf.html). | ||
- Consider using a native code profiler such as [perf](profiling/with_perf.html) | ||
- or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision, | ||
full-featured graphical interface. | ||
|
||
- If you want a nice visual representation of the compile times of your crate graph, | ||
you can use [cargo's `-Ztimings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings), | ||
eg. `cargo -Ztimings build`. | ||
You can use this flag on the compiler itself with `CARGOFLAGS="-Ztimings" ./x.py build` | ||
|
||
## Optimizing rustc's self-compile-times with cargo-llvm-lines | ||
|
||
Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the | ||
number of lines of LLVM IR across all instantiations of a generic function. | ||
Since most of the time compiling rustc is spent in LLVM, the idea is that by | ||
reducing the amount of code passed to LLVM, compiling rustc gets faster. | ||
|
||
Example usage: | ||
``` | ||
cargo install cargo-llvm-lines | ||
# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P | ||
|
||
# Do a clean before every run, to not mix in the results from previous runs. | ||
./x.py clean | ||
RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc | ||
jyn514 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Single crate, eg. rustc_middle | ||
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle* > llvm-lines-middle.txt | ||
# Whole compiler at once | ||
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt | ||
``` | ||
|
||
Example output: | ||
``` | ||
Lines Copies Function name | ||
----- ------ ------------- | ||
11802479 (100%) 52848 (100%) (TOTAL) | ||
1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like another good place to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably. I haven't had much time yet to look into it, but it's tricky because the relevant traits aren't dyn safe. |
||
683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place | ||
568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl | ||
472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable<T>::reserve_rehash | ||
306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}} | ||
212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl | ||
194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl | ||
158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings | ||
119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once | ||
119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific | ||
104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}} | ||
81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats | ||
80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}} | ||
78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}} | ||
69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt | ||
56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich | ||
49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label | ||
``` | ||
|
||
Since this doesn't seem to work with incremental compilation or `x.py check`, | ||
you will be compiling rustc _a lot_. | ||
I recommend changing a few settings in `config.toml` to make it bearable: | ||
``` | ||
[rust] | ||
# A debug build takes _a fourth_ as long on my machine, | ||
# but compiling more than stage0 rustc becomes unbearably slow. | ||
optimize = false | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice trick. Does this affect llvm-lines at all? Maybe it's a good thing if it does since it reflects how much LLVM IR is getting sent in the first place, not how small LLVM can make it with optimizations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apparently yes. |
||
|
||
# We can't use incremental anyway, so we disable it for a little speed boost. | ||
incremental = false | ||
# We won't be running it, so no point in compiling debug checks. | ||
debug = false | ||
|
||
# Caution: This changes the output of llvm-lines. | ||
# Using a single codegen unit gives more accurate output, but is slower to compile. | ||
# Changing it to the number of cores on my machine increased the output | ||
# from 3.5GB to 4.1GB and decreased compile times from 5½ min to 4 min. | ||
codegen-units = 1 | ||
#codegen-units = 0 # num_cpus | ||
``` | ||
|
||
What I'm still not sure about is if inlining in MIR optimizations affect llvm-lines. | ||
Julian-Wollersberger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The output with `-Zmir-opt-level=0` and `-Zmir-opt-level=1` is the same, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you try There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tried it now and the difference is -0.2GB, but it doesn't seem to inline anything. This discussion says MIR inlining doesn't work yet. |
||
but it feels like that some functions that show up at the top should be to small | ||
Julian-Wollersberger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
to have such a high impact. Inlining should only happens in LLVM though. | ||
Julian-Wollersberger marked this conversation as resolved.
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.