You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
3. Now that the combined profile data from all *rustc* invocations can be found in `<PROFDATA_DIR>/rustc-llvm.profdata` it is time to re-compile LLVM and *rustc* again, this time instructing Clang to make use of this valuable new information.
97
-
To this end, we modify `config.toml` as follows:
97
+
To this end we modify `config.toml` as follows:
98
98
99
99
```toml
100
100
[llvm]
@@ -137,7 +137,7 @@ Diving more into details shows the expected profile:
Workloads that spend most of their time in LLVM (e.g. optimized builds) show the most improvement, while workloads that don't invoke LLVM at all (e.g. check builds) also don't profit from a faster LLVM.
140
-
Let's take a look how we can take things further by applying PGO to the other half of the compiler.
140
+
Let's take a look at how we can take things further by applying PGO to the other half of the compiler.
@@ -214,7 +218,7 @@ As expected the results are similar to when PGO was applied to LLVM: a reduction
214
218
215
219
Because different workloads execute different amounts of Rust code (vs C++/LLVM code), the total reduction can be a lot less for LLVM-heavy cases.
216
220
For example, a full *webrender-opt* build will spend more than 80% of its time in LLVM, so reducing the remaining 20% by 5% can only reduce the total number by 1%.
217
-
On the other hand, a *check* build or an *incr-unchanged* build spends almost no time in LLVM, so the 5% Rust performance improvement translates almost entirely into a 5% build time reduction for these cases:
221
+
On the other hand, a *check* build or an *incr-unchanged* build spends almost no time in LLVM, so the 5% Rust performance improvement translates almost entirely into a 5% instruction count reduction for these cases:
218
222
219
223
![Performance improvements gained from applying PGO to (only) the Rust part of the compiler (details)][rustc-perf-pgo-rust-expanded]
220
224
@@ -248,7 +252,7 @@ Given that PGO adds quite a few complications to the build process of the compil
I then took a glance that the benchmarks' wall time measurements (instead of the instruction count measurements) and saw quite a different picture: *webrender-opt* minus 15%, *style-servo-opt* minus 14%, *serde-check* minus 15%?
255
+
I then took a glance at the benchmarks' wall time measurements (instead of the instruction count measurements) and saw quite a different picture: *webrender-opt* minus 15%, *style-servo-opt* minus 14%, *serde-check* minus 15%?
252
256
This looked decidedly better than for instruction counts.
253
257
But wall time measurements can be very noisy (which is why most people only look at instruction counts on perf.rust-lang.org), and `rustc-perf` only does a single iteration for each benchmark, so I was not prepared to trust these numbers just yet.
254
258
I decided to try and reduce the noise by increasing the number of benchmark iterations from one to twenty.
@@ -260,7 +264,7 @@ After roughly eight hours to complete both the PGO and the non-PGO versions of t
As you can see we get a 10-16% reduction of build times almost across the board.
267
+
As you can see we get a 10-16% reduction of build times almost across the board for real world test cases.
264
268
This was more in line with what I had initially hoped to get from PGO.
265
269
It is a bit surprising that the difference between instruction counts and wall time is so pronounced.
266
270
One plausible explanation would be that PGO improves instruction cache utilization, something which makes a difference for execution time but would not be reflected in the amount of instructions executed.
@@ -300,4 +304,4 @@ It's unlikely that I can spend a lot of time on this personally -- but my hope i
**PS** -- Special thanks to Mark Rousskov for uploading my local benchmarking data to [perf.rust-lang.org][perf.rlo], which makes it much nicer to explore!
307
+
**PS** -- Special thanks to Mark Rousskov for uploading my local benchmarking data to [perf.rust-lang.org][rustc-perf-pgo-both-walltime], which makes it much nicer to explore!
0 commit comments