You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: posts/inside-rust/2020-10-30-exploring-pgo-for-the-rust-compiler.md
+9-7Lines changed: 9 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ description: "Investigate the effects that profile guided optimization has on ru
6
6
team: the compiler team <https://www.rust-lang.org/governance/teams/compiler>
7
7
---
8
8
9
-
**TLDR** -- PGO makes the compiler [faster](#Final-Benchmark-Numbers-and-a-Measurement-Surprise) but is [not straightforward](#Where-to-go-from-here) to realize in CI.
9
+
**TLDR** -- PGO makes the compiler [faster](#final-numbers-and-a-benchmarking-plot-twist) but is [not straightforward](#where-to-go-from-here) to realize in CI.
10
10
11
11
For the last few months Mozilla has been using Profile-Guided Optimization (PGO) to build their own [optimized version of Clang][moz-clang], leading to an up to 9% reduction of Firefox compile times on their build infrastructure.
12
12
Would the same be possible for the Rust compiler, that is, could we apply profile-guided optimization to *rustc* itself in order to make it faster?
@@ -126,7 +126,7 @@ Here's a glance at the effect that a PGOed LLVM has on *rustc*'s performance:
126
126
[![Performance improvements gained from apply PGO to LLVM][rustc-perf-pgo-llvm-thumb]][rustc-perf-pgo-llvm]
Because different workloads execute different amounts of Rust code (vs C++/LLVM code), the total reduction can be a lot less for LLVM-heavy cases.
216
216
For example, a full *webrender-opt* build will spend more than 80% of its time in LLVM, so reducing the remaining 20% by 5% can only reduce the total number by 1%.
@@ -246,7 +246,7 @@ Sure, PGO seems to lead to a pretty solid 5% reduction of instruction counts acr
246
246
That is pretty nice -- but also far away from the 20% improvement mentioned in the Clang documentation.
247
247
Given that PGO adds quite a few complications to the build process of the compiler itself (not to mention the almost tripled build times) I started to think that applying PGO to the compiler would probably not be worth the trouble.
I then took a glance that the benchmarks' wall time measurements (instead of the instruction count measurements) and saw quite a different picture: *webrender-opt* minus 15%, *style-servo-opt* minus 14%, *serde-check* minus 15%?
252
252
This looked decidedly better than for instruction counts.
@@ -255,10 +255,10 @@ I decided to try and reduce the noise by increasing the number of benchmark iter
255
255
I only did "full" builds in this configuration as PGO's effect seemed to translate pretty predictably to incremental builds.
256
256
After roughly eight hours to complete both the PGO and the non-PGO versions of the benchmarks these are the numbers I got:
257
257
258
-
[![Wall time improvements gained from applying PGO to the entire compiler][rustc-perf-pgo-both-walltime-thump]][rustc-perf-pgo-both-walltime]
258
+
[![Wall time improvements gained from applying PGO to the entire compiler][rustc-perf-pgo-both-walltime-thumb]][rustc-perf-pgo-both-walltime]
As you can see we get a 10-16% reduction of build times almost across the board.
264
264
This was more in line with what I had initially hoped to get from PGO.
@@ -299,3 +299,5 @@ Having a straightforward way of obtaining a PGOed compiler (e.g. by adding a sim
299
299
It's unlikely that I can spend a lot of time on this personally -- but my hope is that others will pick up the baton. I'd be happy to provide guidance on how to use PGO specifically.
**PS** -- Special thanks to Mark Rousskov for uploading my local benchmarking data to [perf.rust-lang.org][perf.rlo], which makes it much nicer to explore!
0 commit comments