Skip to content

perf triage for this week. #1340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 1, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 251 additions & 0 deletions triage/2022-05-31.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# 2022-05-31 Triage Log

A good week: The regressions were small; some have follow-up PR's in flight to
address them; and we saw a big improvement from PR
[#97345](https://github.com/rust-lang/rust/pull/97345), which adds more fast
paths for quickly exiting comparisons between two types (such as `BitsImpl<M>`
and `BitsImpl<N>` for const integers `M` and `N`). This improved compile-times
for the `bitmaps` benchmark by 50-65% in some cases (including the trunk
`nalgebra`, according to independent investigation from nnethercote). That same
PR had more modest improvements (1% to 2%) to the compile-times for a number of
other crates. Many thanks to lcnr and nnethercote for some excellent work here!

Triage done by **@pnkfelix**.
Revision range: [43d9f385..0a43923a](https://perf.rust-lang.org/?start=43d9f3859e0204e764161ee085a360274b5f3e9a&end=0a43923a86c3b8f11d005884871b152f59b746f7&absolute=false&stat=instructions%3Au)

**Summary**:

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | N/A | N/A | 0 |
| Regressions 😿 <br /> (secondary) | 0.8% | 0.9% | 3 |
| Improvements 🎉 <br /> (primary) | -4.0% | -65.9% | 227 |
| Improvements 🎉 <br /> (secondary) | -2.0% | -7.7% | 217 |
| All 😿🎉 (primary) | -4.0% | -65.9% | 227 |


3 Regressions, 1 Improvements, 9 Mixed; 0 of them in rollups
59 artifact comparisons made in total

#### Regressions

Proc macro tweaks [#97004](https://github.com/rust-lang/rust/pull/97004) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=9a42c6509d399fd205917ebce474b31315c5d3e9&end=f558990814bb43cfb67db321b299dfdf275663e3&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | 0.2% | 0.2% | 1 |
| Regressions 😿 <br /> (secondary) | 0.6% | 1.5% | 10 |
| Improvements 🎉 <br /> (primary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (secondary) | N/A | N/A | 0 |
| All 😿🎉 (primary) | 0.2% | 0.2% | 1 |

* aparently making a `Buffer<T>` non-generic (its solely instantiated at `u8`) caused a performance regression.
* there is ongoing follow-up work to address the regression, either by making the buffer generic again, or by adding `#[inline]` annotations.
* see e.g. [PR 97539](https://github.com/rust-lang/rust/pull/97539)


rustdoc: include impl generics / self in search index [#96652](https://github.com/rust-lang/rust/pull/96652) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=84288ed6d5307ed44a0f78e2f1ee55fbadf4e978&end=0acc4a35853215a6f9388ab61455ced309711003&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | 0.5% | 1.0% | 3 |
| Regressions 😿 <br /> (secondary) | 1.1% | 1.1% | 3 |
| Improvements 🎉 <br /> (primary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (secondary) | N/A | N/A | 0 |
| All 😿🎉 (primary) | 0.5% | 1.0% | 3 |

* This regressed doc-generation for a few primary benchmarks, but I think that might be a inherent cost of a change like this. I marked it as triaged based on that assumption.

Move things to `rustc_type_ir` [#97287](https://github.com/rust-lang/rust/pull/97287) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=303d916867040e269b54adf3cfc7f5c903dc26ff&end=0f06824013761ed6829887019033f1001e68f623&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | 0.5% | 0.7% | 9 |
| Regressions 😿 <br /> (secondary) | 1.1% | 1.1% | 3 |
| Improvements 🎉 <br /> (primary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (secondary) | N/A | N/A | 0 |
| All 😿🎉 (primary) | 0.5% | 0.7% | 9 |

* During development, this PR was identified as regressing one primary benchmark, `bitmaps` (in a variety of contexts), and the PR author [said](https://github.com/rust-lang/rust/pull/97287#issuecomment-1139917174) better to take that perf hit and deal with it later.
* In the PR that landed, the number of regressing variants was smaller, but the affected set of benchmarks slightly larger: [both `bitmaps` and `unicode-normalization`](https://perf.rust-lang.org/compare.html?start=303d916867040e269b54adf3cfc7f5c903dc26ff&end=0f06824013761ed6829887019033f1001e68f623&stat=instructions:u) are affected, regressing by 0.35% to 0.70%.
* My very brief inpsection of the flamegraphs ([old](https://perf.rust-lang.org/perf/processed-self-profile?commit=303d916867040e269b54adf3cfc7f5c903dc26ff&benchmark=bitmaps-3.1.0-check&scenario=full&type=flamegraph), [new](https://perf.rust-lang.org/perf/processed-self-profile?commit=0f06824013761ed6829887019033f1001e68f623&benchmark=bitmaps-3.1.0-check&scenario=full&type=flamegraph)) didn't show any smoking guns.
* At this point I think we can just live with this performance hit.

#### Improvements

Add suggestion for relaxing static lifetime bounds on dyn trait impls in NLL [#97284](https://github.com/rust-lang/rust/pull/97284) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=764b8615e9149431d8790e3c07cb663642fe393d&end=ed76b773b57cf0aa48ec4e2fc6d6a3f7a9079491&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | N/A | N/A | 0 |
| Regressions 😿 <br /> (secondary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (primary) | -0.2% | -0.2% | 1 |
| Improvements 🎉 <br /> (secondary) | -5.3% | -5.9% | 6 |
| All 😿🎉 (primary) | -0.2% | -0.2% | 1 |

* This managed to improve performance (slightly), probably because it [restructured](https://github.com/rust-lang/rust/pull/97284/commits/3c6c8d5a8dbf4db20450ed5793ef35f29c13466c#r882897413) the `CallArgument` and that had fallout that ended up being positive overall (despite our intuition being that it would hurt performance due to the increase in size).
* It might be nice to confirm that hypothesis independently (by isolating just that structural change and confirming that it has a similar effect on performance here) ...
* ... but, the improvements are essentially isolated to just the secondary wg-grammar benchmark, so its not really worth too much digging, except if we think it might reveal other structural changes we should make elsewhere.

#### Mixed

add a deep fast_reject routine [#97345](https://github.com/rust-lang/rust/pull/97345) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=cbdce423201f1b155c46f3ec690a644cf3b4ba53&end=4a99c5f504ab65a0fd9d60f515811e1d9cff8c0a&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | N/A | N/A | 0 |
| Regressions 😿 <br /> (secondary) | 0.9% | 1.6% | 10 |
| Improvements 🎉 <br /> (primary) | -11.0% | -65.8% | 45 |
| Improvements 🎉 <br /> (secondary) | -0.5% | -0.9% | 12 |
| All 😿🎉 (primary) | -11.0% | -65.8% | 45 |

* this was great, as noted in summary.


Move various checks to typeck so them failing causes the typeck result to get tainted [#96046](https://github.com/rust-lang/rust/pull/96046) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=46147119ec545045948bc799581d93edd3b1617b&end=56fd680cf9226ab424f88d4e3b43c5e088d17f19&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | N/A | N/A | 0 |
| Regressions 😿 <br /> (secondary) | 0.9% | 1.1% | 4 |
| Improvements 🎉 <br /> (primary) | -0.3% | -0.4% | 4 |
| Improvements 🎉 <br /> (secondary) | -0.3% | -0.3% | 24 |
| All 😿🎉 (primary) | -0.3% | -0.4% | 4 |

* This had some small improvements to primary benchmarks.
* The CTFE stress test regressed, but I assume that was expected since this was a change to the CTFE engine (to address some ICE's).

Update jemalloc to v5.3 [#96790](https://github.com/rust-lang/rust/pull/96790) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=56fd680cf9226ab424f88d4e3b43c5e088d17f19&end=ebbcbfc236ced21d5e6a92269edb704692ff26b8&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | N/A | N/A | 0 |
| Regressions 😿 <br /> (secondary) | 0.3% | 0.3% | 1 |
| Improvements 🎉 <br /> (primary) | -0.9% | -6.2% | 212 |
| Improvements 🎉 <br /> (secondary) | -1.1% | -3.2% | 191 |
| All 😿🎉 (primary) | -0.9% | -6.2% | 212 |

* This had various improvements to our primary benchmarks' instruction counts.
* The other big thing to observe: the max-rss was improved in several primary
benchmarks (by 1% to 6%), and some secondary benchmarks saw even more
significant improvements to their max-rss.

Split dead store elimination off dest prop [#97158](https://github.com/rust-lang/rust/pull/97158) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=19abca1172ae10d5c084b6c3013d92680f92dd8b&end=68314177e70017c08f6cdf295631bb508f9f85bc&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | 0.5% | 1.3% | 15 |
| Regressions 😿 <br /> (secondary) | 0.6% | 2.3% | 12 |
| Improvements 🎉 <br /> (primary) | -0.4% | -1.9% | 50 |
| Improvements 🎉 <br /> (secondary) | -0.6% | -1.3% | 33 |
| All 😿🎉 (primary) | -0.2% | -1.9% | 65 |

* The changes here were investigated by [the PR author](https://github.com/rust-lang/rust/pull/97158#issuecomment-1140550402).
* Marking as triaged based on their investigation.

Try to cache region_scope_tree as a query [#97383](https://github.com/rust-lang/rust/pull/97383) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=68314177e70017c08f6cdf295631bb508f9f85bc&end=4f39fb1f34d4bd25d9ce96afe7b2d109f073e286&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | N/A | N/A | 0 |
| Regressions 😿 <br /> (secondary) | 1.0% | 1.0% | 3 |
| Improvements 🎉 <br /> (primary) | -1.7% | -4.7% | 98 |
| Improvements 🎉 <br /> (secondary) | -2.4% | -7.7% | 43 |
| All 😿🎉 (primary) | -1.7% | -4.7% | 98 |

* This was a targeted PR to address the regressions introduced by PR [#95563](https://github.com/rust-lang/rust/pull/95563) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=acfd327fd4e3a302ebb0a077f422a527a7935333&end=653463731a7f01f519cf85f444869def27f00395&stat=instructions:u).
* It seems to succeed at recovering the instruction-count performance lost in that PR.
* It doesn't *quite* recover all of the max-rss lost
(PR [#97383 improves max-rss by about 3.75%](https://perf.rust-lang.org/compare.html?start=6ac8adad1f7d733b5b97d1df4e7f96e73a46db42&end=209f91e1787d0d29a0e566fa93f35d52e60ea84a&stat=max-rss)
while PR [#95563 degraded max-rss by about 4%](https://perf.rust-lang.org/compare.html?start=acfd327fd4e3a302ebb0a077f422a527a7935333&end=653463731a7f01f519cf85f444869def27f00395&stat=max-rss)),
but its close enough to satisfy our needs.

proc_macro: don't pass a client-side function pointer through the server. [#97461](https://github.com/rust-lang/rust/pull/97461) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=4f39fb1f34d4bd25d9ce96afe7b2d109f073e286&end=116201eefebcf45074ae232377e7145f5fbb704b&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | 0.5% | 0.5% | 2 |
| Regressions 😿 <br /> (secondary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (primary) | -0.4% | -0.8% | 11 |
| Improvements 🎉 <br /> (secondary) | -2.6% | -5.4% | 10 |
| All 😿🎉 (primary) | -0.3% | -0.8% | 13 |

* On the primary benchmarks, this mostly yielded small improvements; the one exception was `serde_derive` (debug), which regressed by ~0.5% for the full and incr-full variants.
* Looking at the flamegraphs for the before and after commits, and at the [table at bottom of details page](https://perf.rust-lang.org/detailed-query.html?commit=116201eefebcf45074ae232377e7145f5fbb704b&base_commit=4f39fb1f34d4bd25d9ce96afe7b2d109f073e286&benchmark=serde_derive-1.0.136-debug&scenario=full), it seems like the instruction-count regression is in `codegen_module`
* Looking at the history of serde_derive-debug (massively zoomed in on the "Percent Delta from First" Graph kind), it seems reasonable to think that something *did* happen on this PR.
* .... but I also don't really think its a big enough regression to be worth tearing our hair out over. This is a (smallish) win overall, and even for serde_derive-debug, it is a small regression in the context of much larger wins, so overall the trajectory is good.
* Marked as triaged.

Replace `#[default_method_body_is_const]` with `#[const_trait]` [#96964](https://github.com/rust-lang/rust/pull/96964) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=855fc022fe879f4e3493a024f9c6b981d6317612&end=5c780b98d10f48d6255cf2deb2643194b9221c02&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | 0.2% | 0.3% | 9 |
| Regressions 😿 <br /> (secondary) | 0.2% | 0.3% | 3 |
| Improvements 🎉 <br /> (primary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (secondary) | -1.1% | -1.1% | 3 |
| All 😿🎉 (primary) | 0.2% | 0.3% | 9 |

* The primary regressions are mostly in variants of stm32f4, with a two serde and one syn thrown in for good measure.
* The improvements are solely to ctfe stress test, which I guess makes sense given the PR?
* Anyway, the regressions seem minor, and they are contained to an unstable feature that the stdlib is using.
* Marking as triaged.

improve format impl for literals [#97480](https://github.com/rust-lang/rust/pull/97480) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=e810f750a2a407f9caeabba39059578e844add14&end=4a8d2e3856c0c75c71998b6c85937203839b946d&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | 0.4% | 0.5% | 4 |
| Regressions 😿 <br /> (secondary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (primary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (secondary) | -1.5% | -1.5% | 1 |
| All 😿🎉 (primary) | 0.4% | 0.5% | 4 |

* This is adding a fast-path so that `format!("literal")` will compile into the same code as `"literal".to_owned()`.
* The primary regression is solely contained to `bitmaps`.
* Its possible that the regression to `bitmaps` is due to `format!("literal")` being totally unused in that code; all instances of `format!` there take an additional argument.
So its *possible* that the extra code to check about whether to use the fast-path is slowing things down there.
* But I personally don't believe that explanation here: Unless I'm misunderstanding the code, there is some amount of macro-expansion into multiple instances of `format!`, but
most of the expanded code is going to be dominated by all the `impl` blocks, not the relatively few `format!` instances. (Unless I massively misunderstand how the macros and/or codegen and/or inlining end up linking up here.)
* So: I don't believe the best hypothesis I have for what is happening here.
* But I also do not think the regression here is large enough to warrant further investigation.
* Marking as triaged.

errors: simplify referring to fluent attributes [#97357](https://github.com/rust-lang/rust/pull/97357) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=c35035cefc709abddabfb28ecc6a326458d46ce2&end=7be9ec27652f2c3b820d341158b0e005f42e248e&stat=instructions:u)

| | mean | max | count |
|:----------:|:----:|:---:|:-----:|
| Regressions 😿 <br /> (primary) | N/A | N/A | 0 |
| Regressions 😿 <br /> (secondary) | 0.4% | 0.6% | 9 |
| Improvements 🎉 <br /> (primary) | N/A | N/A | 0 |
| Improvements 🎉 <br /> (secondary) | -0.4% | -0.5% | 4 |
| All 😿🎉 (primary) | N/A | N/A | 0 |

* There were some regressions here, but they are few, minor, and contained solely to secondary benchmarks (specifically projection-caching and wg-grammar). Marking as triaged.

#### Untriaged Pull Requests

- [#97019 Transition to valtrees pt1](https://github.com/rust-lang/rust/pull/97019)
- [#97004 Proc macro tweaks](https://github.com/rust-lang/rust/pull/97004)
- [#96883 Add EarlyBinder](https://github.com/rust-lang/rust/pull/96883)
- [#96825 Retire `ItemLikeVisitor` trait](https://github.com/rust-lang/rust/pull/96825)
- [#96652 rustdoc: include impl generics / self in search index](https://github.com/rust-lang/rust/pull/96652)
- [#96010 Implement `core::ptr::Unique` on top of `NonNull`](https://github.com/rust-lang/rust/pull/96010)
- [#95990 Rollup of 7 pull requests](https://github.com/rust-lang/rust/pull/95990)
- [#95899 rustc_metadata: Do not encode unnecessary module children](https://github.com/rust-lang/rust/pull/95899)
- [#95893 Respect -Z verify-llvm-ir and other flags that add extra passes when combined with -C no-prepopulate-passes in the new LLVM Pass Manager.](https://github.com/rust-lang/rust/pull/95893)
- [#95835 Rollup of 8 pull requests](https://github.com/rust-lang/rust/pull/95835)
- [#95794 `parse_tt`: a few more tweaks](https://github.com/rust-lang/rust/pull/95794)
- [#95742 Rollup of 6 pull requests](https://github.com/rust-lang/rust/pull/95742)
- [#95715 Shrink `Nonterminal`](https://github.com/rust-lang/rust/pull/95715)
- [#95706 rustdoc: Early doc link resolution fixes and refactorings](https://github.com/rust-lang/rust/pull/95706)
- [#95702 Rollup of 8 pull requests](https://github.com/rust-lang/rust/pull/95702)
- [#95667 Rollup of 5 pull requests](https://github.com/rust-lang/rust/pull/95667)
- [#95662 Rollup of 7 pull requests](https://github.com/rust-lang/rust/pull/95662)
- [#95645 Fix intra doc link ICE when trying to get traits in scope for primitive](https://github.com/rust-lang/rust/pull/95645)
- [#95604 Generate synthetic object file to ensure all exported and used symbols participate in the linking](https://github.com/rust-lang/rust/pull/95604)
- [#95563 Move the extended lifetime resolution into typeck context](https://github.com/rust-lang/rust/pull/95563)
- [#95399 Faster parsing for lower numbers for radix up to 16 (cont.)](https://github.com/rust-lang/rust/pull/95399)
- [#95250 Rollup of 5 pull requests](https://github.com/rust-lang/rust/pull/95250)
- [#94824 Rollup of 7 pull requests](https://github.com/rust-lang/rust/pull/94824)
- [#94814 Rollup of 8 pull requests](https://github.com/rust-lang/rust/pull/94814)