Merge pull request #1340 from pnkfelix/triage-2022-05-31

rylev · web-flow · commit 08c031997d0e · 2022-06-01T10:20:13.000+02:00
perf triage for this week.
diff --git a/triage/2022-05-31.md b/triage/2022-05-31.md
@@ -0,0 +1,251 @@
+# 2022-05-31 Triage Log
+
+A good week: The regressions were small; some have follow-up PR's in flight to
+address them; and we saw a big improvement from PR
+[#97345](https://github.com/rust-lang/rust/pull/97345), which adds more fast
+paths for quickly exiting comparisons between two types (such as `BitsImpl<M>`
+and `BitsImpl<N>` for const integers `M` and `N`). This improved compile-times
+for the `bitmaps` benchmark by 50-65% in some cases (including the trunk
+`nalgebra`, according to independent investigation from nnethercote). That same
+PR had more modest improvements (1% to 2%) to the compile-times for a number of
+other crates. Many thanks to lcnr and nnethercote for some excellent work here!
+
+Triage done by **@pnkfelix**.
+Revision range: [43d9f385..0a43923a](https://perf.rust-lang.org/?start=43d9f3859e0204e764161ee085a360274b5f3e9a&end=0a43923a86c3b8f11d005884871b152f59b746f7&absolute=false&stat=instructions%3Au)
+
+**Summary**:
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | N/A  | N/A | 0     |
+| Regressions 😿 <br /> (secondary) | 0.8% | 0.9% | 3     |
+| Improvements 🎉 <br /> (primary) | -4.0% | -65.9% | 227   |
+| Improvements 🎉 <br /> (secondary) | -2.0% | -7.7% | 217   |
+| All 😿🎉 (primary) | -4.0% | -65.9% | 227   |
+
+
+3 Regressions, 1 Improvements, 9 Mixed; 0 of them in rollups
+59 artifact comparisons made in total
+
+#### Regressions
+
+Proc macro tweaks [#97004](https://github.com/rust-lang/rust/pull/97004) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=9a42c6509d399fd205917ebce474b31315c5d3e9&end=f558990814bb43cfb67db321b299dfdf275663e3&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | 0.2% | 0.2% | 1     |
+| Regressions 😿 <br /> (secondary) | 0.6% | 1.5% | 10    |
+| Improvements 🎉 <br /> (primary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (secondary) | N/A  | N/A | 0     |
+| All 😿🎉 (primary) | 0.2% | 0.2% | 1     |
+
+* aparently making a `Buffer<T>` non-generic (its solely instantiated at `u8`) caused a performance regression.
+* there is ongoing follow-up work to address the regression, either by making the buffer generic again, or by adding `#[inline]` annotations.
+* see e.g. [PR 97539](https://github.com/rust-lang/rust/pull/97539)
+
+
+rustdoc: include impl generics / self in search index [#96652](https://github.com/rust-lang/rust/pull/96652) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=84288ed6d5307ed44a0f78e2f1ee55fbadf4e978&end=0acc4a35853215a6f9388ab61455ced309711003&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | 0.5% | 1.0% | 3     |
+| Regressions 😿 <br /> (secondary) | 1.1% | 1.1% | 3     |
+| Improvements 🎉 <br /> (primary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (secondary) | N/A  | N/A | 0     |
+| All 😿🎉 (primary) | 0.5% | 1.0% | 3     |
+
+* This regressed doc-generation for a few primary benchmarks, but I think that might be a inherent cost of a change like this. I marked it as triaged based on that assumption.
+
+Move things to `rustc_type_ir` [#97287](https://github.com/rust-lang/rust/pull/97287) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=303d916867040e269b54adf3cfc7f5c903dc26ff&end=0f06824013761ed6829887019033f1001e68f623&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | 0.5% | 0.7% | 9     |
+| Regressions 😿 <br /> (secondary) | 1.1% | 1.1% | 3     |
+| Improvements 🎉 <br /> (primary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (secondary) | N/A  | N/A | 0     |
+| All 😿🎉 (primary) | 0.5% | 0.7% | 9     |
+
+* During development, this PR was identified as regressing one primary benchmark, `bitmaps` (in a variety of contexts), and the PR author [said](https://github.com/rust-lang/rust/pull/97287#issuecomment-1139917174) better to take that perf hit and deal with it later.
+* In the PR that landed, the number of regressing variants was smaller, but the affected set of benchmarks slightly larger: [both `bitmaps` and `unicode-normalization`](https://perf.rust-lang.org/compare.html?start=303d916867040e269b54adf3cfc7f5c903dc26ff&end=0f06824013761ed6829887019033f1001e68f623&stat=instructions:u) are affected, regressing by 0.35% to 0.70%.
+* My very brief inpsection of the flamegraphs ([old](https://perf.rust-lang.org/perf/processed-self-profile?commit=303d916867040e269b54adf3cfc7f5c903dc26ff&benchmark=bitmaps-3.1.0-check&scenario=full&type=flamegraph), [new](https://perf.rust-lang.org/perf/processed-self-profile?commit=0f06824013761ed6829887019033f1001e68f623&benchmark=bitmaps-3.1.0-check&scenario=full&type=flamegraph)) didn't show any smoking guns.
+* At this point I think we can just live with this performance hit.
+
+#### Improvements
+
+Add suggestion for relaxing static lifetime bounds on dyn trait impls in NLL [#97284](https://github.com/rust-lang/rust/pull/97284) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=764b8615e9149431d8790e3c07cb663642fe393d&end=ed76b773b57cf0aa48ec4e2fc6d6a3f7a9079491&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | N/A  | N/A | 0     |
+| Regressions 😿 <br /> (secondary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (primary) | -0.2% | -0.2% | 1     |
+| Improvements 🎉 <br /> (secondary) | -5.3% | -5.9% | 6     |
+| All 😿🎉 (primary) | -0.2% | -0.2% | 1     |
+
+* This managed to improve performance (slightly), probably because it [restructured](https://github.com/rust-lang/rust/pull/97284/commits/3c6c8d5a8dbf4db20450ed5793ef35f29c13466c#r882897413) the `CallArgument` and that had fallout that ended up being positive overall (despite our intuition being that it would hurt performance due to the increase in size).
+* It might be nice to confirm that hypothesis independently (by isolating just that structural change and confirming that it has a similar effect on performance here) ...
+* ... but, the improvements are essentially isolated to just the secondary wg-grammar benchmark, so its not really worth too much digging, except if we think it might reveal other structural changes we should make elsewhere.
+
+#### Mixed
+
+add a deep fast_reject routine [#97345](https://github.com/rust-lang/rust/pull/97345) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=cbdce423201f1b155c46f3ec690a644cf3b4ba53&end=4a99c5f504ab65a0fd9d60f515811e1d9cff8c0a&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | N/A  | N/A | 0     |
+| Regressions 😿 <br /> (secondary) | 0.9% | 1.6% | 10    |
+| Improvements 🎉 <br /> (primary) | -11.0% | -65.8% | 45    |
+| Improvements 🎉 <br /> (secondary) | -0.5% | -0.9% | 12    |
+| All 😿🎉 (primary) | -11.0% | -65.8% | 45    |
+
+* this was great, as noted in summary.
+
+
+Move various checks to typeck so them failing causes the typeck result to get tainted [#96046](https://github.com/rust-lang/rust/pull/96046) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=46147119ec545045948bc799581d93edd3b1617b&end=56fd680cf9226ab424f88d4e3b43c5e088d17f19&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | N/A  | N/A | 0     |
+| Regressions 😿 <br /> (secondary) | 0.9% | 1.1% | 4     |
+| Improvements 🎉 <br /> (primary) | -0.3% | -0.4% | 4     |
+| Improvements 🎉 <br /> (secondary) | -0.3% | -0.3% | 24    |
+| All 😿🎉 (primary) | -0.3% | -0.4% | 4     |
+
+* This had some small improvements to primary benchmarks.
+* The CTFE stress test regressed, but I assume that was expected since this was a change to the CTFE engine (to address some ICE's).
+
+Update jemalloc to v5.3 [#96790](https://github.com/rust-lang/rust/pull/96790) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=56fd680cf9226ab424f88d4e3b43c5e088d17f19&end=ebbcbfc236ced21d5e6a92269edb704692ff26b8&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | N/A  | N/A | 0     |
+| Regressions 😿 <br /> (secondary) | 0.3% | 0.3% | 1     |
+| Improvements 🎉 <br /> (primary) | -0.9% | -6.2% | 212   |
+| Improvements 🎉 <br /> (secondary) | -1.1% | -3.2% | 191   |
+| All 😿🎉 (primary) | -0.9% | -6.2% | 212   |
+
+* This had various improvements to our primary benchmarks' instruction counts.
+* The other big thing to observe: the max-rss was improved in several primary
+  benchmarks (by 1% to 6%), and some secondary benchmarks saw even more
+  significant improvements to their max-rss.
+
+Split dead store elimination off dest prop [#97158](https://github.com/rust-lang/rust/pull/97158) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=19abca1172ae10d5c084b6c3013d92680f92dd8b&end=68314177e70017c08f6cdf295631bb508f9f85bc&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | 0.5% | 1.3% | 15    |
+| Regressions 😿 <br /> (secondary) | 0.6% | 2.3% | 12    |
+| Improvements 🎉 <br /> (primary) | -0.4% | -1.9% | 50    |
+| Improvements 🎉 <br /> (secondary) | -0.6% | -1.3% | 33    |
+| All 😿🎉 (primary) | -0.2% | -1.9% | 65    |
+
+* The changes here were investigated by [the PR author](https://github.com/rust-lang/rust/pull/97158#issuecomment-1140550402).
+* Marking as triaged based on their investigation.
+
+Try to cache region_scope_tree as a query [#97383](https://github.com/rust-lang/rust/pull/97383) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=68314177e70017c08f6cdf295631bb508f9f85bc&end=4f39fb1f34d4bd25d9ce96afe7b2d109f073e286&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | N/A  | N/A | 0     |
+| Regressions 😿 <br /> (secondary) | 1.0% | 1.0% | 3     |
+| Improvements 🎉 <br /> (primary) | -1.7% | -4.7% | 98    |
+| Improvements 🎉 <br /> (secondary) | -2.4% | -7.7% | 43    |
+| All 😿🎉 (primary) | -1.7% | -4.7% | 98    |
+
+* This was a targeted PR to address the regressions introduced by PR [#95563](https://github.com/rust-lang/rust/pull/95563) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=acfd327fd4e3a302ebb0a077f422a527a7935333&end=653463731a7f01f519cf85f444869def27f00395&stat=instructions:u).
+* It seems to succeed at recovering the instruction-count performance lost in that PR.
+* It doesn't *quite* recover all of the max-rss lost
+  (PR [#97383 improves max-rss by about 3.75%](https://perf.rust-lang.org/compare.html?start=6ac8adad1f7d733b5b97d1df4e7f96e73a46db42&end=209f91e1787d0d29a0e566fa93f35d52e60ea84a&stat=max-rss)
+  while PR [#95563 degraded max-rss by about 4%](https://perf.rust-lang.org/compare.html?start=acfd327fd4e3a302ebb0a077f422a527a7935333&end=653463731a7f01f519cf85f444869def27f00395&stat=max-rss)),
+  but its close enough to satisfy our needs.
+
+proc_macro: don't pass a client-side function pointer through the server. [#97461](https://github.com/rust-lang/rust/pull/97461) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=4f39fb1f34d4bd25d9ce96afe7b2d109f073e286&end=116201eefebcf45074ae232377e7145f5fbb704b&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | 0.5% | 0.5% | 2     |
+| Regressions 😿 <br /> (secondary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (primary) | -0.4% | -0.8% | 11    |
+| Improvements 🎉 <br /> (secondary) | -2.6% | -5.4% | 10    |
+| All 😿🎉 (primary) | -0.3% | -0.8% | 13    |
+
+* On the primary benchmarks, this mostly yielded small improvements; the one exception was `serde_derive` (debug), which regressed by ~0.5% for the full and incr-full variants.
+* Looking at the flamegraphs for the before and after commits, and at the [table at bottom of details page](https://perf.rust-lang.org/detailed-query.html?commit=116201eefebcf45074ae232377e7145f5fbb704b&base_commit=4f39fb1f34d4bd25d9ce96afe7b2d109f073e286&benchmark=serde_derive-1.0.136-debug&scenario=full), it seems like the instruction-count regression is in `codegen_module`
+* Looking at the history of serde_derive-debug (massively zoomed in on the "Percent Delta from First" Graph kind), it seems reasonable to think that something *did* happen on this PR.
+* .... but I also don't really think its a big enough regression to be worth tearing our hair out over. This is a (smallish) win overall, and even for serde_derive-debug, it is a small regression in the context of much larger wins, so overall the trajectory is good.
+* Marked as triaged.
+
+Replace `#[default_method_body_is_const]` with `#[const_trait]` [#96964](https://github.com/rust-lang/rust/pull/96964) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=855fc022fe879f4e3493a024f9c6b981d6317612&end=5c780b98d10f48d6255cf2deb2643194b9221c02&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | 0.2% | 0.3% | 9     |
+| Regressions 😿 <br /> (secondary) | 0.2% | 0.3% | 3     |
+| Improvements 🎉 <br /> (primary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (secondary) | -1.1% | -1.1% | 3     |
+| All 😿🎉 (primary) | 0.2% | 0.3% | 9     |
+
+* The primary regressions are mostly in variants of stm32f4, with a two serde and one syn thrown in for good measure.
+* The improvements are solely to ctfe stress test, which I guess makes sense given the PR?
+* Anyway, the regressions seem minor, and they are contained to an unstable feature that the stdlib is using.
+* Marking as triaged.
+
+improve format impl for literals [#97480](https://github.com/rust-lang/rust/pull/97480) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=e810f750a2a407f9caeabba39059578e844add14&end=4a8d2e3856c0c75c71998b6c85937203839b946d&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | 0.4% | 0.5% | 4     |
+| Regressions 😿 <br /> (secondary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (primary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (secondary) | -1.5% | -1.5% | 1     |
+| All 😿🎉 (primary) | 0.4% | 0.5% | 4     |
+
+* This is adding a fast-path so that `format!("literal")` will compile into the same code as `"literal".to_owned()`.
+* The primary regression is solely contained to `bitmaps`.
+* Its possible that the regression to `bitmaps` is due to `format!("literal")` being totally unused in that code; all instances of `format!` there take an additional argument.
+  So its *possible* that the extra code to check about whether to use the fast-path is slowing things down there.
+* But I personally don't believe that explanation here: Unless I'm misunderstanding the code, there is some amount of macro-expansion into multiple instances of `format!`, but
+  most of the expanded code is going to be dominated by all the `impl` blocks, not the relatively few `format!` instances. (Unless I massively misunderstand how the macros and/or codegen and/or inlining end up linking up here.)
+* So: I don't believe the best hypothesis I have for what is happening here.
+* But I also do not think the regression here is large enough to warrant further investigation.
+* Marking as triaged.
+
+errors: simplify referring to fluent attributes [#97357](https://github.com/rust-lang/rust/pull/97357) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=c35035cefc709abddabfb28ecc6a326458d46ce2&end=7be9ec27652f2c3b820d341158b0e005f42e248e&stat=instructions:u)
+
+|            | mean | max | count |
+|:----------:|:----:|:---:|:-----:|
+| Regressions 😿 <br /> (primary) | N/A  | N/A | 0     |
+| Regressions 😿 <br /> (secondary) | 0.4% | 0.6% | 9     |
+| Improvements 🎉 <br /> (primary) | N/A  | N/A | 0     |
+| Improvements 🎉 <br /> (secondary) | -0.4% | -0.5% | 4     |
+| All 😿🎉 (primary) | N/A  | N/A | 0     |
+
+* There were some regressions here, but they are few, minor, and contained solely to secondary benchmarks (specifically projection-caching and wg-grammar). Marking as triaged.
+
+#### Untriaged Pull Requests
+
+- [#97019 Transition to valtrees pt1](https://github.com/rust-lang/rust/pull/97019)
+- [#97004 Proc macro tweaks](https://github.com/rust-lang/rust/pull/97004)
+- [#96883 Add EarlyBinder](https://github.com/rust-lang/rust/pull/96883)
+- [#96825  Retire `ItemLikeVisitor` trait](https://github.com/rust-lang/rust/pull/96825)
+- [#96652 rustdoc: include impl generics / self in search index](https://github.com/rust-lang/rust/pull/96652)
+- [#96010 Implement `core::ptr::Unique` on top of `NonNull`](https://github.com/rust-lang/rust/pull/96010)
+- [#95990 Rollup of 7 pull requests](https://github.com/rust-lang/rust/pull/95990)
+- [#95899 rustc_metadata: Do not encode unnecessary module children](https://github.com/rust-lang/rust/pull/95899)
+- [#95893 Respect -Z verify-llvm-ir and other flags that add extra passes when combined with -C no-prepopulate-passes in the new LLVM Pass Manager.](https://github.com/rust-lang/rust/pull/95893)
+- [#95835 Rollup of 8 pull requests](https://github.com/rust-lang/rust/pull/95835)
+- [#95794 `parse_tt`: a few more tweaks](https://github.com/rust-lang/rust/pull/95794)
+- [#95742 Rollup of 6 pull requests](https://github.com/rust-lang/rust/pull/95742)
+- [#95715 Shrink `Nonterminal`](https://github.com/rust-lang/rust/pull/95715)
+- [#95706 rustdoc: Early doc link resolution fixes and refactorings](https://github.com/rust-lang/rust/pull/95706)
+- [#95702 Rollup of 8 pull requests](https://github.com/rust-lang/rust/pull/95702)
+- [#95667 Rollup of 5 pull requests](https://github.com/rust-lang/rust/pull/95667)
+- [#95662 Rollup of 7 pull requests](https://github.com/rust-lang/rust/pull/95662)
+- [#95645 Fix intra doc link ICE when trying to get traits in scope for primitive](https://github.com/rust-lang/rust/pull/95645)
+- [#95604 Generate synthetic object file to ensure all exported and used symbols participate in the linking](https://github.com/rust-lang/rust/pull/95604)
+- [#95563 Move the extended lifetime resolution into typeck context](https://github.com/rust-lang/rust/pull/95563)
+- [#95399 Faster parsing for lower numbers for radix up to 16 (cont.)](https://github.com/rust-lang/rust/pull/95399)
+- [#95250 Rollup of 5 pull requests](https://github.com/rust-lang/rust/pull/95250)
+- [#94824 Rollup of 7 pull requests](https://github.com/rust-lang/rust/pull/94824)
+- [#94814 Rollup of 8 pull requests](https://github.com/rust-lang/rust/pull/94814)