Add extended stats to GH summary table #1322

Kobzol · 2022-05-15T11:15:23Z

This is an attempt to improve the (currently non-existing) display of secondary performance stats (like max RSS or cycles) in GH summary reports. This corresponds to the "better UI/UX" item from the performance roadmap.

Sometimes changes in icounts are virtually just noise, but some other metric changes in significant ways (typically max RSS, but sometimes also cycles). This can be missed by onlookers, since to show these secondary steps, you have to manually switch to them in the compare page, which is sometimes neglected.

I think that it would be nice if we could show the information in the summary, if we find that the changes are reasonably large (this part will be difficult, as always). The PR currently shows the same summary table that is currently used for icounts also for Max RSS, if we find that there have been large changes (TODO: define what does this mean). We can also do something different than showing the full table, like saying "Watch out, it seems that RSS has changed significantly, check this link to find more.".

The first commit is just small cleanup that enables using multiple summary tables in a single PR without duplicating footer with the footnote definition.

lqd · 2022-05-15T12:56:26Z

Nice, thanks for working on this!

Since the secondary stats may have too much variance, we could even maybe hide them in <details> when we notice a reasonably large change, e.g. with a label that is tentative about the change being actually significant and not just the usual random perturbations) ?

(I wonder if the historical significance factor could also be helpful here, to smooth out the number of times these detections happen. We can iterate later in any case, if the initial steps are not super obtrusive and misleading, which is hard to predict.)

nnethercote · 2022-05-16T01:12:12Z

One concern is that I've noticed max-rss can be noisy in a correlated way: often all the benchmarks will improve or regress slightly on one run, but it's not actually meaningful. I don't know why this is.

But I think it's definitely worth trying. It will likely take some experimentation to find the right settings.

rylev

I'm in favor of this change though it's obvious we'll need to rapidly iterate as so much of the way things currently work is tuned to instruction counts and is not adjusted here.

The one thing I'd like to see changed before we merge this is a clearer indication of max-rss being a secondary stat. As noted in my comments, the way things currently are, it will be possible for us to report "no relevant results" but then show max-rss stats. We need to make this clearer to avoid a bunch of confusion.

site/src/github.rs

Kobzol · 2022-05-16T10:24:55Z

Since the secondary stats may have too much variance, we could even maybe hide them in
when we notice a reasonably large change, e.g. with a label that is tentative about the change being actually significant and not just the usual random perturbations) ?

Hiding the secondary table(s) in <details> sounds like a good idea. What do others think?

One concern is that I've noticed max-rss can be noisy in a correlated way: often all the benchmarks will improve or regress slightly on one run, but it's not actually meaningful. I don't know why this is.

I have also noticed this. Maybe we can tune the change detection algorithm for this and say something like "if at least X benchmarks had large/very large change, then we consider it to be interesting".

I have refactored the code slightly. The deserves_attention naming is a bit confusing, since for icounts we use it for next steps on PR summaries and for inclusion in the triage, while for Max RSS we use it for deciding whether we should display the summary table in the PR body.. sigh :D

Kobzol · 2022-05-18T06:55:16Z

I thought about this a bit more, and I really like @lqd's suggestion. Instead of coming up with arbitrary thresholds, let's just always show MaxRSS and cycles, but at the same time:

Hide them in <details>, so that they won't take that much space, but they will be "one click away".
Include their mean in the <details> header, so we quickly see whether it's even worth it to click on it.

This will make the raw text content of the comment a bit larger, but I hope that won't be a problem. It will also do a few more work on the backend, but since perf. report PRs are not that frequent, I think that should be fine.

WDYT?

rylev · 2022-05-18T11:27:49Z

Are we always going to show max-rss and cycles even if the instruction count is not significant? So, we keep the (not quite so arbitrary but still somewhat arbitrary) threshold for instruction count but get ride of it for other stats?

Kobzol · 2022-05-18T11:36:48Z

Good point. I suggest this:

Always add Max RSS and cycles table, but keep it hidden.
Always add icount table. If there are relevant results, show it as before.
If there are no relevant results, hide it by default (same as RSS/cycles) + add the current string "the run did not find any relevant results".

I think that we should treat icounts a bit specially, as they are usually the most stable/least noisy stat and one that we primarily try to optimize.

Showing the icount table (even though it will be hidden by default) if there are no results in it seems a bit wasteful, but I don't consider that to be a big problem and I would rather do that than add another special case to the code. And also we can further convince the reader that the string "no relevant results were found" indeed means that the table is "empty". I think that this mostly only happens for icounts. I don't ever recall seeing a perf. result where cycles/max RSS didn't have at least a single relevant result (it might happen of course, but I think that it's quite rare).

Kobzol · 2022-05-18T11:38:03Z

Just a reminder (I remembered this wrongly): currently there is no real dedicated heuristic for not showing the icount table in PR summaries (deserves_attention is not used for this!). The only case where the table is not shown is when the relevant result count is zero.

Kobzol · 2022-05-18T11:44:14Z

As an example, I hand-crafted it here. If there are no relevant results, there is no point in showing the empty table.

rylev · 2022-05-18T11:51:10Z

I think this is fine as long as we treat max-rss the same. If there are 0 results for max-rss, we should also not show an empty table.

rylev · 2022-05-18T11:55:40Z

I am a bit worried though that we won't be providing any assistance to folks interpreting max-rss or cycles. We've tuned instruction counts to the point where noise doesn't normally show up, but for max-rss and cycles, noise will come through more often. It would be nice if we had someone of helping people interpret these results.

Kobzol · 2022-05-18T11:57:05Z

We could create some Markdown page in the rustc-perf repository with a short "Hitchhiker's guide to interpreting perf results" and link to it from the summary.

Kobzol · 2022-05-18T12:17:25Z

I changed the code to always include Max RSS and cycles. Yes, I do realize that the instruction comparison and summaries are calculated twice, but it was just so painful to modify the code to remove the duplication that I left it like this :D Until we decide if we want to do this.

rylev

A few small things need to be addressed, but otherwise I'm ok with merging almost as is. We can then refactor.

site/src/github.rs

Kobzol · 2022-05-18T12:48:45Z

I refactored the stat summaries to remove duplication and the panic call.

site/src/github.rs

nnethercote · 2022-05-18T21:16:39Z

In the "Max RSS" section is it worth having a note that RSS measurements can be noisy?

Kobzol · 2022-05-18T21:23:14Z

Directly to the PR comment or to the proposed page with a guide for interpreting the results?

nnethercote · 2022-05-18T22:45:42Z

Either sounds ok, as long as it's somewhere that's fairly easy to find.

rylev · 2022-05-19T09:53:27Z

I think maybe using ### instead of ## headers might be nicer, thoughts?

With `##`

With `###`

Kobzol · 2022-05-19T10:49:10Z

That looks better, indeed.

Kobzol added 2 commits May 15, 2022 12:58

Separate footer from summary table

e1087d2

Include more stats in GH summary table

33a3461

Kobzol requested review from lqd, rylev and nnethercote May 15, 2022 11:15

nnethercote approved these changes May 16, 2022

View reviewed changes

rylev requested changes May 16, 2022

View reviewed changes

Turn Stat into an enum

935c4da

rylev mentioned this pull request May 18, 2022

Additional details per metric #1328

Closed

Always add Max RSS and cycles to PR summary

98b4aba

rylev requested changes May 18, 2022

View reviewed changes

site/src/github.rs Show resolved Hide resolved

site/src/github.rs Outdated Show resolved Hide resolved

Refactor stat summaries

a77669f

rylev reviewed May 18, 2022

View reviewed changes

site/src/github.rs Show resolved Hide resolved

rylev approved these changes May 18, 2022

View reviewed changes

rylev enabled auto-merge May 19, 2022 07:10

rylev merged commit 53f6df8 into master May 19, 2022

rylev deleted the summary-extended-stats branch May 19, 2022 07:10

This was referenced May 19, 2022

Create a guide on interpreting performance results from summaries #1329

Open

Add newline before stat title in GH summary #1330

Merged

klensy mentioned this pull request May 25, 2022

unable to select some comparison metrics: Failed to deserialize request #1337

Closed

Kobzol mentioned this pull request May 25, 2022

Generalize metrics #1338

Merged

Add extended stats to GH summary table #1322

Add extended stats to GH summary table #1322

Uh oh!

Conversation

Kobzol commented May 15, 2022

Uh oh!

lqd commented May 15, 2022

Uh oh!

nnethercote commented May 16, 2022

Uh oh!

rylev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kobzol commented May 16, 2022

Uh oh!

Kobzol commented May 18, 2022

Uh oh!

rylev commented May 18, 2022

Uh oh!

Kobzol commented May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kobzol commented May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kobzol commented May 18, 2022

Uh oh!

rylev commented May 18, 2022

Uh oh!

rylev commented May 18, 2022

Uh oh!

Kobzol commented May 18, 2022

Uh oh!

Kobzol commented May 18, 2022

Uh oh!

rylev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Kobzol commented May 18, 2022

Uh oh!

Uh oh!

nnethercote commented May 18, 2022

Uh oh!

Kobzol commented May 18, 2022

Uh oh!

nnethercote commented May 18, 2022

Uh oh!

rylev commented May 19, 2022

With ##

With ###

Uh oh!

Kobzol commented May 19, 2022

Uh oh!

Uh oh!

Kobzol commented May 18, 2022 •

edited

Loading

Kobzol commented May 18, 2022 •

edited

Loading

With `##`

With `###`