-
Notifications
You must be signed in to change notification settings - Fork 157
Add extended stats to GH summary table #1322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Nice, thanks for working on this! Since the secondary stats may have too much variance, we could even maybe hide them in (I wonder if the historical significance factor could also be helpful here, to smooth out the number of times these detections happen. We can iterate later in any case, if the initial steps are not super obtrusive and misleading, which is hard to predict.) |
One concern is that I've noticed max-rss can be noisy in a correlated way: often all the benchmarks will improve or regress slightly on one run, but it's not actually meaningful. I don't know why this is. But I think it's definitely worth trying. It will likely take some experimentation to find the right settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of this change though it's obvious we'll need to rapidly iterate as so much of the way things currently work is tuned to instruction counts and is not adjusted here.
The one thing I'd like to see changed before we merge this is a clearer indication of max-rss being a secondary stat. As noted in my comments, the way things currently are, it will be possible for us to report "no relevant results" but then show max-rss stats. We need to make this clearer to avoid a bunch of confusion.
Hiding the secondary table(s) in
I have also noticed this. Maybe we can tune the change detection algorithm for this and say something like "if at least X benchmarks had large/very large change, then we consider it to be interesting". I have refactored the code slightly. The |
I thought about this a bit more, and I really like @lqd's suggestion. Instead of coming up with arbitrary thresholds, let's just always show MaxRSS and cycles, but at the same time:
This will make the raw text content of the comment a bit larger, but I hope that won't be a problem. It will also do a few more work on the backend, but since perf. report PRs are not that frequent, I think that should be fine. WDYT? |
Are we always going to show max-rss and cycles even if the instruction count is not significant? So, we keep the (not quite so arbitrary but still somewhat arbitrary) threshold for instruction count but get ride of it for other stats? |
Good point. I suggest this:
I think that we should treat icounts a bit specially, as they are usually the most stable/least noisy stat and one that we primarily try to optimize.
|
Just a reminder (I remembered this wrongly): currently there is no real dedicated heuristic for not showing the icount table in PR summaries ( |
As an example, I hand-crafted it here. If there are no relevant results, there is no point in showing the empty table. |
I think this is fine as long as we treat max-rss the same. If there are 0 results for max-rss, we should also not show an empty table. |
I am a bit worried though that we won't be providing any assistance to folks interpreting max-rss or cycles. We've tuned instruction counts to the point where noise doesn't normally show up, but for max-rss and cycles, noise will come through more often. It would be nice if we had someone of helping people interpret these results. |
We could create some Markdown page in the |
I changed the code to always include Max RSS and cycles. Yes, I do realize that the instruction comparison and summaries are calculated twice, but it was just so painful to modify the code to remove the duplication that I left it like this :D Until we decide if we want to do this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small things need to be addressed, but otherwise I'm ok with merging almost as is. We can then refactor.
I refactored the stat summaries to remove duplication and the panic call. |
In the "Max RSS" section is it worth having a note that RSS measurements can be noisy? |
Directly to the PR comment or to the proposed page with a guide for interpreting the results? |
Either sounds ok, as long as it's somewhere that's fairly easy to find. |
That looks better, indeed. |
This is an attempt to improve the (currently non-existing) display of secondary performance stats (like max RSS or cycles) in GH summary reports. This corresponds to the "better UI/UX" item from the performance roadmap.
Sometimes changes in icounts are virtually just noise, but some other metric changes in significant ways (typically max RSS, but sometimes also cycles). This can be missed by onlookers, since to show these secondary steps, you have to manually switch to them in the compare page, which is sometimes neglected.
I think that it would be nice if we could show the information in the summary, if we find that the changes are reasonably large (this part will be difficult, as always). The PR currently shows the same summary table that is currently used for icounts also for Max RSS, if we find that there have been large changes (TODO: define what does this mean). We can also do something different than showing the full table, like saying "Watch out, it seems that RSS has changed significantly, check this link to find more.".
The first commit is just small cleanup that enables using multiple summary tables in a single PR without duplicating footer with the footnote definition.