Benchmark categorization #1181

Kobzol · 2022-02-24T10:08:12Z

This PR adds a category attribute to each benchmark, which should serve to quickly filter benchmarks that fall into different categories (mostly real-world vs stress-test benchmarks). The category is entered in perf-config.json (default is secondary), then it's put into the DB and threaded from the DB to the compare page. The compare page now shows two tables (for primary and secondary benchmarks) and it also has a filter so that you can show e.g. only the primary ones (the filter is mainly useful for the summary, so that you can see e.g. only the percentual changes for real-world crates).

Couple of comments:

I modified the DB to store the benchmark category, instead of hardcoding it in the dashboard or the site crate. I thus had to work around the supports-stable column in the DB, which looks quite complicated (its handling is complicated). And the attribute is not even used or read anywhere, the crates that support stable are hardcoded (

rustc-perf/site/src/request_handlers/dashboard.rs

Line 88 in 9a9d1a8

"futures",

). I suggest to either remove this attribute from DB or start using it (that's for another PR though).
Doing large changes to the compare page is quite cumbersome. It could get some refactoring, and switching to Vue 3 would also be nice (for example in this PR I had to put the whole benchmark table into the component, instead of just the rows, because Vue 2 doesn't support components that render as multiple DIV elements). This change was IMO on the verge of what can still be changed until the page's code gets messed up.
The categorization of crates into primary/secondary was done according to my taste, it should be discussed.
Primary/secondary naming seems a bit opaque, maybe real-world/artificial or real-world/stress-test would be better.
I'm not sure how to pass the category state from SQLite to Postgre in the import binary, as that code doesn't even touch the benchmarks table. It will probably require some larger changes (or some hack :) ).

collector/src/execute.rs

collector/benchmarks/native-tls-0.1.5/perf-config.json

collector/benchmarks/rust-mozjs/perf-config.json

nnethercote · 2022-02-24T22:49:17Z

Primary/secondary naming seems a bit opaque, maybe real-world/artificial or real-world/stress-test would be better.

When thinking about this change I originally had a real-world/artifical distinction, but after some thought I now have a strong preference for primary/secondary because it's more flexible. E.g. there might be a real-world benchmark (i.e. from crates.io) that stresses the compiler in an interesting but obscure way, and we could put it in the secondary category. Likewise, helloworld isn't exactly real-world, but I do want to have it in the primary category because it's a good measure of the shortest possible compile time, which is interesting. It also avoids arguments about the exact meaning of "real-world" :)

Having said that, most of the crates in the primary category will be "real-world", and most of the crates in the secondary category will be artificial in some way, and we can certainly mention that in the glossary. This PR should also update the benchmark description list should be updated. Currently it has three categories ("Real programs that are important", "Real programs that stress the compiler", "Artificial stress tests"), it should just be split into primary/secondary.

nnethercote · 2022-02-24T22:51:06Z

Here's the draft primary/secondary split that I came up. Jakub's primary choices are marked with *, there's a lot of overlap but a few differences: inflate, keccak, tokio-webpush-simple, wg-grammar.

Primary

*cargo
*clap-rs
*cranelift-codegen
*diesel
*encoding
*futures
*helloworld
*html5ever
*hyper-2
inflate
keccak
*piston-image
*regex
*ripgrep
*serde
*stm32f4
*style-servo
*syn
tokio-webpush-simple
*ucd
*unicode_normalization
*webrender
*webrender-wrench

Secondary

await-call-tree
coercions
ctfe-stress-4
deeply-nested
deeply-nested-async
deeply-nested-closures
deep-vector
derive
externs
issue-46449
issue-58319
issue-88862
many-assoc-items
match-stress-enum
match-stress-exhaustive_patterns
projection-caching
regression-31157
token-stream-stress
tuple-stress
unify-linearly
unused-warnings
wf-projection-stress-65510
*wg-grammar

nnethercote · 2022-02-24T22:52:18Z

Finally, I'll say that the screen capture above looks great, very much what I was imagining and hoping for, so thank you for doing this! I have commented on some parts of the code, but I am no expert on the database stuff or the webdev stuff, so I will leave that so @Mark-Simulacrum and @rylev.

Kobzol · 2022-02-25T08:26:52Z

I rebased over master and addressed the comments. I agree with your revised categorization, the table with the asterisks was incredibly helpful, thanks!

collector/benchmarks/README.md

collector/src/main.rs

nnethercote · 2022-02-25T08:49:35Z

One presentation nitpick: the "Primary benchmarks" and "Secondary benchmarks" could be a big bigger, bolder, and with a bit more space around them. Maybe even centred above the table.

Kobzol · 2022-02-25T09:42:59Z

rylev · 2022-02-25T09:54:37Z

@Kobzol is this PR ready for review - I marked as such without thinking because it seems like it had already been reviewed by @nnethercote, but I should have asked first.

Overall, things look good (depending on your answer above, I will go through the PR and comment on some nits I have). I'm unsure though if we should be keeping this info in the database. Changing the database is quite difficult and risky, and I'm not sure what we get from it. Thoughts?

Kobzol · 2022-02-25T10:00:38Z

It is, I just didn't want to unmark it as draft because there was still one TODO (import from SQLite into Postgre), so it wouldn't be good if it was merged with this TODO.

Regarding the database, I discussed this with @Mark-Simulacrum and our agreement was that the database solution is more general, because then the site doesn't need access to the benchmarks directory and can fully work with the DB. That being said, currently the supports-stable attribute is stored into the DB, but then it's still being hardcoded in the source code.

Hardcoding the values seems a bit inflexible. A solution that would be rather flexible but didn't require info in DB would be if the site crate would load the benchmarks (and information about stable support and category) from the benchmarks directory upon startup. But of course then it would depend on the directory being up-to-date and accessible.

nnethercote · 2022-02-25T11:56:25Z

I did a partial review but as mentioned above I'm no expert on the database and website side of things, so others should review those changes.

collector/src/execute.rs

collector/src/main.rs

database/src/bin/import-sqlite.rs

Mark-Simulacrum · 2022-02-28T23:16:36Z

I think this largely makes sense; I am a little worried that the secondary benchmarks might be hidden below the fold if there's a lot of primary benchmarks. Maybe that's ok, but I wonder if we should try to do something to explicitly show them -- particularly when there's regressions there.

One option might be to put the summary (+ neutral -) stats, with avg, on each of the 'primary' or 'secondary' headers, and then fold them if they get too long? Not sure how I feel about that though.

Kobzol · 2022-03-01T20:41:31Z

Regarding the implementation - do we have an agreement whether we want to store the category in DB? Or should it be loaded from the benchmark directory (is the directory always available for the site crate?).

Mark-Simulacrum · 2022-03-01T20:42:27Z

Yes, I think we should store it in the DB -- it's much easier to migrate away from that if we see a need than vice-versa.

Kobzol · 2022-03-03T21:37:59Z

Moved Category to DB and did something with the SQLite import (based on git blame it should be used to import local data to the perf server). It seems like changing the database is quite tricky and brittle, some ORM sure could make this easier.

nnethercote · 2022-03-07T11:56:22Z

After some thought, I think keccak should be a secondary benchmark. @Kobzol, can you update accordingly? Thanks.

Kobzol · 2022-03-07T13:26:03Z

Switched keccak to secondary category, rebased on master and squashed and added support for categories into the download command.

Kobzol · 2022-03-07T15:12:20Z

Finally CI is green! This is ready for another round of review.

collector/src/main.rs

Mark-Simulacrum · 2022-03-07T21:53:39Z

Very small nit, and then let's move ahead with this!

Kobzol · 2022-03-07T22:17:19Z

Added BufWriter and also this, because after moving Category to DB and getting it in a strongly-typed form from the DB, we need to convert it to a string, so that JavaScript doesn't have to deal with rmp_serde serialization of enums.

Mark-Simulacrum · 2022-03-07T22:19:09Z

Hm, ok. I would have expected serde to just serialize the enum to a string value which should be pretty easy for JS to deal with, but maybe I'm missing something?

Kobzol · 2022-03-08T22:27:19Z

If it was serde JSON, then yes. However, msgpack is used here, which currently serializes the enum as "category": {"1": null}, which is not really usable in Javascript.

nnethercote reviewed Feb 24, 2022

View reviewed changes

collector/src/execute.rs Outdated Show resolved Hide resolved

nnethercote reviewed Feb 24, 2022

View reviewed changes

collector/src/execute.rs Outdated Show resolved Hide resolved

nnethercote reviewed Feb 24, 2022

View reviewed changes

collector/benchmarks/native-tls-0.1.5/perf-config.json Outdated Show resolved Hide resolved

nnethercote reviewed Feb 24, 2022

View reviewed changes

collector/benchmarks/rust-mozjs/perf-config.json Outdated Show resolved Hide resolved

Kobzol force-pushed the benchmark-categorization branch 2 times, most recently from 454f2b0 to 9ef06fd Compare February 25, 2022 08:26

nnethercote reviewed Feb 25, 2022

View reviewed changes

collector/benchmarks/README.md Outdated Show resolved Hide resolved

nnethercote reviewed Feb 25, 2022

View reviewed changes

collector/src/main.rs Outdated Show resolved Hide resolved

rylev marked this pull request as ready for review February 25, 2022 09:52

rylev mentioned this pull request Feb 25, 2022

Improve perf.rlo frontend experience #1184

Closed

Mark-Simulacrum reviewed Feb 28, 2022

View reviewed changes

collector/src/execute.rs Outdated Show resolved Hide resolved

Mark-Simulacrum reviewed Feb 28, 2022

View reviewed changes

collector/src/main.rs Outdated Show resolved Hide resolved

Mark-Simulacrum reviewed Feb 28, 2022

View reviewed changes

database/src/bin/import-sqlite.rs Outdated Show resolved Hide resolved

nnethercote mentioned this pull request Mar 2, 2022

Add command for downloading a new benchmark #1183

Merged

Kobzol force-pushed the benchmark-categorization branch from 6533031 to 6a5900f Compare March 3, 2022 21:36

Kobzol force-pushed the benchmark-categorization branch from 6a5900f to 6bebfaf Compare March 7, 2022 12:51

Kobzol force-pushed the benchmark-categorization branch 2 times, most recently from 095f2e0 to 8fa387a Compare March 7, 2022 13:25

Kobzol force-pushed the benchmark-categorization branch 3 times, most recently from 0c7f72a to b13e850 Compare March 7, 2022 14:43

Mark-Simulacrum reviewed Mar 7, 2022

View reviewed changes

collector/src/main.rs Show resolved Hide resolved

Split benchmarks into primary and secondary categories

02c29b5

Kobzol force-pushed the benchmark-categorization branch from b13e850 to 02c29b5 Compare March 7, 2022 22:15

Mark-Simulacrum approved these changes Mar 8, 2022

View reviewed changes

Mark-Simulacrum merged commit 7011a96 into rust-lang:master Mar 8, 2022

Kobzol deleted the benchmark-categorization branch March 8, 2022 22:32

Benchmark categorization #1181

Benchmark categorization #1181

Uh oh!

Conversation

Kobzol commented Feb 24, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnethercote commented Feb 24, 2022

Uh oh!

nnethercote commented Feb 24, 2022

Primary

Secondary

Uh oh!

nnethercote commented Feb 24, 2022

Uh oh!

Kobzol commented Feb 25, 2022

Uh oh!

Uh oh!

Uh oh!

nnethercote commented Feb 25, 2022

Uh oh!

Kobzol commented Feb 25, 2022

Uh oh!

rylev commented Feb 25, 2022

Uh oh!

Kobzol commented Feb 25, 2022

Uh oh!

nnethercote commented Feb 25, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mark-Simulacrum commented Feb 28, 2022

Uh oh!

Kobzol commented Mar 1, 2022

Uh oh!

Mark-Simulacrum commented Mar 1, 2022

Uh oh!

Kobzol commented Mar 3, 2022

Uh oh!

nnethercote commented Mar 7, 2022

Uh oh!

Kobzol commented Mar 7, 2022

Uh oh!

Kobzol commented Mar 7, 2022

Uh oh!

Uh oh!

Mark-Simulacrum commented Mar 7, 2022

Uh oh!

Kobzol commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mark-Simulacrum commented Mar 7, 2022

Uh oh!

Kobzol commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Kobzol commented Mar 7, 2022 •

edited

Loading

Kobzol commented Mar 8, 2022 •

edited

Loading