Skip to content

Benchmark categorization #1181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 8, 2022

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Feb 24, 2022

This PR adds a category attribute to each benchmark, which should serve to quickly filter benchmarks that fall into different categories (mostly real-world vs stress-test benchmarks). The category is entered in perf-config.json (default is secondary), then it's put into the DB and threaded from the DB to the compare page. The compare page now shows two tables (for primary and secondary benchmarks) and it also has a filter so that you can show e.g. only the primary ones (the filter is mainly useful for the summary, so that you can see e.g. only the percentual changes for real-world crates).

benchmarks

Couple of comments:

  1. I modified the DB to store the benchmark category, instead of hardcoding it in the dashboard or the site crate. I thus had to work around the supports-stable column in the DB, which looks quite complicated (its handling is complicated). And the attribute is not even used or read anywhere, the crates that support stable are hardcoded ( ). I suggest to either remove this attribute from DB or start using it (that's for another PR though).
  2. Doing large changes to the compare page is quite cumbersome. It could get some refactoring, and switching to Vue 3 would also be nice (for example in this PR I had to put the whole benchmark table into the component, instead of just the rows, because Vue 2 doesn't support components that render as multiple DIV elements). This change was IMO on the verge of what can still be changed until the page's code gets messed up.
  3. The categorization of crates into primary/secondary was done according to my taste, it should be discussed.
  4. Primary/secondary naming seems a bit opaque, maybe real-world/artificial or real-world/stress-test would be better.
  5. I'm not sure how to pass the category state from SQLite to Postgre in the import binary, as that code doesn't even touch the benchmarks table. It will probably require some larger changes (or some hack :) ).

@nnethercote
Copy link
Contributor

Primary/secondary naming seems a bit opaque, maybe real-world/artificial or real-world/stress-test would be better.

When thinking about this change I originally had a real-world/artifical distinction, but after some thought I now have a strong preference for primary/secondary because it's more flexible. E.g. there might be a real-world benchmark (i.e. from crates.io) that stresses the compiler in an interesting but obscure way, and we could put it in the secondary category. Likewise, helloworld isn't exactly real-world, but I do want to have it in the primary category because it's a good measure of the shortest possible compile time, which is interesting. It also avoids arguments about the exact meaning of "real-world" :)

Having said that, most of the crates in the primary category will be "real-world", and most of the crates in the secondary category will be artificial in some way, and we can certainly mention that in the glossary. This PR should also update the benchmark description list should be updated. Currently it has three categories ("Real programs that are important", "Real programs that stress the compiler", "Artificial stress tests"), it should just be split into primary/secondary.

@nnethercote
Copy link
Contributor

Here's the draft primary/secondary split that I came up. Jakub's primary choices are marked with *, there's a lot of overlap but a few differences: inflate, keccak, tokio-webpush-simple, wg-grammar.

Primary

*cargo
*clap-rs
*cranelift-codegen
*diesel
*encoding
*futures
*helloworld
*html5ever
*hyper-2
inflate
keccak
*piston-image
*regex
*ripgrep
*serde
*stm32f4
*style-servo
*syn
tokio-webpush-simple
*ucd
*unicode_normalization
*webrender
*webrender-wrench

Secondary

await-call-tree
coercions
ctfe-stress-4
deeply-nested
deeply-nested-async
deeply-nested-closures
deep-vector
derive
externs
issue-46449
issue-58319
issue-88862
many-assoc-items
match-stress-enum
match-stress-exhaustive_patterns
projection-caching
regression-31157
token-stream-stress
tuple-stress
unify-linearly
unused-warnings
wf-projection-stress-65510
*wg-grammar

@nnethercote
Copy link
Contributor

Finally, I'll say that the screen capture above looks great, very much what I was imagining and hoping for, so thank you for doing this! I have commented on some parts of the code, but I am no expert on the database stuff or the webdev stuff, so I will leave that so @Mark-Simulacrum and @rylev.

@Kobzol Kobzol force-pushed the benchmark-categorization branch 2 times, most recently from 454f2b0 to 9ef06fd Compare February 25, 2022 08:26
@Kobzol
Copy link
Contributor Author

Kobzol commented Feb 25, 2022

I rebased over master and addressed the comments. I agree with your revised categorization, the table with the asterisks was incredibly helpful, thanks!

@nnethercote
Copy link
Contributor

One presentation nitpick: the "Primary benchmarks" and "Secondary benchmarks" could be a big bigger, bolder, and with a bit more space around them. Maybe even centred above the table.

@Kobzol
Copy link
Contributor Author

Kobzol commented Feb 25, 2022

image

@rylev rylev marked this pull request as ready for review February 25, 2022 09:52
@rylev
Copy link
Member

rylev commented Feb 25, 2022

@Kobzol is this PR ready for review - I marked as such without thinking because it seems like it had already been reviewed by @nnethercote, but I should have asked first.

Overall, things look good (depending on your answer above, I will go through the PR and comment on some nits I have). I'm unsure though if we should be keeping this info in the database. Changing the database is quite difficult and risky, and I'm not sure what we get from it. Thoughts?

@Kobzol
Copy link
Contributor Author

Kobzol commented Feb 25, 2022

It is, I just didn't want to unmark it as draft because there was still one TODO (import from SQLite into Postgre), so it wouldn't be good if it was merged with this TODO.

Regarding the database, I discussed this with @Mark-Simulacrum and our agreement was that the database solution is more general, because then the site doesn't need access to the benchmarks directory and can fully work with the DB. That being said, currently the supports-stable attribute is stored into the DB, but then it's still being hardcoded in the source code.

Hardcoding the values seems a bit inflexible. A solution that would be rather flexible but didn't require info in DB would be if the site crate would load the benchmarks (and information about stable support and category) from the benchmarks directory upon startup. But of course then it would depend on the directory being up-to-date and accessible.

@nnethercote
Copy link
Contributor

I did a partial review but as mentioned above I'm no expert on the database and website side of things, so others should review those changes.

@Mark-Simulacrum
Copy link
Member

I think this largely makes sense; I am a little worried that the secondary benchmarks might be hidden below the fold if there's a lot of primary benchmarks. Maybe that's ok, but I wonder if we should try to do something to explicitly show them -- particularly when there's regressions there.

One option might be to put the summary (+ neutral -) stats, with avg, on each of the 'primary' or 'secondary' headers, and then fold them if they get too long? Not sure how I feel about that though.

@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 1, 2022

Regarding the implementation - do we have an agreement whether we want to store the category in DB? Or should it be loaded from the benchmark directory (is the directory always available for the site crate?).

@Mark-Simulacrum
Copy link
Member

Yes, I think we should store it in the DB -- it's much easier to migrate away from that if we see a need than vice-versa.

@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 3, 2022

Moved Category to DB and did something with the SQLite import (based on git blame it should be used to import local data to the perf server). It seems like changing the database is quite tricky and brittle, some ORM sure could make this easier.

@nnethercote
Copy link
Contributor

After some thought, I think keccak should be a secondary benchmark. @Kobzol, can you update accordingly? Thanks.

@Kobzol Kobzol force-pushed the benchmark-categorization branch from 6a5900f to 6bebfaf Compare March 7, 2022 12:51
@Kobzol Kobzol force-pushed the benchmark-categorization branch 2 times, most recently from 095f2e0 to 8fa387a Compare March 7, 2022 13:25
@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 7, 2022

Switched keccak to secondary category, rebased on master and squashed and added support for categories into the download command.

@Kobzol Kobzol force-pushed the benchmark-categorization branch 3 times, most recently from 0c7f72a to b13e850 Compare March 7, 2022 14:43
@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 7, 2022

Finally CI is green! This is ready for another round of review.

@Mark-Simulacrum
Copy link
Member

Very small nit, and then let's move ahead with this!

@Kobzol Kobzol force-pushed the benchmark-categorization branch from b13e850 to 02c29b5 Compare March 7, 2022 22:15
@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 7, 2022

Added BufWriter and also this, because after moving Category to DB and getting it in a strongly-typed form from the DB, we need to convert it to a string, so that JavaScript doesn't have to deal with rmp_serde serialization of enums.

@Mark-Simulacrum
Copy link
Member

Hm, ok. I would have expected serde to just serialize the enum to a string value which should be pretty easy for JS to deal with, but maybe I'm missing something?

@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 8, 2022

If it was serde JSON, then yes. However, msgpack is used here, which currently serializes the enum as "category": {"1": null}, which is not really usable in Javascript.

@Mark-Simulacrum Mark-Simulacrum merged commit 7011a96 into rust-lang:master Mar 8, 2022
@Kobzol Kobzol deleted the benchmark-categorization branch March 8, 2022 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants