-
Notifications
You must be signed in to change notification settings - Fork 157
Benchmark categorization #1181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark categorization #1181
Conversation
When thinking about this change I originally had a real-world/artifical distinction, but after some thought I now have a strong preference for primary/secondary because it's more flexible. E.g. there might be a real-world benchmark (i.e. from crates.io) that stresses the compiler in an interesting but obscure way, and we could put it in the secondary category. Likewise, Having said that, most of the crates in the primary category will be "real-world", and most of the crates in the secondary category will be artificial in some way, and we can certainly mention that in the glossary. This PR should also update the benchmark description list should be updated. Currently it has three categories ("Real programs that are important", "Real programs that stress the compiler", "Artificial stress tests"), it should just be split into primary/secondary. |
Here's the draft primary/secondary split that I came up. Jakub's primary choices are marked with Primary*cargo Secondaryawait-call-tree |
Finally, I'll say that the screen capture above looks great, very much what I was imagining and hoping for, so thank you for doing this! I have commented on some parts of the code, but I am no expert on the database stuff or the webdev stuff, so I will leave that so @Mark-Simulacrum and @rylev. |
454f2b0
to
9ef06fd
Compare
I rebased over |
One presentation nitpick: the "Primary benchmarks" and "Secondary benchmarks" could be a big bigger, bolder, and with a bit more space around them. Maybe even centred above the table. |
@Kobzol is this PR ready for review - I marked as such without thinking because it seems like it had already been reviewed by @nnethercote, but I should have asked first. Overall, things look good (depending on your answer above, I will go through the PR and comment on some nits I have). I'm unsure though if we should be keeping this info in the database. Changing the database is quite difficult and risky, and I'm not sure what we get from it. Thoughts? |
It is, I just didn't want to unmark it as draft because there was still one TODO (import from SQLite into Postgre), so it wouldn't be good if it was merged with this TODO. Regarding the database, I discussed this with @Mark-Simulacrum and our agreement was that the database solution is more general, because then the Hardcoding the values seems a bit inflexible. A solution that would be rather flexible but didn't require info in DB would be if the |
I did a partial review but as mentioned above I'm no expert on the database and website side of things, so others should review those changes. |
I think this largely makes sense; I am a little worried that the secondary benchmarks might be hidden below the fold if there's a lot of primary benchmarks. Maybe that's ok, but I wonder if we should try to do something to explicitly show them -- particularly when there's regressions there. One option might be to put the summary (+ neutral -) stats, with avg, on each of the 'primary' or 'secondary' headers, and then fold them if they get too long? Not sure how I feel about that though. |
Regarding the implementation - do we have an agreement whether we want to store the category in DB? Or should it be loaded from the benchmark directory (is the directory always available for the |
Yes, I think we should store it in the DB -- it's much easier to migrate away from that if we see a need than vice-versa. |
6533031
to
6a5900f
Compare
Moved Category to DB and did something with the SQLite import (based on |
After some thought, I think |
6a5900f
to
6bebfaf
Compare
095f2e0
to
8fa387a
Compare
Switched |
0c7f72a
to
b13e850
Compare
Finally CI is green! This is ready for another round of review. |
Very small nit, and then let's move ahead with this! |
b13e850
to
02c29b5
Compare
Added |
Hm, ok. I would have expected serde to just serialize the enum to a string value which should be pretty easy for JS to deal with, but maybe I'm missing something? |
If it was serde JSON, then yes. However, msgpack is used here, which currently serializes the enum as |
This PR adds a
category
attribute to each benchmark, which should serve to quickly filter benchmarks that fall into different categories (mostly real-world vs stress-test benchmarks). The category is entered inperf-config.json
(default issecondary
), then it's put into the DB and threaded from the DB to the compare page. The compare page now shows two tables (for primary and secondary benchmarks) and it also has a filter so that you can show e.g. only the primary ones (the filter is mainly useful for the summary, so that you can see e.g. only the percentual changes for real-world crates).Couple of comments:
site
crate. I thus had to work around thesupports-stable
column in the DB, which looks quite complicated (its handling is complicated). And the attribute is not even used or read anywhere, the crates that support stable are hardcoded (rustc-perf/site/src/request_handlers/dashboard.rs
Line 88 in 9a9d1a8
benchmarks
table. It will probably require some larger changes (or some hack :) ).