-
Notifications
You must be signed in to change notification settings - Fork 156
Support collecting hardware performance counters on Windows #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support collecting hardware performance counters on Windows #885
Conversation
Haven't reviewed too closely but at a high level this seems reasonable to me, will review closely later. |
Thanks @Mark-Simulacrum! |
6d3719f
to
4c4a5ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
♥ Great!
|
||
anyhow::ensure!(line.starts_with("OS Version"), "OS version line not found"); | ||
|
||
let components: Vec<_> = line.split(',').collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is allocating a vec here necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not but I didn't want to resplit the string twice. I'm not sure which is more expensive...
Self { | ||
instructions_retired: 0, | ||
total_cycles: 0, | ||
// FIXME(wesleywiser): We should be properly calculating this value by taking the total time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create issues in the repo for these FIXMEs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable to me! I'll create issues when this merges and link to the FIXME lines.
@@ -182,6 +185,7 @@ impl Profiler { | |||
// is rejected because it can't be used with the `profiler` | |||
// subcommand. (It's used with `bench_local` instead.) | |||
"perf-stat" => Err(anyhow!("'perf-stat' cannot be used as the profiler")), | |||
"xperf-stat" => Err(anyhow!("'xperf-stat' cannot be used as the profiler")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explain why these profilers can't be used in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure why this isn't allowed. I think technically it should work but the results will just be the statistics captured during the run which is probably not very useful for profiling (vs benchmarking).
Thanks for the review @rylev! |
collector/src/rustc-fake.rs
Outdated
let mut cmd = Command::new(tracelog); | ||
assert!(cmd.output().is_ok(), "tracelog.exe could not be started"); | ||
|
||
cmd.args(&["-start", "counters", "-f", "counters.etl", "-eflag", "CSWITCH+PROC_THREAD+LOADER", "-PMC", "InstructionRetired,TotalCycles:CSWITCH"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding documentation somewhere on how this all works? I.e. what the workflow is, which programs are called in which order, what the flags mean? A link to a good description would be fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From playing around with this it looks like the "counters" parameter a globally visible "LoggerName" used for coordinating between the tracelog
and the xperf
invocations, right? I think it might be a good idea to generate a more meaningful name here that includes the fact that it belongs to "rustc-perf" and maybe the name of benchmark being run. Something like rustc-perf-{benchmark}-logger
.
I don't know if it would also make sense to add some randomly generated ID to it too. At least I had some problems where I had to run xperf -stop counters
manually because otherwise tracelog
would complain that counters
already exists. I think my system got into that state because there was some crash after tracelog
allocated the logger with name "counters" and the xperf
command stopping the logger was never executed. Making that more robust one way or other seems like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments here explaining what the workflow for collecting the counters is.
I also added code to issue a xperf -stop
before we do our current run. I think that's better than generating a different name each time as I've read there is a limit to how many concurrent collections you can have running and, if we know the collection name, we can just try stopping it. I also updated the name to "rustc-perf-counters"
Happy to merge this whenever, looks like there's a few unresolved comments left though -- just let me know. |
Also fake the cpu-clock counter because it's necessary to get the self-profiling data to display in the site.
Also update the collection name so it's obvious that it is related to the rustc-perf tool.
38bec7b
to
5384e7d
Compare
This is ready to merge! I'll leave it open for now in case @Mark-Simulacrum wants to take another look but if not, I'll merge later today. |
Collect the
InstructionRetired
andTotalCycles
hardware performance counters on Windows using a combination ofxperf
andtracelog
. After collection, we process the resulting trace file to calculate the totals for the performance counters and store those in the appropriate parts of the database for the run. The site is able to display the results without modification and we also have self-profiling data available to site users.Note: the
cpu-clock
perf
event is not an actual hardware event and no corresponding event is available out of the box on Windows. I believe we can calculate an equivalent value from the trace data but that is left as a future TODO item.Part of #834