Skip to content

Support collecting hardware performance counters on Windows #885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 8, 2021

Conversation

wesleywiser
Copy link
Member

@wesleywiser wesleywiser commented Jun 18, 2021

Collect the InstructionRetired and TotalCycles hardware performance counters on Windows using a combination of xperf and tracelog. After collection, we process the resulting trace file to calculate the totals for the performance counters and store those in the appropriate parts of the database for the run. The site is able to display the results without modification and we also have self-profiling data available to site users.

Note: the cpu-clock perf event is not an actual hardware event and no corresponding event is available out of the box on Windows. I believe we can calculate an equivalent value from the trace data but that is left as a future TODO item.

Part of #834

@Mark-Simulacrum
Copy link
Member

Haven't reviewed too closely but at a high level this seems reasonable to me, will review closely later.

@wesleywiser wesleywiser mentioned this pull request Jun 18, 2021
6 tasks
@wesleywiser
Copy link
Member Author

Thanks @Mark-Simulacrum!

@wesleywiser wesleywiser force-pushed the hardware_counters_windows branch from 6d3719f to 4c4a5ee Compare June 18, 2021 21:09
Copy link
Member

@rylev rylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♥ Great!


anyhow::ensure!(line.starts_with("OS Version"), "OS version line not found");

let components: Vec<_> = line.split(',').collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is allocating a vec here necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not but I didn't want to resplit the string twice. I'm not sure which is more expensive...

Self {
instructions_retired: 0,
total_cycles: 0,
// FIXME(wesleywiser): We should be properly calculating this value by taking the total time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create issues in the repo for these FIXMEs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable to me! I'll create issues when this merges and link to the FIXME lines.

@@ -182,6 +185,7 @@ impl Profiler {
// is rejected because it can't be used with the `profiler`
// subcommand. (It's used with `bench_local` instead.)
"perf-stat" => Err(anyhow!("'perf-stat' cannot be used as the profiler")),
"xperf-stat" => Err(anyhow!("'xperf-stat' cannot be used as the profiler")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explain why these profilers can't be used in the error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure why this isn't allowed. I think technically it should work but the results will just be the statistics captured during the run which is probably not very useful for profiling (vs benchmarking).

@wesleywiser
Copy link
Member Author

Thanks for the review @rylev!

let mut cmd = Command::new(tracelog);
assert!(cmd.output().is_ok(), "tracelog.exe could not be started");

cmd.args(&["-start", "counters", "-f", "counters.etl", "-eflag", "CSWITCH+PROC_THREAD+LOADER", "-PMC", "InstructionRetired,TotalCycles:CSWITCH"]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding documentation somewhere on how this all works? I.e. what the workflow is, which programs are called in which order, what the flags mean? A link to a good description would be fine too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From playing around with this it looks like the "counters" parameter a globally visible "LoggerName" used for coordinating between the tracelog and the xperf invocations, right? I think it might be a good idea to generate a more meaningful name here that includes the fact that it belongs to "rustc-perf" and maybe the name of benchmark being run. Something like rustc-perf-{benchmark}-logger.

I don't know if it would also make sense to add some randomly generated ID to it too. At least I had some problems where I had to run xperf -stop counters manually because otherwise tracelog would complain that counters already exists. I think my system got into that state because there was some crash after tracelog allocated the logger with name "counters" and the xperf command stopping the logger was never executed. Making that more robust one way or other seems like a good idea.

Copy link
Member Author

@wesleywiser wesleywiser Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments here explaining what the workflow for collecting the counters is.

I also added code to issue a xperf -stop before we do our current run. I think that's better than generating a different name each time as I've read there is a limit to how many concurrent collections you can have running and, if we know the collection name, we can just try stopping it. I also updated the name to "rustc-perf-counters"

@Mark-Simulacrum
Copy link
Member

Happy to merge this whenever, looks like there's a few unresolved comments left though -- just let me know.

@wesleywiser wesleywiser force-pushed the hardware_counters_windows branch from 38bec7b to 5384e7d Compare July 7, 2021 19:52
@wesleywiser
Copy link
Member Author

This is ready to merge! I'll leave it open for now in case @Mark-Simulacrum wants to take another look but if not, I'll merge later today.

@wesleywiser wesleywiser merged commit c648f39 into rust-lang:master Jul 8, 2021
@wesleywiser wesleywiser deleted the hardware_counters_windows branch July 8, 2021 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants