Change runtime benchmark definition to always refer to a self-contained function #1470

Kobzol · 2022-10-17T11:42:18Z

Ok, hear me out 😁 I realize that this change might be a bit controversial, but I think that it has some benefits.

Basically, this PR changes the define_benchmark macro in the runtime benchmark suite in a way that forces us to pass a named function instead of a closure (we now have to pass an identifier, example below). This makes the benchmark definition a bit more verbose for the simplest cases, but the fact that we will have a proper, named function in the resulting benchmark binary for each piece of code that we want to benchmark also has advantages:

We can provide functionality that diffs the codegen (assembly, LLVM IR, MIR) for a specific benchmark between two compiler versions. I think that this functionality could be quite useful for investigating performance regressions. This is in theory doable with closures (maybe we could emit some special markers at the beginning/start of the closure that are propagated to assembly?), but it gets so much easier when it's just a normal function. We can use e.g. cargo-show-asm for this (I have already tried it). Of course, the diff will be less useful if the function is not self contained and uses a lot of other code.
By being a function, we can be more confident in what we are actually benchmarking. Currently we have dynamically created a closure from another closure that prepared data, but it's unclear whether some inlining is happening between them and what part of the code is being benchmarked exactly (it's also not that easy to control the inlining of closures). By using a function, we can be quite explicit, and even force #[inline(never)] for the benchmarked functions, if it's needed (either automatically by providing some macro for defining the functions or some wrapper function or just by checking it in review).

I also think that it might be useful to try some of the benchmarks with e.g. different input sizes. We can't be as wild and dynamic as you could be in e.g. criterion, by trying thousands of combinations, because we will want to have stable names for the benchmarks for easier DB and UI processing (unless we come up with some other parametrization system for runtime benchmarks, but that seems difficult and unlikely). But I think that something like this might make sense:

fn map_insert(mut map: HashMap<usize, usize, FxBuildHasher>) -> HashMap<usize, usize, FxBuildHasher> {
    let capacity = map.capacity();
    for index in 0..capacity {
        map.insert(index, index);
    }
    map
}

fn main() {
    run_benchmark_group(|group| {
        // Measures how long does it take to insert 10 thousand numbers into a `hashbrown` hashmap.
        define_benchmark!(
            group,
            "hashmap_insert_10k",
            map_insert,
            hashbrown::HashMap::with_capacity_and_hasher(10000, fxhash::FxBuildHasher::default())
        );

        // Measures how long does it take to insert 100 thousand numbers into a `hashbrown` hashmap.
        define_benchmark!(
            group,
            "hashmap_insert_100k",
            map_insert,
            hashbrown::HashMap::with_capacity_and_hasher(100000, fxhash::FxBuildHasher::default())
        );
    });
}

The code is a bit long because of formatting and long type names, but I suppose that the gist is clear. For trying multiple inputs for the same function, it's only natural to put the benchmarked code into a separate function. Of course, this could be done with a closure too.

There are of course also some disadvantages, probably the main one is that the benchmarked code moves further away from the definition of the benchmark, and we have to come up with some name for it.

In theory, all this could probably be done with closures too, and closures are definitely more general. If we wanted the diffing feature, we could just put the code into a function and call it from the closure. But if we don't enforce this behaviour, it will hardly be done for most benchmarks and then it won't be so easy to diff any benchmark without the need to manually modify it. So for me, the biggest benefit of this approach is enforcing the creation of functions for benchmarked code so that we can diff any of the benchmarks easily.

… a self-contained function

nnethercote

I have no problem with requiring a named function, but I am uncertain about a few things.

collector/benchlib/src/benchmark.rs

nnethercote · 2022-10-18T10:07:44Z

collector/runtime-benchmarks/hashmap/src/main.rs


 fn main() {
    run_benchmark_group(|group| {
        // Measures how long does it take to insert 10 thousand numbers into a `hashbrown` hashmap.
-        define_benchmark!(group, hashmap_insert_10k, {
-            let mut map = hashbrown::HashMap::with_capacity_and_hasher(
+        define_benchmark!(group, "hashmap_insert_10k", map_insert, {


This { isn't necessary? (And the matching } below.)

nnethercote · 2022-10-18T10:11:33Z

collector/benchlib/src/benchmark.rs

-    };
+    // Version without init
+    ($group:expr, $name:literal, $fun:ident) => {{
+        let constructor = || {


I'm having trouble with all the nested closures. Are they all necessary? constructor is a zero-arg closure that returns a zero-arg closure. Feels like that could be collapsed into a single closure. And then register wraps the constructor in yet another closure.

So I wanted to achieve this:
You have a constructor, which is a function that returns another function, and that returned function will be benchmarked. You should be able to call the constructor multiple times (e.g. because of wall-time/perf. counter measurement split or simply because we run more iterations). So that's why I need the first closure.

The second closure is needed to apply black_box on the return value of the benchmarked function. Before I could use black_box in the benchmark harness itself, but now that we pass a named function, I can't easily store an arbitrary function in a hashmap, because I couldn't name it's return type. So I apply black_box here directly on whatever the function returns, and then the created closure will just return ().

In that case, please add more comments to the code explaining all this.

nnethercote · 2022-10-18T10:17:02Z

collector/benchlib/src/benchmark.rs

+        $group.register($name, constructor);
+    }};
+    // Version with init
+    ($group:expr, $name:literal, $fun:ident, $init:expr) => {{


It should be possible to make this take any number of arguments with $($init:expr),* (or whatever the exact syntax is), like println! and other macros. That would avoid the need for tuples to pass multiple arguments. You might even be able to combine this macro rules with the zero-init one above.

I wanted to do it, but my macro-fu isn't on par 😆 I can make arbitrary number of arguments work, but I don't know how to avoid the tuples. Normally I would do something like this:

($group:expr, $name:literal, $fun:ident, $($init:expr),*) => {{ let constructor = || { || fun($($init),*); };

But in this way the init expression evaluation would be measured by the harness, which I want to avoid. So instead, I'd like to do something like this:

($group:expr, $name:literal, $fun:ident, $($init:expr),*) => {{ let constructor = || { let args = ($($init),*); || fun(*args); };

With *args meaning: unroll the args tuple into individual arguments. But afaik something like this isn't expressible in Rust. Any ideas?

Does this work?

($group:expr, $name:literal, $fun:ident, $($init:expr),*) => {{ let constructor = || { || fun($($init),*); };

That looks like the first example that I have sent in my previous commit I think. But in this case the init expression would be evaluated inside the measured closure, which we want to avoid.

I see.

It makes me uncomfortable that the evaluation position is so subtle, and not visible from the call sites. E.g. this:

define_benchmark!(group, "nbody_10k", calculate_nbody, nbody::init(10000));

has a very different behaviour to this:

let init = nbody::init(10000); define_benchmark!(group, "nbody_10k", calculate_nbody, init);

Perhaps the macro syntax could be changed so it doesn't look like a normal call, something like this:

define_benchmark!(group, "nbody_10k", calculate_nbody, delayed_init: init);

It's not great but at least makes it clearer that something non-standard is happening.

But now I'm wondering if the macro is hiding too much subtle stuff :/

nnethercote · 2022-10-18T10:21:11Z

collector/runtime-benchmarks/bufreader/src/main.rs

-                    .unwrap();
-                let mut reader =
-                    BufReader::with_capacity(BYTES, FrameDecoder::new(&compressed[..]));
+        define_benchmark!(group, "bufreader_snappy", read_buf, {


This is a complex $init value. Is its execution measured as part of the benchmark?

It shouldn't be, and that is also the point of why the constructor exists. So that I can call the benchmark in a two part way:

Create init data

Start measuring

Pass init data and execute benchmark

Stop measuring

And also ideally, then large init data could be returned from the function and dropped (or mem::forget-ed later). But then we would need to pass the argument by &mut, to make sure that we have control over its destruction.

I think that criterion offers similar tradeoffs by letting users pass data by value or by reference.

That all sounds reasonable, it's just not currently clear. More documentation please! :)

nnethercote · 2022-10-21T06:04:07Z

I put some changes in #1476 which I think address my concerns and misunderstandings here.

Kobzol · 2022-10-26T08:43:36Z

This complicates things for a feature that might not even be needed. If we find in the future that named identifiers are actually useful for something, I'll revisit this.

Change benchmark definition so that the benchmark itself is always in…

527a18a

… a self-contained function

Kobzol requested a review from nnethercote October 17, 2022 11:42

nnethercote reviewed Oct 18, 2022

View reviewed changes

nnethercote mentioned this pull request Oct 21, 2022

Runtime benchmarking tweaks #1476

Merged

Kobzol closed this Oct 26, 2022

Change runtime benchmark definition to always refer to a self-contained function #1470

Change runtime benchmark definition to always refer to a self-contained function #1470

Uh oh!

Conversation

Kobzol commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kobzol Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kobzol Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnethercote commented Oct 21, 2022

Uh oh!

Kobzol commented Oct 26, 2022

Uh oh!

Uh oh!

Kobzol commented Oct 17, 2022 •

edited

Loading

Kobzol Oct 18, 2022 •

edited

Loading

Kobzol Oct 18, 2022 •

edited

Loading