-
Notifications
You must be signed in to change notification settings - Fork 35
Add MT benchmarks #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MT benchmarks #204
Conversation
Wait for the PR #167 with a fix for TSAN CI builds to merge this one. |
Rebased on current main. |
8715b28
to
0bba9e3
Compare
c13fede
to
a2aaa58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results of the benchmark are sometimes very strange. It seems that the methodology of measurements is wrong, since the standard deviation can be even bigger than the mean. It means the accuracy of measurements is far too small:
scalable_pool mt_alloc_free: mean: 16.13 [ms] std_dev: 21.2146 [ms]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run this benchmark on Windows WSL or on a bare-metal server it hangs after printing (I waited 55 minutes and it did not finish):
scalable_pool mt_alloc_free: mean: 24.34 [ms] std_dev: 18.1215 [ms] (total alloc failures: 0 out of 5000000)
jemalloc_pool mt_alloc_free: mean: 1112.15 [ms] std_dev: 116.737 [ms] (total alloc failures: 0 out of 5000000)
probably on:
Thread 1 "multithread_ben" received signal SIGINT, Interrupt.
__futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=11177, futex_word=0x7fffe79ff910) at ./nptl/futex-internal.c:57
57 in ./nptl/futex-internal.c
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=11177, futex_word=0x7fffe79ff910) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=11177, futex_word=0x7fffe79ff910) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7fffe79ff910, expected=11177, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
#3 0x00007ffff77dd624 in __pthread_clockjoin_ex (threadid=140737079408192, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at ./nptl/pthread_join_common.c:105
#4 0x00007ffff7b352c7 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x0000555555559b1b in umf_bench::parallel_exec<umf_bench::measure<std::chrono::duration<long int, std::ratio<1, 1000> >, mt_alloc_free(poolCreateExtParams)::<lambda(auto:2)> >(size_t, size_t, mt_alloc_free(poolCreateExtParams)::<lambda(auto:2)>&&)::<lambda(size_t)> >(size_t, struct {...} &&) (threads_number=20, f=...) at /home/ldorau/work/unified-memory-framework/benchmark/multithread.hpp:32
#6 0x000055555555935f in umf_bench::measure<std::chrono::duration<long int, std::ratio<1, 1000> >, mt_alloc_free(poolCreateExtParams)::<lambda(auto:2)> >(size_t, size_t, struct {...} &&) (iterations=5, concurrency=20, run_workload=...)
at /home/ldorau/work/unified-memory-framework/benchmark/multithread.hpp:110
#7 0x00005555555595e7 in mt_alloc_free (params=std::tuple containing = {...}) at /home/ldorau/work/unified-memory-framework/benchmark/multithread.cpp:76
#8 0x0000555555559988 in main () at /home/ldorau/work/unified-memory-framework/benchmark/multithread.cpp:114
Can you try it again now? scalable_pool seems to need some time to warmup and the first iteration was taking longer. I've made a change to skip the first one in calculations. |
This only happens in debug mode because of the debug checks in base_alloc. We should either not run benchmarks in debug or optimize those checks. |
See the last CI build: https://github.com/oneapi-src/unified-memory-framework/actions/runs/7817812924/job/21327245255?pr=204
Still: 13.3 ms +/- 10.5 ms says it can be from 2.8 to 23.8 ms - I'm afraid this is not a good result of a benchmark ... The ubench has the maximum allowed confidence interval +- 2.5% (the ubench fails if the confidence interval is bigger than 2.5%):
See the output of the last CI build of ubench: "confidence interval 5.807915% exceeds maximum permitted 2.500000%"
but we have here 10.5836/13.3125 = 79,5 % so almost 80% ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confidence interval about 80% is not good enough, but we can merge it as is and fix it later.
Please add TODO in the code to fix that later.
it would be better to add a github issue |
I've increased the number of iterations for scalable pool and now it looks like this:
I think this is acceptable for a multithreaded benchmark. Even for ubench scalable_pool reports higher than expected (above the 2.5% confidence interval). |
Helper functions taken from pmemstream repo.
so that we do not measure warmup time
It's not possible to use ubench for MT benchmarks because we want only to measure the workload and not thread creation time, etc. This means that we have to write out our own measurement code and use it within each worker thread. We also have to make sure that all threads start running at the same time (hence use of syncthreads()).