[tzdb] Replace shared_mutex with mutex. #87929

EricWF · 2024-04-07T16:56:38Z

The overhead of taking a std::mutex is much lower than taking a reader
lock on a shared mutex, even under heavy contention.

The benefit of shared_mutex only occurs as the amount of
time spent in the critical sections grows large enough.

In our case all we do is read a pointer and return the lock.
As a result, using a shared lock can be ~50%-100% slower

Here are the results for the provided benchmark on my machine:

2024-04-07T12:48:51-04:00
Running ./libcxx/benchmarks/shared_mutex_vs_mutex.libcxx.out
Run on (12 X 400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1024 KiB (x6)
  L3 Unified 32768 KiB (x1)
Load Average: 2.70, 2.70, 1.63
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
BM_shared_mutex/threads:1        13.9 ns         13.9 ns     50533700
BM_shared_mutex/threads:2        34.5 ns         68.9 ns      9957784
BM_shared_mutex/threads:4        38.4 ns          137 ns      4987772
BM_shared_mutex/threads:8        51.1 ns          358 ns      1974160
BM_shared_mutex/threads:32       57.1 ns          682 ns      1043648
BM_mutex/threads:1               5.54 ns         5.53 ns    125867422
BM_mutex/threads:2               15.5 ns         30.9 ns     21830116
BM_mutex/threads:4               15.4 ns         57.2 ns     12136920
BM_mutex/threads:8               19.3 ns          140 ns      4997080
BM_mutex/threads:32              20.8 ns          252 ns      2859808

The overhead of taking a std::mutex is much lower than taking a reader lock on a shared mutex, even under heavy contention. The benifit of shared_mutex only occurs as the amount of time spent in the critical sections grows large enough. In our case all we do is read a pointer and return the lock. As a result, using a shared lock can be ~50%-100% slower

llvmbot · 2024-04-07T16:57:09Z

@llvm/pr-subscribers-libcxx

Author: Eric (EricWF)

Changes

The overhead of taking a std::mutex is much lower than taking a reader
lock on a shared mutex, even under heavy contention.

The benifit of shared_mutex only occurs as the amount of
time spent in the critical sections grows large enough.

In our case all we do is read a pointer and return the lock.
As a result, using a shared lock can be ~50%-100% slower

Full diff: https://github.com/llvm/llvm-project/pull/87929.diff

3 Files Affected:

(modified) libcxx/benchmarks/CMakeLists.txt (+1)
(added) libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp (+41)
(modified) libcxx/src/include/tzdb/tzdb_list_private.h (+4-4)

diff --git a/libcxx/benchmarks/CMakeLists.txt b/libcxx/benchmarks/CMakeLists.txt
index 928238c1ac69ba..527a2acf2d3b36 100644
--- a/libcxx/benchmarks/CMakeLists.txt
+++ b/libcxx/benchmarks/CMakeLists.txt
@@ -221,6 +221,7 @@ set(BENCHMARK_TESTS
     map.bench.cpp
     monotonic_buffer.bench.cpp
     ordered_set.bench.cpp
+    shared_mutex_vs_mutex.bench.cpp
     stop_token.bench.cpp
     std_format_spec_string_unicode.bench.cpp
     string.bench.cpp
diff --git a/libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp b/libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp
new file mode 100644
index 00000000000000..7d06857d0ad6e7
--- /dev/null
+++ b/libcxx/benchmarks/shared_mutex_vs_mutex.bench.cpp
@@ -0,0 +1,41 @@
+//===----------------------------------------------------------------------===//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// This benchmark compares the performance of std::mutex and std::shared_mutex in contended scenarios.
+// it's meant to establish a baseline overhead for std::shared_mutex and std::mutex, and to help inform decisions about
+// which mutex to use when selecting a mutex type for a given use case.
+
+#include <atomic>
+#include <mutex>
+#include <numeric>
+#include <thread>
+#include <shared_mutex>
+
+#include "benchmark/benchmark.h"
+
+int global_value = 42;
+std::mutex m;
+std::shared_mutex sm;
+
+static void BM_shared_mutex(benchmark::State& state) {
+  for (auto _ : state) {
+    std::shared_lock<std::shared_mutex> lock(sm);
+    benchmark::DoNotOptimize(global_value);
+  }
+}
+
+static void BM_mutex(benchmark::State& state) {
+  for (auto _ : state) {
+    std::lock_guard<std::mutex> lock(m);
+    benchmark::DoNotOptimize(global_value);
+  }
+}
+
+BENCHMARK(BM_shared_mutex)->Threads(1)->Threads(2)->Threads(4)->Threads(8)->Threads(32);
+BENCHMARK(BM_mutex)->Threads(1)->Threads(2)->Threads(4)->Threads(8)->Threads(32);
+
+BENCHMARK_MAIN();
diff --git a/libcxx/src/include/tzdb/tzdb_list_private.h b/libcxx/src/include/tzdb/tzdb_list_private.h
index 969b2b9f8a9f63..c0598cb6e24b16 100644
--- a/libcxx/src/include/tzdb/tzdb_list_private.h
+++ b/libcxx/src/include/tzdb/tzdb_list_private.h
@@ -16,7 +16,7 @@
 
 // When threads are not available the locking is not required.
 #ifndef _LIBCPP_HAS_NO_THREADS
-#  include <shared_mutex>
+#  include <mutex>
 #endif
 
 #include "types_private.h"
@@ -56,7 +56,7 @@ class tzdb_list::__impl {
 
   const tzdb& __front() const noexcept {
 #ifndef _LIBCPP_HAS_NO_THREADS
-    shared_lock __lock{__mutex_};
+    unique_lock __lock{__mutex_};
 #endif
     return __tzdb_.front();
   }
@@ -72,7 +72,7 @@ class tzdb_list::__impl {
 
   const_iterator __begin() const noexcept {
 #ifndef _LIBCPP_HAS_NO_THREADS
-    shared_lock __lock{__mutex_};
+    unique_lock __lock{__mutex_};
 #endif
     return __tzdb_.begin();
   }
@@ -87,7 +87,7 @@ class tzdb_list::__impl {
   void __load_no_lock() { chrono::__init_tzdb(__tzdb_.emplace_front(), __rules_.emplace_front()); }
 
 #ifndef _LIBCPP_HAS_NO_THREADS
-  mutable shared_mutex __mutex_;
+  mutable mutex __mutex_;
 #endif
   forward_list<tzdb> __tzdb_;

EricWF · 2024-04-07T16:57:53Z

Here are the results for the provided benchmark on my machine:

2024-04-07T12:48:51-04:00
Running ./libcxx/benchmarks/shared_mutex_vs_mutex.libcxx.out
Run on (12 X 400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1024 KiB (x6)
  L3 Unified 32768 KiB (x1)
Load Average: 2.70, 2.70, 1.63
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
BM_shared_mutex/threads:1        13.9 ns         13.9 ns     50533700
BM_shared_mutex/threads:2        34.5 ns         68.9 ns      9957784
BM_shared_mutex/threads:4        38.4 ns          137 ns      4987772
BM_shared_mutex/threads:8        51.1 ns          358 ns      1974160
BM_shared_mutex/threads:32       57.1 ns          682 ns      1043648
BM_mutex/threads:1               5.54 ns         5.53 ns    125867422
BM_mutex/threads:2               15.5 ns         30.9 ns     21830116
BM_mutex/threads:4               15.4 ns         57.2 ns     12136920
BM_mutex/threads:8               19.3 ns          140 ns      4997080
BM_mutex/threads:32              20.8 ns          252 ns      2859808

github-actions · 2024-04-07T16:59:29Z

✅ With the latest revision this PR passed the C/C++ code formatter.

mordante

Thanks a lot for this optimization!
Can you add the benchmark results to the commit message, that way it's part of the history and does not depend on the availability of GitHub.

LGTM modulo one nit.

libcxx/src/include/tzdb/tzdb_list_private.h

Address review comments.

EricWF · 2024-04-12T17:09:31Z

Cool, thanks for the review. I believe I've addressed all of the comments.

mordante

Cool, thanks for the review. I believe I've addressed all of the comments.

Thanks! LGTM!

libcxx/src/include/tzdb/tzdb_list_private.h

The overhead of taking a std::mutex is much lower than taking a reader lock on a shared mutex, even under heavy contention. The benefit of shared_mutex only occurs as the amount of time spent in the critical sections grows large enough. In our case all we do is read a pointer and return the lock. As a result, using a shared lock can be ~50%-100% slower Here are the results for the provided benchmark on my machine: ``` 2024-04-07T12:48:51-04:00 Running ./libcxx/benchmarks/shared_mutex_vs_mutex.libcxx.out Run on (12 X 400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x6) L1 Instruction 32 KiB (x6) L2 Unified 1024 KiB (x6) L3 Unified 32768 KiB (x1) Load Average: 2.70, 2.70, 1.63 --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_shared_mutex/threads:1 13.9 ns 13.9 ns 50533700 BM_shared_mutex/threads:2 34.5 ns 68.9 ns 9957784 BM_shared_mutex/threads:4 38.4 ns 137 ns 4987772 BM_shared_mutex/threads:8 51.1 ns 358 ns 1974160 BM_shared_mutex/threads:32 57.1 ns 682 ns 1043648 BM_mutex/threads:1 5.54 ns 5.53 ns 125867422 BM_mutex/threads:2 15.5 ns 30.9 ns 21830116 BM_mutex/threads:4 15.4 ns 57.2 ns 12136920 BM_mutex/threads:8 19.3 ns 140 ns 4997080 BM_mutex/threads:32 20.8 ns 252 ns 2859808 ```

EricWF requested a review from a team as a code owner April 7, 2024 16:56

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Apr 7, 2024

EricWF requested a review from mordante April 7, 2024 16:56

respect the formatter

4c233a0

EricWF assigned mordante Apr 7, 2024

mordante approved these changes Apr 7, 2024

View reviewed changes

libcxx/src/include/tzdb/tzdb_list_private.h Show resolved Hide resolved

Update tzdb_list_private.h

b36cb0f

Address review comments.

mordante approved these changes Apr 12, 2024

View reviewed changes

libcxx/src/include/tzdb/tzdb_list_private.h Show resolved Hide resolved

EricWF merged commit 11f22f1 into llvm:main Apr 13, 2024

EricWF deleted the bm-shared-mutex branch April 13, 2024 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tzdb] Replace shared_mutex with mutex. #87929

[tzdb] Replace shared_mutex with mutex. #87929

Uh oh!

EricWF commented Apr 7, 2024 •

edited by ldionne

Loading

Uh oh!

llvmbot commented Apr 7, 2024

Uh oh!

EricWF commented Apr 7, 2024

Uh oh!

github-actions bot commented Apr 7, 2024 •

edited

Loading

Uh oh!

mordante left a comment

Uh oh!

Uh oh!

EricWF commented Apr 12, 2024

Uh oh!

mordante left a comment

Uh oh!

Uh oh!

Uh oh!

[tzdb] Replace shared_mutex with mutex. #87929

[tzdb] Replace shared_mutex with mutex. #87929

Uh oh!

Conversation

EricWF commented Apr 7, 2024 • edited by ldionne Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 7, 2024

Uh oh!

EricWF commented Apr 7, 2024

Uh oh!

github-actions bot commented Apr 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mordante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

EricWF commented Apr 12, 2024

Uh oh!

mordante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

EricWF commented Apr 7, 2024 •

edited by ldionne

Loading

github-actions bot commented Apr 7, 2024 •

edited

Loading