[compiler-rt] Use __atomic builtins whenever possible #84439

arichardson · 2024-03-08T07:36:12Z

The code in this file dates back to 2012 when Clang's support for atomic
builtins was still quite limited. The bugs referenced in the comment
at the top of the file have long been fixed and using the compiler
builtins directly should now generate slightly better code.
Additionally, this allows using the atomic builtin header for platforms
where the __sync_builtins are lacking (e.g. Arm Morello).

This change does not introduce any code generation changes for
__tsan_read*/__tsan_write* or _tsan_func{entry,exit} on x86, which
indicates the previously noted compiler issues have been fixed.

We also have to touch the non-clang codepaths here since the only way we
can make this work easily is by making the memory_order enum match the
compiler-provided macros, so we have to update the debug checks that
assumed the enum was always a bitflag.

The one downside of this change is that 32-bit MIPS now definitely
requires libatomic (but that may already have been needed for RMW ops).

Created using spr 1.3.6-beta.1

llvmbot · 2024-03-08T07:36:43Z

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Alexander Richardson (arichardson)

Changes

The code in this file dates back to 2012 when Clang's support for atomic
builtins was still quite limited. The bugs referenced in the comment
at the top of the file have long been fixed and using the compiler
builtins directly should now generate slightly better code.
Additionally, this allows using the atomic builtin header for platforms
where the __sync_builtins are lacking (e.g. Arm Morello).

We also have to touch the non-clang codepaths here since the only way we
can make this work easily is by making the memory_order enum match the
compiler-provided macros, so we have to update the debug checks that
assumed the enum was always a bitflag.

Full diff: https://github.com/llvm/llvm-project/pull/84439.diff

6 Files Affected:

(modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic.h (+9)
(modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h (+17-32)
(modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h (+8-8)
(modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h (+4-4)
(modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h (+4-4)
(modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h (+3-4)

diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h
index 46f06957228c9b..2dab4e4b8725d7 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h
@@ -18,12 +18,21 @@
 namespace __sanitizer {
 
 enum memory_order {
+#ifdef __ATOMIC_SEQ_CST
+  memory_order_relaxed = __ATOMIC_RELAXED,
+  memory_order_consume = __ATOMIC_CONSUME,
+  memory_order_acquire = __ATOMIC_ACQUIRE,
+  memory_order_release = __ATOMIC_RELEASE,
+  memory_order_acq_rel = __ATOMIC_ACQ_REL,
+  memory_order_seq_cst = __ATOMIC_SEQ_CST
+#else
   memory_order_relaxed = 1 << 0,
   memory_order_consume = 1 << 1,
   memory_order_acquire = 1 << 2,
   memory_order_release = 1 << 3,
   memory_order_acq_rel = 1 << 4,
   memory_order_seq_cst = 1 << 5
+#endif
 };
 
 struct atomic_uint8_t {
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h
index 4318d64d16cfa2..e4f02b7f6f685c 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h
@@ -22,52 +22,37 @@
 
 namespace __sanitizer {
 
-// We would like to just use compiler builtin atomic operations
-// for loads and stores, but they are mostly broken in clang:
-// - they lead to vastly inefficient code generation
-// (http://llvm.org/bugs/show_bug.cgi?id=17281)
-// - 64-bit atomic operations are not implemented on x86_32
-// (http://llvm.org/bugs/show_bug.cgi?id=15034)
-// - they are not implemented on ARM
-// error: undefined reference to '__atomic_load_4'
+// We use the compiler builtin atomic operations for loads and stores, which
+// generates correct code for all architectures, but may require libatomic
+// on platforms where e.g. 64-bit atomics are not supported natively.
 
 // See http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
 // for mappings of the memory model to different processors.
 
-inline void atomic_signal_fence(memory_order) {
-  __asm__ __volatile__("" ::: "memory");
-}
+inline void atomic_signal_fence(memory_order mo) { __atomic_signal_fence(mo); }
 
-inline void atomic_thread_fence(memory_order) {
-  __sync_synchronize();
-}
+inline void atomic_thread_fence(memory_order mo) { __atomic_thread_fence(mo); }
 
-template<typename T>
-inline typename T::Type atomic_fetch_add(volatile T *a,
-    typename T::Type v, memory_order mo) {
-  (void)mo;
+template <typename T>
+inline typename T::Type atomic_fetch_add(volatile T *a, typename T::Type v,
+                                         memory_order mo) {
   DCHECK(!((uptr)a % sizeof(*a)));
-  return __sync_fetch_and_add(&a->val_dont_use, v);
+  return __atomic_fetch_add(&a->val_dont_use, v, mo);
 }
 
-template<typename T>
-inline typename T::Type atomic_fetch_sub(volatile T *a,
-    typename T::Type v, memory_order mo) {
+template <typename T>
+inline typename T::Type atomic_fetch_sub(volatile T *a, typename T::Type v,
+                                         memory_order mo) {
   (void)mo;
   DCHECK(!((uptr)a % sizeof(*a)));
-  return __sync_fetch_and_add(&a->val_dont_use, -v);
+  return __atomic_fetch_sub(&a->val_dont_use, v, mo);
 }
 
-template<typename T>
-inline typename T::Type atomic_exchange(volatile T *a,
-    typename T::Type v, memory_order mo) {
+template <typename T>
+inline typename T::Type atomic_exchange(volatile T *a, typename T::Type v,
+                                        memory_order mo) {
   DCHECK(!((uptr)a % sizeof(*a)));
-  if (mo & (memory_order_release | memory_order_acq_rel | memory_order_seq_cst))
-    __sync_synchronize();
-  v = __sync_lock_test_and_set(&a->val_dont_use, v);
-  if (mo == memory_order_seq_cst)
-    __sync_synchronize();
-  return v;
+  return __atomic_exchange_n(&a->val_dont_use, v, mo);
 }
 
 template <typename T>
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h
index f3d3052e5b7c5c..5fc2bf25f3cd51 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h
@@ -40,8 +40,8 @@ template <>
 inline atomic_uint64_t::Type atomic_fetch_add(volatile atomic_uint64_t *ptr,
                                               atomic_uint64_t::Type val,
                                               memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   atomic_uint64_t::Type ret;
@@ -66,8 +66,8 @@ inline bool atomic_compare_exchange_strong(volatile atomic_uint64_t *ptr,
                                            atomic_uint64_t::Type *cmp,
                                            atomic_uint64_t::Type xchg,
                                            memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   typedef atomic_uint64_t::Type Type;
@@ -89,8 +89,8 @@ inline bool atomic_compare_exchange_strong(volatile atomic_uint64_t *ptr,
 template <>
 inline atomic_uint64_t::Type atomic_load(const volatile atomic_uint64_t *ptr,
                                          memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   atomic_uint64_t::Type zero = 0;
@@ -102,8 +102,8 @@ inline atomic_uint64_t::Type atomic_load(const volatile atomic_uint64_t *ptr,
 template <>
 inline void atomic_store(volatile atomic_uint64_t *ptr, atomic_uint64_t::Type v,
                          memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   __spin_lock(&lock.lock);
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h
index 557082a636b879..5ace0d2213275d 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h
@@ -24,8 +24,8 @@ inline void proc_yield(int cnt) {
 template<typename T>
 inline typename T::Type atomic_load(
     const volatile T *a, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_consume
-      | memory_order_acquire | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_consume ||
+         mo == memory_order_acquire || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   typename T::Type v;
 
@@ -58,8 +58,8 @@ inline typename T::Type atomic_load(
 
 template<typename T>
 inline void atomic_store(volatile T *a, typename T::Type v, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_release
-      | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
 
   if (sizeof(*a) < 8 || sizeof(void*) == 8) {
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h
index b81a354d209872..966463acd04b8a 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h
@@ -26,8 +26,8 @@ inline void proc_yield(int cnt) {
 template<typename T>
 inline typename T::Type atomic_load(
     const volatile T *a, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_consume
-      | memory_order_acquire | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_consume ||
+         mo == memory_order_acquire || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   typename T::Type v;
 
@@ -71,8 +71,8 @@ inline typename T::Type atomic_load(
 
 template<typename T>
 inline void atomic_store(volatile T *a, typename T::Type v, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_release
-      | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
 
   if (sizeof(*a) < 8 || sizeof(void*) == 8) {
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h
index 31317adcdfc99f..ed82f62a1c96c3 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h
@@ -70,8 +70,8 @@ inline void proc_yield(int cnt) {
 template<typename T>
 inline typename T::Type atomic_load(
     const volatile T *a, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_consume
-      | memory_order_acquire | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_consume
+      || mo == memory_order_acquire || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   typename T::Type v;
   // FIXME(dvyukov): 64-bit load is not atomic on 32-bits.
@@ -87,8 +87,7 @@ inline typename T::Type atomic_load(
 
 template<typename T>
 inline void atomic_store(volatile T *a, typename T::Type v, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_release
-      | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   // FIXME(dvyukov): 64-bit store is not atomic on 32-bits.
   if (mo == memory_order_relaxed) {

github-actions · 2024-03-08T07:38:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

dvyukov · 2024-03-08T09:35:37Z

Hi Alexander,

Thanks for working on this.

Please double-check that codegen for the following tsan functions in release builds don't change: __tsan_read1/2/4/8, __tsan_write1/2/4/8, __tsan_func_entry/exit. Please post diff of asm for these functions, if any.

If we are refacatoring this, can we simplify code more today? I guess Vistual Studio does not support these buitlins, so we can't just switch to them. But perhaps we could switch to std::atomic? If yes, I would assume we still keep the atomic operation wrapper functions to minimize the diff.

Also clang-format CI check complains, please re-format the code.

arichardson · 2024-03-08T18:41:37Z

Hi Alexander,

Thanks for working on this.

Please double-check that codegen for the following tsan functions in release builds don't change: __tsan_read1/2/4/8, __tsan_write1/2/4/8, __tsan_func_entry/exit. Please post diff of asm for these functions, if any.

If we are refacatoring this, can we simplify code more today? I guess Vistual Studio does not support these buitlins, so we can't just switch to them. But perhaps we could switch to std::atomic? If yes, I would assume we still keep the atomic operation wrapper functions to minimize the diff.

Also clang-format CI check complains, please re-format the code.

Which architectures do you care about for these diffs? X86 should be easiest to generate, but I can also show AArch64, which might actually be better considering it has a weaker memory model?

The reason I did not use std::atomic is that I was under the assumption that we can't rely on the C++ standard library here and have to roll our own.

arichardson · 2024-03-08T18:42:56Z

Also one more thing: this only touches the RMW operations, the loads/stores will require a separate patch that should allow for further simplification.

dvyukov · 2024-03-09T12:39:15Z

Which architectures do you care about for these diffs? X86 should be easiest to generate

x86 is good.

The reason I did not use std::atomic is that I was under the assumption that we can't rely on the C++ standard library here and have to roll our own.

Yes, this may be the case. I think I will withdraw this part of my comment, let's just ensure there are no codegen degradation for the most important functions.

jyknight · 2024-03-12T20:51:21Z

To use the __atomic builtins and rely on libatomic for non-native atomics, you must do so consistently. You can't correctly mix-and-match RMW and load/store, or they might not be atomic with regards to each-other.

That means:

delete sanitizer_atomic_clang_x86.h (preserve the architecture-specific definition of proc_yield, though it's unrelated to atomics)
delete sanitizer_atomic_clang_mips.h
delete sanitizer_atomic_clang_other.h, moving the functions into into sanitizer_atomic_clang.h and change atomic_load/atomic_store to call __atomic_load/__atomic_store instead of its custom impl.

Created using spr 1.3.6-beta.1

arichardson · 2024-04-15T23:03:18Z

Hi Alexander,

Thanks for working on this.

Please double-check that codegen for the following tsan functions in release builds don't change: __tsan_read1/2/4/8, __tsan_write1/2/4/8, __tsan_func_entry/exit. Please post diff of asm for these functions, if any.

If we are refacatoring this, can we simplify code more today? I guess Vistual Studio does not support these buitlins, so we can't just switch to them. But perhaps we could switch to std::atomic? If yes, I would assume we still keep the atomic operation wrapper functions to minimize the diff.

Also clang-format CI check complains, please re-format the code.

On x86 I don't see any difference for __tsan_read*/__tsan_write* in the x86 assembly (other than the relative addresses). Same for __tsan_func_entry/exit.

I do see one large diff. Code before:

<_ZN11__sanitizer23InternalAllocatorUnlockEv>:
               	pushq	%rax
               	callq	0x3cc00 <_ZN11__sanitizer18internal_allocatorEv>
               	movb	$0x0, 0x17be73(%rip)    # 0x1b8f80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x81000>
               	xorl	%eax, %eax
               	leaq	0xfae6a(%rip), %rcx     # 0x137f80 <_ZN11__sanitizerL26internal_alloc_placeholderE>
               	nopw	%cs:(%rax,%rax)
               	movb	$0x0, 0x40d80(%rax,%rcx)
               	addq	$-0x40, %rax
               	cmpq	$-0xd80, %rax           # imm = 0xF280
               	jne	0x3d120 <_ZN11__sanitizer23InternalAllocatorUnlockEv+0x20>
               	movb	$0x0, 0x17be87(%rip)    # 0x1b8fc2 <_ZN11__sanitizerL27internal_allocator_cache_muE>
               	popq	%rax
               	retq
               	nopl	(%rax)

After has a lot of extra stores:

<_ZN11__sanitizer23InternalAllocatorUnlockEv>:
               	pushq	%rax
               	callq	0x3cc00 <_ZN11__sanitizer18internal_allocatorEv>
               	movb	$0x0, 0x17bcb3(%rip)    # 0x1b8f80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x81000>
               	movb	$0x0, 0x13ba2c(%rip)    # 0x178d00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40d80>
               	movb	$0x0, 0x13b9e5(%rip)    # 0x178cc0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40d40>
               	movb	$0x0, 0x13b99e(%rip)    # 0x178c80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40d00>
               	movb	$0x0, 0x13b957(%rip)    # 0x178c40 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40cc0>
               	movb	$0x0, 0x13b910(%rip)    # 0x178c00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40c80>
               	movb	$0x0, 0x13b8c9(%rip)    # 0x178bc0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40c40>
               	movb	$0x0, 0x13b882(%rip)    # 0x178b80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40c00>
               	movb	$0x0, 0x13b83b(%rip)    # 0x178b40 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40bc0>
               	movb	$0x0, 0x13b7f4(%rip)    # 0x178b00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40b80>
               	movb	$0x0, 0x13b7ad(%rip)    # 0x178ac0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40b40>
               	movb	$0x0, 0x13b766(%rip)    # 0x178a80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40b00>
               	movb	$0x0, 0x13b71f(%rip)    # 0x178a40 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40ac0>
               	movb	$0x0, 0x13b6d8(%rip)    # 0x178a00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40a80>
               	movb	$0x0, 0x13b691(%rip)    # 0x1789c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40a40>
               	movb	$0x0, 0x13b64a(%rip)    # 0x178980 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40a00>
               	movb	$0x0, 0x13b603(%rip)    # 0x178940 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x409c0>
               	movb	$0x0, 0x13b5bc(%rip)    # 0x178900 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40980>
               	movb	$0x0, 0x13b575(%rip)    # 0x1788c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40940>
               	movb	$0x0, 0x13b52e(%rip)    # 0x178880 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40900>
               	movb	$0x0, 0x13b4e7(%rip)    # 0x178840 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x408c0>
               	movb	$0x0, 0x13b4a0(%rip)    # 0x178800 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40880>
               	movb	$0x0, 0x13b459(%rip)    # 0x1787c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40840>
               	movb	$0x0, 0x13b412(%rip)    # 0x178780 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40800>
               	movb	$0x0, 0x13b3cb(%rip)    # 0x178740 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x407c0>
               	movb	$0x0, 0x13b384(%rip)    # 0x178700 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40780>
               	movb	$0x0, 0x13b33d(%rip)    # 0x1786c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40740>
               	movb	$0x0, 0x13b2f6(%rip)    # 0x178680 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40700>
               	movb	$0x0, 0x13b2af(%rip)    # 0x178640 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x406c0>
               	movb	$0x0, 0x13b268(%rip)    # 0x178600 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40680>
               	movb	$0x0, 0x13b221(%rip)    # 0x1785c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40640>
               	movb	$0x0, 0x13b1da(%rip)    # 0x178580 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40600>
               	movb	$0x0, 0x13b193(%rip)    # 0x178540 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x405c0>
               	movb	$0x0, 0x13b14c(%rip)    # 0x178500 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40580>
               	movb	$0x0, 0x13b105(%rip)    # 0x1784c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40540>
               	movb	$0x0, 0x13b0be(%rip)    # 0x178480 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40500>
               	movb	$0x0, 0x13b077(%rip)    # 0x178440 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x404c0>
               	movb	$0x0, 0x13b030(%rip)    # 0x178400 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40480>
               	movb	$0x0, 0x13afe9(%rip)    # 0x1783c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40440>
               	movb	$0x0, 0x13afa2(%rip)    # 0x178380 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40400>
               	movb	$0x0, 0x13af5b(%rip)    # 0x178340 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x403c0>
               	movb	$0x0, 0x13af14(%rip)    # 0x178300 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40380>
               	movb	$0x0, 0x13aecd(%rip)    # 0x1782c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40340>
               	movb	$0x0, 0x13ae86(%rip)    # 0x178280 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40300>
               	movb	$0x0, 0x13ae3f(%rip)    # 0x178240 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x402c0>
               	movb	$0x0, 0x13adf8(%rip)    # 0x178200 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40280>
               	movb	$0x0, 0x13adb1(%rip)    # 0x1781c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40240>
               	movb	$0x0, 0x13ad6a(%rip)    # 0x178180 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40200>
               	movb	$0x0, 0x13ad23(%rip)    # 0x178140 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x401c0>
               	movb	$0x0, 0x13acdc(%rip)    # 0x178100 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40180>
               	movb	$0x0, 0x13ac95(%rip)    # 0x1780c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40140>
               	movb	$0x0, 0x13ac4e(%rip)    # 0x178080 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40100>
               	movb	$0x0, 0x13ac07(%rip)    # 0x178040 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x400c0>
               	movb	$0x0, 0x13abc0(%rip)    # 0x178000 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40080>
               	movb	$0x0, 0x13ab79(%rip)    # 0x177fc0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40040>
               	movb	$0x0, 0x17bb74(%rip)    # 0x1b8fc2 <_ZN11__sanitizerL27internal_allocator_cache_muE>
               	popq	%rax
               	retq

It looks like the store mo_release loop was unrolled here if I read this correctly.
I have very limited understanding of x86 ASM so this is guessing based on knowing arm/aarch64/risc-v/mips instructions.

Enna1 · 2024-04-16T03:24:35Z

It looks like the store mo_release loop was unrolled here if I read this correctly.

I think you are right.

There are 56 stores/movb in __sanitizer::InternalAllocatorUnlock() x86 assemmby.
I have a look at the implementation of __sanitizer::InternalAllocatorUnlock(), https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cpp#L134

The first store corresponds to https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator_secondary.h#L272
The following 54 stores corresponds to unrolled loop at https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary32.h#L247
The kNumClasses is 54 for SizeClassMap of PrimaryInternalAllocator.
The last store, movb $0x0, 0x17bb74(%rip) corresponds to internal_allocator_cache_mu.Unlock()

dvyukov · 2024-04-16T05:12:48Z

On x86 I don't see any difference for __tsan_read*/__tsan_write* in the x86 assembly (other than the relative addresses) > __tsan_func_entry/exit.

Thanks for double checking.
If __tsan_read*/__tsan_write* are not negatively affected, then it's good enough for me.
Change in _ZN11__sanitizer23InternalAllocatorUnlockEv looks fine, if compiler thinks it's better.

Enna1 · 2024-04-16T06:25:26Z

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/CMakeLists.txt#L125 should also be updated?

arichardson · 2024-04-16T17:48:20Z

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/CMakeLists.txt#L125 should also be updated?

Thanks, that's probably why CI is failing (not sure why it didn't locally).

Created using spr 1.3.6-beta.1

[𝘀𝗽𝗿] initial version

5a369e9

Created using spr 1.3.6-beta.1

llvmbot added compiler-rt compiler-rt:sanitizer labels Mar 8, 2024

arichardson requested review from dvyukov, jyknight and efriedma-quic March 8, 2024 07:38

arichardson added 2 commits April 14, 2024 22:37

Use builtins everywhere

695aabc

Created using spr 1.3.6-beta.1

clang-format

bff6f33

Created using spr 1.3.6-beta.1

dvyukov approved these changes Apr 16, 2024

View reviewed changes

remove deleted headers from cmakelists and gn build files

0c90983

Created using spr 1.3.6-beta.1

arichardson merged commit abd5e45 into main Apr 17, 2024

arichardson deleted the users/arichardson/spr/compiler-rt-use-__atomic-builtins-whenever-possible branch April 17, 2024 15:42

aheejin mentioned this pull request Dec 19, 2024

Update compiler-rt to LLVM 19.1.4 emscripten-core/emscripten#22937

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compiler-rt] Use __atomic builtins whenever possible #84439

[compiler-rt] Use __atomic builtins whenever possible #84439

Uh oh!

arichardson commented Mar 8, 2024 •

edited

Loading

Uh oh!

llvmbot commented Mar 8, 2024

Uh oh!

github-actions bot commented Mar 8, 2024 •

edited

Loading

Uh oh!

dvyukov commented Mar 8, 2024

Uh oh!

arichardson commented Mar 8, 2024

Uh oh!

arichardson commented Mar 8, 2024 •

edited

Loading

Uh oh!

dvyukov commented Mar 9, 2024

Uh oh!

jyknight commented Mar 12, 2024

Uh oh!

arichardson commented Apr 15, 2024 •

edited

Loading

Uh oh!

Enna1 commented Apr 16, 2024

Uh oh!

dvyukov commented Apr 16, 2024

Uh oh!

Enna1 commented Apr 16, 2024

Uh oh!

arichardson commented Apr 16, 2024

Uh oh!

Uh oh!

[compiler-rt] Use __atomic builtins whenever possible #84439

[compiler-rt] Use __atomic builtins whenever possible #84439

Uh oh!

Conversation

arichardson commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 8, 2024

Uh oh!

github-actions bot commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvyukov commented Mar 8, 2024

Uh oh!

arichardson commented Mar 8, 2024

Uh oh!

arichardson commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvyukov commented Mar 9, 2024

Uh oh!

jyknight commented Mar 12, 2024

Uh oh!

arichardson commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Enna1 commented Apr 16, 2024

Uh oh!

dvyukov commented Apr 16, 2024

Uh oh!

Enna1 commented Apr 16, 2024

Uh oh!

arichardson commented Apr 16, 2024

Uh oh!

Uh oh!

arichardson commented Mar 8, 2024 •

edited

Loading

github-actions bot commented Mar 8, 2024 •

edited

Loading

arichardson commented Mar 8, 2024 •

edited

Loading

arichardson commented Apr 15, 2024 •

edited

Loading