Skip to content

[compiler-rt] Use __atomic builtins whenever possible #84439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arichardson
Copy link
Member

@arichardson arichardson commented Mar 8, 2024

The code in this file dates back to 2012 when Clang's support for atomic
builtins was still quite limited. The bugs referenced in the comment
at the top of the file have long been fixed and using the compiler
builtins directly should now generate slightly better code.
Additionally, this allows using the atomic builtin header for platforms
where the __sync_builtins are lacking (e.g. Arm Morello).

This change does not introduce any code generation changes for
__tsan_read*/__tsan_write* or _tsan_func{entry,exit} on x86, which
indicates the previously noted compiler issues have been fixed.

We also have to touch the non-clang codepaths here since the only way we
can make this work easily is by making the memory_order enum match the
compiler-provided macros, so we have to update the debug checks that
assumed the enum was always a bitflag.

The one downside of this change is that 32-bit MIPS now definitely
requires libatomic (but that may already have been needed for RMW ops).

Created using spr 1.3.6-beta.1
@llvmbot
Copy link
Member

llvmbot commented Mar 8, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Alexander Richardson (arichardson)

Changes

The code in this file dates back to 2012 when Clang's support for atomic
builtins was still quite limited. The bugs referenced in the comment
at the top of the file have long been fixed and using the compiler
builtins directly should now generate slightly better code.
Additionally, this allows using the atomic builtin header for platforms
where the __sync_builtins are lacking (e.g. Arm Morello).

We also have to touch the non-clang codepaths here since the only way we
can make this work easily is by making the memory_order enum match the
compiler-provided macros, so we have to update the debug checks that
assumed the enum was always a bitflag.


Full diff: https://github.com/llvm/llvm-project/pull/84439.diff

6 Files Affected:

  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic.h (+9)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h (+17-32)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h (+8-8)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h (+4-4)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h (+4-4)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h (+3-4)
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h
index 46f06957228c9b..2dab4e4b8725d7 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic.h
@@ -18,12 +18,21 @@
 namespace __sanitizer {
 
 enum memory_order {
+#ifdef __ATOMIC_SEQ_CST
+  memory_order_relaxed = __ATOMIC_RELAXED,
+  memory_order_consume = __ATOMIC_CONSUME,
+  memory_order_acquire = __ATOMIC_ACQUIRE,
+  memory_order_release = __ATOMIC_RELEASE,
+  memory_order_acq_rel = __ATOMIC_ACQ_REL,
+  memory_order_seq_cst = __ATOMIC_SEQ_CST
+#else
   memory_order_relaxed = 1 << 0,
   memory_order_consume = 1 << 1,
   memory_order_acquire = 1 << 2,
   memory_order_release = 1 << 3,
   memory_order_acq_rel = 1 << 4,
   memory_order_seq_cst = 1 << 5
+#endif
 };
 
 struct atomic_uint8_t {
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h
index 4318d64d16cfa2..e4f02b7f6f685c 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang.h
@@ -22,52 +22,37 @@
 
 namespace __sanitizer {
 
-// We would like to just use compiler builtin atomic operations
-// for loads and stores, but they are mostly broken in clang:
-// - they lead to vastly inefficient code generation
-// (http://llvm.org/bugs/show_bug.cgi?id=17281)
-// - 64-bit atomic operations are not implemented on x86_32
-// (http://llvm.org/bugs/show_bug.cgi?id=15034)
-// - they are not implemented on ARM
-// error: undefined reference to '__atomic_load_4'
+// We use the compiler builtin atomic operations for loads and stores, which
+// generates correct code for all architectures, but may require libatomic
+// on platforms where e.g. 64-bit atomics are not supported natively.
 
 // See http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
 // for mappings of the memory model to different processors.
 
-inline void atomic_signal_fence(memory_order) {
-  __asm__ __volatile__("" ::: "memory");
-}
+inline void atomic_signal_fence(memory_order mo) { __atomic_signal_fence(mo); }
 
-inline void atomic_thread_fence(memory_order) {
-  __sync_synchronize();
-}
+inline void atomic_thread_fence(memory_order mo) { __atomic_thread_fence(mo); }
 
-template<typename T>
-inline typename T::Type atomic_fetch_add(volatile T *a,
-    typename T::Type v, memory_order mo) {
-  (void)mo;
+template <typename T>
+inline typename T::Type atomic_fetch_add(volatile T *a, typename T::Type v,
+                                         memory_order mo) {
   DCHECK(!((uptr)a % sizeof(*a)));
-  return __sync_fetch_and_add(&a->val_dont_use, v);
+  return __atomic_fetch_add(&a->val_dont_use, v, mo);
 }
 
-template<typename T>
-inline typename T::Type atomic_fetch_sub(volatile T *a,
-    typename T::Type v, memory_order mo) {
+template <typename T>
+inline typename T::Type atomic_fetch_sub(volatile T *a, typename T::Type v,
+                                         memory_order mo) {
   (void)mo;
   DCHECK(!((uptr)a % sizeof(*a)));
-  return __sync_fetch_and_add(&a->val_dont_use, -v);
+  return __atomic_fetch_sub(&a->val_dont_use, v, mo);
 }
 
-template<typename T>
-inline typename T::Type atomic_exchange(volatile T *a,
-    typename T::Type v, memory_order mo) {
+template <typename T>
+inline typename T::Type atomic_exchange(volatile T *a, typename T::Type v,
+                                        memory_order mo) {
   DCHECK(!((uptr)a % sizeof(*a)));
-  if (mo & (memory_order_release | memory_order_acq_rel | memory_order_seq_cst))
-    __sync_synchronize();
-  v = __sync_lock_test_and_set(&a->val_dont_use, v);
-  if (mo == memory_order_seq_cst)
-    __sync_synchronize();
-  return v;
+  return __atomic_exchange_n(&a->val_dont_use, v, mo);
 }
 
 template <typename T>
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h
index f3d3052e5b7c5c..5fc2bf25f3cd51 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_mips.h
@@ -40,8 +40,8 @@ template <>
 inline atomic_uint64_t::Type atomic_fetch_add(volatile atomic_uint64_t *ptr,
                                               atomic_uint64_t::Type val,
                                               memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   atomic_uint64_t::Type ret;
@@ -66,8 +66,8 @@ inline bool atomic_compare_exchange_strong(volatile atomic_uint64_t *ptr,
                                            atomic_uint64_t::Type *cmp,
                                            atomic_uint64_t::Type xchg,
                                            memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   typedef atomic_uint64_t::Type Type;
@@ -89,8 +89,8 @@ inline bool atomic_compare_exchange_strong(volatile atomic_uint64_t *ptr,
 template <>
 inline atomic_uint64_t::Type atomic_load(const volatile atomic_uint64_t *ptr,
                                          memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   atomic_uint64_t::Type zero = 0;
@@ -102,8 +102,8 @@ inline atomic_uint64_t::Type atomic_load(const volatile atomic_uint64_t *ptr,
 template <>
 inline void atomic_store(volatile atomic_uint64_t *ptr, atomic_uint64_t::Type v,
                          memory_order mo) {
-  DCHECK(mo &
-         (memory_order_relaxed | memory_order_release | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)ptr % sizeof(*ptr)));
 
   __spin_lock(&lock.lock);
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h
index 557082a636b879..5ace0d2213275d 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_other.h
@@ -24,8 +24,8 @@ inline void proc_yield(int cnt) {
 template<typename T>
 inline typename T::Type atomic_load(
     const volatile T *a, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_consume
-      | memory_order_acquire | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_consume ||
+         mo == memory_order_acquire || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   typename T::Type v;
 
@@ -58,8 +58,8 @@ inline typename T::Type atomic_load(
 
 template<typename T>
 inline void atomic_store(volatile T *a, typename T::Type v, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_release
-      | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
 
   if (sizeof(*a) < 8 || sizeof(void*) == 8) {
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h
index b81a354d209872..966463acd04b8a 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_clang_x86.h
@@ -26,8 +26,8 @@ inline void proc_yield(int cnt) {
 template<typename T>
 inline typename T::Type atomic_load(
     const volatile T *a, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_consume
-      | memory_order_acquire | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_consume ||
+         mo == memory_order_acquire || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   typename T::Type v;
 
@@ -71,8 +71,8 @@ inline typename T::Type atomic_load(
 
 template<typename T>
 inline void atomic_store(volatile T *a, typename T::Type v, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_release
-      | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release ||
+         mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
 
   if (sizeof(*a) < 8 || sizeof(void*) == 8) {
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h
index 31317adcdfc99f..ed82f62a1c96c3 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_atomic_msvc.h
@@ -70,8 +70,8 @@ inline void proc_yield(int cnt) {
 template<typename T>
 inline typename T::Type atomic_load(
     const volatile T *a, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_consume
-      | memory_order_acquire | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_consume
+      || mo == memory_order_acquire || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   typename T::Type v;
   // FIXME(dvyukov): 64-bit load is not atomic on 32-bits.
@@ -87,8 +87,7 @@ inline typename T::Type atomic_load(
 
 template<typename T>
 inline void atomic_store(volatile T *a, typename T::Type v, memory_order mo) {
-  DCHECK(mo & (memory_order_relaxed | memory_order_release
-      | memory_order_seq_cst));
+  DCHECK(mo == memory_order_relaxed || mo == memory_order_release || mo == memory_order_seq_cst);
   DCHECK(!((uptr)a % sizeof(*a)));
   // FIXME(dvyukov): 64-bit store is not atomic on 32-bits.
   if (mo == memory_order_relaxed) {

Copy link

github-actions bot commented Mar 8, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 8, 2024

Hi Alexander,

Thanks for working on this.

Please double-check that codegen for the following tsan functions in release builds don't change: __tsan_read1/2/4/8, __tsan_write1/2/4/8, __tsan_func_entry/exit. Please post diff of asm for these functions, if any.

If we are refacatoring this, can we simplify code more today? I guess Vistual Studio does not support these buitlins, so we can't just switch to them. But perhaps we could switch to std::atomic? If yes, I would assume we still keep the atomic operation wrapper functions to minimize the diff.

Also clang-format CI check complains, please re-format the code.

@arichardson
Copy link
Member Author

Hi Alexander,

Thanks for working on this.

Please double-check that codegen for the following tsan functions in release builds don't change: __tsan_read1/2/4/8, __tsan_write1/2/4/8, __tsan_func_entry/exit. Please post diff of asm for these functions, if any.

If we are refacatoring this, can we simplify code more today? I guess Vistual Studio does not support these buitlins, so we can't just switch to them. But perhaps we could switch to std::atomic? If yes, I would assume we still keep the atomic operation wrapper functions to minimize the diff.

Also clang-format CI check complains, please re-format the code.

Which architectures do you care about for these diffs? X86 should be easiest to generate, but I can also show AArch64, which might actually be better considering it has a weaker memory model?

The reason I did not use std::atomic is that I was under the assumption that we can't rely on the C++ standard library here and have to roll our own.

@arichardson
Copy link
Member Author

arichardson commented Mar 8, 2024

Also one more thing: this only touches the RMW operations, the loads/stores will require a separate patch that should allow for further simplification.

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 9, 2024

Which architectures do you care about for these diffs? X86 should be easiest to generate

x86 is good.

The reason I did not use std::atomic is that I was under the assumption that we can't rely on the C++ standard library here and have to roll our own.

Yes, this may be the case. I think I will withdraw this part of my comment, let's just ensure there are no codegen degradation for the most important functions.

@jyknight
Copy link
Member

To use the __atomic builtins and rely on libatomic for non-native atomics, you must do so consistently. You can't correctly mix-and-match RMW and load/store, or they might not be atomic with regards to each-other.

That means:

  • delete sanitizer_atomic_clang_x86.h (preserve the architecture-specific definition of proc_yield, though it's unrelated to atomics)
  • delete sanitizer_atomic_clang_mips.h
  • delete sanitizer_atomic_clang_other.h, moving the functions into into sanitizer_atomic_clang.h and change atomic_load/atomic_store to call __atomic_load/__atomic_store instead of its custom impl.

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@arichardson
Copy link
Member Author

arichardson commented Apr 15, 2024

Hi Alexander,

Thanks for working on this.

Please double-check that codegen for the following tsan functions in release builds don't change: __tsan_read1/2/4/8, __tsan_write1/2/4/8, __tsan_func_entry/exit. Please post diff of asm for these functions, if any.

If we are refacatoring this, can we simplify code more today? I guess Vistual Studio does not support these buitlins, so we can't just switch to them. But perhaps we could switch to std::atomic? If yes, I would assume we still keep the atomic operation wrapper functions to minimize the diff.

Also clang-format CI check complains, please re-format the code.

On x86 I don't see any difference for __tsan_read*/__tsan_write* in the x86 assembly (other than the relative addresses). Same for __tsan_func_entry/exit.

I do see one large diff. Code before:

<_ZN11__sanitizer23InternalAllocatorUnlockEv>:
               	pushq	%rax
               	callq	0x3cc00 <_ZN11__sanitizer18internal_allocatorEv>
               	movb	$0x0, 0x17be73(%rip)    # 0x1b8f80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x81000>
               	xorl	%eax, %eax
               	leaq	0xfae6a(%rip), %rcx     # 0x137f80 <_ZN11__sanitizerL26internal_alloc_placeholderE>
               	nopw	%cs:(%rax,%rax)
               	movb	$0x0, 0x40d80(%rax,%rcx)
               	addq	$-0x40, %rax
               	cmpq	$-0xd80, %rax           # imm = 0xF280
               	jne	0x3d120 <_ZN11__sanitizer23InternalAllocatorUnlockEv+0x20>
               	movb	$0x0, 0x17be87(%rip)    # 0x1b8fc2 <_ZN11__sanitizerL27internal_allocator_cache_muE>
               	popq	%rax
               	retq
               	nopl	(%rax)

After has a lot of extra stores:

<_ZN11__sanitizer23InternalAllocatorUnlockEv>:
               	pushq	%rax
               	callq	0x3cc00 <_ZN11__sanitizer18internal_allocatorEv>
               	movb	$0x0, 0x17bcb3(%rip)    # 0x1b8f80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x81000>
               	movb	$0x0, 0x13ba2c(%rip)    # 0x178d00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40d80>
               	movb	$0x0, 0x13b9e5(%rip)    # 0x178cc0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40d40>
               	movb	$0x0, 0x13b99e(%rip)    # 0x178c80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40d00>
               	movb	$0x0, 0x13b957(%rip)    # 0x178c40 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40cc0>
               	movb	$0x0, 0x13b910(%rip)    # 0x178c00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40c80>
               	movb	$0x0, 0x13b8c9(%rip)    # 0x178bc0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40c40>
               	movb	$0x0, 0x13b882(%rip)    # 0x178b80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40c00>
               	movb	$0x0, 0x13b83b(%rip)    # 0x178b40 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40bc0>
               	movb	$0x0, 0x13b7f4(%rip)    # 0x178b00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40b80>
               	movb	$0x0, 0x13b7ad(%rip)    # 0x178ac0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40b40>
               	movb	$0x0, 0x13b766(%rip)    # 0x178a80 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40b00>
               	movb	$0x0, 0x13b71f(%rip)    # 0x178a40 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40ac0>
               	movb	$0x0, 0x13b6d8(%rip)    # 0x178a00 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40a80>
               	movb	$0x0, 0x13b691(%rip)    # 0x1789c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40a40>
               	movb	$0x0, 0x13b64a(%rip)    # 0x178980 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40a00>
               	movb	$0x0, 0x13b603(%rip)    # 0x178940 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x409c0>
               	movb	$0x0, 0x13b5bc(%rip)    # 0x178900 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40980>
               	movb	$0x0, 0x13b575(%rip)    # 0x1788c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40940>
               	movb	$0x0, 0x13b52e(%rip)    # 0x178880 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40900>
               	movb	$0x0, 0x13b4e7(%rip)    # 0x178840 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x408c0>
               	movb	$0x0, 0x13b4a0(%rip)    # 0x178800 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40880>
               	movb	$0x0, 0x13b459(%rip)    # 0x1787c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40840>
               	movb	$0x0, 0x13b412(%rip)    # 0x178780 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40800>
               	movb	$0x0, 0x13b3cb(%rip)    # 0x178740 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x407c0>
               	movb	$0x0, 0x13b384(%rip)    # 0x178700 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40780>
               	movb	$0x0, 0x13b33d(%rip)    # 0x1786c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40740>
               	movb	$0x0, 0x13b2f6(%rip)    # 0x178680 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40700>
               	movb	$0x0, 0x13b2af(%rip)    # 0x178640 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x406c0>
               	movb	$0x0, 0x13b268(%rip)    # 0x178600 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40680>
               	movb	$0x0, 0x13b221(%rip)    # 0x1785c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40640>
               	movb	$0x0, 0x13b1da(%rip)    # 0x178580 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40600>
               	movb	$0x0, 0x13b193(%rip)    # 0x178540 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x405c0>
               	movb	$0x0, 0x13b14c(%rip)    # 0x178500 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40580>
               	movb	$0x0, 0x13b105(%rip)    # 0x1784c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40540>
               	movb	$0x0, 0x13b0be(%rip)    # 0x178480 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40500>
               	movb	$0x0, 0x13b077(%rip)    # 0x178440 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x404c0>
               	movb	$0x0, 0x13b030(%rip)    # 0x178400 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40480>
               	movb	$0x0, 0x13afe9(%rip)    # 0x1783c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40440>
               	movb	$0x0, 0x13afa2(%rip)    # 0x178380 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40400>
               	movb	$0x0, 0x13af5b(%rip)    # 0x178340 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x403c0>
               	movb	$0x0, 0x13af14(%rip)    # 0x178300 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40380>
               	movb	$0x0, 0x13aecd(%rip)    # 0x1782c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40340>
               	movb	$0x0, 0x13ae86(%rip)    # 0x178280 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40300>
               	movb	$0x0, 0x13ae3f(%rip)    # 0x178240 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x402c0>
               	movb	$0x0, 0x13adf8(%rip)    # 0x178200 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40280>
               	movb	$0x0, 0x13adb1(%rip)    # 0x1781c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40240>
               	movb	$0x0, 0x13ad6a(%rip)    # 0x178180 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40200>
               	movb	$0x0, 0x13ad23(%rip)    # 0x178140 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x401c0>
               	movb	$0x0, 0x13acdc(%rip)    # 0x178100 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40180>
               	movb	$0x0, 0x13ac95(%rip)    # 0x1780c0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40140>
               	movb	$0x0, 0x13ac4e(%rip)    # 0x178080 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40100>
               	movb	$0x0, 0x13ac07(%rip)    # 0x178040 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x400c0>
               	movb	$0x0, 0x13abc0(%rip)    # 0x178000 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40080>
               	movb	$0x0, 0x13ab79(%rip)    # 0x177fc0 <_ZN11__sanitizerL26internal_alloc_placeholderE+0x40040>
               	movb	$0x0, 0x17bb74(%rip)    # 0x1b8fc2 <_ZN11__sanitizerL27internal_allocator_cache_muE>
               	popq	%rax
               	retq

It looks like the store mo_release loop was unrolled here if I read this correctly.
I have very limited understanding of x86 ASM so this is guessing based on knowing arm/aarch64/risc-v/mips instructions.

@Enna1
Copy link
Contributor

Enna1 commented Apr 16, 2024

It looks like the store mo_release loop was unrolled here if I read this correctly.

I think you are right.

There are 56 stores/movb in __sanitizer::InternalAllocatorUnlock() x86 assemmby.
I have a look at the implementation of __sanitizer::InternalAllocatorUnlock(), https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cpp#L134

  1. The first store corresponds to https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator_secondary.h#L272
  2. The following 54 stores corresponds to unrolled loop at https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator_primary32.h#L247
    The kNumClasses is 54 for SizeClassMap of PrimaryInternalAllocator.
  3. The last store, movb $0x0, 0x17bb74(%rip) corresponds to internal_allocator_cache_mu.Unlock()

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 16, 2024

On x86 I don't see any difference for __tsan_read*/__tsan_write* in the x86 assembly (other than the relative addresses) > __tsan_func_entry/exit.

Thanks for double checking.
If __tsan_read*/__tsan_write* are not negatively affected, then it's good enough for me.
Change in _ZN11__sanitizer23InternalAllocatorUnlockEv looks fine, if compiler thinks it's better.

@Enna1
Copy link
Contributor

Enna1 commented Apr 16, 2024

@arichardson
Copy link
Member Author

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/CMakeLists.txt#L125 should also be updated?

Thanks, that's probably why CI is failing (not sure why it didn't locally).

@arichardson arichardson merged commit abd5e45 into main Apr 17, 2024
@arichardson arichardson deleted the users/arichardson/spr/compiler-rt-use-__atomic-builtins-whenever-possible branch April 17, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants