Skip to content

[RTSan][Apple] Disable AccessingALargeAtomicVariableDiesWhenRealtime #129309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

thetruestblue
Copy link
Contributor

@thetruestblue thetruestblue commented Feb 28, 2025

@llvmbot
Copy link
Member

llvmbot commented Feb 28, 2025

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (thetruestblue)

Changes

…on Apple devices

Disabling this test since it failing Apple CI jobs:
https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/3694/testReport/RealtimeSanitizer-Unit/__Rtsan-x86_64h-Test_TestRtsan/AccessingALargeAtomicVariableDiesWhenRealtime/

rdar://145488759

Full diff: https://github.com/llvm/llvm-project/pull/129309.diff

1 Files Affected:

  • (modified) compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp (+3)
diff --git a/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp b/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp
index e05d7ae78b5d9..85ccc98400918 100644
--- a/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp
+++ b/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp
@@ -171,6 +171,9 @@ TEST(TestRtsan, CopyingALambdaWithLargeCaptureDiesWhenRealtime) {
 }
 
 TEST(TestRtsan, AccessingALargeAtomicVariableDiesWhenRealtime) {
+  #ifdef __APPLE__
+  GTEST_SKIP(); // Test is failing on Apple Devices.
+  #endif
   std::atomic<float> small_atomic{0.0f};
   ASSERT_TRUE(small_atomic.is_lock_free());
   RealtimeInvoke([&small_atomic]() {

@cjappl
Copy link
Contributor

cjappl commented Feb 28, 2025 via email

@thetruestblue
Copy link
Contributor Author

thetruestblue commented Feb 28, 2025

Hi @thetruestblue, I think ideally we would hunt down and figure out why this is failing to die. Both of the main RTSan devs work on macs primarily, and this test has always passed for us locally. Is it possible to figure out what kind of lock is being taken by the 'not is_lock_free' large atomic, and intercept that? That would be preferable to disabling the test.

I agree it would be ideal to hunt this down and get the correct interceptor set up. Unfortunately, this has been failing in our CI for a few weeks now. Would it be possible to use the failing test above to try and get the devs who work on this set up for a repro? We need to address the failing test, but want to also work with you to get this working. But there is a time commitment issue currently.

I can spend some limited time on this maybe next week, but I think ideally the people working on this should also try to get set up so that they can reproduce failures that pop up in the Apple CI. Happy to troubleshoot that with you as well.

@cjappl
Copy link
Contributor

cjappl commented Feb 28, 2025 via email

@cjappl
Copy link
Contributor

cjappl commented Mar 1, 2025

Alright, here is what I have for you:

https://gist.github.com/cjappl/be6544731e973497b58cd6af4a0e318f

This reproduces the problem in the simplest way I can think of, basically a re-hash of the unit test but standalone.

From my poking around on my machine, I think the possibly fastest way to finding the culprit is to look at the internal Apple implementation of __atomic_load in libcompiler_rt
#1 0x00019a27cb78 in __atomic_load+0xb4 (libcompiler_rt.dylib:arm64+0xb78).

On my machine, this eventually goes to OSSpinLockLock, one of our intercepted calls. On the problematic machine, it inserts something else that we seemingly do not intercept. I am hopeful looking at the source points us to the function we are missing, or at least gives us more info. I was not able to get satisfying answers out of Instruments for this one.

but I think ideally the people working on this should also try to get set up so that they can reproduce failures that pop up in the Apple CI. Happy to troubleshoot that with you as well.

I'd be more than happy to get this instruction! I also assume it is off the table, but if I were to get SSH access to this machine (or a machine that repros this problem) I would be more than happy to spend some time and see if I could figure it out.

Some possible hints on where this may be:

  • Is it an STL version thing? A newer update, or an older version?
  • Is it ARM vs X86?

Shout if you get any good leads, happy to help.

@davidtrevelyan
Copy link
Contributor

@thetruestblue is there any chance you could possibly update us on your discussions once you've had them? I've investigated this and am able to reproduce the test failure on my machine using the MacOS15.2 SDK. Unfortunately, however, there doesn't appear to be a straightforward solution to the problem, at least from what I can see. We'd very much appreciate you sharing any insight you and your team may have had!

@thetruestblue
Copy link
Contributor Author

@thetruestblue is there any chance you could possibly update us on your discussions once you've had them? I've investigated this and am able to reproduce the test failure on my machine using the MacOS15.2 SDK. Unfortunately, however, there doesn't appear to be a straightforward solution to the problem, at least from what I can see. We'd very much appreciate you sharing any insight you and your team may have had!

My repro was specific to my local OS, and I wasn't actually reproing the issue in CI. Though I think I know whats going on. It has to do with how we are handling the deprecation of OSSpinLock.

Can you look how we recently updated the TSan OSSpinLock interceptor, and see if adopting that same mechanism fixes the issues on 15.2? See: a34159f

I am provisions a device to try to confirm what you're seeing.

@davidtrevelyan
Copy link
Contributor

@thetruestblue is there any chance you could possibly update us on your discussions once you've had them? I've investigated this and am able to reproduce the test failure on my machine using the MacOS15.2 SDK. Unfortunately, however, there doesn't appear to be a straightforward solution to the problem, at least from what I can see. We'd very much appreciate you sharing any insight you and your team may have had!

My repro was specific to my local OS, and I wasn't actually reproing the issue in CI. Though I think I know whats going on. It has to do with how we are handling the deprecation of OSSpinLock.

Can you look how we recently updated the TSan OSSpinLock interceptor, and see if adopting that same mechanism fixes the issues on 15.2? See: a34159f

I am provisions a device to try to confirm what you're seeing.

Many thanks for the update @thetruestblue. I tried the change you linked in a34159f and unfortunately did not observe it fixing the test. Based on my (incomplete) understanding of the problem, this was not unexpected - I'll try to outline why I think this based on what I currently understand, in the hope that you may be able to help complete my understanding - and hopefully this may lead to us solving it together.

I'd like to start with a short recap of the purpose of this test, for anyone joining this discussion later wanting a quick overview. Atomic variables are ubiquitous in multi-threaded real-time programming and necessary for cross-thread data communication in nonblocking contexts. Atomic operations are lock-free for data types with sizes that are smaller-than-or-equal-to some maximum size as supported by the hardware, and this can be queried using the constexpr variable std::atomic<T>::is_always_lock_free. Anything larger than this size may result in the implementation inserting some sort of mutex or other locking for reads and writes. The difficulty for real-time programmers is that this is implicit - and avoiding it relies on i) deep knowledge of the hardware and STL implementation, and, perhaps with more risk, ii) nobody innocuously increasing the size of a type used in std::atomic beyond the lock-free limit. RTSan seeks to alert programmers to situations where the implementation of std::atomic uses locks (i.e. blocking operations) inside nonblocking contexts.

On Linux, we observed that std::atomic implementations insert calls to pthread_mutex_lock for larger data types. Intercepting pthread_mutex_lock therefore became our mechanism for "detecting" this blocking behaviour. Until now, on macOS, we have relied on being able to do something very similar - we observed that OSSpinLockLock was eventually invoked within std::atomic for larger types as the locking mechanism. @cjappl is still observing this behaviour on his machine, as you can see in his previous comment above. Anyway, this was great - we could easily intercept the call and detect the unwanted blocking behaviour. But now, here comes our problem - it appears like recent changes to the std::atomic implementation may have eliminated the calls to OSSpinLockLock, leaving us with no easy means of intercepting the locking for larger atomic types. Please bear in mind that this is where my understanding is incomplete - I'm not confident that my final statement here is true, and I'd be very grateful for your help in establishing whether my understanding is correct.

Here's a backtrace from the failing test on my machine (macOS SDK 15.2):

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 7.2
  * frame #0: 0x0000000100008a58 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] std::__1::array<float, 2048ul> std::__1::__cxx_atomic_load[abi:ne180100]<std::__1::array<float, 2048ul>>(__a=0x000000016fdfa988, __order=memory_order_seq_cst) at cxx_atomic_impl.h:349:10 [opt]
    frame #1: 0x0000000100008a58 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] std::__1::__atomic_base<std::__1::array<float, 2048ul>, false>::load[abi:ne180100](this=0x000000016fdfa988, __m=memory_order_seq_cst) const at atomic_base.h:60:12 [opt]
    frame #2: 0x0000000100008a58 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody()::$_1::operator()(this=<unavailable>) const at rtsan_test_functional.cpp:184:46 [opt]
    frame #3: 0x0000000100008a58 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] void rtsan_testing::ExpectNonRealtimeSurvival<TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody()::$_1&>(Func=<unavailable>) at rtsan_test_utilities.h:45:3 [opt]
    frame #4: 0x0000000100008a58 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody(this=<unavailable>) at rtsan_test_functional.cpp:188:3 [opt]
    frame #5: 0x0000000100066f68 Rtsan-arm64-Test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) [inlined] void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(object=<unavailable>, method=0x00000000000000010000000000000020, location="the test body") at gtest.cc:2612:10 [opt]
    frame #6: 0x0000000100066f58 Rtsan-arm64-Test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(object=0x0000600002984980, method=0x00000000000000000000000000000020, location="the test body") at gtest.cc:2648:14 [opt]
    frame #7: 0x0000000100066e44 Rtsan-arm64-Test`testing::Test::Run(this=0x0000600002984980) at gtest.cc:2687:5 [opt]
    frame #8: 0x000000010006842c Rtsan-arm64-Test`testing::TestInfo::Run(this=0x00000001560052e0) at gtest.cc:2836:11 [opt]
    frame #9: 0x0000000100069308 Rtsan-arm64-Test`testing::TestSuite::Run(this=0x00000001560043f0) at gtest.cc:3015:30 [opt]
    frame #10: 0x0000000100079ff0 Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests(this=0x0000000156004190) at gtest.cc:5920:44 [opt]
    frame #11: 0x0000000100079a04 Rtsan-arm64-Test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) [inlined] bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=<unavailable>, method=(Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5797), location="auxiliary test code (environments or event listeners)") at gtest.cc:2612:10 [opt]
    frame #12: 0x00000001000799f4 Rtsan-arm64-Test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=0x0000000156004190, method=(Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5797), location="auxiliary test code (environments or event listeners)") at gtest.cc:2648:14 [opt]
    frame #13: 0x000000010007995c Rtsan-arm64-Test`testing::UnitTest::Run(this=0x00000001000b8668) at gtest.cc:5484:10 [opt]
    frame #14: 0x000000010004cdd4 Rtsan-arm64-Test`main [inlined] RUN_ALL_TESTS() at gtest.h:2317:73 [opt]
    frame #15: 0x000000010004cdcc Rtsan-arm64-Test`main(argc=1, argv=<unavailable>) at rtsan_test_main.cpp:36:10 [opt]
    frame #16: 0x000000019e300274 dyld`start + 2840

To get this backtrace, I set a breakpoint in std::atomic::load and stepped in as far as I could. Where previously we ended up in OSSpinLockLock, here we have landed in std::__1::__cxx_atomic_load, which I found is implemented as follows:

template <class _Tp>
_LIBCPP_HIDE_FROM_ABI _Tp __cxx_atomic_load(__cxx_atomic_base_impl<_Tp> const* __a, memory_order __order) _NOEXCEPT {
  using __ptr_type = __remove_const_t<decltype(__a->__a_value)>*;
  return __c11_atomic_load(
      const_cast<__ptr_type>(std::addressof(__a->__a_value)), static_cast<__memory_order_underlying_t>(__order));
}

Note that this delegates immediately to the builtin __c11_atomic_load - no invocation of pthread_mutex_lock, OSSpinLockLock, or any other obvious locking. It's also clear that we are no longer in the atomic implementation provided by libcompiler_rt, as @cjappl is observing. Comparing this to the backtrace that Chris shared here hints at one key difference: in atomic_base.h, it appears like the implementation of std::atomic::load has changed from calling __atomic_load (seemingly implemented in libcompiler_rt.dylib) to calling std::__cxx_atomic_load (implemented in the macOS SDK).

The implementation from libcompiler_rt can be found in in compiler-rt/lib/builtins/atomic.c. The branching based on the size of the atomic type is clear - there are some lock-free checks before defaulting to memcpy wrapped in lock and unlock:

void __atomic_load_c(int size, void *src, void *dest, int model) {
#define LOCK_FREE_ACTION(type)                                                 \
  *((type *)dest) = __c11_atomic_load((_Atomic(type) *)src, model);            \
  return;
  LOCK_FREE_CASES(src);
#undef LOCK_FREE_ACTION
  Lock *l = lock_for_pointer(src);
  lock(l);
  memcpy(dest, src, size);
  unlock(l);
}

The implementations of lock and unlock are private, and are simply:

#elif defined(__APPLE__)
#include <libkern/OSAtomic.h>
typedef OSSpinLock Lock;
__inline static void unlock(Lock *l) { OSSpinLockUnlock(l); }
/// Locks a lock.  In the current implementation, this is potentially
/// unbounded in the contended case.
__inline static void lock(Lock *l) { OSSpinLockLock(l); }

This is where the invocations of OSSpinLockLock must have been coming from. But we're clearly no longer invoking this implementation in std::atomic::load. One key question I have at the moment is this: in the macOS SDK v15.2, where is the logical switch happening between implementations of atomic::load for types where is_always_lock_free is true, and those where it is false?

The answer to this question, I think, will determine whether we will be able to simply add an interceptor to detect the locking, or whether we're going to need some more advanced instrumentation at the codegen stage or in the optimisation pass.

Hope this is somewhat helpful! Looking forward to hearing whether this is consistent with your observations!

@thetruestblue
Copy link
Contributor Author

thetruestblue commented Mar 4, 2025

Thank you. I have not been able to give this adequate time. Your insights are helpful. Let me do digging as well.

@davidtrevelyan
Copy link
Contributor

@thetruestblue no problem - thanks for your support here. Good luck with the digging and let us know how you get on!

@davidtrevelyan
Copy link
Contributor

@thetruestblue no-pressure just checking in here. Did you and the team manage to find any time to consider this one?

@thetruestblue
Copy link
Contributor Author

thetruestblue commented Mar 11, 2025

@thetruestblue no-pressure just checking in here. Did you and the team manage to find any time to consider this one?

I do believe I have a decent understanding of what is going on. Apologies. I will work on getting you something to move forward with. It will be a day or two!

Can I ask -- what SDK are you using where OSSpinLock interceptor is working for the atomics?

@davidtrevelyan
Copy link
Contributor

@thetruestblue no-pressure just checking in here. Did you and the team manage to find any time to consider this one?

I do believe I have a decent understanding of what is going on. Apologies. I will work on getting you something to move forward with. It will be a day or two!

Can I ask -- what SDK are you using where OSSpinLock interceptor is working for the atomics?

Excellent - appreciate your efforts, and no pressure on timing from our side. As for the SDK you ask about that eventually triggers OSSpinLockLock (via libcompiler_rt's __atomic_load), @cjappl will have to report back as I'm not able to get it to work on my machine. @cjappl - could you confirm?

@thetruestblue
Copy link
Contributor Author

thetruestblue commented Mar 11, 2025

@davidtrevelyan you said you can only step to cxx_atomic_load. But in the other stack trace I see __atomic_load is what calls OSSpinLock. Can you try to set a break point on __atomic_load and confirm it is never hit? can you disassemble from that frame? I'd really appreciate if we could get a comparison between working device vs non-working device on that symbol.

@cjappl
Copy link
Contributor

cjappl commented Mar 11, 2025

@thetruestblue no-pressure just checking in here. Did you and the team manage to find any time to consider this one?

I do believe I have a decent understanding of what is going on. Apologies. I will work on getting you something to move forward with. It will be a day or two!
Can I ask -- what SDK are you using where OSSpinLock interceptor is working for the atomics?

Excellent - appreciate your efforts, and no pressure on timing from our side. As for the SDK you ask about that eventually triggers OSSpinLockLock (via libcompiler_rt's __atomic_load), @cjappl will have to report back as I'm not able to get it to work on my machine. @cjappl - could you confirm?

The SDK I am running on (the one that calls/intercepts OSSpinLockLock successfully) is MacOSX15.1.sdk

For the record the backtrace I get running the code in my gist is:

(lldb) break set --name __atomic_load
Breakpoint 4: where = libcompiler_rt.dylib`__atomic_load, address = 0x000000019a27cac4
(lldb) run
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 4.1
    frame #0: 0x000000019a27cac4 libcompiler_rt.dylib`__atomic_load
libcompiler_rt.dylib`__atomic_load:
->  0x19a27cac4 <+0>:  pacibsp
    0x19a27cac8 <+4>:  stp    x22, x21, [sp, #-0x30]!
    0x19a27cacc <+8>:  stp    x20, x19, [sp, #0x10]
    0x19a27cad0 <+12>: stp    x29, x30, [sp, #0x20]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 4.1
  * frame #0: 0x000000019a27cac4 libcompiler_rt.dylib`__atomic_load
    frame #1: 0x0000000100003eec a.out`BigStruct std::__1::__cxx_atomic_load[abi:ne180100]<BigStruct>(__a=0x000000016fdfea00, __order=memory_order_seq_cst) at cxx_atomic_impl.h:349:10
    frame #2: 0x0000000100003e64 a.out`std::__1::__atomic_base<BigStruct, false>::load[abi:ne180100](this=0x000000016fdfea00, __m=memory_order_seq_cst) const at atomic_base.h:60:12
    frame #3: 0x0000000100003e0c a.out`main at main.cpp:53:32
    frame #4: 0x000000018d2960e0 dyld`start + 2360

@cjappl
Copy link
Contributor

cjappl commented Mar 11, 2025

And for completeness:

(lldb) break set --name OSSpinLockLock                                                                                         Breakpoint 5: where = libsystem_platform.dylib`OSSpinLockLock, address = 0x000000018d64f518
(lldb) run
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 5.1
    frame #0: 0x000000018d64f518 libsystem_platform.dylib`OSSpinLockLock
libsystem_platform.dylib`OSSpinLockLock:
->  0x18d64f518 <+0>:  mov    w8, #0x0 ; =0
    0x18d64f51c <+4>:  mov    w9, #-0x1 ; =-1
    0x18d64f520 <+8>:  casa   w8, w9, [x0]
    0x18d64f524 <+12>: cmp    w8, #0x0
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 5.1
  * frame #0: 0x000000018d64f518 libsystem_platform.dylib`OSSpinLockLock
    frame #1: 0x000000019a27cb7c libcompiler_rt.dylib`__atomic_load + 184
    frame #2: 0x0000000100003eec a.out`BigStruct std::__1::__cxx_atomic_load[abi:ne180100]<BigStruct>(__a=0x000000016fdfea00, __order=memory_order_seq_cst) at cxx_atomic_impl.h:349:10
    frame #3: 0x0000000100003e64 a.out`std::__1::__atomic_base<BigStruct, false>::load[abi:ne180100](this=0x000000016fdfea00, __m=memory_order_seq_cst) const at atomic_base.h:60:12
    frame #4: 0x0000000100003e0c a.out`main at main.cpp:53:32
    frame #5: 0x000000018d2960e0 dyld`start + 2360
(lldb)

@davidtrevelyan
Copy link
Contributor

@davidtrevelyan you said you can only step to cxx_atomic_load. But in the other stack trace I see __atomic_load is what calls OSSpinLock. Can you try to set a break point on __atomic_load and confirm it is never hit? can you disassemble from that frame? I'd really appreciate if we could get a comparison between working device vs non-working device on that symbol.

@thetruestblue thanks so much for the insightful questions - your prompting has led me to what I believe could be a suitable fix. I'll answer your questions directly first, and then outline my idea for the fix. Reading between the lines of your questions, I suspect you may already be planning something along the same lines.

Can you try to set a break point on __atomic_load and confirm it is never hit?

Great question, thanks for asking me to revisit this. I had not previously appreciated that lldb doesn't automatically attach to forked child processes, and this is why I thought it wasn't being hit. I'll add some additional background info quickly here as to why this is relevant: as you're probably aware already, the unit tests in question here check that invoking some real-time unsafe operations do two things:

  1. trigger an RTSan error when invoked in a real-time context, as defined by the [[clang::nonblocking]] attribute, and
  2. do not trigger an RTSan error when invoked in a non-real-time context.

Asserting that 2. is true is, of course, trivial; we just invoke the function in the main test process and if it returns then we're good. But asserting that 1. is true is a bit more involved: we currently make use of googletest's EXPECT_DEATH macro, which forks the process and checks the return code. We have two helper methods that, given some function, make this testing easy:

  1. ExpectRealtimeDeath, and
  2. ExpectNonRealtimeSurvival.

Both accept a function as an argument and invoke it. ExpectRealtimeDeath is the function that invokes another [[clang::nonblocking]] function named RealtimeInvoke in a child process. What I found was:

  • __atomic_load was not hit during ExpectRealtimeDeath, but
  • __atomic_load was hit during ExpectNonRealtimeSurvival.

This makes sense! My debugger was not attaching automatically to the child process, so wasn't breaking in __atomic_load during ExpectRealtimeDeath. I temporarily switched to running RealtimeInvoke in the main process for exploration and, with my new-found ability to trap __atomic_load in the debugger, along with your next question, this led me to some new info.

can you disassemble from that frame?

Excellent idea - this was the key insight - thanks again. Yes. Here's what I found (backtrace first):

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
  * frame #0: 0x00000001ac241b18 libcompiler_rt.dylib`__atomic_load
    frame #1: 0x00000001000087a8 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] std::__1::array<float, 2048ul> std::__1::__cxx_atomic_load[abi:ne180100]<std::__1::array<float, 2048ul>>(__a=0x000000016fdfa8d8, __order=memory_order_seq_cst) at cxx_atomic_impl.h:349:10 [opt]
    frame #2: 0x0000000100008790 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] std::__1::__atomic_base<std::__1::array<float, 2048ul>, false>::load[abi:ne180100](this=0x000000016fdfa8d8, __m=memory_order_seq_cst) const at atomic_base.h:60:12 [opt]
    frame #3: 0x0000000100008790 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody()::$_1::operator()(this=<unavailable>) const at rtsan_test_functional.cpp:184:46 [opt]
    frame #4: 0x0000000100008790 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] void rtsan_testing::RealtimeInvoke<TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody()::$_1&>(Func=<unavailable>) at rtsan_test_utilities.h:21:3 [opt]
    frame #5: 0x000000010000878c Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody(this=<unavailable>) at rtsan_test_functional.cpp:187:3 [opt]
    frame #6: 0x0000000100066f68 Rtsan-arm64-Test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) [inlined] void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(object=<unavailable>, method=0x00000000000000010000000000000020, location="the test body") at gtest.cc:2612:10 [opt]
    frame #7: 0x0000000100066f58 Rtsan-arm64-Test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(object=0x000060000298c980, method=0x00000000000000000000000000000020, location="the test body") at gtest.cc:2648:14 [opt]
    frame #8: 0x0000000100066e44 Rtsan-arm64-Test`testing::Test::Run(this=0x000060000298c980) at gtest.cc:2687:5 [opt]
    frame #9: 0x000000010006842c Rtsan-arm64-Test`testing::TestInfo::Run(this=0x00000001278052e0) at gtest.cc:2836:11 [opt]
    frame #10: 0x0000000100069308 Rtsan-arm64-Test`testing::TestSuite::Run(this=0x00000001278043f0) at gtest.cc:3015:30 [opt]
    frame #11: 0x0000000100079ff0 Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests(this=0x0000000127804190) at gtest.cc:5920:44 [opt]
    frame #12: 0x0000000100079a04 Rtsan-arm64-Test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) [inlined] bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=<unavailable>, method=(Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5797), location="auxiliary test code (environments or event listeners)") at gtest.cc:2612:10 [opt]
    frame #13: 0x00000001000799f4 Rtsan-arm64-Test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=0x0000000127804190, method=(Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5797), location="auxiliary test code (environments or event listeners)") at gtest.cc:2648:14 [opt]
    frame #14: 0x000000010007995c Rtsan-arm64-Test`testing::UnitTest::Run(this=0x00000001000b8668) at gtest.cc:5484:10 [opt]
    frame #15: 0x000000010004cdd4 Rtsan-arm64-Test`main [inlined] RUN_ALL_TESTS() at gtest.h:2317:73 [opt]
    frame #16: 0x000000010004cdcc Rtsan-arm64-Test`main(argc=1, argv=<unavailable>) at rtsan_test_main.cpp:36:10 [opt]
    frame #17: 0x000000019e300274 dyld`start + 2840
(lldb) disassemble --frame
libcompiler_rt.dylib`__atomic_load:
->  0x1ac241b18 <+0>:   pacibsp
    0x1ac241b1c <+4>:   stp    x22, x21, [sp, #-0x30]!
    0x1ac241b20 <+8>:   stp    x20, x19, [sp, #0x10]
    0x1ac241b24 <+12>:  stp    x29, x30, [sp, #0x20]
    0x1ac241b28 <+16>:  mov    x19, x2
    0x1ac241b2c <+20>:  mov    x20, x1
    0x1ac241b30 <+24>:  mov    x21, x0
    0x1ac241b34 <+28>:  sub    w16, w0, #0x1
    0x1ac241b38 <+32>:  cmp    w16, #0x7
    0x1ac241b3c <+36>:  b.hi   0x1ac241bb0    ; <+152>
    0x1ac241b40 <+40>:  cmp    x16, #0x7
    0x1ac241b44 <+44>:  csel   x16, x16, xzr, ls
    0x1ac241b48 <+48>:  adrp   x17, 0
    0x1ac241b4c <+52>:  add    x17, x17, #0xc94 ; ___lldb_unnamed_symbol91
    0x1ac241b50 <+56>:  ldrsw  x16, [x17, x16, lsl #2]
    0x1ac241b54 <+60>:  adr    x17, 0x1ac241b54 ; <+60>
    0x1ac241b58 <+64>:  add    x16, x17, x16
    0x1ac241b5c <+68>:  br     x16
    0x1ac241b60 <+72>:  sub    w8, w3, #0x1
    0x1ac241b64 <+76>:  cmp    w8, #0x2
    0x1ac241b68 <+80>:  b.hs   0x1ac241c04    ; <+236>
    0x1ac241b6c <+84>:  ldaprb w8, [x20]
    0x1ac241b70 <+88>:  b      0x1ac241c3c    ; <+292>
    0x1ac241b74 <+92>:  tbnz   w20, #0x0, 0x1ac241bb0 ; <+152>
    0x1ac241b78 <+96>:  sub    w8, w3, #0x1
    0x1ac241b7c <+100>: cmp    w8, #0x2
    0x1ac241b80 <+104>: b.hs   0x1ac241c28    ; <+272>
    0x1ac241b84 <+108>: ldaprh w8, [x20]
    0x1ac241b88 <+112>: b      0x1ac241c68    ; <+336>
    0x1ac241b8c <+116>: tst    x20, #0x3
    0x1ac241b90 <+120>: b.ne   0x1ac241bb0    ; <+152>
    0x1ac241b94 <+124>: sub    w8, w3, #0x1
    0x1ac241b98 <+128>: cmp    w8, #0x2
    0x1ac241b9c <+132>: b.hs   0x1ac241c44    ; <+300>
    0x1ac241ba0 <+136>: ldapr  w8, [x20]
    0x1ac241ba4 <+140>: b      0x1ac241c74    ; <+348>
    0x1ac241ba8 <+144>: tst    x20, #0x7
    0x1ac241bac <+148>: b.eq   0x1ac241c14    ; <+252>
    0x1ac241bb0 <+152>: ubfx   x8, x20, #20, #12
    0x1ac241bb4 <+156>: eor    w8, w8, w20, lsr #4
    0x1ac241bb8 <+160>: and    x8, x8, #0x3ff
    0x1ac241bbc <+164>: adrp   x9, 367688
    0x1ac241bc0 <+168>: add    x9, x9, #0xdd8 ; locks
    0x1ac241bc4 <+172>: add    x22, x9, x8, lsl #2
    0x1ac241bc8 <+176>: mov    x0, x22
    0x1ac241bcc <+180>: bl     0x1ac2448ac    ; symbol stub for: _os_nospin_lock_lock
    0x1ac241bd0 <+184>: sxtw   x2, w21
    0x1ac241bd4 <+188>: mov    x0, x19
    0x1ac241bd8 <+192>: mov    x1, x20
    0x1ac241bdc <+196>: bl     0x1ac2448dc    ; symbol stub for: memcpy
    0x1ac241be0 <+200>: mov    x0, x22
    0x1ac241be4 <+204>: ldp    x29, x30, [sp, #0x20]
    0x1ac241be8 <+208>: ldp    x20, x19, [sp, #0x10]
    0x1ac241bec <+212>: ldp    x22, x21, [sp], #0x30
    0x1ac241bf0 <+216>: autibsp
    0x1ac241bf4 <+220>: eor    x16, x30, x30, lsl #1
    0x1ac241bf8 <+224>: tbz    x16, #0x3e, 0x1ac241c00 ; <+232>
    0x1ac241bfc <+228>: brk    #0xc471
    0x1ac241c00 <+232>: b      0x1ac2448bc    ; symbol stub for: _os_nospin_lock_unlock
    0x1ac241c04 <+236>: cmp    w3, #0x5
    0x1ac241c08 <+240>: b.ne   0x1ac241c38    ; <+288>
    0x1ac241c0c <+244>: ldarb  w8, [x20]
    0x1ac241c10 <+248>: b      0x1ac241c3c    ; <+292>
    0x1ac241c14 <+252>: sub    w8, w3, #0x1
    0x1ac241c18 <+256>: cmp    w8, #0x2
    0x1ac241c1c <+260>: b.hs   0x1ac241c54    ; <+316>
    0x1ac241c20 <+264>: ldapr  x8, [x20]
    0x1ac241c24 <+268>: b      0x1ac241c80    ; <+360>
    0x1ac241c28 <+272>: cmp    w3, #0x5
    0x1ac241c2c <+276>: b.ne   0x1ac241c64    ; <+332>
    0x1ac241c30 <+280>: ldarh  w8, [x20]
    0x1ac241c34 <+284>: b      0x1ac241c68    ; <+336>
    0x1ac241c38 <+288>: ldrb   w8, [x20]
    0x1ac241c3c <+292>: strb   w8, [x19]
    0x1ac241c40 <+296>: b      0x1ac241c84    ; <+364>
    0x1ac241c44 <+300>: cmp    w3, #0x5
    0x1ac241c48 <+304>: b.ne   0x1ac241c70    ; <+344>
    0x1ac241c4c <+308>: ldar   w8, [x20]
    0x1ac241c50 <+312>: b      0x1ac241c74    ; <+348>
    0x1ac241c54 <+316>: cmp    w3, #0x5
    0x1ac241c58 <+320>: b.ne   0x1ac241c7c    ; <+356>
    0x1ac241c5c <+324>: ldar   x8, [x20]
    0x1ac241c60 <+328>: b      0x1ac241c80    ; <+360>
    0x1ac241c64 <+332>: ldrh   w8, [x20]
    0x1ac241c68 <+336>: strh   w8, [x19]
    0x1ac241c6c <+340>: b      0x1ac241c84    ; <+364>
    0x1ac241c70 <+344>: ldr    w8, [x20]
    0x1ac241c74 <+348>: str    w8, [x19]
    0x1ac241c78 <+352>: b      0x1ac241c84    ; <+364>
    0x1ac241c7c <+356>: ldr    x8, [x20]
    0x1ac241c80 <+360>: str    x8, [x19]
    0x1ac241c84 <+364>: ldp    x29, x30, [sp, #0x20]
    0x1ac241c88 <+368>: ldp    x20, x19, [sp, #0x10]
    0x1ac241c8c <+372>: ldp    x22, x21, [sp], #0x30
    0x1ac241c90 <+376>: retab

Check that out! The implementation is using _os_nospin_lock_lock as its locking mechanism. I've git grepped the string os_nospin in the llvm-project repo and it's not present, so I can only presume that this is coming from the MacOS SDK.

So - here are my thoughts about a potential fix. I added an interceptor for _os_nospin_lock_lock (I had to forward-declare its argument type, because it's private to Darwin's os/lock.c implementation). Now the test passes on my machine. I'll raise a PR later today to show you and @cjappl, but here's the interceptor implementation for illustration in the meantime:

struct _os_nospin_lock_t;
INTERCEPTOR(void, _os_nospin_lock_lock, _os_nospin_lock_t *lock) {
  __rtsan_notify_intercepted_call("_os_nospin_lock_lock");
  return REAL(_os_nospin_lock_lock)(lock);
}

Let me know what you think. I'm not sure whether this solution is particularly sustainable - what if the Darwin implementation switches to a different locking mechanism in future, for example? I guess that's perhaps just something we have to deal with. Anyway, I'm keen to hear your thoughts.

@thetruestblue
Copy link
Contributor Author

Let me know what you think. I'm not sure whether this solution is particularly sustainable - what if the Darwin implementation switches to a different locking mechanism in future, for example? I guess that's perhaps just something we have to deal with. Anyway, I'm keen to hear your thoughts.

This matches my observations and concerns.

@davidtrevelyan
Copy link
Contributor

Let me know what you think. I'm not sure whether this solution is particularly sustainable - what if the Darwin implementation switches to a different locking mechanism in future, for example? I guess that's perhaps just something we have to deal with. Anyway, I'm keen to hear your thoughts.

This matches my observations and concerns.

Great. Glad we're on the same page. Concerns-wise, I haven't got any smart ideas at the moment. But I have put together this PR to hopefully get your CI green again: #131034 - let me know what you think, and please invite any other reviewers you feel would be helpful. Hopefully this should unblock your work and we can formulate a plan for making this interception more stable going forwards later?

@cjappl
Copy link
Contributor

cjappl commented Mar 12, 2025

Great sleuthing @davidtrevelyan

My 2c is that we should intercept this new method for now if that fixes the problem. In the future if the implementation changes again to ANOTHER approach we should cross the bridge of something more durable.

If this ends up being the long terms stable approach from MacOS the simple solution is great.

Looking at that other PR now.

@thetruestblue
Copy link
Contributor Author

No longer necessary to disable this test.

davidtrevelyan added a commit that referenced this pull request Mar 13, 2025
Follows the discussion here:
#129309

Recently, the test
`TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been
failing on newer MacOS versions, because the internal locking mechanism
in `std::atomic<T>::load` (for types `T` that are larger than the
hardware lock-free limit), has changed to a function that wasn't being
intercepted by rtsan.

This PR introduces an interceptor for `_os_nospin_lock_lock`, which is
the new internal locking mechanism.

_Note: we'd probably do well to introduce interceptors for
`_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also
appear to have blocking implementations. This can follow in a separate
PR._
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 13, 2025
…131034)

Follows the discussion here:
llvm/llvm-project#129309

Recently, the test
`TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been
failing on newer MacOS versions, because the internal locking mechanism
in `std::atomic<T>::load` (for types `T` that are larger than the
hardware lock-free limit), has changed to a function that wasn't being
intercepted by rtsan.

This PR introduces an interceptor for `_os_nospin_lock_lock`, which is
the new internal locking mechanism.

_Note: we'd probably do well to introduce interceptors for
`_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also
appear to have blocking implementations. This can follow in a separate
PR._
frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
Follows the discussion here:
llvm#129309

Recently, the test
`TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been
failing on newer MacOS versions, because the internal locking mechanism
in `std::atomic<T>::load` (for types `T` that are larger than the
hardware lock-free limit), has changed to a function that wasn't being
intercepted by rtsan.

This PR introduces an interceptor for `_os_nospin_lock_lock`, which is
the new internal locking mechanism.

_Note: we'd probably do well to introduce interceptors for
`_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also
appear to have blocking implementations. This can follow in a separate
PR._
thetruestblue pushed a commit to thetruestblue/llvm-project that referenced this pull request Apr 25, 2025
Follows the discussion here:
llvm#129309

Recently, the test
`TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been
failing on newer MacOS versions, because the internal locking mechanism
in `std::atomic<T>::load` (for types `T` that are larger than the
hardware lock-free limit), has changed to a function that wasn't being
intercepted by rtsan.

This PR introduces an interceptor for `_os_nospin_lock_lock`, which is
the new internal locking mechanism.

_Note: we'd probably do well to introduce interceptors for
`_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also
appear to have blocking implementations. This can follow in a separate
PR._

(cherry picked from commit 481a55a)
@thetruestblue thetruestblue deleted the disable-rtsan-failing branch April 29, 2025 18:34
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request May 10, 2025
Follows the discussion here:
llvm#129309

Recently, the test
`TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been
failing on newer MacOS versions, because the internal locking mechanism
in `std::atomic<T>::load` (for types `T` that are larger than the
hardware lock-free limit), has changed to a function that wasn't being
intercepted by rtsan.

This PR introduces an interceptor for `_os_nospin_lock_lock`, which is
the new internal locking mechanism.

_Note: we'd probably do well to introduce interceptors for
`_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also
appear to have blocking implementations. This can follow in a separate
PR._

(cherry picked from commit 481a55a)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 15, 2025
…131034)

Follows the discussion here:
llvm/llvm-project#129309

Recently, the test
`TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been
failing on newer MacOS versions, because the internal locking mechanism
in `std::atomic<T>::load` (for types `T` that are larger than the
hardware lock-free limit), has changed to a function that wasn't being
intercepted by rtsan.

This PR introduces an interceptor for `_os_nospin_lock_lock`, which is
the new internal locking mechanism.

_Note: we'd probably do well to introduce interceptors for
`_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also
appear to have blocking implementations. This can follow in a separate
PR._

(cherry picked from commit 481a55a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants