-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RTSan][Apple] Disable AccessingALargeAtomicVariableDiesWhenRealtime #129309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on Apple devices Disabling this test since it failing Apple CI jobs: https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/3694/testReport/RealtimeSanitizer-Unit/__Rtsan-x86_64h-Test_TestRtsan/AccessingALargeAtomicVariableDiesWhenRealtime/ rdar://145488759
@llvm/pr-subscribers-compiler-rt-sanitizer Author: None (thetruestblue) Changes…on Apple devices Disabling this test since it failing Apple CI jobs:
Full diff: https://github.com/llvm/llvm-project/pull/129309.diff 1 Files Affected:
diff --git a/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp b/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp
index e05d7ae78b5d9..85ccc98400918 100644
--- a/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp
+++ b/compiler-rt/lib/rtsan/tests/rtsan_test_functional.cpp
@@ -171,6 +171,9 @@ TEST(TestRtsan, CopyingALambdaWithLargeCaptureDiesWhenRealtime) {
}
TEST(TestRtsan, AccessingALargeAtomicVariableDiesWhenRealtime) {
+ #ifdef __APPLE__
+ GTEST_SKIP(); // Test is failing on Apple Devices.
+ #endif
std::atomic<float> small_atomic{0.0f};
ASSERT_TRUE(small_atomic.is_lock_free());
RealtimeInvoke([&small_atomic]() {
|
Hi @thetruestblue,
I think ideally we would hunt down and figure out why this is failing to die. Both of the main RTSan devs work on macs primarily, and this test has always passed for us locally.
Is it possible to figure out what kind of lock is being taken by the 'not is_lock_free' large atomic, and intercept that? That would be preferable to disabling the test.
…On Sat, Mar 1, 2025 at 6:16 AM, thetruestblue ***@***.***(mailto:On Sat, Mar 1, 2025 at 6:16 AM, thetruestblue <<a href=)> wrote:
***@***.***(https://github.com/thetruestblue) requested your review on: [#129309](#129309) [RTSan][Apple] Disable AccessingALargeAtomicVariableDiesWhenRealtime.
—
Reply to this email directly, [view it on GitHub](#129309 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADMDXY3WUO2HPL52U6G2WOT2SDG37AVCNFSM6AAAAABYDF3DZSVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJWGUYTOOJZGU3DQNI).
You are receiving this because your review was requested.Message ID: ***@***.***>
|
I agree it would be ideal to hunt this down and get the correct interceptor set up. Unfortunately, this has been failing in our CI for a few weeks now. Would it be possible to use the failing test above to try and get the devs who work on this set up for a repro? We need to address the failing test, but want to also work with you to get this working. But there is a time commitment issue currently. I can spend some limited time on this maybe next week, but I think ideally the people working on this should also try to get set up so that they can reproduce failures that pop up in the Apple CI. Happy to troubleshoot that with you as well. |
Yes, I will report back with a minimal binary by the end of the day that should repro the problem. (APAC timezone right now)
Hopefully hooking that binary up to Instruments or similar on the machine with the failure will point us right there.
…On Sat, Mar 1, 2025 at 7:03 AM, thetruestblue ***@***.***(mailto:On Sat, Mar 1, 2025 at 7:03 AM, thetruestblue <<a href=)> wrote:
> Hi ***@***.***(https://github.com/thetruestblue), I think ideally we would hunt down and figure out why this is failing to die. Both of the main RTSan devs work on macs primarily, and this test has always passed for us locally. Is it possible to figure out what kind of lock is being taken by the 'not is_lock_free' large atomic, and intercept that? That would be preferable to disabling the test.
> […](#)
> On Sat, Mar 1, 2025 at 6:16 AM, thetruestblue @.(mailto:On Sat, Mar 1, 2025 at 6:16 AM, thetruestblue <[wrote: @.]())(https://github.com/thetruestblue) requested your review on: #129309 [RTSan][Apple] Disable AccessingALargeAtomicVariableDiesWhenRealtime. — Reply to this email directly, [view it on GitHub]([#129309 (comment)](#129309 (comment))), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADMDXY3WUO2HPL52U6G2WOT2SDG37AVCNFSM6AAAAABYDF3DZSVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJWGUYTOOJZGU3DQNI). You are receiving this because your review was requested.Message ID: @.***>
I agree it would be ideal to hunt this down and get the correct interceptor set up. Unfortunately, this has been failing in our CI for a few weeks now. Would it be possible to use the failing test above to try and get set up for a repro? We are currently resource constrained to dig into rtsan support currently, but need to clean up the CI.
—
Reply to this email directly, [view it on GitHub](#129309 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADMDXYYJQ3DL5QJXQWTUEA32SDMLRAVCNFSM6AAAAABYDF3DZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJRGYZDCOJWGA).
You are receiving this because your review was requested.Message ID: ***@***.***>
[thetruestblue] thetruestblue left a comment [(llvm/llvm-project#129309)](#129309 (comment))
> Hi ***@***.***(https://github.com/thetruestblue), I think ideally we would hunt down and figure out why this is failing to die. Both of the main RTSan devs work on macs primarily, and this test has always passed for us locally. Is it possible to figure out what kind of lock is being taken by the 'not is_lock_free' large atomic, and intercept that? That would be preferable to disabling the test.
> […](#)
> On Sat, Mar 1, 2025 at 6:16 AM, thetruestblue @.(mailto:On Sat, Mar 1, 2025 at 6:16 AM, thetruestblue <[wrote: @.]())(https://github.com/thetruestblue) requested your review on: #129309 [RTSan][Apple] Disable AccessingALargeAtomicVariableDiesWhenRealtime. — Reply to this email directly, [view it on GitHub]([#129309 (comment)](#129309 (comment))), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADMDXY3WUO2HPL52U6G2WOT2SDG37AVCNFSM6AAAAABYDF3DZSVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJWGUYTOOJZGU3DQNI). You are receiving this because your review was requested.Message ID: @.***>
I agree it would be ideal to hunt this down and get the correct interceptor set up. Unfortunately, this has been failing in our CI for a few weeks now. Would it be possible to use the failing test above to try and get set up for a repro? We are currently resource constrained to dig into rtsan support currently, but need to clean up the CI.
—
Reply to this email directly, [view it on GitHub](#129309 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADMDXYYJQ3DL5QJXQWTUEA32SDMLRAVCNFSM6AAAAABYDF3DZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJRGYZDCOJWGA).
You are receiving this because your review was requested.Message ID: ***@***.***>
|
Alright, here is what I have for you: https://gist.github.com/cjappl/be6544731e973497b58cd6af4a0e318f This reproduces the problem in the simplest way I can think of, basically a re-hash of the unit test but standalone. From my poking around on my machine, I think the possibly fastest way to finding the culprit is to look at the internal Apple implementation of On my machine, this eventually goes to
I'd be more than happy to get this instruction! I also assume it is off the table, but if I were to get SSH access to this machine (or a machine that repros this problem) I would be more than happy to spend some time and see if I could figure it out. Some possible hints on where this may be:
Shout if you get any good leads, happy to help. |
@thetruestblue is there any chance you could possibly update us on your discussions once you've had them? I've investigated this and am able to reproduce the test failure on my machine using the MacOS15.2 SDK. Unfortunately, however, there doesn't appear to be a straightforward solution to the problem, at least from what I can see. We'd very much appreciate you sharing any insight you and your team may have had! |
My repro was specific to my local OS, and I wasn't actually reproing the issue in CI. Though I think I know whats going on. It has to do with how we are handling the deprecation of OSSpinLock. Can you look how we recently updated the TSan OSSpinLock interceptor, and see if adopting that same mechanism fixes the issues on 15.2? See: a34159f I am provisions a device to try to confirm what you're seeing. |
Many thanks for the update @thetruestblue. I tried the change you linked in a34159f and unfortunately did not observe it fixing the test. Based on my (incomplete) understanding of the problem, this was not unexpected - I'll try to outline why I think this based on what I currently understand, in the hope that you may be able to help complete my understanding - and hopefully this may lead to us solving it together. I'd like to start with a short recap of the purpose of this test, for anyone joining this discussion later wanting a quick overview. Atomic variables are ubiquitous in multi-threaded real-time programming and necessary for cross-thread data communication in On Linux, we observed that Here's a backtrace from the failing test on my machine (macOS SDK 15.2):
To get this backtrace, I set a breakpoint in template <class _Tp>
_LIBCPP_HIDE_FROM_ABI _Tp __cxx_atomic_load(__cxx_atomic_base_impl<_Tp> const* __a, memory_order __order) _NOEXCEPT {
using __ptr_type = __remove_const_t<decltype(__a->__a_value)>*;
return __c11_atomic_load(
const_cast<__ptr_type>(std::addressof(__a->__a_value)), static_cast<__memory_order_underlying_t>(__order));
} Note that this delegates immediately to the builtin The implementation from void __atomic_load_c(int size, void *src, void *dest, int model) {
#define LOCK_FREE_ACTION(type) \
*((type *)dest) = __c11_atomic_load((_Atomic(type) *)src, model); \
return;
LOCK_FREE_CASES(src);
#undef LOCK_FREE_ACTION
Lock *l = lock_for_pointer(src);
lock(l);
memcpy(dest, src, size);
unlock(l);
} The implementations of #elif defined(__APPLE__)
#include <libkern/OSAtomic.h>
typedef OSSpinLock Lock;
__inline static void unlock(Lock *l) { OSSpinLockUnlock(l); }
/// Locks a lock. In the current implementation, this is potentially
/// unbounded in the contended case.
__inline static void lock(Lock *l) { OSSpinLockLock(l); } This is where the invocations of The answer to this question, I think, will determine whether we will be able to simply add an interceptor to detect the locking, or whether we're going to need some more advanced instrumentation at the codegen stage or in the optimisation pass. Hope this is somewhat helpful! Looking forward to hearing whether this is consistent with your observations! |
Thank you. I have not been able to give this adequate time. Your insights are helpful. Let me do digging as well. |
@thetruestblue no problem - thanks for your support here. Good luck with the digging and let us know how you get on! |
@thetruestblue no-pressure just checking in here. Did you and the team manage to find any time to consider this one? |
I do believe I have a decent understanding of what is going on. Apologies. I will work on getting you something to move forward with. It will be a day or two! Can I ask -- what SDK are you using where OSSpinLock interceptor is working for the atomics? |
Excellent - appreciate your efforts, and no pressure on timing from our side. As for the SDK you ask about that eventually triggers |
@davidtrevelyan you said you can only step to cxx_atomic_load. But in the other stack trace I see __atomic_load is what calls OSSpinLock. Can you try to set a break point on __atomic_load and confirm it is never hit? can you disassemble from that frame? I'd really appreciate if we could get a comparison between working device vs non-working device on that symbol. |
The SDK I am running on (the one that calls/intercepts For the record the backtrace I get running the code in my gist is:
|
And for completeness:
|
@thetruestblue thanks so much for the insightful questions - your prompting has led me to what I believe could be a suitable fix. I'll answer your questions directly first, and then outline my idea for the fix. Reading between the lines of your questions, I suspect you may already be planning something along the same lines.
Great question, thanks for asking me to revisit this. I had not previously appreciated that lldb doesn't automatically attach to forked child processes, and this is why I thought it wasn't being hit. I'll add some additional background info quickly here as to why this is relevant: as you're probably aware already, the unit tests in question here check that invoking some real-time unsafe operations do two things:
Asserting that 2. is true is, of course, trivial; we just invoke the function in the main test process and if it returns then we're good. But asserting that 1. is true is a bit more involved: we currently make use of googletest's
Both accept a function as an argument and invoke it.
This makes sense! My debugger was not attaching automatically to the child process, so wasn't breaking in
Excellent idea - this was the key insight - thanks again. Yes. Here's what I found (backtrace first): (lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: 0x00000001ac241b18 libcompiler_rt.dylib`__atomic_load
frame #1: 0x00000001000087a8 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] std::__1::array<float, 2048ul> std::__1::__cxx_atomic_load[abi:ne180100]<std::__1::array<float, 2048ul>>(__a=0x000000016fdfa8d8, __order=memory_order_seq_cst) at cxx_atomic_impl.h:349:10 [opt]
frame #2: 0x0000000100008790 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] std::__1::__atomic_base<std::__1::array<float, 2048ul>, false>::load[abi:ne180100](this=0x000000016fdfa8d8, __m=memory_order_seq_cst) const at atomic_base.h:60:12 [opt]
frame #3: 0x0000000100008790 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody()::$_1::operator()(this=<unavailable>) const at rtsan_test_functional.cpp:184:46 [opt]
frame #4: 0x0000000100008790 Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody() [inlined] void rtsan_testing::RealtimeInvoke<TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody()::$_1&>(Func=<unavailable>) at rtsan_test_utilities.h:21:3 [opt]
frame #5: 0x000000010000878c Rtsan-arm64-Test`TestRtsan_AccessingALargeAtomicVariableDiesWhenRealtime_Test::TestBody(this=<unavailable>) at rtsan_test_functional.cpp:187:3 [opt]
frame #6: 0x0000000100066f68 Rtsan-arm64-Test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) [inlined] void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(object=<unavailable>, method=0x00000000000000010000000000000020, location="the test body") at gtest.cc:2612:10 [opt]
frame #7: 0x0000000100066f58 Rtsan-arm64-Test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(object=0x000060000298c980, method=0x00000000000000000000000000000020, location="the test body") at gtest.cc:2648:14 [opt]
frame #8: 0x0000000100066e44 Rtsan-arm64-Test`testing::Test::Run(this=0x000060000298c980) at gtest.cc:2687:5 [opt]
frame #9: 0x000000010006842c Rtsan-arm64-Test`testing::TestInfo::Run(this=0x00000001278052e0) at gtest.cc:2836:11 [opt]
frame #10: 0x0000000100069308 Rtsan-arm64-Test`testing::TestSuite::Run(this=0x00000001278043f0) at gtest.cc:3015:30 [opt]
frame #11: 0x0000000100079ff0 Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests(this=0x0000000127804190) at gtest.cc:5920:44 [opt]
frame #12: 0x0000000100079a04 Rtsan-arm64-Test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) [inlined] bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=<unavailable>, method=(Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5797), location="auxiliary test code (environments or event listeners)") at gtest.cc:2612:10 [opt]
frame #13: 0x00000001000799f4 Rtsan-arm64-Test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=0x0000000127804190, method=(Rtsan-arm64-Test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5797), location="auxiliary test code (environments or event listeners)") at gtest.cc:2648:14 [opt]
frame #14: 0x000000010007995c Rtsan-arm64-Test`testing::UnitTest::Run(this=0x00000001000b8668) at gtest.cc:5484:10 [opt]
frame #15: 0x000000010004cdd4 Rtsan-arm64-Test`main [inlined] RUN_ALL_TESTS() at gtest.h:2317:73 [opt]
frame #16: 0x000000010004cdcc Rtsan-arm64-Test`main(argc=1, argv=<unavailable>) at rtsan_test_main.cpp:36:10 [opt]
frame #17: 0x000000019e300274 dyld`start + 2840
(lldb) disassemble --frame
libcompiler_rt.dylib`__atomic_load:
-> 0x1ac241b18 <+0>: pacibsp
0x1ac241b1c <+4>: stp x22, x21, [sp, #-0x30]!
0x1ac241b20 <+8>: stp x20, x19, [sp, #0x10]
0x1ac241b24 <+12>: stp x29, x30, [sp, #0x20]
0x1ac241b28 <+16>: mov x19, x2
0x1ac241b2c <+20>: mov x20, x1
0x1ac241b30 <+24>: mov x21, x0
0x1ac241b34 <+28>: sub w16, w0, #0x1
0x1ac241b38 <+32>: cmp w16, #0x7
0x1ac241b3c <+36>: b.hi 0x1ac241bb0 ; <+152>
0x1ac241b40 <+40>: cmp x16, #0x7
0x1ac241b44 <+44>: csel x16, x16, xzr, ls
0x1ac241b48 <+48>: adrp x17, 0
0x1ac241b4c <+52>: add x17, x17, #0xc94 ; ___lldb_unnamed_symbol91
0x1ac241b50 <+56>: ldrsw x16, [x17, x16, lsl #2]
0x1ac241b54 <+60>: adr x17, 0x1ac241b54 ; <+60>
0x1ac241b58 <+64>: add x16, x17, x16
0x1ac241b5c <+68>: br x16
0x1ac241b60 <+72>: sub w8, w3, #0x1
0x1ac241b64 <+76>: cmp w8, #0x2
0x1ac241b68 <+80>: b.hs 0x1ac241c04 ; <+236>
0x1ac241b6c <+84>: ldaprb w8, [x20]
0x1ac241b70 <+88>: b 0x1ac241c3c ; <+292>
0x1ac241b74 <+92>: tbnz w20, #0x0, 0x1ac241bb0 ; <+152>
0x1ac241b78 <+96>: sub w8, w3, #0x1
0x1ac241b7c <+100>: cmp w8, #0x2
0x1ac241b80 <+104>: b.hs 0x1ac241c28 ; <+272>
0x1ac241b84 <+108>: ldaprh w8, [x20]
0x1ac241b88 <+112>: b 0x1ac241c68 ; <+336>
0x1ac241b8c <+116>: tst x20, #0x3
0x1ac241b90 <+120>: b.ne 0x1ac241bb0 ; <+152>
0x1ac241b94 <+124>: sub w8, w3, #0x1
0x1ac241b98 <+128>: cmp w8, #0x2
0x1ac241b9c <+132>: b.hs 0x1ac241c44 ; <+300>
0x1ac241ba0 <+136>: ldapr w8, [x20]
0x1ac241ba4 <+140>: b 0x1ac241c74 ; <+348>
0x1ac241ba8 <+144>: tst x20, #0x7
0x1ac241bac <+148>: b.eq 0x1ac241c14 ; <+252>
0x1ac241bb0 <+152>: ubfx x8, x20, #20, #12
0x1ac241bb4 <+156>: eor w8, w8, w20, lsr #4
0x1ac241bb8 <+160>: and x8, x8, #0x3ff
0x1ac241bbc <+164>: adrp x9, 367688
0x1ac241bc0 <+168>: add x9, x9, #0xdd8 ; locks
0x1ac241bc4 <+172>: add x22, x9, x8, lsl #2
0x1ac241bc8 <+176>: mov x0, x22
0x1ac241bcc <+180>: bl 0x1ac2448ac ; symbol stub for: _os_nospin_lock_lock
0x1ac241bd0 <+184>: sxtw x2, w21
0x1ac241bd4 <+188>: mov x0, x19
0x1ac241bd8 <+192>: mov x1, x20
0x1ac241bdc <+196>: bl 0x1ac2448dc ; symbol stub for: memcpy
0x1ac241be0 <+200>: mov x0, x22
0x1ac241be4 <+204>: ldp x29, x30, [sp, #0x20]
0x1ac241be8 <+208>: ldp x20, x19, [sp, #0x10]
0x1ac241bec <+212>: ldp x22, x21, [sp], #0x30
0x1ac241bf0 <+216>: autibsp
0x1ac241bf4 <+220>: eor x16, x30, x30, lsl #1
0x1ac241bf8 <+224>: tbz x16, #0x3e, 0x1ac241c00 ; <+232>
0x1ac241bfc <+228>: brk #0xc471
0x1ac241c00 <+232>: b 0x1ac2448bc ; symbol stub for: _os_nospin_lock_unlock
0x1ac241c04 <+236>: cmp w3, #0x5
0x1ac241c08 <+240>: b.ne 0x1ac241c38 ; <+288>
0x1ac241c0c <+244>: ldarb w8, [x20]
0x1ac241c10 <+248>: b 0x1ac241c3c ; <+292>
0x1ac241c14 <+252>: sub w8, w3, #0x1
0x1ac241c18 <+256>: cmp w8, #0x2
0x1ac241c1c <+260>: b.hs 0x1ac241c54 ; <+316>
0x1ac241c20 <+264>: ldapr x8, [x20]
0x1ac241c24 <+268>: b 0x1ac241c80 ; <+360>
0x1ac241c28 <+272>: cmp w3, #0x5
0x1ac241c2c <+276>: b.ne 0x1ac241c64 ; <+332>
0x1ac241c30 <+280>: ldarh w8, [x20]
0x1ac241c34 <+284>: b 0x1ac241c68 ; <+336>
0x1ac241c38 <+288>: ldrb w8, [x20]
0x1ac241c3c <+292>: strb w8, [x19]
0x1ac241c40 <+296>: b 0x1ac241c84 ; <+364>
0x1ac241c44 <+300>: cmp w3, #0x5
0x1ac241c48 <+304>: b.ne 0x1ac241c70 ; <+344>
0x1ac241c4c <+308>: ldar w8, [x20]
0x1ac241c50 <+312>: b 0x1ac241c74 ; <+348>
0x1ac241c54 <+316>: cmp w3, #0x5
0x1ac241c58 <+320>: b.ne 0x1ac241c7c ; <+356>
0x1ac241c5c <+324>: ldar x8, [x20]
0x1ac241c60 <+328>: b 0x1ac241c80 ; <+360>
0x1ac241c64 <+332>: ldrh w8, [x20]
0x1ac241c68 <+336>: strh w8, [x19]
0x1ac241c6c <+340>: b 0x1ac241c84 ; <+364>
0x1ac241c70 <+344>: ldr w8, [x20]
0x1ac241c74 <+348>: str w8, [x19]
0x1ac241c78 <+352>: b 0x1ac241c84 ; <+364>
0x1ac241c7c <+356>: ldr x8, [x20]
0x1ac241c80 <+360>: str x8, [x19]
0x1ac241c84 <+364>: ldp x29, x30, [sp, #0x20]
0x1ac241c88 <+368>: ldp x20, x19, [sp, #0x10]
0x1ac241c8c <+372>: ldp x22, x21, [sp], #0x30
0x1ac241c90 <+376>: retab Check that out! The implementation is using So - here are my thoughts about a potential fix. I added an interceptor for struct _os_nospin_lock_t;
INTERCEPTOR(void, _os_nospin_lock_lock, _os_nospin_lock_t *lock) {
__rtsan_notify_intercepted_call("_os_nospin_lock_lock");
return REAL(_os_nospin_lock_lock)(lock);
} Let me know what you think. I'm not sure whether this solution is particularly sustainable - what if the Darwin implementation switches to a different locking mechanism in future, for example? I guess that's perhaps just something we have to deal with. Anyway, I'm keen to hear your thoughts. |
This matches my observations and concerns. |
Great. Glad we're on the same page. Concerns-wise, I haven't got any smart ideas at the moment. But I have put together this PR to hopefully get your CI green again: #131034 - let me know what you think, and please invite any other reviewers you feel would be helpful. Hopefully this should unblock your work and we can formulate a plan for making this interception more stable going forwards later? |
Great sleuthing @davidtrevelyan My 2c is that we should intercept this new method for now if that fixes the problem. In the future if the implementation changes again to ANOTHER approach we should cross the bridge of something more durable. If this ends up being the long terms stable approach from MacOS the simple solution is great. Looking at that other PR now. |
No longer necessary to disable this test. |
Follows the discussion here: #129309 Recently, the test `TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been failing on newer MacOS versions, because the internal locking mechanism in `std::atomic<T>::load` (for types `T` that are larger than the hardware lock-free limit), has changed to a function that wasn't being intercepted by rtsan. This PR introduces an interceptor for `_os_nospin_lock_lock`, which is the new internal locking mechanism. _Note: we'd probably do well to introduce interceptors for `_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also appear to have blocking implementations. This can follow in a separate PR._
…131034) Follows the discussion here: llvm/llvm-project#129309 Recently, the test `TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been failing on newer MacOS versions, because the internal locking mechanism in `std::atomic<T>::load` (for types `T` that are larger than the hardware lock-free limit), has changed to a function that wasn't being intercepted by rtsan. This PR introduces an interceptor for `_os_nospin_lock_lock`, which is the new internal locking mechanism. _Note: we'd probably do well to introduce interceptors for `_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also appear to have blocking implementations. This can follow in a separate PR._
Follows the discussion here: llvm#129309 Recently, the test `TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been failing on newer MacOS versions, because the internal locking mechanism in `std::atomic<T>::load` (for types `T` that are larger than the hardware lock-free limit), has changed to a function that wasn't being intercepted by rtsan. This PR introduces an interceptor for `_os_nospin_lock_lock`, which is the new internal locking mechanism. _Note: we'd probably do well to introduce interceptors for `_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also appear to have blocking implementations. This can follow in a separate PR._
Follows the discussion here: llvm#129309 Recently, the test `TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been failing on newer MacOS versions, because the internal locking mechanism in `std::atomic<T>::load` (for types `T` that are larger than the hardware lock-free limit), has changed to a function that wasn't being intercepted by rtsan. This PR introduces an interceptor for `_os_nospin_lock_lock`, which is the new internal locking mechanism. _Note: we'd probably do well to introduce interceptors for `_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also appear to have blocking implementations. This can follow in a separate PR._ (cherry picked from commit 481a55a)
Follows the discussion here: llvm#129309 Recently, the test `TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been failing on newer MacOS versions, because the internal locking mechanism in `std::atomic<T>::load` (for types `T` that are larger than the hardware lock-free limit), has changed to a function that wasn't being intercepted by rtsan. This PR introduces an interceptor for `_os_nospin_lock_lock`, which is the new internal locking mechanism. _Note: we'd probably do well to introduce interceptors for `_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also appear to have blocking implementations. This can follow in a separate PR._ (cherry picked from commit 481a55a)
…131034) Follows the discussion here: llvm/llvm-project#129309 Recently, the test `TestRtsan.AccessingALargeAtomicVariableDiesWhenRealtime` has been failing on newer MacOS versions, because the internal locking mechanism in `std::atomic<T>::load` (for types `T` that are larger than the hardware lock-free limit), has changed to a function that wasn't being intercepted by rtsan. This PR introduces an interceptor for `_os_nospin_lock_lock`, which is the new internal locking mechanism. _Note: we'd probably do well to introduce interceptors for `_os_nospin_lock_unlock` (and `os_unfair_lock_unlock`) too, which also appear to have blocking implementations. This can follow in a separate PR._ (cherry picked from commit 481a55a)
…on Apple devices
Disabling this test since it is failing Apple CI jobs:
https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/3694/testReport/RealtimeSanitizer-Unit/__Rtsan-x86_64h-Test_TestRtsan/AccessingALargeAtomicVariableDiesWhenRealtime/
rdar://145488759