Skip to content

[tsan] Fix nested signal handling #138599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 22, 2025

Conversation

jwidauer
Copy link
Contributor

@jwidauer jwidauer commented May 5, 2025

This PR fixes the bug reported in #134358.

In the current implementation of the tsan posix interceptors, the signal set does not get restored to the correct original set, if a signal handler gets called, while already inside of a signal handler. This leads to the wrong signal set being set for the thread in which the signal handler was called.

To fix this I introduced a stack of __sanitizer_sigset_t to keep all the correct old signal sets and restore them in the correct order.

There was also already an existing test that tested nested / recursive signal handlers, but it was disabled.
I therefore reenabled it, made it more robust by waiting for the second thread to have been properly started and added checks for the signal sets.
This test then failed before the introduction of the interceptor fix and didn't fail with the fix.

@dvyukov What are your thoughts?

Copy link

github-actions bot commented May 5, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented May 5, 2025

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Jakob Widauer (jwidauer)

Changes

This PR fixes the bug reported in #134358.

In the current implementation of the tsan posix interceptors, the signal set does not get restored to the correct original set, if a signal handler gets called, while already inside of a signal handler. This leads to the wrong signal set being set for the thread in which the signal handler was called.

To fix this I introduced a stack of __sanitizer_sigset_t to keep all the correct old signal sets and restore them in the correct order.

There was also already an existing test that tested nested / recursive signal handlers, but it was disabled.
I therefore reenabled it, made it more robust by waiting for the second thread to have been properly started and added checks for the signal sets.
This test then failed before the introduction of the interceptor fix and didn't fail with the fix.

@dvyukov What are your thoughts?


Full diff: https://github.com/llvm/llvm-project/pull/138599.diff

2 Files Affected:

  • (modified) compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp (+13-9)
  • (modified) compiler-rt/test/tsan/signal_recursive.cpp (+62-19)
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index 6d79b80593379..7b91c54e91007 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -12,6 +12,9 @@
 // sanitizer_common/sanitizer_common_interceptors.inc
 //===----------------------------------------------------------------------===//
 
+#include <stdarg.h>
+
+#include "interception/interception.h"
 #include "sanitizer_common/sanitizer_allocator_dlsym.h"
 #include "sanitizer_common/sanitizer_atomic.h"
 #include "sanitizer_common/sanitizer_errno.h"
@@ -24,16 +27,14 @@
 #include "sanitizer_common/sanitizer_posix.h"
 #include "sanitizer_common/sanitizer_stacktrace.h"
 #include "sanitizer_common/sanitizer_tls_get_addr.h"
-#include "interception/interception.h"
+#include "sanitizer_common/sanitizer_vector.h"
+#include "tsan_fd.h"
 #include "tsan_interceptors.h"
 #include "tsan_interface.h"
+#include "tsan_mman.h"
 #include "tsan_platform.h"
-#include "tsan_suppressions.h"
 #include "tsan_rtl.h"
-#include "tsan_mman.h"
-#include "tsan_fd.h"
-
-#include <stdarg.h>
+#include "tsan_suppressions.h"
 
 using namespace __tsan;
 
@@ -177,7 +178,7 @@ struct ThreadSignalContext {
   SignalDesc pending_signals[kSigCount];
   // emptyset and oldset are too big for stack.
   __sanitizer_sigset_t emptyset;
-  __sanitizer_sigset_t oldset;
+  __sanitizer::Vector<__sanitizer_sigset_t> oldset;
 };
 
 void EnterBlockingFunc(ThreadState *thr) {
@@ -980,6 +981,7 @@ void PlatformCleanUpThreadState(ThreadState *thr) {
       &thr->signal_ctx, memory_order_relaxed);
   if (sctx) {
     atomic_store(&thr->signal_ctx, 0, memory_order_relaxed);
+    sctx->oldset.Reset();
     UnmapOrDie(sctx, sizeof(*sctx));
   }
 }
@@ -2176,7 +2178,8 @@ void ProcessPendingSignalsImpl(ThreadState *thr) {
     return;
   atomic_fetch_add(&thr->in_signal_handler, 1, memory_order_relaxed);
   internal_sigfillset(&sctx->emptyset);
-  int res = REAL(pthread_sigmask)(SIG_SETMASK, &sctx->emptyset, &sctx->oldset);
+  __sanitizer_sigset_t *oldset = sctx->oldset.PushBack();
+  int res = REAL(pthread_sigmask)(SIG_SETMASK, &sctx->emptyset, oldset);
   CHECK_EQ(res, 0);
   for (int sig = 0; sig < kSigCount; sig++) {
     SignalDesc *signal = &sctx->pending_signals[sig];
@@ -2186,8 +2189,9 @@ void ProcessPendingSignalsImpl(ThreadState *thr) {
                             &signal->ctx);
     }
   }
-  res = REAL(pthread_sigmask)(SIG_SETMASK, &sctx->oldset, 0);
+  res = REAL(pthread_sigmask)(SIG_SETMASK, oldset, 0);
   CHECK_EQ(res, 0);
+  sctx->oldset.PopBack();
   atomic_fetch_add(&thr->in_signal_handler, -1, memory_order_relaxed);
 }
 
diff --git a/compiler-rt/test/tsan/signal_recursive.cpp b/compiler-rt/test/tsan/signal_recursive.cpp
index 40be2d01502b6..fca8757d2a952 100644
--- a/compiler-rt/test/tsan/signal_recursive.cpp
+++ b/compiler-rt/test/tsan/signal_recursive.cpp
@@ -3,8 +3,6 @@
 // Test case for recursive signal handlers, adopted from:
 // https://github.com/google/sanitizers/issues/478
 
-// REQUIRES: disabled
-
 #include "test.h"
 #include <semaphore.h>
 #include <signal.h>
@@ -15,8 +13,6 @@ static const int kSigRestart = SIGUSR2;
 
 static sem_t g_thread_suspend_ack_sem;
 
-static bool g_busy_thread_received_restart;
-
 static volatile bool g_busy_thread_garbage_collected;
 
 static void SaveRegistersInStack() {
@@ -30,22 +26,51 @@ static void fail(const char *what) {
   exit(1);
 }
 
+static void CheckSigBlocked(const sigset_t &oldset, const sigset_t &newset,
+                            int sig) {
+  const int is_old_member = sigismember(&oldset, sig);
+  const int is_new_member = sigismember(&newset, sig);
+
+  if (is_old_member == -1 || is_new_member == -1)
+    fail("sigismember failed");
+
+  if (is_old_member != is_new_member)
+    fail("restoring signals failed");
+}
+
+sigset_t GetCurrentSigSet() {
+  sigset_t set;
+  if (sigemptyset(&set) != 0)
+    fail("sigemptyset failed");
+
+  if (pthread_sigmask(SIG_BLOCK, NULL, &set) != 0)
+    fail("pthread_sigmask failed");
+
+  return set;
+}
+
 static void SuspendHandler(int sig) {
   int old_errno = errno;
   SaveRegistersInStack();
 
   // Enable kSigRestart handling, tsan disables signals around signal handlers.
-  sigset_t sigset;
-  sigemptyset(&sigset);
-  pthread_sigmask(SIG_SETMASK, &sigset, 0);
+  const auto oldset = GetCurrentSigSet();
 
   // Acknowledge that thread is saved and suspended
   if (sem_post(&g_thread_suspend_ack_sem) != 0)
     fail("sem_post failed");
 
   // Wait for wakeup signal.
-  while (!g_busy_thread_received_restart)
-    usleep(100);  // wait for kSigRestart signal
+  sigset_t sigset;
+  sigemptyset(&sigset);
+  if (sigsuspend(&sigset) != 0 && errno != EINTR)
+    fail("sigsuspend failed");
+
+  const auto newset = GetCurrentSigSet();
+
+  // Check that the same signals are blocked as before
+  CheckSigBlocked(oldset, newset, kSigSuspend);
+  CheckSigBlocked(oldset, newset, kSigRestart);
 
   // Acknowledge that thread restarted
   if (sem_post(&g_thread_suspend_ack_sem) != 0)
@@ -56,31 +81,33 @@ static void SuspendHandler(int sig) {
   errno = old_errno;
 }
 
-static void RestartHandler(int sig) {
-  g_busy_thread_received_restart = true;
+static void RestartHandler(int sig) {}
+
+static void WaitSem() {
+  while (sem_wait(&g_thread_suspend_ack_sem) != 0) {
+    if (errno != EINTR)
+      fail("sem_wait failed");
+  }
 }
 
 static void StopWorld(pthread_t thread) {
   if (pthread_kill(thread, kSigSuspend) != 0)
     fail("pthread_kill failed");
 
-  while (sem_wait(&g_thread_suspend_ack_sem) != 0) {
-    if (errno != EINTR)
-      fail("sem_wait failed");
-  }
+  WaitSem();
 }
 
 static void StartWorld(pthread_t thread) {
   if (pthread_kill(thread, kSigRestart) != 0)
     fail("pthread_kill failed");
 
-  while (sem_wait(&g_thread_suspend_ack_sem) != 0) {
-    if (errno != EINTR)
-      fail("sem_wait failed");
-  }
+  WaitSem();
 }
 
 static void CollectGarbage(pthread_t thread) {
+  // Wait for the thread to start
+  WaitSem();
+
   StopWorld(thread);
   // Walk stacks
   StartWorld(thread);
@@ -102,21 +129,37 @@ static void Init() {
 
 void* BusyThread(void *arg) {
   (void)arg;
+  const auto oldset = GetCurrentSigSet();
+
+  if (sem_post(&g_thread_suspend_ack_sem) != 0)
+    fail("sem_post failed");
+
   while (!g_busy_thread_garbage_collected) {
     usleep(100); // Tsan deadlocks without these sleeps
   }
+
+  const auto newset = GetCurrentSigSet();
+
+  // Check that we have the same signals blocked as before
+  CheckSigBlocked(oldset, newset, kSigSuspend);
+  CheckSigBlocked(oldset, newset, kSigRestart);
+
   return NULL;
 }
 
 int main(int argc, const char *argv[]) {
   Init();
+
   pthread_t busy_thread;
   if (pthread_create(&busy_thread, NULL, &BusyThread, NULL) != 0)
     fail("pthread_create failed");
+
   CollectGarbage(busy_thread);
   if (pthread_join(busy_thread, 0) != 0)
     fail("pthread_join failed");
+
   fprintf(stderr, "DONE\n");
+
   return 0;
 }
 

@dvyukov
Copy link
Collaborator

dvyukov commented May 6, 2025

I agree we messed signal mask on recursive signals.
This code is tricky and fragile, I always afraid of touching it. But I don't see anything obviously wrong with this change. It probably should be OK to call internal allocator at that point.

I think LongJmp function should also pop from the vector to restore the original size captured in SetJmp.

Please also run all signal-related tests multiple times.

@jwidauer
Copy link
Contributor Author

@dvyukov Thanks for your quick review!

I added a commit to store and restore the old signal set stack in the LongJmp and SetJmp methods. Is this similar to what you had in mind? What do you think?

I also ran all the tsan tests multiple times and didn't get any failures.

@jwidauer
Copy link
Contributor Author

@dvyukov Thanks for your quick approval!
Please let me know if there's anything else that would be required from my side!
Since I can't merge PRs, I believe I'd require you to do that for me.

@jwidauer
Copy link
Contributor Author

@dvyukov Any update on when this might get merged?

@dvyukov
Copy link
Collaborator

dvyukov commented May 22, 2025

Oh, sorry, this got off my radar.
@vitalybuka can you please merge this, or ask somebody to merge? I do it so rarely lately that forgot all procedures.

@thurstond thurstond merged commit 1aa746d into llvm:main May 22, 2025
10 checks passed
Copy link

@jwidauer Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@cachemeifyoucan
Copy link
Collaborator

Test breaks Darwin Bots on iOSSim simulator: https://green.lab.llvm.org/job/llvm.org/job/clang-san-iossim/9036/testReport/ThreadSanitizer-x86_64-iossim/ThreadSanitizer-x86_64-iossim/signal_recursive_cpp/

I think the reason is that sem_init is deprecated on Darwin systems. I will mark the test as UNSUPPORTED for now.

@cachemeifyoucan
Copy link
Collaborator

PR for disabling on Darwin: #141135

@jwidauer jwidauer deleted the enable-nested-signals-for-tsan branch May 22, 2025 22:17
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
This PR fixes the bug reported in llvm#134358.

In the current implementation of the tsan posix interceptors, the signal
set does not get restored to the correct original set, if a signal
handler gets called, while already inside of a signal handler. This
leads to the wrong signal set being set for the thread in which the
signal handler was called.

To fix this I introduced a stack of `__sanitizer_sigset_t` to keep all
the correct old signal sets and restore them in the correct order.

There was also already an existing test that tested nested / recursive
signal handlers, but it was disabled.
I therefore reenabled it, made it more robust by waiting for the second
thread to have been properly started and added checks for the signal
sets.
This test then failed before the introduction of the interceptor fix and
didn't fail with the fix.

@dvyukov What are your thoughts?
ajaden-codes pushed a commit to Jaddyen/llvm-project that referenced this pull request Jun 6, 2025
This PR fixes the bug reported in llvm#134358.

In the current implementation of the tsan posix interceptors, the signal
set does not get restored to the correct original set, if a signal
handler gets called, while already inside of a signal handler. This
leads to the wrong signal set being set for the thread in which the
signal handler was called.

To fix this I introduced a stack of `__sanitizer_sigset_t` to keep all
the correct old signal sets and restore them in the correct order.

There was also already an existing test that tested nested / recursive
signal handlers, but it was disabled.
I therefore reenabled it, made it more robust by waiting for the second
thread to have been properly started and added checks for the signal
sets.
This test then failed before the introduction of the interceptor fix and
didn't fail with the fix.

@dvyukov What are your thoughts?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants