-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb][test] Don't call SBDebugger::Terminate if TestMultipleDebuggers times out #143732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s times out Fixes llvm#101162 This test did this: * SBDebugger::Initialize * Spawn a bunch of threads that do * SBDebugger::Create * some work * SBDebugger::Destroy * Wait on those threads to finish then call SBDebugger::Terminate, or - * Reach a time limit before all the threads finish, call SBDebugger::Terminate and exit. The problem was that in the timeout case, calling SBDebugger::Terminate destroys data being used by threads that are still running. This test was expecting said threads to be so broken they were probably stuck, but when the machine is just heavily loaded, one of them might read that data before the whole program can exit. This means what should have been a timeout is now a crash. Sometimes. Which explains why we saw both timeouts and various signals on the AArch64 Linux bot. It depends on the timings. So I'm changing it not to call SBDebugger::Terminate in the timeout case. We will have to tweak the timeout value based on what happens on the buildbot.
@llvm/pr-subscribers-lldb Author: David Spickett (DavidSpickett) ChangesFixes #101162 This test did this:
The problem was that in the timeout case, calling SBDebugger::Terminate destroys data being used by threads that are still running. I expect this test was expecting said threads to be so broken they were probably stuck, but when the machine is just heavily loaded, one of them might read that data before the whole program exits. This means what should have been a timeout becomes a crash. Sometimes. Which explains why we saw both timeouts and various signals on the AArch64 Linux bot. It depends on the timings. So I'm changing it not to call SBDebugger::Terminate in the timeout case. We will have to tweak the timeout value based on what happens on the buildbot, but we will know it's machine load not an lldb bug. Full diff: https://github.com/llvm/llvm-project/pull/143732.diff 2 Files Affected:
diff --git a/lldb/test/API/api/multiple-debuggers/TestMultipleDebuggers.py b/lldb/test/API/api/multiple-debuggers/TestMultipleDebuggers.py
index 1fd4806cd74f4..f0a3893f53aab 100644
--- a/lldb/test/API/api/multiple-debuggers/TestMultipleDebuggers.py
+++ b/lldb/test/API/api/multiple-debuggers/TestMultipleDebuggers.py
@@ -12,8 +12,6 @@
class TestMultipleSimultaneousDebuggers(TestBase):
NO_DEBUG_INFO_TESTCASE = True
- # Sometimes times out on Linux, see https://github.com/llvm/llvm-project/issues/101162.
- @skipIfLinux
@skipIfNoSBHeaders
@skipIfWindows
@skipIfHostIncompatibleWithTarget
diff --git a/lldb/test/API/api/multiple-debuggers/multi-process-driver.cpp b/lldb/test/API/api/multiple-debuggers/multi-process-driver.cpp
index 64728fb7c29a1..fcec9bae0ed9c 100644
--- a/lldb/test/API/api/multiple-debuggers/multi-process-driver.cpp
+++ b/lldb/test/API/api/multiple-debuggers/multi-process-driver.cpp
@@ -296,6 +296,8 @@ int main (int argc, char **argv)
NUMBER_OF_SIMULTANEOUS_DEBUG_SESSIONS);
}
- SBDebugger::Terminate();
+ // We do not call SBDebugger::Terminate() here because it will destroy
+ // data that might be being used by threads that are still running. Which
+ // would change the timeout into an unrelated crash.
exit (1);
}
|
Side note: this test doesn't actually care if all the threads are successful in what they do, but it's been that way forever and I don't want to get any more involved than I have to :) |
This makes the assumption that child threads are torn down before their parent, which might be a faulty one. I looked for a way to cancel std::threads but it seems like you have to get a native handle to it and use the specific OS APIs, which I'd rather not complicate the test with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit() will also run some cleanup functions and could cause things to crash. You could one-up it to _exit(), but ultimately, there's no way to guarantee that misbehaving code will umm... behave in a certain way.
Then what I'm doing is narrowing the window in which crashes could happen, but not closing it. _exit seems more safe for the timeout situation. If there's a genuine bug then yes anything could happen but at least we'd be rewarded for looking into it when it did. If there's still flakiness, I'll declare this impossible to get 100% right and disable it again. I did think I could send a signal from the main thread, but other threads continue running so that doesn't work. Unless it's SIGSTOP, but you can't catch that, and I don't know what return code we'd end up with. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/19295 Here is the relevant piece of the build log for the reference
|
Timed out on the very first build 🤣 On second thought this test is better disabled. It's more correct than it was, but it's never going to be truly stable. |
…s times out (llvm#143732) Fixes llvm#101162 This test did this: * SBDebugger::Initialize * Spawn a bunch of threads that do: * SBDebugger::Create * some work * SBDebugger::Destroy * Wait on those threads to finish then call SBDebugger::Terminate and exit, or - * Reach a time limit before all the threads finish, call SBDebugger::Terminate and exit. The problem was that in the timeout case, calling SBDebugger::Terminate destroys data being used by threads that are still running. I expect this test was expecting said threads to be so broken they were probably stuck, but when the machine is just heavily loaded, one of them might read that data before the whole program exits. This means what should have been a timeout becomes a crash. Sometimes. Which explains why we saw both timeouts and various signals on the AArch64 Linux bot. It depends on the timings. So I'm changing it not to call SBDebugger::Terminate in the timeout case. We will have to tweak the timeout value based on what happens on the buildbot, but we will know it's machine load not an lldb bug. Also use _exit instead of exit, to skip more cleanup that might cause a crash.
…s times out (llvm#143732) Fixes llvm#101162 This test did this: * SBDebugger::Initialize * Spawn a bunch of threads that do: * SBDebugger::Create * some work * SBDebugger::Destroy * Wait on those threads to finish then call SBDebugger::Terminate and exit, or - * Reach a time limit before all the threads finish, call SBDebugger::Terminate and exit. The problem was that in the timeout case, calling SBDebugger::Terminate destroys data being used by threads that are still running. I expect this test was expecting said threads to be so broken they were probably stuck, but when the machine is just heavily loaded, one of them might read that data before the whole program exits. This means what should have been a timeout becomes a crash. Sometimes. Which explains why we saw both timeouts and various signals on the AArch64 Linux bot. It depends on the timings. So I'm changing it not to call SBDebugger::Terminate in the timeout case. We will have to tweak the timeout value based on what happens on the buildbot, but we will know it's machine load not an lldb bug. Also use _exit instead of exit, to skip more cleanup that might cause a crash.
Fixes #101162
This test did this:
The problem was that in the timeout case, calling SBDebugger::Terminate destroys data being used by threads that are still running. I expect this test was expecting said threads to be so broken they were probably stuck, but when the machine is just heavily loaded, one of them might read that data before the whole program exits.
This means what should have been a timeout becomes a crash. Sometimes. Which explains why we saw both timeouts and various signals on the AArch64 Linux bot. It depends on the timings.
So I'm changing it not to call SBDebugger::Terminate in the timeout case. We will have to tweak the timeout value based on what happens on the buildbot, but we will know it's machine load not an lldb bug.
Also use _exit instead of exit, to skip more cleanup that might cause a crash.