-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Disable ThreadPlanSingleThreadTimeout during step over breakpoint #104532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable ThreadPlanSingleThreadTimeout during step over breakpoint #104532
Conversation
@llvm/pr-subscribers-lldb Author: None (jeffreytan81) ChangesThis PR fixes another potential race condition in #90930. The failure is found by @labath with this log: https://paste.debian.net/hidden/30235a5c/. The investigation shows the failure happens when timeout is short and async interrupt is sent to lldb-server immediately after vCont so ptrace() resumes and then async interrupts debuggee immediately so debuggee does not get a chance to execute and move PC. So it enters stop mode immediately at original PC. To fix this, the PR prevents Full diff: https://github.com/llvm/llvm-project/pull/104532.diff 4 Files Affected:
diff --git a/lldb/include/lldb/Target/ThreadPlan.h b/lldb/include/lldb/Target/ThreadPlan.h
index c336b6bb37df1b..d6da484f4fc137 100644
--- a/lldb/include/lldb/Target/ThreadPlan.h
+++ b/lldb/include/lldb/Target/ThreadPlan.h
@@ -385,7 +385,13 @@ class ThreadPlan : public std::enable_shared_from_this<ThreadPlan>,
virtual void SetStopOthers(bool new_value);
virtual bool StopOthers();
-
+
+ // Returns true if the thread plan supports ThreadPlanSingleThreadTimeout to
+ // resume other threads after timeout. If the thread plan returns false it
+ // will prevent ThreadPlanSingleThreadTimeout from being created when this
+ // thread plan is alive.
+ virtual bool SupportsResumeOthers() { return true; }
+
virtual bool ShouldRunBeforePublicStop() { return false; }
// This is the wrapper for DoWillResume that does generic ThreadPlan logic,
diff --git a/lldb/include/lldb/Target/ThreadPlanStepOverBreakpoint.h b/lldb/include/lldb/Target/ThreadPlanStepOverBreakpoint.h
index 1f3aff45c49abe..0da8dbf44ffd8a 100644
--- a/lldb/include/lldb/Target/ThreadPlanStepOverBreakpoint.h
+++ b/lldb/include/lldb/Target/ThreadPlanStepOverBreakpoint.h
@@ -23,6 +23,7 @@ class ThreadPlanStepOverBreakpoint : public ThreadPlan {
void GetDescription(Stream *s, lldb::DescriptionLevel level) override;
bool ValidatePlan(Stream *error) override;
bool ShouldStop(Event *event_ptr) override;
+ bool SupportsResumeOthers() override;
bool StopOthers() override;
lldb::StateType GetPlanRunState() override;
bool WillStop() override;
diff --git a/lldb/source/Target/ThreadPlanSingleThreadTimeout.cpp b/lldb/source/Target/ThreadPlanSingleThreadTimeout.cpp
index 806ba95c508b7c..71be81365a2668 100644
--- a/lldb/source/Target/ThreadPlanSingleThreadTimeout.cpp
+++ b/lldb/source/Target/ThreadPlanSingleThreadTimeout.cpp
@@ -76,6 +76,9 @@ void ThreadPlanSingleThreadTimeout::PushNewWithTimeout(Thread &thread,
if (!thread.GetCurrentPlan()->StopOthers())
return;
+ if (!thread.GetCurrentPlan()->SupportsResumeOthers())
+ return;
+
auto timeout_plan = new ThreadPlanSingleThreadTimeout(thread, info);
ThreadPlanSP thread_plan_sp(timeout_plan);
auto status = thread.QueueThreadPlan(thread_plan_sp,
@@ -102,6 +105,9 @@ void ThreadPlanSingleThreadTimeout::ResumeFromPrevState(Thread &thread,
if (!thread.GetCurrentPlan()->StopOthers())
return;
+ if (!thread.GetCurrentPlan()->SupportsResumeOthers())
+ return;
+
auto timeout_plan = new ThreadPlanSingleThreadTimeout(thread, info);
ThreadPlanSP thread_plan_sp(timeout_plan);
auto status = thread.QueueThreadPlan(thread_plan_sp,
diff --git a/lldb/source/Target/ThreadPlanStepOverBreakpoint.cpp b/lldb/source/Target/ThreadPlanStepOverBreakpoint.cpp
index f88a2b895931cd..97c27ad4cd0493 100644
--- a/lldb/source/Target/ThreadPlanStepOverBreakpoint.cpp
+++ b/lldb/source/Target/ThreadPlanStepOverBreakpoint.cpp
@@ -103,6 +103,12 @@ bool ThreadPlanStepOverBreakpoint::ShouldStop(Event *event_ptr) {
bool ThreadPlanStepOverBreakpoint::StopOthers() { return true; }
+// The ThreadPlanSingleThreadTimeout can interrupt and resume all threads during
+// stepping, which may cause them to miss breakpoint. Therefore, we should
+// prevent the creation of ThreadPlanSingleThreadTimeout during a step-over
+// breakpoint.
+bool ThreadPlanStepOverBreakpoint::SupportsResumeOthers() { return false; }
+
StateType ThreadPlanStepOverBreakpoint::GetPlanRunState() {
return eStateStepping;
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty clean, but we need to get Jim's opinion in case there is a better way to do this that he would prefer.
// The ThreadPlanSingleThreadTimeout can interrupt and resume all threads during | ||
// stepping, which may cause them to miss breakpoint. Therefore, we should | ||
// prevent the creation of ThreadPlanSingleThreadTimeout during a step-over | ||
// breakpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a bit more clear to say something like:
// This thread plan does a single instruction step over a breakpoint instruction and needs
// to not resume other threads, so return false to stop the ThreadPlanSingleThreadTimeout
// from timing out and trying to resume all threads. If all threads gets resumed before we
// disable, single step and re-enable the breakpoint, we can miss breakpoints on other
// threads.
Thanks for looking into this. I'll also defer to Jim, but I'll note two things:
|
If there are a few (< 10 maybe) specific lines from the log that are the key ones, please include those in the PR description. Just because there's a slim chance someone in the future might have their own strange timeouts and it would be a useful clue. If the specific packets are not useful to see then no worries. |
Yes, I discussed this concern with @clayborg before writing it. However, this is not a new issue—the default 'step over' (next command) uses ThreadPlanStepOverBreakpoint to trace a single instruction across a breakpoint without resuming other threads, so it could, in theory, encounter the syscall deadlock as well. @jimingham, let me know if this PR makes sense to you. |
This seems an okay solution for now. We really don't want to miss breakpoint hits if we can at all avoid it. This isn't a regression, it's just one case where the proposed enhancement to stepping doesn't enhance stepping. We haven't had many (any?) reports of single stepping over a breakpoint blocking because we're running only one thread. Mostly that's because the actual instructions that might block are in system libraries that people tend to Note, there's no inherent problem with the instruction we're stepping over not returning, it's only an issue if WE cause it to not return. By "block" we really mean "artificially block because the thread that should have caused the return was suspended by lldb". So this would have to be something like a read that was expecting another thread in the program to be the writer, which also limits the scope of the problem. In any case, if we are going to implement true "non-stop" debugging we will need to avoid the "remove and replace instructions" dance and instead leave the traps always in place. Since we are contractually obligated not to stop the other threads from the outside, that's the only way we can avoid missing breakpoint hits. We'll have to come up with some scheme to execute the instruction that was under the breakpoint out of place, so that we never have to remove the traps. At that point, ThreadPlanStepOverBreakpoint will be fine with running the other threads, and this problem will go away. |
@jimingham, right, I am pretty sure that Microsoft Visual Studio debugger implements step-over breakpoint the same as lldb so it is a universal standard behavior and I haven't heard anyone really complaining about this. Based on this, if there is no further concern, anyone can accept PR so that we can address the issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR fixes another race condition in #90930. The failure was found by @labath with this log: https://paste.debian.net/hidden/30235a5c/:
It shows an async interrupt "\x03" was sent immediately after
vCont;s
single step over breakpoint at address0x224505
(which was disabled before vCont). And the later stop was still at the original PC (0x224505) not moving forward.The investigation shows the failure happens when timeout is short and async interrupt is sent to lldb-server immediately after vCont so ptrace() resumes and then async interrupts debuggee immediately so debuggee does not get a chance to execute and move PC. So it enters stop mode immediately at original PC.
ThreadPlanStepOverBreakpoint
does not expect PC not moving and reports stop at the original place.To fix this, the PR prevents
ThreadPlanSingleThreadTimeout
from being created duringThreadPlanStepOverBreakpoint
by introduces a newSupportsResumeOthers()
method andThreadPlanStepOverBreakpoint
returns false for it. This makes sense because we should never resume threads during step over breakpoint anyway otherwise it might cause other threads to miss breakpoint.