Skip to content

[lldb] Add pc check for thread-step-by-bp algorithms #108504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jasonmolenda
Copy link
Collaborator

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints.

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp
can use breakpoints to implement instruction stepping on cores where
there is no native instruction-step primitive.  Currently these set a
breakpoint, continue, and if we hit the breakpoint with the original
thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change
its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint
hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past
the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to
breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes,
we silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means
that if we are sitting on a breakpoint that has not yet executed,
and instruction-step the thread, we will execute the breakpoint
instruction at $pc (instead of $next-pc where it meant to go), and
stop again -- at the same pc value.  Then we will rewrite the stop
reason to 'trace'.  The higher level logic will see that we haven't
hit the breakpoint instruction again, so it will try to instruction
step again, hitting the breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are
instruction-stepping-by-breakpoint AND that we've stopped at the
breakpoint address we are stepping to.  Only in that case will the
stop reason be rewritten to "trace" hiding the implementation detail
that the step was done by breakpoints.
@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2024

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints.


Full diff: https://github.com/llvm/llvm-project/pull/108504.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp (+4-1)
  • (modified) lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp (+5-2)
diff --git a/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp b/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
index 97fff4b9f65a82..80b27571f43d55 100644
--- a/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
+++ b/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
@@ -319,9 +319,12 @@ void NativeProcessFreeBSD::MonitorSIGTRAP(lldb::pid_t pid) {
                info.pl_siginfo.si_addr);
 
       if (thread) {
+        auto &regctx = static_cast<NativeRegisterContextFreeBSD &>(
+            thread->GetRegisterContext());
         auto thread_info =
             m_threads_stepping_with_breakpoint.find(thread->GetID());
-        if (thread_info != m_threads_stepping_with_breakpoint.end()) {
+        if (thread_info != m_threads_stepping_with_breakpoint.end() &&
+            threads_info->second == regctx.GetPC()) {
           thread->SetStoppedByTrace();
           Status brkpt_error = RemoveBreakpoint(thread_info->second);
           if (brkpt_error.Fail())
diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index 5c262db8db7fde..38b7092682873b 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -829,8 +829,11 @@ void NativeProcessLinux::MonitorBreakpoint(NativeThreadLinux &thread) {
   thread.SetStoppedByBreakpoint();
   FixupBreakpointPCAsNeeded(thread);
 
-  if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
-      m_threads_stepping_with_breakpoint.end())
+  NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
+  auto stepping_with_bp_it =
+      m_threads_stepping_with_breakpoint.find(thread.GetID());
+  if (stepping_with_bp_it != m_threads_stepping_with_breakpoint.end() &&
+      stepping_with_bp_it->second == reg_ctx.GetPC())
     thread.SetStoppedByTrace();
 
   StopRunningThreads(thread.GetID());

@jasonmolenda
Copy link
Collaborator Author

This is addressing the discussion about the CI failures (and David's repoing of them) from when I tried to land #96260 .

I had this as a commit for the PR where I'm addressing the CI fails so I can re-land ( #105594 ) but this one has no dependance on my change and I'd like to land it separately, like I already did with the Dexter fix to work with current lldb or my new lldb stepping behavior.

These plugins are only used for targets without an instruction step capability and I don't have access to any of those so I haven't tested it -- but I think it's straightforward enough that landing it and seeing the CI bots react is fine.

jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Sep 13, 2024
Landing this in a separate PR
llvm#108504

This reverts commit 48a0fc1.
@DavidSpickett
Copy link
Collaborator

"if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed"

What does it look like for a thread to be stopped at a breakpoint site but not to have executed the breakpoint?

Is this just a timing thing, if one thread executes the breakpoint and that causes all others to stop, and one of those others happens to be about to execute the same breakpoint?

T0:
  nop
  brk << $pc, hit the breakpoint and generated the debug exception

T1:
  nop
  brk << also $pc but didn't execute it yet

@labath
Copy link
Collaborator

labath commented Sep 13, 2024

It can happen through timing, but also in single-threaded code as well. A thread could reach-but-not execute a breakpoint insn due to some activity on the previous insn. E.g. it could be single-stepping over it (so the stop reason is "i've completed a step", not "I've hit a breakpoint"), or (on x86, where watchpoint hits are reported after the fact), the previous insn could have triggered a watchpoint.

None of this really matters for this patch though. The problem here was that the lldb-server implementation has assuming that the stop reason of a software-single-stepping thread must be "i've completed a step" , but that won't be the case if the instruction you are stepping over is actually a breakpoint insn -- in that case, the reason will be "i've hit a breakpoint (other than the temporary bkpt i've set to single-step)". It just so happens that without that Jason's patch, lldb would never actually try to step over a breakpoint insn it knew of, so we'd never hit this problem.

@DavidSpickett
Copy link
Collaborator

Thanks I understand now.

@jasonmolenda
Copy link
Collaborator Author

Yeah, the problem with our current algorithm is that you lose the actual stop reason, and sometimes that's important. For instance, if you have a store instruction that modifies a watchpointed memory, and you have a user breakpoint on the next instruction, on AArch64, the store triggers the watchpoint. lldb reads the old value, instruction steps, reads the new value and then tells the user. But now it's sitting on a breakpoint site so that overwrites the watchpoint stop reason and we never tell the user about that - we tell it we've hit the breakpoint. gdb behaves the same way.

The user-visible down side to this new approach was seen with Dexter debuginfo engine, where it sets a breakpoint on every source line in a program, and then does step-in across the source lines. In this case, the step-in executes to the first instruction of the next source line (where there is a breakpoint). So we have a "step-in completed" stop reason. Then Dexter step-in's again, and now it executes the breakpoint instruction and stops, without having advanced the pc, with a "breakpoint hit" stop reason. then doing a step-in will advance to the next source line.

Users will surely experience this (with only one breakpoint in code they're stepping over), and I'm sure they will scratch their heads about why they needed to Step twice to get past a breakpoint (if they notice at all), but that's the only bad fallout from this change to the algorithm.

@jasonmolenda jasonmolenda merged commit 213c59d into llvm:main Sep 13, 2024
9 checks passed
@jasonmolenda jasonmolenda deleted the trace-by-breakpoints-enforce-pc-check branch September 13, 2024 16:02
DavidSpickett pushed a commit that referenced this pull request Sep 23, 2024
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Feb 13, 2025
lldb-server built with NativeProcessLinux.cpp and
NativeProcessFreeBSD.cpp can use breakpoints to implement instruction
stepping on cores where there is no native instruction-step primitive.
Currently these set a breakpoint, continue, and if we hit the breakpoint
with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change
its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint
hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past
the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to
breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we
silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that
if we are sitting on a breakpoint that has not yet executed, and
instruction-step the thread, we will execute the breakpoint instruction
at $pc (instead of $next-pc where it meant to go), and stop again -- at
the same pc value. Then we will rewrite the stop reason to 'trace'. The
higher level logic will see that we haven't hit the breakpoint
instruction again, so it will try to instruction step again, hitting the
breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are
instruction-stepping-by-breakpoint AND that we've stopped at the
breakpoint address we are stepping to. Only in that case will the stop
reason be rewritten to "trace" hiding the implementation detail that the
step was done by breakpoints.

(cherry picked from commit 213c59d)
jasonmolenda pushed a commit to jasonmolenda/llvm-project that referenced this pull request Feb 13, 2025
…lvm#109643)

Apparently a typo is causing compile error, added by llvm#108504.

(cherry picked from commit 85220a0)
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Mar 29, 2025
lldb-server built with NativeProcessLinux.cpp and
NativeProcessFreeBSD.cpp can use breakpoints to implement instruction
stepping on cores where there is no native instruction-step primitive.
Currently these set a breakpoint, continue, and if we hit the breakpoint
with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change
its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint
hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past
the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to
breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we
silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that
if we are sitting on a breakpoint that has not yet executed, and
instruction-step the thread, we will execute the breakpoint instruction
at $pc (instead of $next-pc where it meant to go), and stop again -- at
the same pc value. Then we will rewrite the stop reason to 'trace'. The
higher level logic will see that we haven't hit the breakpoint
instruction again, so it will try to instruction step again, hitting the
breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are
instruction-stepping-by-breakpoint AND that we've stopped at the
breakpoint address we are stepping to. Only in that case will the stop
reason be rewritten to "trace" hiding the implementation detail that the
step was done by breakpoints.

(cherry picked from commit 213c59d)
jasonmolenda pushed a commit to jasonmolenda/llvm-project that referenced this pull request Mar 29, 2025
…lvm#109643)

Apparently a typo is causing compile error, added by llvm#108504.

(cherry picked from commit 85220a0)
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Apr 8, 2025
lldb-server built with NativeProcessLinux.cpp and
NativeProcessFreeBSD.cpp can use breakpoints to implement instruction
stepping on cores where there is no native instruction-step primitive.
Currently these set a breakpoint, continue, and if we hit the breakpoint
with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change
its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint
hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past
the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to
breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we
silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that
if we are sitting on a breakpoint that has not yet executed, and
instruction-step the thread, we will execute the breakpoint instruction
at $pc (instead of $next-pc where it meant to go), and stop again -- at
the same pc value. Then we will rewrite the stop reason to 'trace'. The
higher level logic will see that we haven't hit the breakpoint
instruction again, so it will try to instruction step again, hitting the
breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are
instruction-stepping-by-breakpoint AND that we've stopped at the
breakpoint address we are stepping to. Only in that case will the stop
reason be rewritten to "trace" hiding the implementation detail that the
step was done by breakpoints.

(cherry picked from commit 213c59d)
jasonmolenda pushed a commit to jasonmolenda/llvm-project that referenced this pull request Apr 8, 2025
…lvm#109643)

Apparently a typo is causing compile error, added by llvm#108504.

(cherry picked from commit 85220a0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants