[lldb] Add pc check for thread-step-by-bp algorithms #108504

jasonmolenda · 2024-09-13T06:25:29Z

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints.

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace". I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of "if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" + "when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming" to a behavior of "when a thread executes a breakpoint, set the stop reason to breakpoint-hit" + "when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread". For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever. To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints.

llvmbot · 2024-09-13T06:26:07Z

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints.

Full diff: https://github.com/llvm/llvm-project/pull/108504.diff

2 Files Affected:

(modified) lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp (+4-1)
(modified) lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp (+5-2)

diff --git a/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp b/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
index 97fff4b9f65a82..80b27571f43d55 100644
--- a/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
+++ b/lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
@@ -319,9 +319,12 @@ void NativeProcessFreeBSD::MonitorSIGTRAP(lldb::pid_t pid) {
                info.pl_siginfo.si_addr);
 
       if (thread) {
+        auto &regctx = static_cast<NativeRegisterContextFreeBSD &>(
+            thread->GetRegisterContext());
         auto thread_info =
             m_threads_stepping_with_breakpoint.find(thread->GetID());
-        if (thread_info != m_threads_stepping_with_breakpoint.end()) {
+        if (thread_info != m_threads_stepping_with_breakpoint.end() &&
+            threads_info->second == regctx.GetPC()) {
           thread->SetStoppedByTrace();
           Status brkpt_error = RemoveBreakpoint(thread_info->second);
           if (brkpt_error.Fail())
diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index 5c262db8db7fde..38b7092682873b 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -829,8 +829,11 @@ void NativeProcessLinux::MonitorBreakpoint(NativeThreadLinux &thread) {
   thread.SetStoppedByBreakpoint();
   FixupBreakpointPCAsNeeded(thread);
 
-  if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
-      m_threads_stepping_with_breakpoint.end())
+  NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
+  auto stepping_with_bp_it =
+      m_threads_stepping_with_breakpoint.find(thread.GetID());
+  if (stepping_with_bp_it != m_threads_stepping_with_breakpoint.end() &&
+      stepping_with_bp_it->second == reg_ctx.GetPC())
     thread.SetStoppedByTrace();
 
   StopRunningThreads(thread.GetID());

jasonmolenda · 2024-09-13T06:27:35Z

This is addressing the discussion about the CI failures (and David's repoing of them) from when I tried to land #96260 .

I had this as a commit for the PR where I'm addressing the CI fails so I can re-land ( #105594 ) but this one has no dependance on my change and I'd like to land it separately, like I already did with the Dexter fix to work with current lldb or my new lldb stepping behavior.

These plugins are only used for targets without an instruction step capability and I don't have access to any of those so I haven't tested it -- but I think it's straightforward enough that landing it and seeing the CI bots react is fine.

Landing this in a separate PR llvm#108504 This reverts commit 48a0fc1.

DavidSpickett · 2024-09-13T10:40:11Z

"if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed"

What does it look like for a thread to be stopped at a breakpoint site but not to have executed the breakpoint?

Is this just a timing thing, if one thread executes the breakpoint and that causes all others to stop, and one of those others happens to be about to execute the same breakpoint?

T0:
  nop
  brk << $pc, hit the breakpoint and generated the debug exception

T1:
  nop
  brk << also $pc but didn't execute it yet

labath · 2024-09-13T11:07:31Z

It can happen through timing, but also in single-threaded code as well. A thread could reach-but-not execute a breakpoint insn due to some activity on the previous insn. E.g. it could be single-stepping over it (so the stop reason is "i've completed a step", not "I've hit a breakpoint"), or (on x86, where watchpoint hits are reported after the fact), the previous insn could have triggered a watchpoint.

None of this really matters for this patch though. The problem here was that the lldb-server implementation has assuming that the stop reason of a software-single-stepping thread must be "i've completed a step" , but that won't be the case if the instruction you are stepping over is actually a breakpoint insn -- in that case, the reason will be "i've hit a breakpoint (other than the temporary bkpt i've set to single-step)". It just so happens that without that Jason's patch, lldb would never actually try to step over a breakpoint insn it knew of, so we'd never hit this problem.

DavidSpickett · 2024-09-13T12:21:08Z

Thanks I understand now.

jasonmolenda · 2024-09-13T15:59:21Z

Yeah, the problem with our current algorithm is that you lose the actual stop reason, and sometimes that's important. For instance, if you have a store instruction that modifies a watchpointed memory, and you have a user breakpoint on the next instruction, on AArch64, the store triggers the watchpoint. lldb reads the old value, instruction steps, reads the new value and then tells the user. But now it's sitting on a breakpoint site so that overwrites the watchpoint stop reason and we never tell the user about that - we tell it we've hit the breakpoint. gdb behaves the same way.

The user-visible down side to this new approach was seen with Dexter debuginfo engine, where it sets a breakpoint on every source line in a program, and then does step-in across the source lines. In this case, the step-in executes to the first instruction of the next source line (where there is a breakpoint). So we have a "step-in completed" stop reason. Then Dexter step-in's again, and now it executes the breakpoint instruction and stops, without having advanced the pc, with a "breakpoint hit" stop reason. then doing a step-in will advance to the next source line.

Users will surely experience this (with only one breakpoint in code they're stepping over), and I'm sure they will scratch their heads about why they needed to Step twice to get past a breakpoint (if they notice at all), but that's the only bad fallout from this change to the algorithm.

…109643) Apparently a typo is causing compile error, added by #108504.

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace". I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of "if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" + "when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming" to a behavior of "when a thread executes a breakpoint, set the stop reason to breakpoint-hit" + "when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread". For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever. To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints. (cherry picked from commit 213c59d)

…lvm#109643) Apparently a typo is causing compile error, added by llvm#108504. (cherry picked from commit 85220a0)

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace". I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of "if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" + "when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming" to a behavior of "when a thread executes a breakpoint, set the stop reason to breakpoint-hit" + "when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread". For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever. To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints. (cherry picked from commit 213c59d)

…lvm#109643) Apparently a typo is causing compile error, added by llvm#108504. (cherry picked from commit 85220a0)

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace". I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of "if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" + "when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming" to a behavior of "when a thread executes a breakpoint, set the stop reason to breakpoint-hit" + "when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread". For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever. To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints. (cherry picked from commit 213c59d)

…lvm#109643) Apparently a typo is causing compile error, added by llvm#108504. (cherry picked from commit 85220a0)

jasonmolenda requested review from DavidSpickett and labath September 13, 2024 06:25

jasonmolenda requested a review from JDevlieghere as a code owner September 13, 2024 06:25

llvmbot added the lldb label Sep 13, 2024

jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Sep 13, 2024

Revert "Add pc check for instruction stepping by breakpoints"

5b5a5b6

Landing this in a separate PR llvm#108504 This reverts commit 48a0fc1.

jasonmolenda mentioned this pull request Sep 13, 2024

[lldb] Set the stop reason when receiving swbreak/hwbreak #108518

Merged

labath approved these changes Sep 13, 2024

View reviewed changes

jasonmolenda merged commit 213c59d into llvm:main Sep 13, 2024
9 checks passed

jasonmolenda deleted the trace-by-breakpoints-enforce-pc-check branch September 13, 2024 16:02

DavidSpickett mentioned this pull request Sep 23, 2024

[lldb][FreeBSD] Fix a typo in NativeProcessFreeBSD::MonitorSIGTRAP() #109643

Merged

DavidSpickett pushed a commit that referenced this pull request Sep 23, 2024

[lldb][FreeBSD] Fix a typo in NativeProcessFreeBSD::MonitorSIGTRAP() (#…

85220a0

…109643) Apparently a typo is causing compile error, added by #108504.

jasonmolenda mentioned this pull request Feb 13, 2025

[lldb] Change lldb's breakpoint handling behavior, reland #126988

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Add pc check for thread-step-by-bp algorithms #108504

[lldb] Add pc check for thread-step-by-bp algorithms #108504

Uh oh!

jasonmolenda commented Sep 13, 2024

Uh oh!

llvmbot commented Sep 13, 2024

Uh oh!

jasonmolenda commented Sep 13, 2024

Uh oh!

DavidSpickett commented Sep 13, 2024

Uh oh!

labath commented Sep 13, 2024

Uh oh!

DavidSpickett commented Sep 13, 2024

Uh oh!

jasonmolenda commented Sep 13, 2024

Uh oh!

Uh oh!

Uh oh!

[lldb] Add pc check for thread-step-by-bp algorithms #108504

[lldb] Add pc check for thread-step-by-bp algorithms #108504

Uh oh!

Conversation

jasonmolenda commented Sep 13, 2024

Uh oh!

llvmbot commented Sep 13, 2024

Uh oh!

jasonmolenda commented Sep 13, 2024

Uh oh!

DavidSpickett commented Sep 13, 2024

Uh oh!

labath commented Sep 13, 2024

Uh oh!

DavidSpickett commented Sep 13, 2024

Uh oh!

jasonmolenda commented Sep 13, 2024

Uh oh!

Uh oh!

Uh oh!