Skip to content

[lldb] Change lldb's breakpoint handling behavior #96260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jasonmolenda
Copy link
Collaborator

@jasonmolenda jasonmolenda commented Jun 21, 2024

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped at the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you stepi, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then continue the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior.

The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

  1. If we've stopped at a BreakpointSite (whether we hit it or not), we call Thread::SetThreadStoppedAtBreakpointSite(pc) to record the state at the point when the thread stopped. (so we can detect newly-added breakpoints, or when the pc is changed to an instruction that is a BreakpointSite)

  2. When we have actually hit a breakpoint (regardless of whether it is enablied for this thread), we call Thread::SetThreadHitBreakpointAtAddr(pc) so we know that it should be silently stepped past when we resume execution.

When resuming, we silently step over a breakpoint if we've hit it, or if it is newly added (or the pc was changed to an existing BreakpointSite).

The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch.

rdar://123942164

lldb today has two rules:  When a thread stops at a BreakpointSite,
we set the thread's StopReason to be "breakpoint hit" (regardless
if we've actually hit the breakpoint, or if we've merely stopped
*at* the breakpoint instruction/point and haven't tripped it yet).
And second, when resuming a process, any thread sitting at a
BreakpointSite is silently stepped over the BreakpointSite -- because
we've already flagged the breakpoint hit when we stopped there
originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint.  When we resume, we only silently step past a
BreakpointSite that we've registered as hit.  We preserve this state
across inferior function calls that the user may do while stopped,
etc.

Also, when a user adds a new breakpoint at $pc while stopped, or
changes $pc to be the address of a BreakpointSite, we will silently
step past that breakpoint when the process resumes.  This is purely
a UX call, I don't think there's any person who wants to set a
breakpoint at $pc and then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is
necessary:  If you're stopped at a BreakpointSite that has not yet
executed, you `stepi`, you will hit the breakpoint and the pc will
not yet advance.  This thread has not completed its stepi, and the
thread plan is still on the stack.  If you then `continue` the
thread, lldb will now stop and say, "instruction step completed",
one instruction past the BreakpointSite.  You can continue a second
time to resume execution.  I discussed this with Jim, and trying
to paper over this behavior will lead to more complicated scenarios
behaving non-intuitively.  And mostly it's the testsuite that was
trying to instruction step past a breakpoint and getting thrown off
-- and I changed those tests to expect the new behavior.

The bugs driving this change are all from lldb dropping the real
stop reason for a thread and setting it to breakpoint-hit when that
was not the case.  Jim hit one where we have an aarch64 watchpoint
that triggers one instruction before a BreakpointSite.  On this
arch we are notified of the watchpoint hit after the instruction
has been unrolled -- we disable the watchpoint, instruction step,
re-enable the watchpoint and collect the new value.  But now we're
on a BreakpointSite so the watchpoint-hit stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282
we attach to/launch a process with the pc at a BreakpointSite and
misbehave.  Caroline Tice mentioned it is also a problem they've
had with putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control
is that

1. If we've stopped at a BreakpointSite (whether we hit it or not),
we call Thread::SetThreadStoppedAtBreakpointSite(pc) to record the
state at the point when the thread stopped.  (so we can detect
newly-added breakpoints, or when the pc is changed to an instruction
that is a BreakpointSite)

2. When we have actually hit a breakpoint, and it is enabled for
this thread, we call Thread::SetThreadHitBreakpointAtAddr(pc) so
we know that it should be silently stepped past when we resume
execution.

When resuming, we silently step over a breakpoint if we've hit it,
or if it is newly added (or the pc was changed to an existing
BreakpointSite).

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason.  The Mach exception
codes differ in a few places depending on the target (unambiguously),
and I didn't want to duplicate the new code for each target so I've
tested what mach exceptions we get for each action on each target,
and reorganized StopInfoMachException::CreateStopReasonWithMachException
to document these possible values, and handle them without specializing
based on the target arch.

rdar://123942164
@llvmbot
Copy link
Member

llvmbot commented Jun 21, 2024

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped at the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you stepi, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then continue the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior.

The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

  1. If we've stopped at a BreakpointSite (whether we hit it or not), we call Thread::SetThreadStoppedAtBreakpointSite(pc) to record the state at the point when the thread stopped. (so we can detect newly-added breakpoints, or when the pc is changed to an instruction that is a BreakpointSite)

  2. When we have actually hit a breakpoint, and it is enabled for this thread, we call Thread::SetThreadHitBreakpointAtAddr(pc) so we know that it should be silently stepped past when we resume execution.

When resuming, we silently step over a breakpoint if we've hit it, or if it is newly added (or the pc was changed to an existing BreakpointSite).

The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch.

rdar://123942164


Patch is 33.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96260.diff

8 Files Affected:

  • (modified) lldb/include/lldb/Target/Thread.h (+29)
  • (modified) lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp (+118-178)
  • (modified) lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp (+3-13)
  • (modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+36-82)
  • (modified) lldb/source/Plugins/Process/scripted/ScriptedThread.cpp (+9)
  • (modified) lldb/source/Target/Thread.cpp (+16-1)
  • (modified) lldb/test/API/functionalities/breakpoint/consecutive_breakpoints/TestConsecutiveBreakpoints.py (+19-7)
  • (modified) lldb/test/API/functionalities/breakpoint/step_over_breakpoint/TestStepOverBreakpoint.py (+5-1)
diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index c17bddf4d98b8..1e1aead896018 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -128,6 +128,11 @@ class Thread : public std::enable_shared_from_this<Thread>,
         register_backup_sp; // You need to restore the registers, of course...
     uint32_t current_inlined_depth;
     lldb::addr_t current_inlined_pc;
+    lldb::addr_t
+        hit_bp_at_addr; // Set to the address of a breakpoint that we have hit.
+    lldb::addr_t bpsite_at_stop_pc; // Set to the address of a breakpoint
+                                    // instruction that we have not yet hit, but
+                                    // will hit when we resume.
   };
 
   /// Constructor
@@ -377,6 +382,19 @@ class Thread : public std::enable_shared_from_this<Thread>,
 
   virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}
 
+  /// When a thread has executed/trapped a breakpoint, set the address of that
+  /// breakpoint so we know it has been hit already, and should be silently
+  /// stepped past on resume.
+  void SetThreadHitBreakpointAtAddr(lldb::addr_t pc) { m_hit_bp_at_addr = pc; }
+
+  /// When a thread stops at a breakpoint instruction/address, but has not yet
+  /// executed/triggered it, record that so we can detect when a user adds a
+  /// breakpoint (or changes a thread to a breakpoint site) and we need to
+  /// silently step past that when resuming.
+  void SetThreadStoppedAtBreakpointSite(lldb::addr_t pc) {
+    m_bpsite_at_stop_pc = pc;
+  }
+
   /// Whether this Thread already has all the Queue information cached or not
   ///
   /// A Thread may be associated with a libdispatch work Queue at a given
@@ -1311,6 +1329,17 @@ class Thread : public std::enable_shared_from_this<Thread>,
   bool m_should_run_before_public_stop;  // If this thread has "stop others" 
                                          // private work to do, then it will
                                          // set this.
+  lldb::addr_t m_hit_bp_at_addr;    // If this thread originally stopped at a
+                                    // breakpoint instruction, AND HIT IT,
+                                    // record the address of that breakpoint.
+                                    // LLDB_INVALID_ADDRESS if this thread did
+                                    // not stop at a breakpoint insn, or did not
+                                    // hit it yet.
+  lldb::addr_t m_bpsite_at_stop_pc; // If this thread originally stopped at a
+                                    // breakpoint site, record the address of
+                                    // that breakpoint site.
+                                    // LLDB_INVALID_ADDRESS if this thread did
+                                    // not stop at a breakpoint site.
   const uint32_t m_index_id; ///< A unique 1 based index assigned to each thread
                              /// for easy UI/command line access.
   lldb::RegisterContextSP m_reg_context_sp; ///< The register context for this
diff --git a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
index 25cee369d7ee3..8087ef1accde8 100644
--- a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
+++ b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
@@ -488,38 +488,6 @@ const char *StopInfoMachException::GetDescription() {
   return m_description.c_str();
 }
 
-static StopInfoSP GetStopInfoForHardwareBP(Thread &thread, Target *target,
-                                           uint32_t exc_data_count,
-                                           uint64_t exc_sub_code,
-                                           uint64_t exc_sub_sub_code) {
-  // Try hardware watchpoint.
-  if (target) {
-    // The exc_sub_code indicates the data break address.
-    WatchpointResourceSP wp_rsrc_sp =
-        target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
-            (addr_t)exc_sub_code);
-    if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
-      return StopInfo::CreateStopReasonWithWatchpointID(
-          thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
-    }
-  }
-
-  // Try hardware breakpoint.
-  ProcessSP process_sp(thread.GetProcess());
-  if (process_sp) {
-    // The exc_sub_code indicates the data break address.
-    lldb::BreakpointSiteSP bp_sp =
-        process_sp->GetBreakpointSiteList().FindByAddress(
-            (lldb::addr_t)exc_sub_code);
-    if (bp_sp && bp_sp->IsEnabled()) {
-      return StopInfo::CreateStopReasonWithBreakpointSiteID(thread,
-                                                            bp_sp->GetID());
-    }
-  }
-
-  return nullptr;
-}
-
 #if defined(__APPLE__)
 const char *
 StopInfoMachException::MachException::Name(exception_type_t exc_type) {
@@ -633,171 +601,143 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
     }
     break;
 
+  // [exc_type, exc_code, exc_sub_code, exc_sub_sub_code]
+  //
+  // Instruction step:
+  //   [6, 1, 0]
+  //   Intel KDP [6, 3, ??]
+  //   armv7 [6, 0x102, <stop-pc>]  Same as software breakpoint!
+  //
+  // Software breakpoint:
+  //   x86 [6, 2, 0]
+  //   Intel KDP [6, 2, <bp-addr + 1>]
+  //   arm64 [6, 1, <bp-addr>]
+  //   armv7 [6, 0x102, <bp-addr>]  Same as instruction step!
+  //
+  // Hardware breakpoint:
+  //   x86 [6, 1, <bp-addr>, 0]
+  //   x86/Rosetta not implemented, see software breakpoint
+  //   arm64 [6, 1, <bp-addr>]
+  //   armv7 not implemented, see software breakpoint
+  //
+  // Hardware watchpoint:
+  //   x86 [6, 1, <accessed-addr>, 0] (both Intel hw and Rosetta)
+  //   arm64 [6, 0x102, <accessed-addr>, 0]
+  //   armv7 [6, 0x102, <accessed-addr>, 0]
+  //
+  // arm64 BRK instruction (imm arg not reflected in the ME)
+  //   [ 6, 1, <addr-of-BRK-insn>]
+  //
+  // In order of codes mach exceptions:
+  //   [6, 1, 0] - instruction step
+  //   [6, 1, <bp-addr>] - hardware breakpoint or watchpoint
+  //
+  //   [6, 2, 0] - software breakpoint
+  //   [6, 2, <bp-addr + 1>] - software breakpoint
+  //
+  //   [6, 3] - instruction step
+  //
+  //   [6, 0x102, <stop-pc>] armv7 instruction step
+  //   [6, 0x102, <bp-addr>] armv7 software breakpoint
+  //   [6, 0x102, <accessed-addr>, 0] arm64/armv7 watchpoint
   case 6: // EXC_BREAKPOINT
   {
-    bool is_actual_breakpoint = false;
-    bool is_trace_if_actual_breakpoint_missing = false;
-    switch (cpu) {
-    case llvm::Triple::x86:
-    case llvm::Triple::x86_64:
-      if (exc_code == 1) // EXC_I386_SGL
-      {
-        if (!exc_sub_code) {
-          // This looks like a plain trap.
-          // Have to check if there is a breakpoint here as well.  When you
-          // single-step onto a trap, the single step stops you not to trap.
-          // Since we also do that check below, let's just use that logic.
-          is_actual_breakpoint = true;
-          is_trace_if_actual_breakpoint_missing = true;
-        } else {
-          if (StopInfoSP stop_info =
-                  GetStopInfoForHardwareBP(thread, target, exc_data_count,
-                                           exc_sub_code, exc_sub_sub_code))
-            return stop_info;
-        }
-      } else if (exc_code == 2 || // EXC_I386_BPT
-                 exc_code == 3)   // EXC_I386_BPTFLT
-      {
-        // KDP returns EXC_I386_BPTFLT for trace breakpoints
-        if (exc_code == 3)
-          is_trace_if_actual_breakpoint_missing = true;
-
-        is_actual_breakpoint = true;
-        if (!pc_already_adjusted)
-          pc_decrement = 1;
-      }
-      break;
+    bool stopped_by_hitting_breakpoint = false;
+    bool stopped_by_completing_stepi = false;
+    bool stopped_watchpoint = false;
+    std::optional<addr_t> value;
+
+    // exc_code 1
+    if (exc_code == 1 && exc_sub_code == 0)
+      stopped_by_completing_stepi = true;
+    if (exc_code == 1 && exc_sub_code != 0) {
+      stopped_by_hitting_breakpoint = true;
+      stopped_watchpoint = true;
+      value = exc_sub_code;
+    }
 
-    case llvm::Triple::arm:
-    case llvm::Triple::thumb:
-      if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
-      {
-        // LWP_TODO: We need to find the WatchpointResource that matches
-        // the address, and evaluate its Watchpoints.
-
-        // It's a watchpoint, then, if the exc_sub_code indicates a
-        // known/enabled data break address from our watchpoint list.
-        lldb::WatchpointSP wp_sp;
-        if (target)
-          wp_sp = target->GetWatchpointList().FindByAddress(
-              (lldb::addr_t)exc_sub_code);
-        if (wp_sp && wp_sp->IsEnabled()) {
-          return StopInfo::CreateStopReasonWithWatchpointID(thread,
-                                                            wp_sp->GetID());
-        } else {
-          is_actual_breakpoint = true;
-          is_trace_if_actual_breakpoint_missing = true;
-        }
-      } else if (exc_code == 1) // EXC_ARM_BREAKPOINT
-      {
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-      } else if (exc_code == 0) // FIXME not EXC_ARM_BREAKPOINT but a kernel
-                                // is currently returning this so accept it
-                                // as indicating a breakpoint until the
-                                // kernel is fixed
-      {
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-      }
-      break;
+    // exc_code 2
+    if (exc_code == 2 && exc_sub_code == 0)
+      stopped_by_hitting_breakpoint = true;
+    if (exc_code == 2 && exc_sub_code != 0) {
+      stopped_by_hitting_breakpoint = true;
+      // Intel KDP software breakpoint
+      if (!pc_already_adjusted)
+        pc_decrement = 1;
+    }
 
-    case llvm::Triple::aarch64_32:
-    case llvm::Triple::aarch64: {
-      // xnu describes three things with type EXC_BREAKPOINT:
-      //
-      //   exc_code 0x102 [EXC_ARM_DA_DEBUG], exc_sub_code addr-of-insn
-      //      Watchpoint access.  exc_sub_code is the address of the
-      //      instruction which trigged the watchpoint trap.
-      //      debugserver may add the watchpoint number that was triggered
-      //      in exc_sub_sub_code.
-      //
-      //   exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code 0
-      //      Instruction step has completed.
-      //
-      //   exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code address-of-instruction
-      //      Software breakpoint instruction executed.
-
-      if (exc_code == 1 && exc_sub_code == 0) // EXC_ARM_BREAKPOINT
-      {
-        // This is hit when we single instruction step aka MDSCR_EL1 SS bit 0
-        // is set
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-        if (thread.GetTemporaryResumeState() != eStateStepping)
-          not_stepping_but_got_singlestep_exception = true;
-      }
-      if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
-      {
-        // LWP_TODO: We need to find the WatchpointResource that matches
-        // the address, and evaluate its Watchpoints.
-
-        // It's a watchpoint, then, if the exc_sub_code indicates a
-        // known/enabled data break address from our watchpoint list.
-        lldb::WatchpointSP wp_sp;
-        if (target)
-          wp_sp = target->GetWatchpointList().FindByAddress(
-              (lldb::addr_t)exc_sub_code);
-        if (wp_sp && wp_sp->IsEnabled()) {
-          return StopInfo::CreateStopReasonWithWatchpointID(thread,
-                                                            wp_sp->GetID());
-        }
-        // EXC_ARM_DA_DEBUG seems to be reused for EXC_BREAKPOINT as well as
-        // EXC_BAD_ACCESS
-        if (thread.GetTemporaryResumeState() == eStateStepping)
-          return StopInfo::CreateStopReasonToTrace(thread);
+    // exc_code 3
+    if (exc_code == 3)
+      stopped_by_completing_stepi = true;
+
+    // exc_code 0x102
+    if (exc_code == 0x102 && exc_sub_code != 0) {
+      if (cpu == llvm::Triple::arm || cpu == llvm::Triple::thumb) {
+        stopped_by_hitting_breakpoint = true;
+        stopped_by_completing_stepi = true;
       }
-      // It looks like exc_sub_code has the 4 bytes of the instruction that
-      // triggered the exception, i.e. our breakpoint opcode
-      is_actual_breakpoint = exc_code == 1;
-      break;
+      stopped_watchpoint = true;
+      value = exc_sub_code;
     }
 
-    default:
-      break;
-    }
+    // Go through the reasons why we stopped, starting
+    // with the easiest to detect unambiguously.  We
+    // may have multiple possible reasons set.
 
-    if (is_actual_breakpoint) {
+    if (stopped_by_hitting_breakpoint) {
+      ProcessSP process_sp(thread.GetProcess());
       RegisterContextSP reg_ctx_sp(thread.GetRegisterContext());
-      addr_t pc = reg_ctx_sp->GetPC() - pc_decrement;
-
-      ProcessSP process_sp(thread.CalculateProcess());
-
-      lldb::BreakpointSiteSP bp_site_sp;
-      if (process_sp)
+      BreakpointSiteSP bp_site_sp;
+      addr_t pc = LLDB_INVALID_ADDRESS;
+      if (reg_ctx_sp)
+        pc = reg_ctx_sp->GetPC() - pc_decrement;
+      else if (value)
+        pc = *value;
+
+      if (value)
+        bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(*value);
+      if (!bp_site_sp && reg_ctx_sp) {
         bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(pc);
+      }
       if (bp_site_sp && bp_site_sp->IsEnabled()) {
-        // Update the PC if we were asked to do so, but only do so if we find
-        // a breakpoint that we know about cause this could be a trap
-        // instruction in the code
-        if (pc_decrement > 0 && adjust_pc_if_needed)
-          reg_ctx_sp->SetPC(pc);
-
-        // If the breakpoint is for this thread, then we'll report the hit,
-        // but if it is for another thread, we can just report no reason.  We
-        // don't need to worry about stepping over the breakpoint here, that
-        // will be taken care of when the thread resumes and notices that
-        // there's a breakpoint under the pc. If we have an operating system
-        // plug-in, we might have set a thread specific breakpoint using the
-        // operating system thread ID, so we can't make any assumptions about
-        // the thread ID so we must always report the breakpoint regardless
-        // of the thread.
+        // If we have an operating system plug-in, we might have set a thread
+        // specific breakpoint using the operating system thread ID, so we
+        // can't make any assumptions about the thread ID so we must always
+        // report the breakpoint regardless of the thread.
         if (bp_site_sp->ValidForThisThread(thread) ||
-            thread.GetProcess()->GetOperatingSystem() != nullptr)
+            thread.GetProcess()->GetOperatingSystem() != nullptr) {
+          // Update the PC if we were asked to do so, but only do so if we find
+          // a breakpoint that we know about cause this could be a trap
+          // instruction in the code
+          if (pc_decrement > 0 && adjust_pc_if_needed && reg_ctx_sp)
+            reg_ctx_sp->SetPC(pc);
+          thread.SetThreadHitBreakpointAtAddr(pc);
           return StopInfo::CreateStopReasonWithBreakpointSiteID(
               thread, bp_site_sp->GetID());
-        else if (is_trace_if_actual_breakpoint_missing)
-          return StopInfo::CreateStopReasonToTrace(thread);
-        else
+        } else {
+          thread.SetThreadHitBreakpointAtAddr(pc);
           return StopInfoSP();
+        }
       }
+    }
 
-      // Don't call this a trace if we weren't single stepping this thread.
-      if (is_trace_if_actual_breakpoint_missing &&
-          thread.GetTemporaryResumeState() == eStateStepping) {
-        return StopInfo::CreateStopReasonToTrace(thread);
+    if (stopped_watchpoint && value) {
+      WatchpointResourceSP wp_rsrc_sp =
+          target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
+              *value);
+      if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
+        return StopInfo::CreateStopReasonWithWatchpointID(
+            thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
       }
     }
+
+    if (stopped_by_completing_stepi) {
+      if (thread.GetTemporaryResumeState() != eStateStepping)
+        not_stepping_but_got_singlestep_exception = true;
+      else
+        return StopInfo::CreateStopReasonToTrace(thread);
+    }
+
   } break;
 
   case 7:  // EXC_SYSCALL
diff --git a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
index f383b3d40a4f3..c4005a1f51150 100644
--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -382,19 +382,8 @@ void ProcessWindows::RefreshStateAfterStop() {
     RegisterContextSP register_context = stop_thread->GetRegisterContext();
     const uint64_t pc = register_context->GetPC();
     BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
-    if (site && site->ValidForThisThread(*stop_thread)) {
-      LLDB_LOG(log,
-               "Single-stepped onto a breakpoint in process {0} at "
-               "address {1:x} with breakpoint site {2}",
-               m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
-               site->GetID());
-      stop_info = StopInfo::CreateStopReasonWithBreakpointSiteID(*stop_thread,
-                                                                 site->GetID());
-      stop_thread->SetStopInfo(stop_info);
-
-      return;
-    }
-
+    if (site)
+      stop_thread->SetThreadStoppedAtBreakpointSite(pc);
     auto *reg_ctx = static_cast<RegisterContextWindows *>(
         stop_thread->GetRegisterContext().get());
     uint32_t slot_id = reg_ctx->GetTriggeredHardwareBreakpointSlotId();
@@ -462,6 +451,7 @@ void ProcessWindows::RefreshStateAfterStop() {
         stop_info = StopInfo::CreateStopReasonWithBreakpointSiteID(
             *stop_thread, site->GetID());
         register_context->SetPC(pc);
+        stop_thread->SetThreadHitBreakpointAtAddr(pc);
       } else {
         LLDB_LOG(log,
                  "Breakpoint site {0} is not valid for this thread, "
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index a5a731981299f..bad42d44503fd 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -1598,29 +1598,8 @@ bool ProcessGDBRemote::CalculateThreadStopInfo(ThreadGDBRemote *thread) {
     // that have stop reasons, and if there is no entry for a thread, then it
     // has no stop reason.
     thread->GetRegisterContext()->InvalidateIfNeeded(true);
-    if (!GetThreadStopInfoFromJSON(thread, m_jstopinfo_sp)) {
-      // If a thread is stopped at a breakpoint site, set that as the stop
-      // reason even if it hasn't executed the breakpoint instruction yet.
-      // We will silently step over the breakpoint when we resume execution
-      // and miss the fact that this thread hit the breakpoint.
-      const size_t num_thread_ids = m_thread_ids.size();
-      for (size_t i = 0; i < num_thread_ids; i++) {
-        if (m_thread_ids[i] == thread->GetID() && m_thread_pcs.size() > i) {
-          addr_t pc = m_thread_pcs[i];
-          lldb::BreakpointSiteSP bp_site_sp =
-              thread->GetProcess()->GetBreakpointSiteList().FindByAddress(pc);
-          if (bp_site_sp) {
-            if (bp_site_sp->ValidForThisThread(*thread)) {
-              thread->SetStopInfo(
-                  StopInfo::CreateStopReasonWithBreakpointSiteID(
-                      *thread, bp_site_sp->GetID()));
-              return true;
-            }
-          }
-        }
-      }
+    if (!GetThreadStopInfoFromJSON(thread, m_jstopinfo_sp))
       thread->SetStopInfo(StopInfoSP());
-    }
     return true;
   }
 
@@ -1729,6 +1708,12 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
     if (ThreadSP memory_thread_sp = m_thread_list.GetBackingThread(thread_sp))
       thread_sp = memory_thread_sp;
 
+    addr_t pc = thread_sp->GetRegisterContext()->GetPC();
+    BreakpointSiteSP bp_site_sp =
+        thread_sp->GetProcess()...
[truncated]

@jasonmolenda
Copy link
Collaborator Author

This patch definitely needs review from Jim Ingham, who is out this week. @ZequanWu and @cmtice FYI.

I tested this on arm64 macOS and on aarch64 Ubuntu - the important Process changes are were in ProcessGDBRemote, StopInfoMachException, and in ProcessWindows, I believe my testing will have exercised the first two set of changes (the largest and most complex). I updated ProcessWindows as I believe it needs to be updated, but don't have a way of testing it short of merging the PR and seeing what happens to the windows CI bot.

@AlexK0 if you have a setup to build and run the testsuite on windows, could you try it with this patch some time when you're able? I can't see this being merged for at least another 5-6 days, there's no rush. You can download the unidiff style diff of the patch from https://patch-diff.githubusercontent.com/raw/llvm/llvm-project/pull/96260.diff

There's a comment,

```
The current PC is AFTER the BP opcode, on all architectures
```

and code to decrement the pc value before looking for a BreakpointSite,
if this is how Windows really behaves with breakpoints then we don't
need to silently step past the breakpoint, windows has already done
that.

I was originally fixing a mistake where I called
Thread::SetThreadHitBreakpointAtAddr(pc) only if the breakpoint was
valid for this thread.  But we need to mark the breakpoint that way
regardless of whether it is valid for this thread or not -- for a
bp not on this thread, we still need to silently step past it.

But as I was making that change, I suspect none of this is necessary
on Windows, so I'm removing it.
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change a lot, but I'm wondering if there's a clearer way to store the intermediate state. It seems like this info could go into the StopInfo class, which would have the benefit of automatically being backed up when doing expression evaluation. Is there a reason to not do that?

If the reason is that the StopInfo of the thread has been cleared by the time we call SetupForResume (and there's no good way to change that), maybe we could pass the old stop info to the function as an argument, since that's effectively what the function is doing -- deciding how to resume the thread based on how it was previously stopped.

Regarding the scenarios you mention in the description (moving the PC to a breakpoint site, setting a breakpoint site at the current pc), are these tested by any of the existing tests?

Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state

I believe this was fixed by 60b90b5.

@@ -377,6 +382,19 @@ class Thread : public std::enable_shared_from_this<Thread>,

virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}

/// When a thread has executed/trapped a breakpoint, set the address of that
/// breakpoint so we know it has been hit already, and should be silently
/// stepped past on resume.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unclear as to why do we need to store this separately. Shouldn't this already be stored in the stop reason of the thread (i.e., StopInfoBreakpoint implies we've hit a breakpoint, and the breakpoint site within it should give us the PC value)?

Comment on lines 390 to 396
/// When a thread stops at a breakpoint instruction/address, but has not yet
/// executed/triggered it, record that so we can detect when a user adds a
/// breakpoint (or changes a thread to a breakpoint site) and we need to
/// silently step past that when resuming.
void SetThreadStoppedAtBreakpointSite(lldb::addr_t pc) {
m_bpsite_at_stop_pc = pc;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also finding it hard to wrap my head around the meaning of this variable. If I understand correctly it tells us: the pc that we've stopped at; that there was a breakpoint site there at the time of the stop; and we did not hit that site.

I'm wondering if it would be clearer if we unpacked that. What if we:

  • unconditionally stored the PC of the last stop. Maybe this could be even a part of the StopInfo class, as I think it could be useful to see the PC value at the time of the stop, even if the user changed the PC afterwards.
  • a flag indicating whether we stopped at a breakpoint site, regardless of whether we've hit it or not (per the previous comment, that could be indicated by the stop reason). This doesn't really look like it belongs to StopInfo class, but I think I'd be fine with putting it there for collocation purposes.

@AlexK0
Copy link
Contributor

AlexK0 commented Jun 21, 2024

@AlexK0 if you have a setup to build and run the testsuite on windows, could you try it with this patch some time when you're able? I can't see this being merged for at least another 5-6 days, there's no rush. You can download the unidiff style diff of the patch from https://patch-diff.githubusercontent.com/raw/llvm/llvm-project/pull/96260.diff

@jasonmolenda I checked the tests with VS2022/x86-64/win11 in a debug build with assertions enabled. Unfortunately, the main branch is a bit unstable, and some tests constantly fail :( . Nonetheless, I noticed one new failure with the applied patch:

functionalities/breakpoint/consecutive_breakpoints/TestConsecutiveBreakpoints.py

backtrace:

FAIL: test_single_step_thread_specific (TestConsecutiveBreakpoints.ConsecutiveBreakpointsTestCase)
   Test that single step stops, even though the second breakpoint is not valid.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\packages\Python\lldbsuite\test\decorators.py", line 451, in wrapper
    return func(self, *args, **kwargs)
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\test\API\functionalities\breakpoint\consecutive_breakpoints\TestConsecutiveBreakpoints.py", line 121, in test_single_step_thread_specific
    self.finish_test()
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\test\API\functionalities\breakpoint\consecutive_breakpoints\TestConsecutiveBreakpoints.py", line 42, in finish_test
    self.assertState(self.process.GetState(), lldb.eStateExited)
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 2578, in assertState
    self.fail(self._formatMessage(msg, error))
AssertionError: stopped (5) != exited (10)
Config=x86_64-D:\Projects\github\llvm-project-jasonmolenda\build\bin\clang.exe
----------------------------------------------------------------------

@jasonmolenda
Copy link
Collaborator Author

@AlexK0 if you have a setup to build and run the testsuite on windows, could you try it with this patch some time when you're able? I can't see this being merged for at least another 5-6 days, there's no rush. You can download the unidiff style diff of the patch from https://patch-diff.githubusercontent.com/raw/llvm/llvm-project/pull/96260.diff

@jasonmolenda I checked the tests with VS2022/x86-64/win11 in a debug build with assertions enabled. Unfortunately, the main branch is a bit unstable, and some tests constantly fail :( . Nonetheless, I noticed one new failure with the applied patch:

functionalities/breakpoint/consecutive_breakpoints/TestConsecutiveBreakpoints.py

These TestConcurrent / TestConsecutiveBreakpoints tests are great for finding corner cases with these changes, thanks so much. I was a little uncertain about one part of my ProcessWindows change, where the pc is past the breakpoint when a software breakpoint instruction is used, on Windows. For a moment, I thought, "oh, I don't need to record that we hit the breakpoint because we're already past it and we don't need to instruction past it" but of course that was wrong -- ProcessWindows::RefreshStateAfterStop decrements the $pc value by the size of its breakpoint instruction, which is necessary because e.g. the size of an x86 breakpoint instruction is 1 byte (0xcc) but x86_64 instructions can be between 1 to 15 bytes long, so stopping 1 byte after the BreakpointSite means we are possibly in the middle of the real instruction. We must set the pc to the BreakpointSite address and use lldb's normal logic.

Anyway, tl;dr, I believe this will fix it:

--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -442,6 +442,7 @@ void ProcessWindows::RefreshStateAfterStop() {
                m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
                site->GetID());
 
+      stop_thread->SetThreadHitBreakpointAtAddr(pc);
       if (site->ValidForThisThread(*stop_thread)) {
         LLDB_LOG(log,
                  "Breakpoint site {0} is valid for this thread ({1:x}), "

I'll push it right now. Sorry for my confusion, and thanks again for looking at it, this was a big help.

ProcessWindows plugin.  I misread this method's behavior
initially, when I saw that the stop pc is past the breakpoint
instruction.  But we're not ready to resume the process now -
we need to set the pc back to the start of the breakpoint, then
do lldb's normal "disable breakpoint, instruction step, re-insert
breakpoint" behavior just like all other targets.  Re-adding the
call to register the thread as having hit the breakpoint after
testing help from Aleksandr Korepanov.
@AlexK0
Copy link
Contributor

AlexK0 commented Jun 24, 2024

Anyway, tl;dr, I believe this will fix it:

--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -442,6 +442,7 @@ void ProcessWindows::RefreshStateAfterStop() {
                m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
                site->GetID());
 
+      stop_thread->SetThreadHitBreakpointAtAddr(pc);
       if (site->ValidForThisThread(*stop_thread)) {
         LLDB_LOG(log,
                  "Breakpoint site {0} is valid for this thread ({1:x}), "

I'll push it right now. Sorry for my confusion, and thanks again for looking at it, this was a big help.

@jasonmolenda unfortunately, the test still fails :(
I did a little investigation, here is a possible patch that fixes the test:

diff --git a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
index 231b22f5f189..fb0404f1c4b9 100644
--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -382,7 +382,7 @@ void ProcessWindows::RefreshStateAfterStop() {
     RegisterContextSP register_context = stop_thread->GetRegisterContext();
     const uint64_t pc = register_context->GetPC();
     BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
-    if (site)
+    if (site && site->ValidForThisThread(*stop_thread))
       stop_thread->SetThreadStoppedAtBreakpointSite(pc);
     auto *reg_ctx = static_cast<RegisterContextWindows *>(
         stop_thread->GetRegisterContext().get());

@jasonmolenda
Copy link
Collaborator Author

Thanks @AlexK0 ! After Pavel's suggestion, I need to do a minor bit of changes to the patch soon I think, for all platforms, so I'll see if this is easier to express in ProcessWindows that way.

I've was thinking about @labath 's suggestion over the weekend and I agree that the way the patch is written today isn't ideal about tracking state at stop-time and resume-time, as I developed the change I had to change its behavior and I didn't rethink what I was doing.

The original goal was: If we hit a breakpoint (whether it is enabled for this thread or not), when we Resume the thread, we silently instruction step past that breakpoint instruction before resuming. Then I realized the problem of "user sets a breakpoint at $pc, or changes $pc to a BreakpointSite" and I want to also silently step past the breakpoint in that case, so I added a separate variable to track that.

Looking at the total requirements, the rule can be condensed to: If this thread stopped at a BreakpointSite which it did not execute, we should execute the breakpoint on Resume. In any other case, a thread sitting at a BreakpointSite should silently step past it and resume execution.

So when a thread stops at a BreakpointSite that has executed (whether it is valid for this thread or not), we record nothing. When a thread stops at a BreakpointSite that has not executed, we need to record the address of that BreakpointSite. And on Resume, if the thread is still at this same address, we want to hit the breakpoint.

I don't think we can store this information in the StopInfo easily, because a thread with no stop reason (e.g. a multi-threaded program that hits a breakpoint on one thread, but the others were executing normally) wouldn't have a way to record that they were sitting at a BreakpointSite that needed to be hit still.

I outlined the idea of storing this data in the StopInfo to @jimingham earlier today briefly, and he agreed that we should have StopInfos around until we Resume, but I hadn't thought this through enough to account for threads with no stop reason. I'll check in with him about the details on this before I make the change, but I think I need to keep tracking this in the Thread object.

@labath
Copy link
Collaborator

labath commented Jun 25, 2024

Thanks for the explanation, Jason. I believe I understand the problem now. Here, we're concerned with whether the physical breakpoint opcode got executed, and not about the higher level machinery (conditional breakpoints, ignore count, ...), which builds on top of that and can cause a breakpoint hit to "disappear". Since StopInfo tracks the "logical" stop reason of the process (after all of these filters are applied), using it to track the "physical" one would be tricky at best.

Looking at the total requirements, the rule can be condensed to: If this thread stopped at a BreakpointSite which it did not execute, we should execute the breakpoint on Resume. In any other case, a thread sitting at a BreakpointSite should silently step past it and resume execution.

So, IIUC, there would just be one field (the "unexecuted breakpoint site") instead of two. I'll take that, I think having just one field to think about would make it easier to understand the overall logic.

track if we're at a BreakpointSite that we haven't hit
yet, when we stop.

This adds two methods that the Process plugins need
to call.

When a thread has stopped at an enabled BreakpointSite, but has not
yet hit the breakpoint, it will call
Thread::SetThreadStoppedAtUnexecutedBP(pc).

When a thread has hit a breakpoint, it will call
Thread::SetThreadHitBreakpointSite().

These are mutually exclusive, and the last one called wins.  So a
Process plugin may first detect that it is at an enabled BreakpointSite,
call SetThreadStoppedAtUnexecutedBP(pc), and later realize breakpoint
was actually hit, and update the state by calling
SetThreadHitBreakpointSite().

When resuming a thread, if SetThreadStoppedAtUnexecutedBP(pc) was
called and the thread's pc is still the same as that value when
the thread stopped, we will execute the Breakpoint and stop again
with a breakpoint-hit stop-reason.  Otherwise we will silently
step past the breakpoint.

I've tested this update to the patch on macOS, I need to test on
aarch64 ubuntu still, and ask Aleksandr if it's possible to
check on Windows before we try to merge the PR.
@jasonmolenda
Copy link
Collaborator Author

Hi @AlexK0 I restructured this PR to track everything with a single piece of state in the Thread object, and redid all three Process plugins. I'm feeling a lot better about my changes to ProcessWindows now. I haven't tested the aarch64 ubuntu yet, I will do that tomorrow and if that looks good, I'm going to ask for more careful reviews of the PR. No rush at all, but if you have a chance to try this version on Windows I would really appreciate it. Thanks!

RegisterContextSP register_context = stop_thread->GetRegisterContext();
// If we're at a BreakpointSite, mark this as an Unexecuted Breakpoint.
// We'll clear that state if we've actually executed the breakpoint.
if (RegisterContextSP register_context = stop_thread->GetRegisterContext()) {
Copy link
Contributor

@AlexK0 AlexK0 Jul 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving pc into a scope causes a compilation error at the line with the logging, here: ProcessWindows.cpp:396

lldb\source\Plugins\Process\Windows\Common\ProcessWindows.cpp(396): error C2065: 'pc': undeclared identifier

@AlexK0
Copy link
Contributor

AlexK0 commented Jul 6, 2024

@jasonmolenda, I checked the tests with the latest fix.
There is a minor compilation error: #96260 (review)
If fix it (I just removed pc from the logging), all breakpoint tests work, and I don’t see any new failures compared to the main branch.

I'm setting `register_context`, `site`, and `pc` before either of
the EXCEPTION_SINGLE_STEP or EXCEPTION_BREAKPOINT blocks, and they
are used/updated in both.  Thanks again to Aleksandr for helping
me to avoid extra round-trips through the CI, and testing that it
behaves correctly once these are straightened out.
@jasonmolenda
Copy link
Collaborator Author

Thanks again for all the help @AlexK0 I pushed an update that should compile correctly with my update. It's great to hear that the testsuite looks clean -- that's the part that worried me the most about these changes.

I finished an aarch64 Ubuntu testsuite run and it is clean. I think this patch is about ready for review by @jimingham when he has time.

@@ -377,6 +380,26 @@ class Thread : public std::enable_shared_from_this<Thread>,

virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}

/// When a thread stops at an enabled BreakpointSite that has not exectued,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exectued -> executed

/// When a thread stops at an enabled BreakpointSite that has not exectued,
/// the Process plugin should call SetThreadStoppedAtUnexecutedBP(pc).
/// If that BreakpointSite was actually triggered (the instruction was
/// executed, for a software breakpoint), regardless of wheether the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wheether -> whether

/// If that BreakpointSite was actually triggered (the instruction was
/// executed, for a software breakpoint), regardless of wheether the
/// breakpoint is valid for this thread, SetThreadHitBreakpointSite()
/// should be called to clear that state.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to clear that state" is confusing. A thread could stop because it executed the breakpoint trap without ever having stopped "with the PC at the unexecuted trap". That's actually the more common case. So you won't be clearing that state usually. Maybe "to clear that state" -> "to record that fact".

@@ -128,6 +128,9 @@ class Thread : public std::enable_shared_from_this<Thread>,
register_backup_sp; // You need to restore the registers, of course...
uint32_t current_inlined_depth;
lldb::addr_t current_inlined_pc;
lldb::addr_t stopped_at_unexecuted_bp; // Set to the address of a breakpoint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the stored version of the Thread ivar with roughly the same name. It's better documented there, maybe here you can just refer to that ivar for docs?

lldb::addr_t
m_stopped_at_unexecuted_bp; // When the thread stops at a
// breakpoint instruction/BreakpointSite
// but has not yet hit/exectued it,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executed

bool stopped_by_hitting_breakpoint = false;
bool stopped_by_completing_stepi = false;
bool stopped_watchpoint = false;
std::optional<addr_t> value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name for this than "value" - that's pretty generic.

std::optional<addr_t> value;

// exc_code 1
if (exc_code == 1 && exc_sub_code == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite our fear of indentation, since the two if's exhaust the exe_code == 1 possibilities, this would be clearer as:

if (exc_code == 1) {
if (exc_sub_code == 0) {
} else {
}
}

if (exc_code == 1 && exc_sub_code == 0)
stopped_by_completing_stepi = true;
if (exc_code == 1 && exc_sub_code != 0) {
stopped_by_hitting_breakpoint = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the face of it, it seems weird that you are asserting that this is both a "breakpoint" and a "watchpoint" hit. Later on you'll tell the difference based on the exe_sub_code stored in value. This would be easier to read if there were a comment explaining that.

@@ -633,171 +613,142 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
}
break;

// [exc_type, exc_code, exc_sub_code, exc_sub_sub_code]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table is great, but it deserves a line saying what it is.

@jasonmolenda
Copy link
Collaborator Author

Ah, I see, NativeProcessLinux::Resume will use a software single step (emulate current instruction, insert breakpoint, continue) for any processor without a hardware instruction step and this includes

bool NativeProcessLinux::SupportHardwareSingleStepping() const {
  if (m_arch.IsMIPS() || m_arch.GetMachine() == llvm::Triple::arm ||
      m_arch.GetTriple().isRISCV() || m_arch.GetTriple().isLoongArch())

debugserver uses the hardware breakpoint because we know we have that feature on our armv7 darwin cores, but it looks like lldb-server on armv7 linux will be entirely software based.

@jasonmolenda
Copy link
Collaborator Author

With my patch from this PR applied, the Mach Exception from debugserver is interpreted the same way -- it could be a watchpoint, a breakpoint, or an "instruction step", we use the address included in the ME to say which one it is, and in the ambiguous instance of instruction stepping to a BreakpointSite, we will incorrectly say we hit the breakpoint.

I haven't found the mechanism that an armv7 lldb-server would report reason:trace when it emulated the current instruction, put a breakpoint on the next instruction we'll execute, and 'continued' the thread when we resume. But I might have just missed something in ProcessNativeLinux where it knows it was instruction stepping. (but in the old stepping model in lldb, when you're at a BreakpointSite, the stop reason would be "breakpoint hit" even if it hadn't actually been hit)

@jasonmolenda
Copy link
Collaborator Author

jasonmolenda commented Aug 7, 2024

I built TestStepOverBreakpoint.py for armv7 and ran it against an armv7k apple watch (the only 32-bit target we still support) with debugserver. Four breakpoints on the source lines in main(), the middle two breakpoints have a condition of false, and the test counts the number of instructions and expects that many stops while it next-instruction's through, and doesn't expect to see breakpoint-hit notifications on the two condition==false breakpoints. I only ran lldb on the test program by hand but it looks like it worked correctly, e.g. one of my condition==false breakpoints is on 0x27bf9a and,

<  16> send packet: $vCont;s:343b#ee
(lldb)  < 369:368> read packet: $T05thread:343b;threads:343b;thread-pcs:27bf9a;[registers];metype:6;mecount:2;medata:102;medata:27bf9a;memory:0x27b8fc48=48fdb827d7912c00;memory:0x27b8fd48=0000000000000000;#00

(that's our magical "could be a watchpoint, could be a breakpoint, could be instruction step Mach Exception" that debugserver relays up from the darwin kernel)

 <  15> send packet: $x2bc000,200#bd
 <  65:516> read packet: $000000000000[...]

(lldb allocated memory to store 'false' lol)

Process 257 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0027bf9a a.out`main(argc=1, argv=0x27b8fd68) at main.cpp:9:9
-> 9   	    a = b + func(); // breakpoint_3

lldb has told us that we stopped with instruction-step because the breakpoint is not supposed to stop here (the thread would have resumed if I hadn't been ni'ing), very nice.

But of course the details of what ProcessGDBRemote got back for the reason: field and how that was handled are likely where the problem is -- and debugserver reporting mach exceptions is a completely different mechanism. Maybe I should just read through how this is handled in ProcessGDBRemote if we have the same ambiguity when breakpoints are used for instruction stepping.

@labath
Copy link
Collaborator

labath commented Aug 7, 2024

I haven't found the mechanism that an armv7 lldb-server would report reason:trace when it emulated the current instruction, put a breakpoint on the next instruction we'll execute, and 'continued' the thread when we resume. But I might have just missed something in ProcessNativeLinux where it knows it was instruction stepping. (but in the old stepping model in lldb, when you're at a BreakpointSite, the stop reason would be "breakpoint hit" even if it hadn't actually been hit)

I think that's this code here (I have missed it at first as well):

void NativeProcessLinux::MonitorBreakpoint(NativeThreadLinux &thread) {
  Log *log = GetLog(LLDBLog::Process | LLDBLog::Breakpoints);
  LLDB_LOG(log, "received breakpoint event, pid = {0}", thread.GetID());

  // Mark the thread as stopped at breakpoint.
  thread.SetStoppedByBreakpoint();
  FixupBreakpointPCAsNeeded(thread);

  if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
      m_threads_stepping_with_breakpoint.end())
    thread.SetStoppedByTrace();

  StopRunningThreads(thread.GetID());
}

I think this is the correct thing to do, since lldb is completely unaware of whether software single stepping is taking place(*) I got lost in all of the mach exceptions, so it's not clear to me what is the conclusion. Are you saying that debugserver does not do this (i.e., it reports the stop as a "breakpoint hit", in the mach exception form)?

Just as a curiosity the linux kernel does allow you do use the hardware single step capability if debugging a 32 bit application on a 64-bit kernel (I'm not sure if the debugger process needs to be 64-bit or not). At one point, I wanted to make use of that, which would make our stepping behavior more reliable in these situations. However it would also make it harder to reproduce situations like these, as one would have to find actual 32-bit hardware...

(*) The code is probably too casual about this check though, in that it does not check whether it was the actual single-stepping breakpoint that got hit. If there was a breakpoint under the current PC, it would report this as "trace" even though it hasn't actually stepped anything. Could this maybe be the problem that you ran into with this patch?

@jasonmolenda
Copy link
Collaborator Author

jasonmolenda commented Aug 8, 2024

// Mark the thread as stopped at breakpoint.
thread.SetStoppedByBreakpoint();
FixupBreakpointPCAsNeeded(thread);

if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
m_threads_stepping_with_breakpoint.end())
thread.SetStoppedByTrace();

StopRunningThreads(thread.GetID());
}


I think this is the correct thing to do, since lldb is completely unaware of whether software single stepping is taking place(*) I got lost in all of the mach exceptions, so it's not clear to me what is the conclusion. Are you saying that debugserver does not do this (i.e., it reports the stop as a "breakpoint hit", in the mach exception form)?

Ah, very nice, thanks for finding this. I think this helps to show why this is failing with the new "breakpoints are only 'hit' when they are actually executed'" scheme.

If the pc is at a BreakpointSite and we're stopped in the debugger. We do "si", a breakpoint is inserted on the next instruction (pc+4 or 2), this thread and pc+4 is added to m_threads_stepping_with_breakpoint. We resume execution. The breakpoint at pc is hit, and the pc value does not advance. This method is only checking that the thread is included in the m_threads_stepping_with_breakpoint map, not checking that the pc of the breakpoint we hit is the one we inserted (at pc+4/2). I wonder if this can be fixed by

NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
       m_threads_stepping_with_breakpoint.end() &&
    m_threads_stepping_with_breakpoint.find(thread.GetID())->second == reg_ctx.GetPC())

This wasn't necessary with the previous behavior -- if we stopped at a BreakpointSite, we would declare that we hit the breakpoint, even if we'd just stepped up to the BreakpointSite. But now that you may have stepped up to a BreakpointSite OR you may have executed & hit the breakpoint, we're trusting the gdb-remote stub's response more.

I think @DavidSpickett was seeing some looping in the test case, and I could imagine a scenario where the pc is at a BreakpointSite, lldb instruction steps, and we get back an instruction-step stop reason without the pc advancing. We haven't hit the breakpoint yet (as far as lldb knows), so we try again. And again.

Just as a curiosity the linux kernel does allow you do use the hardware single step capability if debugging a 32 bit application on a 64-bit kernel (I'm not sure if the debugger process needs to be 64-bit or not). At one point, I wanted to make use of that, which would make our stepping behavior more reliable in these situations. However it would also make it harder to reproduce situations like these, as one would have to find actual 32-bit hardware...

Yeah on an armv8 system that can run AArch32 or AArch64 code, both will have MDSCR_EL1.SS instruction stepping capability. It's only on a genuine armv7 processor where that feature doesn't exist. I don't know how much lldb distinguishes between AArch32 and armv7, but this is one important difference. (Apple's AArch64 cores don't support AArch32 any more, so I haven't worked with this combination in years).

(*) The code is probably too casual about this check though, in that it does not check whether it was the actual single-stepping breakpoint that got hit. If there was a breakpoint under the current PC, it would report this as "trace" even though it hasn't actually stepped anything. Could this maybe be the problem that you ran into with this patch?

Ah yes, exactly so, we reached the same conclusion, I didn't read your whole response at first. :)

@jasonmolenda
Copy link
Collaborator Author

David, no need to test this latest fix unless you'd like to - I'll make sure it compiles on an aarch64 ubuntu VM but the fix is obvious once we were looking at the right place, and explains the behavior you saw earlier, I'm pretty confident. I need to figure out the debuginfo test issues and the mingw problem that Martin detailed still.

@DavidSpickett
Copy link
Collaborator

Thanks @DavidSpickett just to be sure, this is armv7 right, not AArch32?

This is an AArch32 container on Armv8. I'm not sure if lldb-server knows the difference though. Maybe it doesn't need to, and ptrace single step is done in the kernel which uses the h/w step.

But it looks like it may software single step on any Arm, v7 and v8...

debugserver uses the hardware breakpoint because we know we have that feature on our armv7 darwin cores, but it looks like lldb-server on armv7 linux will be entirely software based.

I don't know how much lldb distinguishes between AArch32 and armv7, but this is one important difference.

The single step code you posted doesn't seem to make a distinction. This is perhaps why we saw armv7 style behaviour on AArch32.

@jasonmolenda
Copy link
Collaborator Author

Thanks @DavidSpickett just to be sure, this is armv7 right, not AArch32?

This is an AArch32 container on Armv8. I'm not sure if lldb-server knows the difference though. Maybe it doesn't need to, and ptrace single step is done in the kernel which uses the h/w step.

Yeah I'm thinking lldb-server probably does a software instruction step on AArch32, even thought the hardware has instruction-step. Probably a small enough perf hit that there's little value in looking into that further.

@jasonmolenda
Copy link
Collaborator Author

The problem I now have is that debugserver doesn't do this nice behavior that lldb-server does, recognizing that a thread was instruction-stepping and rewriting the stop-reason as "trace" when it was actually "breakpoint-hit". I can't update the debugserver on the armv7 watches, and we need to support debugging on them for a while to come. I may need to make a generic code change where Thread::GetTemporaryResumeState can return both eStateStepping and its original pc it started the step from, only for the armv7 debugservers. With that small fix to NativeProcessLinux::MonitorBreakpoint outlined above, it should report the correct stop reason all by itself in this case.

jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Aug 21, 2024
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is
necessary: If you're stopped at a BreakpointSite that has not yet
executed, you `stepi`, you will hit the breakpoint and the pc will not
yet advance. This thread has not completed its stepi, and the thread
plan is still on the stack. If you then `continue` the thread, lldb will
now stop and say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution. I
discussed this with Jim, and trying to paper over this behavior will
lead to more complicated scenarios behaving non-intuitively. And mostly
it's the testsuite that was trying to instruction step past a breakpoint
and getting thrown off -- and I changed those tests to expect the new
behavior.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record
that.  When the thread resumes, if the pc is still at the same site, we
will continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or not),
the Process plugin should call Thread::SetThreadHitBreakpointSite().
When we go to resume the thread, we will push a step-over-breakpoint
ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

rdar://123942164
@jasonmolenda
Copy link
Collaborator Author

I opened a new PR that starts with this PR's patchset which I had to revert, and I will address the issues that need to be fixed there with separate additional commits in preparation for merging it again.
#105594

jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Aug 23, 2024
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is
necessary: If you're stopped at a BreakpointSite that has not yet
executed, you `stepi`, you will hit the breakpoint and the pc will not
yet advance. This thread has not completed its stepi, and the thread
plan is still on the stack. If you then `continue` the thread, lldb will
now stop and say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution. I
discussed this with Jim, and trying to paper over this behavior will
lead to more complicated scenarios behaving non-intuitively. And mostly
it's the testsuite that was trying to instruction step past a breakpoint
and getting thrown off -- and I changed those tests to expect the new
behavior.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record
that.  When the thread resumes, if the pc is still at the same site, we
will continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or not),
the Process plugin should call Thread::SetThreadHitBreakpointSite().
When we go to resume the thread, we will push a step-over-breakpoint
ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

rdar://123942164
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Feb 13, 2025
lldb today has two rules: When a thread stops at a BreakpointSite,
we set the thread's StopReason to be "breakpoint hit" (regardless
if we've actually hit the breakpoint, or if we've merely stopped
*at* the breakpoint instruction/point and haven't tripped it yet).
And second, when resuming a process, any thread sitting at a
BreakpointSite is silently stepped over the BreakpointSite -- because
we've already flagged the breakpoint hit when we stopped there
originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped,
etc.

Also, when a user adds a new breakpoint at $pc while stopped, or
changes $pc to be the address of a BreakpointSite, we will silently
step past that breakpoint when the process resumes. This is purely
a UX call, I don't think there's any person who wants to set a
breakpoint at $pc and then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`,
you will hit the breakpoint and the pc will not yet advance. This
thread has not completed its stepi, and the ThreadPlanStepInstruction
is still on the stack. If you then `continue` the thread, lldb will
now stop and say, "instruction step completed", one instruction
past the BreakpointSite. You can continue a second time to resume
execution.

The bugs driving this change are all from lldb dropping the real
stop reason for a thread and setting it to breakpoint-hit when that
was not the case. Jim hit one where we have an aarch64 watchpoint
that triggers one instruction before a BreakpointSite. On this arch
we are notified of the watchpoint hit after the instruction has
been unrolled -- we disable the watchpoint, instruction step,
re-enable the watchpoint and collect the new value. But now we're
on a BreakpointSite so the watchpoint-hit stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282
we attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had
with putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is
that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record
that.  When the thread resumes, if the pc is still at the same site, we
will continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or not),
the Process plugin should call Thread::SetThreadHitBreakpointSite().
When we go to resume the thread, we will push a step-over-breakpoint
ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
llvm#96260

but the CI bots and wider testing found a number of test case
failures that needed to be updated, I reverted it.  I've fixed all
of those issues in separate PRs and this change should run cleanly
on all the CI bots now.

rdar://123942164
jasonmolenda added a commit that referenced this pull request Feb 13, 2025
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
#96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 13, 2025
…(#126988)

lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
llvm/llvm-project#96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
@jasonmolenda
Copy link
Collaborator Author

For the record, this PR was finally re-landed as #126988 after four separate commits to address issues found in CI testing when I originally merged this.

@jasonmolenda jasonmolenda deleted the change-breakpoint-hit-behavior-in-lldb branch February 13, 2025 21:32
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Feb 13, 2025
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
llvm#96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
(cherry picked from commit b666ac3)
@mstorsjo
Copy link
Member

For the record, this PR was finally re-landed as #126988 after four separate commits to address issues found in CI testing when I originally merged this.

FWIW, it looks like my smoke tests are still passing fine after these changes have been relanded now: https://github.com/mstorsjo/llvm-mingw/actions/runs/13320200475/job/37215994344

Thanks for sorting it out!

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
llvm#96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
llvm#96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Mar 29, 2025
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
llvm#96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
(cherry picked from commit b666ac3)
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Apr 8, 2025
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
llvm#96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
(cherry picked from commit b666ac3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants