[lldb] Change lldb's breakpoint handling behavior #96260

jasonmolenda · 2024-06-21T00:56:00Z

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped at the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you stepi, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then continue the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior.

The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

If we've stopped at a BreakpointSite (whether we hit it or not), we call Thread::SetThreadStoppedAtBreakpointSite(pc) to record the state at the point when the thread stopped. (so we can detect newly-added breakpoints, or when the pc is changed to an instruction that is a BreakpointSite)
When we have actually hit a breakpoint (regardless of whether it is enablied for this thread), we call Thread::SetThreadHitBreakpointAtAddr(pc) so we know that it should be silently stepped past when we resume execution.

When resuming, we silently step over a breakpoint if we've hit it, or if it is newly added (or the pc was changed to an existing BreakpointSite).

The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch.

rdar://123942164

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite (whether we hit it or not), we call Thread::SetThreadStoppedAtBreakpointSite(pc) to record the state at the point when the thread stopped. (so we can detect newly-added breakpoints, or when the pc is changed to an instruction that is a BreakpointSite) 2. When we have actually hit a breakpoint, and it is enabled for this thread, we call Thread::SetThreadHitBreakpointAtAddr(pc) so we know that it should be silently stepped past when we resume execution. When resuming, we silently step over a breakpoint if we've hit it, or if it is newly added (or the pc was changed to an existing BreakpointSite). The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. rdar://123942164

llvmbot · 2024-06-21T00:56:30Z

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped at the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you stepi, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then continue the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior.

The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

If we've stopped at a BreakpointSite (whether we hit it or not), we call Thread::SetThreadStoppedAtBreakpointSite(pc) to record the state at the point when the thread stopped. (so we can detect newly-added breakpoints, or when the pc is changed to an instruction that is a BreakpointSite)
When we have actually hit a breakpoint, and it is enabled for this thread, we call Thread::SetThreadHitBreakpointAtAddr(pc) so we know that it should be silently stepped past when we resume execution.

When resuming, we silently step over a breakpoint if we've hit it, or if it is newly added (or the pc was changed to an existing BreakpointSite).

The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch.

rdar://123942164

Patch is 33.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96260.diff

8 Files Affected:

(modified) lldb/include/lldb/Target/Thread.h (+29)
(modified) lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp (+118-178)
(modified) lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp (+3-13)
(modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+36-82)
(modified) lldb/source/Plugins/Process/scripted/ScriptedThread.cpp (+9)
(modified) lldb/source/Target/Thread.cpp (+16-1)
(modified) lldb/test/API/functionalities/breakpoint/consecutive_breakpoints/TestConsecutiveBreakpoints.py (+19-7)
(modified) lldb/test/API/functionalities/breakpoint/step_over_breakpoint/TestStepOverBreakpoint.py (+5-1)

diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index c17bddf4d98b8..1e1aead896018 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -128,6 +128,11 @@ class Thread : public std::enable_shared_from_this<Thread>,
         register_backup_sp; // You need to restore the registers, of course...
     uint32_t current_inlined_depth;
     lldb::addr_t current_inlined_pc;
+    lldb::addr_t
+        hit_bp_at_addr; // Set to the address of a breakpoint that we have hit.
+    lldb::addr_t bpsite_at_stop_pc; // Set to the address of a breakpoint
+                                    // instruction that we have not yet hit, but
+                                    // will hit when we resume.
   };
 
   /// Constructor
@@ -377,6 +382,19 @@ class Thread : public std::enable_shared_from_this<Thread>,
 
   virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}
 
+  /// When a thread has executed/trapped a breakpoint, set the address of that
+  /// breakpoint so we know it has been hit already, and should be silently
+  /// stepped past on resume.
+  void SetThreadHitBreakpointAtAddr(lldb::addr_t pc) { m_hit_bp_at_addr = pc; }
+
+  /// When a thread stops at a breakpoint instruction/address, but has not yet
+  /// executed/triggered it, record that so we can detect when a user adds a
+  /// breakpoint (or changes a thread to a breakpoint site) and we need to
+  /// silently step past that when resuming.
+  void SetThreadStoppedAtBreakpointSite(lldb::addr_t pc) {
+    m_bpsite_at_stop_pc = pc;
+  }
+
   /// Whether this Thread already has all the Queue information cached or not
   ///
   /// A Thread may be associated with a libdispatch work Queue at a given
@@ -1311,6 +1329,17 @@ class Thread : public std::enable_shared_from_this<Thread>,
   bool m_should_run_before_public_stop;  // If this thread has "stop others" 
                                          // private work to do, then it will
                                          // set this.
+  lldb::addr_t m_hit_bp_at_addr;    // If this thread originally stopped at a
+                                    // breakpoint instruction, AND HIT IT,
+                                    // record the address of that breakpoint.
+                                    // LLDB_INVALID_ADDRESS if this thread did
+                                    // not stop at a breakpoint insn, or did not
+                                    // hit it yet.
+  lldb::addr_t m_bpsite_at_stop_pc; // If this thread originally stopped at a
+                                    // breakpoint site, record the address of
+                                    // that breakpoint site.
+                                    // LLDB_INVALID_ADDRESS if this thread did
+                                    // not stop at a breakpoint site.
   const uint32_t m_index_id; ///< A unique 1 based index assigned to each thread
                              /// for easy UI/command line access.
   lldb::RegisterContextSP m_reg_context_sp; ///< The register context for this
diff --git a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
index 25cee369d7ee3..8087ef1accde8 100644
--- a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
+++ b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
@@ -488,38 +488,6 @@ const char *StopInfoMachException::GetDescription() {
   return m_description.c_str();
 }
 
-static StopInfoSP GetStopInfoForHardwareBP(Thread &thread, Target *target,
-                                           uint32_t exc_data_count,
-                                           uint64_t exc_sub_code,
-                                           uint64_t exc_sub_sub_code) {
-  // Try hardware watchpoint.
-  if (target) {
-    // The exc_sub_code indicates the data break address.
-    WatchpointResourceSP wp_rsrc_sp =
-        target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
-            (addr_t)exc_sub_code);
-    if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
-      return StopInfo::CreateStopReasonWithWatchpointID(
-          thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
-    }
-  }
-
-  // Try hardware breakpoint.
-  ProcessSP process_sp(thread.GetProcess());
-  if (process_sp) {
-    // The exc_sub_code indicates the data break address.
-    lldb::BreakpointSiteSP bp_sp =
-        process_sp->GetBreakpointSiteList().FindByAddress(
-            (lldb::addr_t)exc_sub_code);
-    if (bp_sp && bp_sp->IsEnabled()) {
-      return StopInfo::CreateStopReasonWithBreakpointSiteID(thread,
-                                                            bp_sp->GetID());
-    }
-  }
-
-  return nullptr;
-}
-
 #if defined(__APPLE__)
 const char *
 StopInfoMachException::MachException::Name(exception_type_t exc_type) {
@@ -633,171 +601,143 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
     }
     break;
 
+  // [exc_type, exc_code, exc_sub_code, exc_sub_sub_code]
+  //
+  // Instruction step:
+  //   [6, 1, 0]
+  //   Intel KDP [6, 3, ??]
+  //   armv7 [6, 0x102, <stop-pc>]  Same as software breakpoint!
+  //
+  // Software breakpoint:
+  //   x86 [6, 2, 0]
+  //   Intel KDP [6, 2, <bp-addr + 1>]
+  //   arm64 [6, 1, <bp-addr>]
+  //   armv7 [6, 0x102, <bp-addr>]  Same as instruction step!
+  //
+  // Hardware breakpoint:
+  //   x86 [6, 1, <bp-addr>, 0]
+  //   x86/Rosetta not implemented, see software breakpoint
+  //   arm64 [6, 1, <bp-addr>]
+  //   armv7 not implemented, see software breakpoint
+  //
+  // Hardware watchpoint:
+  //   x86 [6, 1, <accessed-addr>, 0] (both Intel hw and Rosetta)
+  //   arm64 [6, 0x102, <accessed-addr>, 0]
+  //   armv7 [6, 0x102, <accessed-addr>, 0]
+  //
+  // arm64 BRK instruction (imm arg not reflected in the ME)
+  //   [ 6, 1, <addr-of-BRK-insn>]
+  //
+  // In order of codes mach exceptions:
+  //   [6, 1, 0] - instruction step
+  //   [6, 1, <bp-addr>] - hardware breakpoint or watchpoint
+  //
+  //   [6, 2, 0] - software breakpoint
+  //   [6, 2, <bp-addr + 1>] - software breakpoint
+  //
+  //   [6, 3] - instruction step
+  //
+  //   [6, 0x102, <stop-pc>] armv7 instruction step
+  //   [6, 0x102, <bp-addr>] armv7 software breakpoint
+  //   [6, 0x102, <accessed-addr>, 0] arm64/armv7 watchpoint
   case 6: // EXC_BREAKPOINT
   {
-    bool is_actual_breakpoint = false;
-    bool is_trace_if_actual_breakpoint_missing = false;
-    switch (cpu) {
-    case llvm::Triple::x86:
-    case llvm::Triple::x86_64:
-      if (exc_code == 1) // EXC_I386_SGL
-      {
-        if (!exc_sub_code) {
-          // This looks like a plain trap.
-          // Have to check if there is a breakpoint here as well.  When you
-          // single-step onto a trap, the single step stops you not to trap.
-          // Since we also do that check below, let's just use that logic.
-          is_actual_breakpoint = true;
-          is_trace_if_actual_breakpoint_missing = true;
-        } else {
-          if (StopInfoSP stop_info =
-                  GetStopInfoForHardwareBP(thread, target, exc_data_count,
-                                           exc_sub_code, exc_sub_sub_code))
-            return stop_info;
-        }
-      } else if (exc_code == 2 || // EXC_I386_BPT
-                 exc_code == 3)   // EXC_I386_BPTFLT
-      {
-        // KDP returns EXC_I386_BPTFLT for trace breakpoints
-        if (exc_code == 3)
-          is_trace_if_actual_breakpoint_missing = true;
-
-        is_actual_breakpoint = true;
-        if (!pc_already_adjusted)
-          pc_decrement = 1;
-      }
-      break;
+    bool stopped_by_hitting_breakpoint = false;
+    bool stopped_by_completing_stepi = false;
+    bool stopped_watchpoint = false;
+    std::optional<addr_t> value;
+
+    // exc_code 1
+    if (exc_code == 1 && exc_sub_code == 0)
+      stopped_by_completing_stepi = true;
+    if (exc_code == 1 && exc_sub_code != 0) {
+      stopped_by_hitting_breakpoint = true;
+      stopped_watchpoint = true;
+      value = exc_sub_code;
+    }
 
-    case llvm::Triple::arm:
-    case llvm::Triple::thumb:
-      if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
-      {
-        // LWP_TODO: We need to find the WatchpointResource that matches
-        // the address, and evaluate its Watchpoints.
-
-        // It's a watchpoint, then, if the exc_sub_code indicates a
-        // known/enabled data break address from our watchpoint list.
-        lldb::WatchpointSP wp_sp;
-        if (target)
-          wp_sp = target->GetWatchpointList().FindByAddress(
-              (lldb::addr_t)exc_sub_code);
-        if (wp_sp && wp_sp->IsEnabled()) {
-          return StopInfo::CreateStopReasonWithWatchpointID(thread,
-                                                            wp_sp->GetID());
-        } else {
-          is_actual_breakpoint = true;
-          is_trace_if_actual_breakpoint_missing = true;
-        }
-      } else if (exc_code == 1) // EXC_ARM_BREAKPOINT
-      {
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-      } else if (exc_code == 0) // FIXME not EXC_ARM_BREAKPOINT but a kernel
-                                // is currently returning this so accept it
-                                // as indicating a breakpoint until the
-                                // kernel is fixed
-      {
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-      }
-      break;
+    // exc_code 2
+    if (exc_code == 2 && exc_sub_code == 0)
+      stopped_by_hitting_breakpoint = true;
+    if (exc_code == 2 && exc_sub_code != 0) {
+      stopped_by_hitting_breakpoint = true;
+      // Intel KDP software breakpoint
+      if (!pc_already_adjusted)
+        pc_decrement = 1;
+    }
 
-    case llvm::Triple::aarch64_32:
-    case llvm::Triple::aarch64: {
-      // xnu describes three things with type EXC_BREAKPOINT:
-      //
-      //   exc_code 0x102 [EXC_ARM_DA_DEBUG], exc_sub_code addr-of-insn
-      //      Watchpoint access.  exc_sub_code is the address of the
-      //      instruction which trigged the watchpoint trap.
-      //      debugserver may add the watchpoint number that was triggered
-      //      in exc_sub_sub_code.
-      //
-      //   exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code 0
-      //      Instruction step has completed.
-      //
-      //   exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code address-of-instruction
-      //      Software breakpoint instruction executed.
-
-      if (exc_code == 1 && exc_sub_code == 0) // EXC_ARM_BREAKPOINT
-      {
-        // This is hit when we single instruction step aka MDSCR_EL1 SS bit 0
-        // is set
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-        if (thread.GetTemporaryResumeState() != eStateStepping)
-          not_stepping_but_got_singlestep_exception = true;
-      }
-      if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
-      {
-        // LWP_TODO: We need to find the WatchpointResource that matches
-        // the address, and evaluate its Watchpoints.
-
-        // It's a watchpoint, then, if the exc_sub_code indicates a
-        // known/enabled data break address from our watchpoint list.
-        lldb::WatchpointSP wp_sp;
-        if (target)
-          wp_sp = target->GetWatchpointList().FindByAddress(
-              (lldb::addr_t)exc_sub_code);
-        if (wp_sp && wp_sp->IsEnabled()) {
-          return StopInfo::CreateStopReasonWithWatchpointID(thread,
-                                                            wp_sp->GetID());
-        }
-        // EXC_ARM_DA_DEBUG seems to be reused for EXC_BREAKPOINT as well as
-        // EXC_BAD_ACCESS
-        if (thread.GetTemporaryResumeState() == eStateStepping)
-          return StopInfo::CreateStopReasonToTrace(thread);
+    // exc_code 3
+    if (exc_code == 3)
+      stopped_by_completing_stepi = true;
+
+    // exc_code 0x102
+    if (exc_code == 0x102 && exc_sub_code != 0) {
+      if (cpu == llvm::Triple::arm || cpu == llvm::Triple::thumb) {
+        stopped_by_hitting_breakpoint = true;
+        stopped_by_completing_stepi = true;
       }
-      // It looks like exc_sub_code has the 4 bytes of the instruction that
-      // triggered the exception, i.e. our breakpoint opcode
-      is_actual_breakpoint = exc_code == 1;
-      break;
+      stopped_watchpoint = true;
+      value = exc_sub_code;
     }
 
-    default:
-      break;
-    }
+    // Go through the reasons why we stopped, starting
+    // with the easiest to detect unambiguously.  We
+    // may have multiple possible reasons set.
 
-    if (is_actual_breakpoint) {
+    if (stopped_by_hitting_breakpoint) {
+      ProcessSP process_sp(thread.GetProcess());
       RegisterContextSP reg_ctx_sp(thread.GetRegisterContext());
-      addr_t pc = reg_ctx_sp->GetPC() - pc_decrement;
-
-      ProcessSP process_sp(thread.CalculateProcess());
-
-      lldb::BreakpointSiteSP bp_site_sp;
-      if (process_sp)
+      BreakpointSiteSP bp_site_sp;
+      addr_t pc = LLDB_INVALID_ADDRESS;
+      if (reg_ctx_sp)
+        pc = reg_ctx_sp->GetPC() - pc_decrement;
+      else if (value)
+        pc = *value;
+
+      if (value)
+        bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(*value);
+      if (!bp_site_sp && reg_ctx_sp) {
         bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(pc);
+      }
       if (bp_site_sp && bp_site_sp->IsEnabled()) {
-        // Update the PC if we were asked to do so, but only do so if we find
-        // a breakpoint that we know about cause this could be a trap
-        // instruction in the code
-        if (pc_decrement > 0 && adjust_pc_if_needed)
-          reg_ctx_sp->SetPC(pc);
-
-        // If the breakpoint is for this thread, then we'll report the hit,
-        // but if it is for another thread, we can just report no reason.  We
-        // don't need to worry about stepping over the breakpoint here, that
-        // will be taken care of when the thread resumes and notices that
-        // there's a breakpoint under the pc. If we have an operating system
-        // plug-in, we might have set a thread specific breakpoint using the
-        // operating system thread ID, so we can't make any assumptions about
-        // the thread ID so we must always report the breakpoint regardless
-        // of the thread.
+        // If we have an operating system plug-in, we might have set a thread
+        // specific breakpoint using the operating system thread ID, so we
+        // can't make any assumptions about the thread ID so we must always
+        // report the breakpoint regardless of the thread.
         if (bp_site_sp->ValidForThisThread(thread) ||
-            thread.GetProcess()->GetOperatingSystem() != nullptr)
+            thread.GetProcess()->GetOperatingSystem() != nullptr) {
+          // Update the PC if we were asked to do so, but only do so if we find
+          // a breakpoint that we know about cause this could be a trap
+          // instruction in the code
+          if (pc_decrement > 0 && adjust_pc_if_needed && reg_ctx_sp)
+            reg_ctx_sp->SetPC(pc);
+          thread.SetThreadHitBreakpointAtAddr(pc);
           return StopInfo::CreateStopReasonWithBreakpointSiteID(
               thread, bp_site_sp->GetID());
-        else if (is_trace_if_actual_breakpoint_missing)
-          return StopInfo::CreateStopReasonToTrace(thread);
-        else
+        } else {
+          thread.SetThreadHitBreakpointAtAddr(pc);
           return StopInfoSP();
+        }
       }
+    }
 
-      // Don't call this a trace if we weren't single stepping this thread.
-      if (is_trace_if_actual_breakpoint_missing &&
-          thread.GetTemporaryResumeState() == eStateStepping) {
-        return StopInfo::CreateStopReasonToTrace(thread);
+    if (stopped_watchpoint && value) {
+      WatchpointResourceSP wp_rsrc_sp =
+          target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
+              *value);
+      if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
+        return StopInfo::CreateStopReasonWithWatchpointID(
+            thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
       }
     }
+
+    if (stopped_by_completing_stepi) {
+      if (thread.GetTemporaryResumeState() != eStateStepping)
+        not_stepping_but_got_singlestep_exception = true;
+      else
+        return StopInfo::CreateStopReasonToTrace(thread);
+    }
+
   } break;
 
   case 7:  // EXC_SYSCALL
diff --git a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
index f383b3d40a4f3..c4005a1f51150 100644
--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -382,19 +382,8 @@ void ProcessWindows::RefreshStateAfterStop() {
     RegisterContextSP register_context = stop_thread->GetRegisterContext();
     const uint64_t pc = register_context->GetPC();
     BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
-    if (site && site->ValidForThisThread(*stop_thread)) {
-      LLDB_LOG(log,
-               "Single-stepped onto a breakpoint in process {0} at "
-               "address {1:x} with breakpoint site {2}",
-               m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
-               site->GetID());
-      stop_info = StopInfo::CreateStopReasonWithBreakpointSiteID(*stop_thread,
-                                                                 site->GetID());
-      stop_thread->SetStopInfo(stop_info);
-
-      return;
-    }
-
+    if (site)
+      stop_thread->SetThreadStoppedAtBreakpointSite(pc);
     auto *reg_ctx = static_cast<RegisterContextWindows *>(
         stop_thread->GetRegisterContext().get());
     uint32_t slot_id = reg_ctx->GetTriggeredHardwareBreakpointSlotId();
@@ -462,6 +451,7 @@ void ProcessWindows::RefreshStateAfterStop() {
         stop_info = StopInfo::CreateStopReasonWithBreakpointSiteID(
             *stop_thread, site->GetID());
         register_context->SetPC(pc);
+        stop_thread->SetThreadHitBreakpointAtAddr(pc);
       } else {
         LLDB_LOG(log,
                  "Breakpoint site {0} is not valid for this thread, "
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index a5a731981299f..bad42d44503fd 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -1598,29 +1598,8 @@ bool ProcessGDBRemote::CalculateThreadStopInfo(ThreadGDBRemote *thread) {
     // that have stop reasons, and if there is no entry for a thread, then it
     // has no stop reason.
     thread->GetRegisterContext()->InvalidateIfNeeded(true);
-    if (!GetThreadStopInfoFromJSON(thread, m_jstopinfo_sp)) {
-      // If a thread is stopped at a breakpoint site, set that as the stop
-      // reason even if it hasn't executed the breakpoint instruction yet.
-      // We will silently step over the breakpoint when we resume execution
-      // and miss the fact that this thread hit the breakpoint.
-      const size_t num_thread_ids = m_thread_ids.size();
-      for (size_t i = 0; i < num_thread_ids; i++) {
-        if (m_thread_ids[i] == thread->GetID() && m_thread_pcs.size() > i) {
-          addr_t pc = m_thread_pcs[i];
-          lldb::BreakpointSiteSP bp_site_sp =
-              thread->GetProcess()->GetBreakpointSiteList().FindByAddress(pc);
-          if (bp_site_sp) {
-            if (bp_site_sp->ValidForThisThread(*thread)) {
-              thread->SetStopInfo(
-                  StopInfo::CreateStopReasonWithBreakpointSiteID(
-                      *thread, bp_site_sp->GetID()));
-              return true;
-            }
-          }
-        }
-      }
+    if (!GetThreadStopInfoFromJSON(thread, m_jstopinfo_sp))
       thread->SetStopInfo(StopInfoSP());
-    }
     return true;
   }
 
@@ -1729,6 +1708,12 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
     if (ThreadSP memory_thread_sp = m_thread_list.GetBackingThread(thread_sp))
       thread_sp = memory_thread_sp;
 
+    addr_t pc = thread_sp->GetRegisterContext()->GetPC();
+    BreakpointSiteSP bp_site_sp =
+        thread_sp->GetProcess()...
[truncated]

jasonmolenda · 2024-06-21T01:01:27Z

This patch definitely needs review from Jim Ingham, who is out this week. @ZequanWu and @cmtice FYI.

I tested this on arm64 macOS and on aarch64 Ubuntu - the important Process changes are were in ProcessGDBRemote, StopInfoMachException, and in ProcessWindows, I believe my testing will have exercised the first two set of changes (the largest and most complex). I updated ProcessWindows as I believe it needs to be updated, but don't have a way of testing it short of merging the PR and seeing what happens to the windows CI bot.

@AlexK0 if you have a setup to build and run the testsuite on windows, could you try it with this patch some time when you're able? I can't see this being merged for at least another 5-6 days, there's no rush. You can download the unidiff style diff of the patch from https://patch-diff.githubusercontent.com/raw/llvm/llvm-project/pull/96260.diff

There's a comment, ``` The current PC is AFTER the BP opcode, on all architectures ``` and code to decrement the pc value before looking for a BreakpointSite, if this is how Windows really behaves with breakpoints then we don't need to silently step past the breakpoint, windows has already done that. I was originally fixing a mistake where I called Thread::SetThreadHitBreakpointAtAddr(pc) only if the breakpoint was valid for this thread. But we need to mark the breakpoint that way regardless of whether it is valid for this thread or not -- for a bp not on this thread, we still need to silently step past it. But as I was making that change, I suspect none of this is necessary on Windows, so I'm removing it.

labath

I like this change a lot, but I'm wondering if there's a clearer way to store the intermediate state. It seems like this info could go into the StopInfo class, which would have the benefit of automatically being backed up when doing expression evaluation. Is there a reason to not do that?

If the reason is that the StopInfo of the thread has been cleared by the time we call SetupForResume (and there's no good way to change that), maybe we could pass the old stop info to the function as an argument, since that's effectively what the function is doing -- deciding how to resume the thread based on how it was previously stopped.

Regarding the scenarios you mention in the description (moving the PC to a breakpoint site, setting a breakpoint site at the current pc), are these tested by any of the existing tests?

Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state

I believe this was fixed by 60b90b5.

labath · 2024-06-21T09:35:08Z

lldb/include/lldb/Target/Thread.h

@@ -377,6 +382,19 @@ class Thread : public std::enable_shared_from_this<Thread>,

  virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}

+  /// When a thread has executed/trapped a breakpoint, set the address of that
+  /// breakpoint so we know it has been hit already, and should be silently
+  /// stepped past on resume.


I'm a little unclear as to why do we need to store this separately. Shouldn't this already be stored in the stop reason of the thread (i.e., StopInfoBreakpoint implies we've hit a breakpoint, and the breakpoint site within it should give us the PC value)?

labath · 2024-06-21T09:52:30Z

lldb/include/lldb/Target/Thread.h

+  /// When a thread stops at a breakpoint instruction/address, but has not yet
+  /// executed/triggered it, record that so we can detect when a user adds a
+  /// breakpoint (or changes a thread to a breakpoint site) and we need to
+  /// silently step past that when resuming.
+  void SetThreadStoppedAtBreakpointSite(lldb::addr_t pc) {
+    m_bpsite_at_stop_pc = pc;
+  }


I'm also finding it hard to wrap my head around the meaning of this variable. If I understand correctly it tells us: the pc that we've stopped at; that there was a breakpoint site there at the time of the stop; and we did not hit that site.

I'm wondering if it would be clearer if we unpacked that. What if we:

unconditionally stored the PC of the last stop. Maybe this could be even a part of the StopInfo class, as I think it could be useful to see the PC value at the time of the stop, even if the user changed the PC afterwards.

a flag indicating whether we stopped at a breakpoint site, regardless of whether we've hit it or not (per the previous comment, that could be indicated by the stop reason). This doesn't really look like it belongs to StopInfo class, but I think I'd be fine with putting it there for collocation purposes.

AlexK0 · 2024-06-21T11:02:27Z

@AlexK0 if you have a setup to build and run the testsuite on windows, could you try it with this patch some time when you're able? I can't see this being merged for at least another 5-6 days, there's no rush. You can download the unidiff style diff of the patch from https://patch-diff.githubusercontent.com/raw/llvm/llvm-project/pull/96260.diff

@jasonmolenda I checked the tests with VS2022/x86-64/win11 in a debug build with assertions enabled. Unfortunately, the main branch is a bit unstable, and some tests constantly fail :( . Nonetheless, I noticed one new failure with the applied patch:

functionalities/breakpoint/consecutive_breakpoints/TestConsecutiveBreakpoints.py

backtrace:

FAIL: test_single_step_thread_specific (TestConsecutiveBreakpoints.ConsecutiveBreakpointsTestCase)
   Test that single step stops, even though the second breakpoint is not valid.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\packages\Python\lldbsuite\test\decorators.py", line 451, in wrapper
    return func(self, *args, **kwargs)
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\test\API\functionalities\breakpoint\consecutive_breakpoints\TestConsecutiveBreakpoints.py", line 121, in test_single_step_thread_specific
    self.finish_test()
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\test\API\functionalities\breakpoint\consecutive_breakpoints\TestConsecutiveBreakpoints.py", line 42, in finish_test
    self.assertState(self.process.GetState(), lldb.eStateExited)
  File "D:\Projects\github\llvm-project-jasonmolenda\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 2578, in assertState
    self.fail(self._formatMessage(msg, error))
AssertionError: stopped (5) != exited (10)
Config=x86_64-D:\Projects\github\llvm-project-jasonmolenda\build\bin\clang.exe
----------------------------------------------------------------------

jasonmolenda · 2024-06-21T21:11:29Z

@AlexK0 if you have a setup to build and run the testsuite on windows, could you try it with this patch some time when you're able? I can't see this being merged for at least another 5-6 days, there's no rush. You can download the unidiff style diff of the patch from https://patch-diff.githubusercontent.com/raw/llvm/llvm-project/pull/96260.diff

@jasonmolenda I checked the tests with VS2022/x86-64/win11 in a debug build with assertions enabled. Unfortunately, the main branch is a bit unstable, and some tests constantly fail :( . Nonetheless, I noticed one new failure with the applied patch:

functionalities/breakpoint/consecutive_breakpoints/TestConsecutiveBreakpoints.py

These TestConcurrent / TestConsecutiveBreakpoints tests are great for finding corner cases with these changes, thanks so much. I was a little uncertain about one part of my ProcessWindows change, where the pc is past the breakpoint when a software breakpoint instruction is used, on Windows. For a moment, I thought, "oh, I don't need to record that we hit the breakpoint because we're already past it and we don't need to instruction past it" but of course that was wrong -- ProcessWindows::RefreshStateAfterStop decrements the $pc value by the size of its breakpoint instruction, which is necessary because e.g. the size of an x86 breakpoint instruction is 1 byte (0xcc) but x86_64 instructions can be between 1 to 15 bytes long, so stopping 1 byte after the BreakpointSite means we are possibly in the middle of the real instruction. We must set the pc to the BreakpointSite address and use lldb's normal logic.

Anyway, tl;dr, I believe this will fix it:

--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -442,6 +442,7 @@ void ProcessWindows::RefreshStateAfterStop() {
                m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
                site->GetID());
 
+      stop_thread->SetThreadHitBreakpointAtAddr(pc);
       if (site->ValidForThisThread(*stop_thread)) {
         LLDB_LOG(log,
                  "Breakpoint site {0} is valid for this thread ({1:x}), "

I'll push it right now. Sorry for my confusion, and thanks again for looking at it, this was a big help.

ProcessWindows plugin. I misread this method's behavior initially, when I saw that the stop pc is past the breakpoint instruction. But we're not ready to resume the process now - we need to set the pc back to the start of the breakpoint, then do lldb's normal "disable breakpoint, instruction step, re-insert breakpoint" behavior just like all other targets. Re-adding the call to register the thread as having hit the breakpoint after testing help from Aleksandr Korepanov.

AlexK0 · 2024-06-24T09:59:06Z

Anyway, tl;dr, I believe this will fix it:

--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -442,6 +442,7 @@ void ProcessWindows::RefreshStateAfterStop() {
                m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
                site->GetID());
 
+      stop_thread->SetThreadHitBreakpointAtAddr(pc);
       if (site->ValidForThisThread(*stop_thread)) {
         LLDB_LOG(log,
                  "Breakpoint site {0} is valid for this thread ({1:x}), "

I'll push it right now. Sorry for my confusion, and thanks again for looking at it, this was a big help.

@jasonmolenda unfortunately, the test still fails :(
I did a little investigation, here is a possible patch that fixes the test:

diff --git a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
index 231b22f5f189..fb0404f1c4b9 100644
--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -382,7 +382,7 @@ void ProcessWindows::RefreshStateAfterStop() {
     RegisterContextSP register_context = stop_thread->GetRegisterContext();
     const uint64_t pc = register_context->GetPC();
     BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
-    if (site)
+    if (site && site->ValidForThisThread(*stop_thread))
       stop_thread->SetThreadStoppedAtBreakpointSite(pc);
     auto *reg_ctx = static_cast<RegisterContextWindows *>(
         stop_thread->GetRegisterContext().get());

jasonmolenda · 2024-06-24T22:58:43Z

Thanks @AlexK0 ! After Pavel's suggestion, I need to do a minor bit of changes to the patch soon I think, for all platforms, so I'll see if this is easier to express in ProcessWindows that way.

I've was thinking about @labath 's suggestion over the weekend and I agree that the way the patch is written today isn't ideal about tracking state at stop-time and resume-time, as I developed the change I had to change its behavior and I didn't rethink what I was doing.

The original goal was: If we hit a breakpoint (whether it is enabled for this thread or not), when we Resume the thread, we silently instruction step past that breakpoint instruction before resuming. Then I realized the problem of "user sets a breakpoint at $pc, or changes $pc to a BreakpointSite" and I want to also silently step past the breakpoint in that case, so I added a separate variable to track that.

Looking at the total requirements, the rule can be condensed to: If this thread stopped at a BreakpointSite which it did not execute, we should execute the breakpoint on Resume. In any other case, a thread sitting at a BreakpointSite should silently step past it and resume execution.

So when a thread stops at a BreakpointSite that has executed (whether it is valid for this thread or not), we record nothing. When a thread stops at a BreakpointSite that has not executed, we need to record the address of that BreakpointSite. And on Resume, if the thread is still at this same address, we want to hit the breakpoint.

I don't think we can store this information in the StopInfo easily, because a thread with no stop reason (e.g. a multi-threaded program that hits a breakpoint on one thread, but the others were executing normally) wouldn't have a way to record that they were sitting at a BreakpointSite that needed to be hit still.

I outlined the idea of storing this data in the StopInfo to @jimingham earlier today briefly, and he agreed that we should have StopInfos around until we Resume, but I hadn't thought this through enough to account for threads with no stop reason. I'll check in with him about the details on this before I make the change, but I think I need to keep tracking this in the Thread object.

labath · 2024-06-25T06:56:25Z

Thanks for the explanation, Jason. I believe I understand the problem now. Here, we're concerned with whether the physical breakpoint opcode got executed, and not about the higher level machinery (conditional breakpoints, ignore count, ...), which builds on top of that and can cause a breakpoint hit to "disappear". Since StopInfo tracks the "logical" stop reason of the process (after all of these filters are applied), using it to track the "physical" one would be tricky at best.

Looking at the total requirements, the rule can be condensed to: If this thread stopped at a BreakpointSite which it did not execute, we should execute the breakpoint on Resume. In any other case, a thread sitting at a BreakpointSite should silently step past it and resume execution.

So, IIUC, there would just be one field (the "unexecuted breakpoint site") instead of two. I'll take that, I think having just one field to think about would make it easier to understand the overall logic.

track if we're at a BreakpointSite that we haven't hit yet, when we stop. This adds two methods that the Process plugins need to call. When a thread has stopped at an enabled BreakpointSite, but has not yet hit the breakpoint, it will call Thread::SetThreadStoppedAtUnexecutedBP(pc). When a thread has hit a breakpoint, it will call Thread::SetThreadHitBreakpointSite(). These are mutually exclusive, and the last one called wins. So a Process plugin may first detect that it is at an enabled BreakpointSite, call SetThreadStoppedAtUnexecutedBP(pc), and later realize breakpoint was actually hit, and update the state by calling SetThreadHitBreakpointSite(). When resuming a thread, if SetThreadStoppedAtUnexecutedBP(pc) was called and the thread's pc is still the same as that value when the thread stopped, we will execute the Breakpoint and stop again with a breakpoint-hit stop-reason. Otherwise we will silently step past the breakpoint. I've tested this update to the patch on macOS, I need to test on aarch64 ubuntu still, and ask Aleksandr if it's possible to check on Windows before we try to merge the PR.

jasonmolenda · 2024-07-02T04:02:34Z

Hi @AlexK0 I restructured this PR to track everything with a single piece of state in the Thread object, and redid all three Process plugins. I'm feeling a lot better about my changes to ProcessWindows now. I haven't tested the aarch64 ubuntu yet, I will do that tomorrow and if that looks good, I'm going to ask for more careful reviews of the PR. No rush at all, but if you have a chance to try this version on Windows I would really appreciate it. Thanks!

was enabled.

AlexK0 · 2024-07-06T08:15:21Z

lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp

-    RegisterContextSP register_context = stop_thread->GetRegisterContext();
+  // If we're at a BreakpointSite, mark this as an Unexecuted Breakpoint.
+  // We'll clear that state if we've actually executed the breakpoint.
+  if (RegisterContextSP register_context = stop_thread->GetRegisterContext()) {


Moving pc into a scope causes a compilation error at the line with the logging, here: ProcessWindows.cpp:396

lldb\source\Plugins\Process\Windows\Common\ProcessWindows.cpp(396): error C2065: 'pc': undeclared identifier

AlexK0 · 2024-07-06T08:37:28Z

@jasonmolenda, I checked the tests with the latest fix.
There is a minor compilation error: #96260 (review)
If fix it (I just removed pc from the logging), all breakpoint tests work, and I don’t see any new failures compared to the main branch.

I'm setting `register_context`, `site`, and `pc` before either of the EXCEPTION_SINGLE_STEP or EXCEPTION_BREAKPOINT blocks, and they are used/updated in both. Thanks again to Aleksandr for helping me to avoid extra round-trips through the CI, and testing that it behaves correctly once these are straightened out.

jasonmolenda · 2024-07-09T01:58:42Z

Thanks again for all the help @AlexK0 I pushed an update that should compile correctly with my update. It's great to hear that the testsuite looks clean -- that's the part that worried me the most about these changes.

I finished an aarch64 Ubuntu testsuite run and it is clean. I think this patch is about ready for review by @jimingham when he has time.

jimingham · 2024-07-12T18:14:39Z

lldb/include/lldb/Target/Thread.h

@@ -377,6 +380,26 @@ class Thread : public std::enable_shared_from_this<Thread>,

  virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}

+  /// When a thread stops at an enabled BreakpointSite that has not exectued,


exectued -> executed

jimingham · 2024-07-12T18:16:03Z

lldb/include/lldb/Target/Thread.h

+  /// When a thread stops at an enabled BreakpointSite that has not exectued,
+  /// the Process plugin should call SetThreadStoppedAtUnexecutedBP(pc).
+  /// If that BreakpointSite was actually triggered (the instruction was
+  /// executed, for a software breakpoint), regardless of wheether the


wheether -> whether

jimingham · 2024-07-12T18:18:21Z

lldb/include/lldb/Target/Thread.h

+  /// If that BreakpointSite was actually triggered (the instruction was
+  /// executed, for a software breakpoint), regardless of wheether the
+  /// breakpoint is valid for this thread, SetThreadHitBreakpointSite()
+  /// should be called to clear that state.


"to clear that state" is confusing. A thread could stop because it executed the breakpoint trap without ever having stopped "with the PC at the unexecuted trap". That's actually the more common case. So you won't be clearing that state usually. Maybe "to clear that state" -> "to record that fact".

jimingham · 2024-07-12T18:24:51Z

lldb/include/lldb/Target/Thread.h

@@ -128,6 +128,9 @@ class Thread : public std::enable_shared_from_this<Thread>,
        register_backup_sp; // You need to restore the registers, of course...
    uint32_t current_inlined_depth;
    lldb::addr_t current_inlined_pc;
+    lldb::addr_t stopped_at_unexecuted_bp; // Set to the address of a breakpoint


This is the stored version of the Thread ivar with roughly the same name. It's better documented there, maybe here you can just refer to that ivar for docs?

jimingham · 2024-07-12T18:25:33Z

lldb/include/lldb/Target/Thread.h

+  lldb::addr_t
+      m_stopped_at_unexecuted_bp; // When the thread stops at a
+                                  // breakpoint instruction/BreakpointSite
+                                  // but has not yet hit/exectued it,


jimingham · 2024-07-12T18:41:19Z

lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp

+    bool stopped_by_hitting_breakpoint = false;
+    bool stopped_by_completing_stepi = false;
+    bool stopped_watchpoint = false;
+    std::optional<addr_t> value;


Is there a better name for this than "value" - that's pretty generic.

jimingham · 2024-07-12T18:44:47Z

lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp

+    std::optional<addr_t> value;
+
+    // exc_code 1
+    if (exc_code == 1 && exc_sub_code == 0)


Despite our fear of indentation, since the two if's exhaust the exe_code == 1 possibilities, this would be clearer as:

if (exc_code == 1) {
if (exc_sub_code == 0) {
} else {
}
}

jimingham · 2024-07-12T18:50:42Z

lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp

+    if (exc_code == 1 && exc_sub_code == 0)
+      stopped_by_completing_stepi = true;
+    if (exc_code == 1 && exc_sub_code != 0) {
+      stopped_by_hitting_breakpoint = true;


On the face of it, it seems weird that you are asserting that this is both a "breakpoint" and a "watchpoint" hit. Later on you'll tell the difference based on the exe_sub_code stored in value. This would be easier to read if there were a comment explaining that.

jimingham · 2024-07-12T18:51:46Z

lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp

@@ -633,171 +613,142 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
    }
    break;

+  // [exc_type, exc_code, exc_sub_code, exc_sub_sub_code]


This table is great, but it deserves a line saying what it is.

jasonmolenda · 2024-08-06T22:23:02Z

Ah, I see, NativeProcessLinux::Resume will use a software single step (emulate current instruction, insert breakpoint, continue) for any processor without a hardware instruction step and this includes

bool NativeProcessLinux::SupportHardwareSingleStepping() const {
  if (m_arch.IsMIPS() || m_arch.GetMachine() == llvm::Triple::arm ||
      m_arch.GetTriple().isRISCV() || m_arch.GetTriple().isLoongArch())

debugserver uses the hardware breakpoint because we know we have that feature on our armv7 darwin cores, but it looks like lldb-server on armv7 linux will be entirely software based.

jasonmolenda · 2024-08-06T22:57:22Z

With my patch from this PR applied, the Mach Exception from debugserver is interpreted the same way -- it could be a watchpoint, a breakpoint, or an "instruction step", we use the address included in the ME to say which one it is, and in the ambiguous instance of instruction stepping to a BreakpointSite, we will incorrectly say we hit the breakpoint.

I haven't found the mechanism that an armv7 lldb-server would report reason:trace when it emulated the current instruction, put a breakpoint on the next instruction we'll execute, and 'continued' the thread when we resume. But I might have just missed something in ProcessNativeLinux where it knows it was instruction stepping. (but in the old stepping model in lldb, when you're at a BreakpointSite, the stop reason would be "breakpoint hit" even if it hadn't actually been hit)

jasonmolenda · 2024-08-07T00:30:27Z

I built TestStepOverBreakpoint.py for armv7 and ran it against an armv7k apple watch (the only 32-bit target we still support) with debugserver. Four breakpoints on the source lines in main(), the middle two breakpoints have a condition of false, and the test counts the number of instructions and expects that many stops while it next-instruction's through, and doesn't expect to see breakpoint-hit notifications on the two condition==false breakpoints. I only ran lldb on the test program by hand but it looks like it worked correctly, e.g. one of my condition==false breakpoints is on 0x27bf9a and,

<  16> send packet: $vCont;s:343b#ee
(lldb)  < 369:368> read packet: $T05thread:343b;threads:343b;thread-pcs:27bf9a;[registers];metype:6;mecount:2;medata:102;medata:27bf9a;memory:0x27b8fc48=48fdb827d7912c00;memory:0x27b8fd48=0000000000000000;#00

(that's our magical "could be a watchpoint, could be a breakpoint, could be instruction step Mach Exception" that debugserver relays up from the darwin kernel)

 <  15> send packet: $x2bc000,200#bd
 <  65:516> read packet: $000000000000[...]

(lldb allocated memory to store 'false' lol)

Process 257 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0027bf9a a.out`main(argc=1, argv=0x27b8fd68) at main.cpp:9:9
-> 9   	    a = b + func(); // breakpoint_3

lldb has told us that we stopped with instruction-step because the breakpoint is not supposed to stop here (the thread would have resumed if I hadn't been ni'ing), very nice.

But of course the details of what ProcessGDBRemote got back for the reason: field and how that was handled are likely where the problem is -- and debugserver reporting mach exceptions is a completely different mechanism. Maybe I should just read through how this is handled in ProcessGDBRemote if we have the same ambiguity when breakpoints are used for instruction stepping.

labath · 2024-08-07T08:33:11Z

I haven't found the mechanism that an armv7 lldb-server would report reason:trace when it emulated the current instruction, put a breakpoint on the next instruction we'll execute, and 'continued' the thread when we resume. But I might have just missed something in ProcessNativeLinux where it knows it was instruction stepping. (but in the old stepping model in lldb, when you're at a BreakpointSite, the stop reason would be "breakpoint hit" even if it hadn't actually been hit)

I think that's this code here (I have missed it at first as well):

void NativeProcessLinux::MonitorBreakpoint(NativeThreadLinux &thread) {
  Log *log = GetLog(LLDBLog::Process | LLDBLog::Breakpoints);
  LLDB_LOG(log, "received breakpoint event, pid = {0}", thread.GetID());

  // Mark the thread as stopped at breakpoint.
  thread.SetStoppedByBreakpoint();
  FixupBreakpointPCAsNeeded(thread);

  if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
      m_threads_stepping_with_breakpoint.end())
    thread.SetStoppedByTrace();

  StopRunningThreads(thread.GetID());
}

I think this is the correct thing to do, since lldb is completely unaware of whether software single stepping is taking place(*) I got lost in all of the mach exceptions, so it's not clear to me what is the conclusion. Are you saying that debugserver does not do this (i.e., it reports the stop as a "breakpoint hit", in the mach exception form)?

Just as a curiosity the linux kernel does allow you do use the hardware single step capability if debugging a 32 bit application on a 64-bit kernel (I'm not sure if the debugger process needs to be 64-bit or not). At one point, I wanted to make use of that, which would make our stepping behavior more reliable in these situations. However it would also make it harder to reproduce situations like these, as one would have to find actual 32-bit hardware...

(*) The code is probably too casual about this check though, in that it does not check whether it was the actual single-stepping breakpoint that got hit. If there was a breakpoint under the current PC, it would report this as "trace" even though it hasn't actually stepped anything. Could this maybe be the problem that you ran into with this patch?

jasonmolenda · 2024-08-08T00:19:53Z

// Mark the thread as stopped at breakpoint.
thread.SetStoppedByBreakpoint();
FixupBreakpointPCAsNeeded(thread);

if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
m_threads_stepping_with_breakpoint.end())
thread.SetStoppedByTrace();

StopRunningThreads(thread.GetID());
}
I think this is the correct thing to do, since lldb is completely unaware of whether software single stepping is taking place(*) I got lost in all of the mach exceptions, so it's not clear to me what is the conclusion. Are you saying that debugserver does not do this (i.e., it reports the stop as a "breakpoint hit", in the mach exception form)?

Ah, very nice, thanks for finding this. I think this helps to show why this is failing with the new "breakpoints are only 'hit' when they are actually executed'" scheme.

If the pc is at a BreakpointSite and we're stopped in the debugger. We do "si", a breakpoint is inserted on the next instruction (pc+4 or 2), this thread and pc+4 is added to m_threads_stepping_with_breakpoint. We resume execution. The breakpoint at pc is hit, and the pc value does not advance. This method is only checking that the thread is included in the m_threads_stepping_with_breakpoint map, not checking that the pc of the breakpoint we hit is the one we inserted (at pc+4/2). I wonder if this can be fixed by

NativeRegisterContextLinux &reg_ctx = thread.GetRegisterContext();
if (m_threads_stepping_with_breakpoint.find(thread.GetID()) !=
       m_threads_stepping_with_breakpoint.end() &&
    m_threads_stepping_with_breakpoint.find(thread.GetID())->second == reg_ctx.GetPC())

This wasn't necessary with the previous behavior -- if we stopped at a BreakpointSite, we would declare that we hit the breakpoint, even if we'd just stepped up to the BreakpointSite. But now that you may have stepped up to a BreakpointSite OR you may have executed & hit the breakpoint, we're trusting the gdb-remote stub's response more.

I think @DavidSpickett was seeing some looping in the test case, and I could imagine a scenario where the pc is at a BreakpointSite, lldb instruction steps, and we get back an instruction-step stop reason without the pc advancing. We haven't hit the breakpoint yet (as far as lldb knows), so we try again. And again.

Just as a curiosity the linux kernel does allow you do use the hardware single step capability if debugging a 32 bit application on a 64-bit kernel (I'm not sure if the debugger process needs to be 64-bit or not). At one point, I wanted to make use of that, which would make our stepping behavior more reliable in these situations. However it would also make it harder to reproduce situations like these, as one would have to find actual 32-bit hardware...

Yeah on an armv8 system that can run AArch32 or AArch64 code, both will have MDSCR_EL1.SS instruction stepping capability. It's only on a genuine armv7 processor where that feature doesn't exist. I don't know how much lldb distinguishes between AArch32 and armv7, but this is one important difference. (Apple's AArch64 cores don't support AArch32 any more, so I haven't worked with this combination in years).

(*) The code is probably too casual about this check though, in that it does not check whether it was the actual single-stepping breakpoint that got hit. If there was a breakpoint under the current PC, it would report this as "trace" even though it hasn't actually stepped anything. Could this maybe be the problem that you ran into with this patch?

Ah yes, exactly so, we reached the same conclusion, I didn't read your whole response at first. :)

jasonmolenda · 2024-08-08T01:10:09Z

David, no need to test this latest fix unless you'd like to - I'll make sure it compiles on an aarch64 ubuntu VM but the fix is obvious once we were looking at the right place, and explains the behavior you saw earlier, I'm pretty confident. I need to figure out the debuginfo test issues and the mingw problem that Martin detailed still.

DavidSpickett · 2024-08-08T08:41:02Z

Thanks @DavidSpickett just to be sure, this is armv7 right, not AArch32?

This is an AArch32 container on Armv8. I'm not sure if lldb-server knows the difference though. Maybe it doesn't need to, and ptrace single step is done in the kernel which uses the h/w step.

But it looks like it may software single step on any Arm, v7 and v8...

debugserver uses the hardware breakpoint because we know we have that feature on our armv7 darwin cores, but it looks like lldb-server on armv7 linux will be entirely software based.

I don't know how much lldb distinguishes between AArch32 and armv7, but this is one important difference.

The single step code you posted doesn't seem to make a distinction. This is perhaps why we saw armv7 style behaviour on AArch32.

jasonmolenda · 2024-08-09T00:40:53Z

Thanks @DavidSpickett just to be sure, this is armv7 right, not AArch32?

This is an AArch32 container on Armv8. I'm not sure if lldb-server knows the difference though. Maybe it doesn't need to, and ptrace single step is done in the kernel which uses the h/w step.

Yeah I'm thinking lldb-server probably does a software instruction step on AArch32, even thought the hardware has instruction-step. Probably a small enough perf hit that there's little value in looking into that further.

jasonmolenda · 2024-08-09T00:44:23Z

The problem I now have is that debugserver doesn't do this nice behavior that lldb-server does, recognizing that a thread was instruction-stepping and rewriting the stop-reason as "trace" when it was actually "breakpoint-hit". I can't update the debugserver on the armv7 watches, and we need to support debugging on them for a while to come. I may need to make a generic code change where Thread::GetTemporaryResumeState can return both eStateStepping and its original pc it started the step from, only for the armv7 debugservers. With that small fix to NativeProcessLinux::MonitorBreakpoint outlined above, it should report the correct stop reason all by itself in this case.

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. rdar://123942164

jasonmolenda · 2024-08-21T22:58:30Z

I opened a new PR that starts with this PR's patchset which I had to revert, and I will address the issues that need to be fixed there with separate additional commits in preparation for merging it again.
#105594

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. rdar://123942164

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via llvm#96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via #96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164

…(#126988) lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via llvm/llvm-project#96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164

jasonmolenda · 2025-02-13T21:31:57Z

For the record, this PR was finally re-landed as #126988 after four separate commits to address issues found in CI testing when I originally merged this.

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via llvm#96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164 (cherry picked from commit b666ac3)

mstorsjo · 2025-02-14T11:32:31Z

For the record, this PR was finally re-landed as #126988 after four separate commits to address issues found in CI testing when I originally merged this.

FWIW, it looks like my smoke tests are still passing fine after these changes have been relanded now: https://github.com/mstorsjo/llvm-mingw/actions/runs/13320200475/job/37215994344

Thanks for sorting it out!

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via llvm#96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164

lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via llvm#96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164 (cherry picked from commit b666ac3)

jasonmolenda requested a review from jimingham June 21, 2024 00:56

jasonmolenda requested a review from JDevlieghere as a code owner June 21, 2024 00:56

llvmbot added the lldb label Jun 21, 2024

labath reviewed Jun 21, 2024

View reviewed changes

Fix two spots where I forgot to check if the BreakpointSite

356e56a

was enabled.

AlexK0 reviewed Jul 6, 2024

View reviewed changes

jimingham reviewed Jul 12, 2024

View reviewed changes

jasonmolenda mentioned this pull request Aug 13, 2024

[lldb] Claim to support swbreak and hwbreak packets when debugging a gdbremote #102873

Merged

jasonmolenda mentioned this pull request Aug 21, 2024

[lldb] Change lldb's breakpoint detection behavior #105594

Closed

jasonmolenda mentioned this pull request Sep 13, 2024

[lldb] Add pc check for thread-step-by-bp algorithms #108504

Merged

jasonmolenda mentioned this pull request Feb 12, 2025

[lldb] Update ThreadPlanStepOut to handle new breakpoint behavior #126838

Merged

jasonmolenda mentioned this pull request Feb 13, 2025

[lldb] Change lldb's breakpoint handling behavior, reland #126988

Merged

jasonmolenda deleted the change-breakpoint-hit-behavior-in-lldb branch February 13, 2025 21:32

		@@ -377,6 +380,26 @@ class Thread : public std::enable_shared_from_this<Thread>,

		virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}

		/// When a thread stops at an enabled BreakpointSite that has not exectued,

[lldb] Change lldb's breakpoint handling behavior #96260

[lldb] Change lldb's breakpoint handling behavior #96260

Uh oh!

Conversation

jasonmolenda commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 21, 2024

Uh oh!

jasonmolenda commented Jun 21, 2024

Uh oh!

labath left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexK0 commented Jun 21, 2024

Uh oh!

jasonmolenda commented Jun 21, 2024

Uh oh!

AlexK0 commented Jun 24, 2024

Uh oh!

jasonmolenda commented Jun 24, 2024

Uh oh!

labath commented Jun 25, 2024

Uh oh!

jasonmolenda commented Jul 2, 2024

Uh oh!

AlexK0 Jul 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexK0 commented Jul 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonmolenda commented Jul 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasonmolenda commented Aug 6, 2024

Uh oh!

jasonmolenda commented Aug 6, 2024

Uh oh!

jasonmolenda commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

labath commented Aug 7, 2024

Uh oh!

jasonmolenda commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonmolenda commented Aug 8, 2024

Uh oh!

DavidSpickett commented Aug 8, 2024

Uh oh!

jasonmolenda commented Aug 9, 2024

Uh oh!

jasonmolenda commented Aug 9, 2024

Uh oh!

jasonmolenda commented Aug 21, 2024

Uh oh!

jasonmolenda commented Feb 13, 2025

jasonmolenda commented Jun 21, 2024 •

edited

Loading

AlexK0 Jul 6, 2024 •

edited

Loading

AlexK0 commented Jul 6, 2024 •

edited

Loading

jasonmolenda commented Aug 7, 2024 •

edited

Loading

jasonmolenda commented Aug 8, 2024 •

edited

Loading