Skip to content

[lldb] Change lldb's breakpoint detection behavior #105594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jasonmolenda
Copy link
Collaborator

@jasonmolenda jasonmolenda commented Aug 21, 2024

This is a PR to track the changes I need to make to
#96260

to address the CI bot failures when I merged that PR, as well as investigating the windows problem Martin Storsjö saw. There's likely going to be 4-5 additional changes to this PR before it's ready to be merged again, I'll create a new PR with the patches I originally landed and fixing those separate issues in additional commits.

Issues:

  1. lldb-server armv7/aarch32 stepping-by-breakpoint doesn't verify the pc changed as expected, when reporting a trace stop-reason, so it will fail to recognize stepping over a breakpoint correctly.
  2. ProcessGDBRemote needs to add a reason = "breakpoint" when we have a swbreak or hwbreak from the remote stub, like it does today for watch/awatch. [lldb] Claim to support swbreak and hwbreak packets when debugging a gdbremote #102873
  3. debugserver on armv7k watches is doesn't correctly report a "step by breakpoint" as a trace event, need to verify that we can step / step past breakpoints correctly on old watches running that old debugserver.
  4. fix debuginfo dexter fails because stepping behavior was different.
  5. Look into the Martin Storsjö finish failure on windows.
  6. Review Omair's aarch64 windows hw bp/wp changes in [lldb][Windows] WoA HW Watchpoint support in LLDB #108072 once this PR is merged.

lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, but I'm convinced it is
necessary: If you're stopped at a BreakpointSite that has not yet
executed, you `stepi`, you will hit the breakpoint and the pc will not
yet advance. This thread has not completed its stepi, and the thread
plan is still on the stack. If you then `continue` the thread, lldb will
now stop and say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution. I
discussed this with Jim, and trying to paper over this behavior will
lead to more complicated scenarios behaving non-intuitively. And mostly
it's the testsuite that was trying to instruction step past a breakpoint
and getting thrown off -- and I changed those tests to expect the new
behavior.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record
that.  When the thread resumes, if the pc is still at the same site, we
will continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or not),
the Process plugin should call Thread::SetThreadHitBreakpointSite().
When we go to resume the thread, we will push a step-over-breakpoint
ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

rdar://123942164
@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2024

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

This is a PR to track the changes I need to make to
#96260

to address the CI bot failures when I merged that PR, as well as investigating the windows problem Martin Storsjö saw. There's likely going to be 4-5 additional changes to this PR before it's ready to be merged again, I'll create a new PR with the patches I originally landed and fixing those separate issues in additional commits.

Issues:

  1. lldb-server armv7/aarch32 stepping-by-breakpoint doesn't verify the pc changed as expected, when reporting a trace stop-reason, so it will fail to recognize stepping over a breakpoint correctly.
  2. ProcessGDBRemote needs to add a reason = "breakpoint" when we have a swbreak or hwbreak from the remote stub, like it does today for watch/awatch. [lldb] Claim to support swbreak and hwbreak packets when debugging a gdbremote #102873
  3. debugserver on armv7k watches is doesn't correctly report a "step by breakpoint" as a trace event, need to verify that we can step / step past breakpoints correctly on old watches running that old debugserver.
  4. fix debuginfo dexter fails because stepping behavior was different.
  5. Look into the Martin Storsjö finish failure on windows.

Patch is 35.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/105594.diff

9 Files Affected:

  • (modified) lldb/include/lldb/Target/Thread.h (+24)
  • (modified) lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp (+148-174)
  • (modified) lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp (+12-20)
  • (modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+31-62)
  • (modified) lldb/source/Plugins/Process/scripted/ScriptedThread.cpp (+11)
  • (modified) lldb/source/Target/StopInfo.cpp (+2)
  • (modified) lldb/source/Target/Thread.cpp (+14-1)
  • (modified) lldb/test/API/functionalities/breakpoint/consecutive_breakpoints/TestConsecutiveBreakpoints.py (+19-7)
  • (modified) lldb/test/API/functionalities/breakpoint/step_over_breakpoint/TestStepOverBreakpoint.py (+5-1)
diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index 38b65b2bc58490..22d452c7b4b8a3 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -131,6 +131,7 @@ class Thread : public std::enable_shared_from_this<Thread>,
         register_backup_sp; // You need to restore the registers, of course...
     uint32_t current_inlined_depth;
     lldb::addr_t current_inlined_pc;
+    lldb::addr_t stopped_at_unexecuted_bp;
   };
 
   /// Constructor
@@ -380,6 +381,26 @@ class Thread : public std::enable_shared_from_this<Thread>,
 
   virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}
 
+  /// When a thread stops at an enabled BreakpointSite that has not executed,
+  /// the Process plugin should call SetThreadStoppedAtUnexecutedBP(pc).
+  /// If that BreakpointSite was actually triggered (the instruction was
+  /// executed, for a software breakpoint), regardless of whether the
+  /// breakpoint is valid for this thread, SetThreadHitBreakpointSite()
+  /// should be called to record that fact.
+  ///
+  /// Depending on the structure of the Process plugin, it may be easiest
+  /// to call SetThreadStoppedAtUnexecutedBP(pc) unconditionally when at
+  /// a BreakpointSite, and later when it is known that it was triggered,
+  /// SetThreadHitBreakpointSite() can be called.  These two methods
+  /// overwrite the same piece of state in the Thread, the last one
+  /// called on a Thread wins.
+  void SetThreadStoppedAtUnexecutedBP(lldb::addr_t pc) {
+    m_stopped_at_unexecuted_bp = pc;
+  }
+  void SetThreadHitBreakpointSite() {
+    m_stopped_at_unexecuted_bp = LLDB_INVALID_ADDRESS;
+  }
+
   /// Whether this Thread already has all the Queue information cached or not
   ///
   /// A Thread may be associated with a libdispatch work Queue at a given
@@ -1314,6 +1335,9 @@ class Thread : public std::enable_shared_from_this<Thread>,
   bool m_should_run_before_public_stop;  // If this thread has "stop others" 
                                          // private work to do, then it will
                                          // set this.
+  lldb::addr_t m_stopped_at_unexecuted_bp; // Set to the address of a breakpoint
+                                           // instruction that we have not yet
+                                           // hit, but will hit when we resume.
   const uint32_t m_index_id; ///< A unique 1 based index assigned to each thread
                              /// for easy UI/command line access.
   lldb::RegisterContextSP m_reg_context_sp; ///< The register context for this
diff --git a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
index 25cee369d7ee3d..f1853be12638ea 100644
--- a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
+++ b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
@@ -488,38 +488,6 @@ const char *StopInfoMachException::GetDescription() {
   return m_description.c_str();
 }
 
-static StopInfoSP GetStopInfoForHardwareBP(Thread &thread, Target *target,
-                                           uint32_t exc_data_count,
-                                           uint64_t exc_sub_code,
-                                           uint64_t exc_sub_sub_code) {
-  // Try hardware watchpoint.
-  if (target) {
-    // The exc_sub_code indicates the data break address.
-    WatchpointResourceSP wp_rsrc_sp =
-        target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
-            (addr_t)exc_sub_code);
-    if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
-      return StopInfo::CreateStopReasonWithWatchpointID(
-          thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
-    }
-  }
-
-  // Try hardware breakpoint.
-  ProcessSP process_sp(thread.GetProcess());
-  if (process_sp) {
-    // The exc_sub_code indicates the data break address.
-    lldb::BreakpointSiteSP bp_sp =
-        process_sp->GetBreakpointSiteList().FindByAddress(
-            (lldb::addr_t)exc_sub_code);
-    if (bp_sp && bp_sp->IsEnabled()) {
-      return StopInfo::CreateStopReasonWithBreakpointSiteID(thread,
-                                                            bp_sp->GetID());
-    }
-  }
-
-  return nullptr;
-}
-
 #if defined(__APPLE__)
 const char *
 StopInfoMachException::MachException::Name(exception_type_t exc_type) {
@@ -607,6 +575,25 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
       target ? target->GetArchitecture().GetMachine()
              : llvm::Triple::UnknownArch;
 
+  ProcessSP process_sp(thread.GetProcess());
+  RegisterContextSP reg_ctx_sp(thread.GetRegisterContext());
+  // Caveat: with x86 KDP if we've hit a breakpoint, the pc we
+  // receive is past the breakpoint instruction.
+  // If we have a breakpoints at 0x100 and 0x101, we hit the
+  // 0x100 breakpoint and the pc is reported at 0x101.
+  // We will initially mark this thread as being stopped at an
+  // unexecuted breakpoint at 0x101. Later when we see that
+  // we stopped for a Breakpoint reason, we will decrement the
+  // pc, and update the thread to record that we hit the
+  // breakpoint at 0x100.
+  // The fact that the pc may be off by one at this point
+  // (for an x86 KDP breakpoint hit) is not a problem.
+  addr_t pc = reg_ctx_sp->GetPC();
+  BreakpointSiteSP bp_site_sp =
+      process_sp->GetBreakpointSiteList().FindByAddress(pc);
+  if (bp_site_sp && bp_site_sp->IsEnabled())
+    thread.SetThreadStoppedAtUnexecutedBP(pc);
+
   switch (exc_type) {
   case 1: // EXC_BAD_ACCESS
   case 2: // EXC_BAD_INSTRUCTION
@@ -633,171 +620,158 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
     }
     break;
 
+    // A mach exception comes with 2-4 pieces of data.
+    // The sub-codes are only provided for certain types
+    // of mach exceptions.
+    // [exc_type, exc_code, exc_sub_code, exc_sub_sub_code]
+    //
+    // Here are all of the EXC_BREAKPOINT, exc_type==6,
+    // exceptions we can receive.
+    //
+    // Instruction step:
+    //   [6, 1, 0]
+    //   Intel KDP [6, 3, ??]
+    //   armv7 [6, 0x102, <stop-pc>]  Same as software breakpoint!
+    //
+    // Software breakpoint:
+    //   x86 [6, 2, 0]
+    //   Intel KDP [6, 2, <bp-addr + 1>]
+    //   arm64 [6, 1, <bp-addr>]
+    //   armv7 [6, 0x102, <bp-addr>]  Same as instruction step!
+    //
+    // Hardware breakpoint:
+    //   x86 [6, 1, <bp-addr>, 0]
+    //   x86/Rosetta not implemented, see software breakpoint
+    //   arm64 [6, 1, <bp-addr>]
+    //   armv7 not implemented, see software breakpoint
+    //
+    // Hardware watchpoint:
+    //   x86 [6, 1, <accessed-addr>, 0] (both Intel hw and Rosetta)
+    //   arm64 [6, 0x102, <accessed-addr>, 0]
+    //   armv7 [6, 0x102, <accessed-addr>, 0]
+    //
+    // arm64 BRK instruction (imm arg not reflected in the ME)
+    //   [ 6, 1, <addr-of-BRK-insn>]
+    //
+    // In order of codes mach exceptions:
+    //   [6, 1, 0] - instruction step
+    //   [6, 1, <bp-addr>] - hardware breakpoint or watchpoint
+    //
+    //   [6, 2, 0] - software breakpoint
+    //   [6, 2, <bp-addr + 1>] - software breakpoint
+    //
+    //   [6, 3] - instruction step
+    //
+    //   [6, 0x102, <stop-pc>] armv7 instruction step
+    //   [6, 0x102, <bp-addr>] armv7 software breakpoint
+    //   [6, 0x102, <accessed-addr>, 0] arm64/armv7 watchpoint
+
   case 6: // EXC_BREAKPOINT
   {
-    bool is_actual_breakpoint = false;
-    bool is_trace_if_actual_breakpoint_missing = false;
-    switch (cpu) {
-    case llvm::Triple::x86:
-    case llvm::Triple::x86_64:
-      if (exc_code == 1) // EXC_I386_SGL
-      {
-        if (!exc_sub_code) {
-          // This looks like a plain trap.
-          // Have to check if there is a breakpoint here as well.  When you
-          // single-step onto a trap, the single step stops you not to trap.
-          // Since we also do that check below, let's just use that logic.
-          is_actual_breakpoint = true;
-          is_trace_if_actual_breakpoint_missing = true;
-        } else {
-          if (StopInfoSP stop_info =
-                  GetStopInfoForHardwareBP(thread, target, exc_data_count,
-                                           exc_sub_code, exc_sub_sub_code))
-            return stop_info;
-        }
-      } else if (exc_code == 2 || // EXC_I386_BPT
-                 exc_code == 3)   // EXC_I386_BPTFLT
-      {
-        // KDP returns EXC_I386_BPTFLT for trace breakpoints
-        if (exc_code == 3)
-          is_trace_if_actual_breakpoint_missing = true;
-
-        is_actual_breakpoint = true;
+    bool stopped_by_hitting_breakpoint = false;
+    bool stopped_by_completing_stepi = false;
+    bool stopped_watchpoint = false;
+    std::optional<addr_t> address;
+
+    // exc_code 1
+    if (exc_code == 1) {
+      if (exc_sub_code == 0) {
+        stopped_by_completing_stepi = true;
+      } else {
+        // Ambiguous: could be signalling a
+        // breakpoint or watchpoint hit.
+        stopped_by_hitting_breakpoint = true;
+        stopped_watchpoint = true;
+        address = exc_sub_code;
+      }
+    }
+
+    // exc_code 2
+    if (exc_code == 2) {
+      if (exc_sub_code == 0)
+        stopped_by_hitting_breakpoint = true;
+      else {
+        stopped_by_hitting_breakpoint = true;
+        // Intel KDP software breakpoint
         if (!pc_already_adjusted)
           pc_decrement = 1;
       }
-      break;
+    }
 
-    case llvm::Triple::arm:
-    case llvm::Triple::thumb:
-      if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
-      {
-        // LWP_TODO: We need to find the WatchpointResource that matches
-        // the address, and evaluate its Watchpoints.
-
-        // It's a watchpoint, then, if the exc_sub_code indicates a
-        // known/enabled data break address from our watchpoint list.
-        lldb::WatchpointSP wp_sp;
-        if (target)
-          wp_sp = target->GetWatchpointList().FindByAddress(
-              (lldb::addr_t)exc_sub_code);
-        if (wp_sp && wp_sp->IsEnabled()) {
-          return StopInfo::CreateStopReasonWithWatchpointID(thread,
-                                                            wp_sp->GetID());
-        } else {
-          is_actual_breakpoint = true;
-          is_trace_if_actual_breakpoint_missing = true;
-        }
-      } else if (exc_code == 1) // EXC_ARM_BREAKPOINT
-      {
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-      } else if (exc_code == 0) // FIXME not EXC_ARM_BREAKPOINT but a kernel
-                                // is currently returning this so accept it
-                                // as indicating a breakpoint until the
-                                // kernel is fixed
-      {
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-      }
-      break;
+    // exc_code 3
+    if (exc_code == 3)
+      stopped_by_completing_stepi = true;
 
-    case llvm::Triple::aarch64_32:
-    case llvm::Triple::aarch64: {
-      // xnu describes three things with type EXC_BREAKPOINT:
-      //
-      //   exc_code 0x102 [EXC_ARM_DA_DEBUG], exc_sub_code addr-of-insn
-      //      Watchpoint access.  exc_sub_code is the address of the
-      //      instruction which trigged the watchpoint trap.
-      //      debugserver may add the watchpoint number that was triggered
-      //      in exc_sub_sub_code.
-      //
-      //   exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code 0
-      //      Instruction step has completed.
-      //
-      //   exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code address-of-instruction
-      //      Software breakpoint instruction executed.
-
-      if (exc_code == 1 && exc_sub_code == 0) // EXC_ARM_BREAKPOINT
-      {
-        // This is hit when we single instruction step aka MDSCR_EL1 SS bit 0
-        // is set
-        is_actual_breakpoint = true;
-        is_trace_if_actual_breakpoint_missing = true;
-        if (thread.GetTemporaryResumeState() != eStateStepping)
-          not_stepping_but_got_singlestep_exception = true;
-      }
-      if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
-      {
-        // LWP_TODO: We need to find the WatchpointResource that matches
-        // the address, and evaluate its Watchpoints.
-
-        // It's a watchpoint, then, if the exc_sub_code indicates a
-        // known/enabled data break address from our watchpoint list.
-        lldb::WatchpointSP wp_sp;
-        if (target)
-          wp_sp = target->GetWatchpointList().FindByAddress(
-              (lldb::addr_t)exc_sub_code);
-        if (wp_sp && wp_sp->IsEnabled()) {
-          return StopInfo::CreateStopReasonWithWatchpointID(thread,
-                                                            wp_sp->GetID());
-        }
-        // EXC_ARM_DA_DEBUG seems to be reused for EXC_BREAKPOINT as well as
-        // EXC_BAD_ACCESS
-        if (thread.GetTemporaryResumeState() == eStateStepping)
-          return StopInfo::CreateStopReasonToTrace(thread);
+    // exc_code 0x102
+    if (exc_code == 0x102 && exc_sub_code != 0) {
+      if (cpu == llvm::Triple::arm || cpu == llvm::Triple::thumb) {
+        stopped_by_hitting_breakpoint = true;
+        stopped_by_completing_stepi = true;
       }
-      // It looks like exc_sub_code has the 4 bytes of the instruction that
-      // triggered the exception, i.e. our breakpoint opcode
-      is_actual_breakpoint = exc_code == 1;
-      break;
+      stopped_watchpoint = true;
+      address = exc_sub_code;
     }
 
-    default:
-      break;
-    }
+    // The Mach Exception may have been ambiguous --
+    // e.g. we stopped either because of a breakpoint
+    // or a watchpoint.  We'll disambiguate which it
+    // really was.
 
-    if (is_actual_breakpoint) {
-      RegisterContextSP reg_ctx_sp(thread.GetRegisterContext());
+    if (stopped_by_hitting_breakpoint) {
       addr_t pc = reg_ctx_sp->GetPC() - pc_decrement;
 
-      ProcessSP process_sp(thread.CalculateProcess());
-
-      lldb::BreakpointSiteSP bp_site_sp;
-      if (process_sp)
+      if (address)
+        bp_site_sp =
+            process_sp->GetBreakpointSiteList().FindByAddress(*address);
+      if (!bp_site_sp && reg_ctx_sp) {
         bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(pc);
+      }
       if (bp_site_sp && bp_site_sp->IsEnabled()) {
-        // Update the PC if we were asked to do so, but only do so if we find
-        // a breakpoint that we know about cause this could be a trap
-        // instruction in the code
-        if (pc_decrement > 0 && adjust_pc_if_needed)
-          reg_ctx_sp->SetPC(pc);
-
-        // If the breakpoint is for this thread, then we'll report the hit,
-        // but if it is for another thread, we can just report no reason.  We
-        // don't need to worry about stepping over the breakpoint here, that
-        // will be taken care of when the thread resumes and notices that
-        // there's a breakpoint under the pc. If we have an operating system
-        // plug-in, we might have set a thread specific breakpoint using the
-        // operating system thread ID, so we can't make any assumptions about
-        // the thread ID so we must always report the breakpoint regardless
-        // of the thread.
+        // We've hit this breakpoint, whether it was intended for this thread
+        // or not.  Clear this in the Tread object so we step past it on resume.
+        thread.SetThreadHitBreakpointSite();
+
+        // If we have an operating system plug-in, we might have set a thread
+        // specific breakpoint using the operating system thread ID, so we
+        // can't make any assumptions about the thread ID so we must always
+        // report the breakpoint regardless of the thread.
         if (bp_site_sp->ValidForThisThread(thread) ||
-            thread.GetProcess()->GetOperatingSystem() != nullptr)
+            thread.GetProcess()->GetOperatingSystem() != nullptr) {
+          // Update the PC if we were asked to do so, but only do so if we find
+          // a breakpoint that we know about cause this could be a trap
+          // instruction in the code
+          if (pc_decrement > 0 && adjust_pc_if_needed && reg_ctx_sp)
+            reg_ctx_sp->SetPC(pc);
           return StopInfo::CreateStopReasonWithBreakpointSiteID(
               thread, bp_site_sp->GetID());
-        else if (is_trace_if_actual_breakpoint_missing)
-          return StopInfo::CreateStopReasonToTrace(thread);
-        else
+        } else {
           return StopInfoSP();
+        }
       }
+    }
 
-      // Don't call this a trace if we weren't single stepping this thread.
-      if (is_trace_if_actual_breakpoint_missing &&
-          thread.GetTemporaryResumeState() == eStateStepping) {
-        return StopInfo::CreateStopReasonToTrace(thread);
+    // Breakpoint-hit events are handled.
+    // Now handle watchpoints.
+
+    if (stopped_watchpoint && address) {
+      WatchpointResourceSP wp_rsrc_sp =
+          target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
+              *address);
+      if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
+        return StopInfo::CreateStopReasonWithWatchpointID(
+            thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
       }
     }
+
+    // Finally, handle instruction step.
+
+    if (stopped_by_completing_stepi) {
+      if (thread.GetTemporaryResumeState() != eStateStepping)
+        not_stepping_but_got_singlestep_exception = true;
+      else
+        return StopInfo::CreateStopReasonToTrace(thread);
+    }
+
   } break;
 
   case 7:  // EXC_SYSCALL
diff --git a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
index f383b3d40a4f3a..8b12b87eacbc61 100644
--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -377,24 +377,17 @@ void ProcessWindows::RefreshStateAfterStop() {
   if (!stop_thread)
     return;
 
-  switch (active_exception->GetExceptionCode()) {
-  case EXCEPTION_SINGLE_STEP: {
-    RegisterContextSP register_context = stop_thread->GetRegisterContext();
-    const uint64_t pc = register_context->GetPC();
-    BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
-    if (site && site->ValidForThisThread(*stop_thread)) {
-      LLDB_LOG(log,
-               "Single-stepped onto a breakpoint in process {0} at "
-               "address {1:x} with breakpoint site {2}",
-               m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
-               site->GetID());
-      stop_info = StopInfo::CreateStopReasonWithBreakpointSiteID(*stop_thread,
-                                                                 site->GetID());
-      stop_thread->SetStopInfo(stop_info);
+  RegisterContextSP register_context = stop_thread->GetRegisterContext();
+  uint64_t pc = register_context->GetPC();
 
-      return;
-    }
+  // If we're at a BreakpointSite, mark this as an Unexecuted Breakpoint.
+  // We'll clear that state if we've actually executed the breakpoint.
+  BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
+  if (site && site->IsEnabled())
+    stop_thread->SetThreadStoppedAtUnexecutedBP(pc);
 
+  switch (active_exception->GetExceptionCode()) {
+  case EXCEPTION_SINGLE_STEP: {
     auto *reg_ctx = static_cast<RegisterContextWindows *>(
         stop_thread->GetRegisterContext().get());
     uint32_t slot_id = reg_ctx->GetTriggeredHardwareBreakpointSlotId();
@@ -419,8 +412,6 @@ void ProcessWindows::RefreshStateAfterStop() {
   }
 
   case EXCEPTION_BREAKPOINT: {
-    RegisterContextSP register_context = stop_thread->GetRegisterContext();
-
     int breakpoint_size = 1;
     switch (GetTarget().GetArchitecture().GetMachine()) {
     case llvm::Triple::aarch64:
@@ -443,9 +434,9 @@ void ProcessWindows::RefreshStateAfterStop() {
     }
 
     // The current PC is AFTER the BP opcode, on all architectures.
-    uint64_t pc = register_context->GetPC() - breakpoint_size;
+    pc = register_context->GetPC() - breakpoint_size;
 
-    BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
+    site = GetBreakpointSiteList().FindByAddress(pc);
     if (site) {
       LLDB_LOG(log,
                "detected breakp...
[truncated]

mark it as a `reason:breakpoint` so the correct StopInfo is created.
NativeProcessFreeBSD::MonitorSIGTRAP (MIPS) and
NativeProcessLinux::MonitorBreakpoint (MIPS, armv7, RISCV, LoongArch)
do instruction stepping by setting a breakpoint on the next instruction
that will be executed, and resuming the thread.  They check that there
was one of these "instruction step" breakpoints set for this thread,
and rewrite the stop reason from "breakpoint hit" to "trace".

However, if a user is sitting at a BreakpointSite (break instruction),
and does `stepi`, then a breakpoint instruction will be put on the
next instruction, the thread resumes, and we stop at the same
location having hit our *original* breakpoint, not the one inserted
on the next instruction for stepping.  The stop reason is rewritten to
"trace" but the pc hasn't advanced.  The stepping logic will try to
instruction step again and it infinite loops.

I changed NativeProcessLinux and NativeProcessFreeBSD to only rewrite
the stop reason to "trace" when the pc is the breakpoint we added
for the insn-step.  I don't have any of the devices that use this
behavior, but I did at least compile test NativeProcessLinux in an
Ubuntu AArch64 VM (but my core doesn't support aarch32 so I can't
test it).
@jasonmolenda
Copy link
Collaborator Author

jasonmolenda commented Aug 29, 2024

Regarding the third of my todo's

  1. debugserver on armv7k watches is doesn't correctly report a "step by breakpoint" as a trace event, need to verify that we can step / step past breakpoints correctly on old watches running that old debugserver.

I've tested the new breakpoint stepping behavior with the old debugserver on the armv7k watches and it works well enough, I won't change lldb to handle this specially.

@jasonmolenda
Copy link
Collaborator Author

jasonmolenda commented Sep 6, 2024

I have a fix for the fourth of five issues needing investigation - the debuginfo dexter test fails. In this case, it places a breakpoint on every source line, then step-in's through the program. With old lldb, we would step to a BreakpointLocation at the start of a source line, and the breakpoint-hit stop reason would override the "step-in completed" stop reason. With this new algorithm, we stop at the BreakpointLocation that has not executed yet, so our stop reason is "step-in completed". Then when we step again, we hit the breakpoint we're stopped at, and the stop reason is "breakpoint-hit" -- but we have not advanced. So dexter needs to step twice to get past these breakpoints-on-every-source-line as it step-in's through the program. I have a PR that fixes it (when dexter steps, if it stops at a BreakpointLocation with a step-in-completed stop reason, step again), asking around if I should put it up as a separate PR or include it as a commit on this.

dexter sets breakpoints on every source line in the file.
Then it step-in's to every source line, and tests are written
to expect that one step will move between lines, hitting the
breakpoints that are set there.

With this new algorithm, lldb will now step to the next source
line, and be stopped at a breakpoint, but it has not yet hit the
breakpoint.  The stop reason will be "step-in completed".  When
you do "step-in" again, we will hit the breakpoint instruction,
stop with a "breakpoint-hit" stop reason -- but the pc has not
advanced.  Then when you "step-in" again, we will move to the
next source line.

In short, lldb needs to step twice in a situation with breakpoints
on every source line and stepping across them.

This patch detects when lldb is behaving this way, and preserves
the behavior the tests expect.  It will step-in, and if we stop
with a step-in stop reason and we are stopped at a breakpoint site,
we will step once again to hit the breakpoint.
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Sep 11, 2024
lldb will change how it reports stop reasons around breakpoints in
the near future.  I landed an earlier version of this change and
noticed debuginfo test failures on the CI bots due to the changes.
I'm addressing the issues found by CI at
llvm#105594 and will re-land
once I've done all of them.

Currently, when lldb stops at a breakpoint instruction -- but has
not yet executed the instruction -- it will overwrite the thread's
Stop Reason with "breakpoint-hit".  This caused bugs when the
original stop reason was important to the user - for instance, a
watchpoint on an AArch64 system where we have to instruction-step
past the watchpoint to find the new value.  Normally we would
instruction step, fetch the new value, then report the user that a
watchpoint has been hit with the old and new values.  But if the
instruction after this access is a breakpoint site, we overwrite
the "watchpoint hit" stop reason (and related actions) with "breakpoint
hit".

dexter sets breakpoints on all source lines, then steps line-to-line,
hitting the breakpoints.  But with this new behavior, we see two
steps per source line:  The first step gets us to the start of the
next line, with a "step completed" stop reason.  Then we step again
and we execute the breakpoint instruction, stop with the pc the
same, and report "breakpoint hit".  Now we can step a second time
and move past the breakpoint.

I've changed the `step` method in LLDB.py to check if we step to a
breakpoint site but have a "step completed" stop reason -- in which
case we have this new breakpoint behavior, and we need to step a
second time to actually hit the breakpoint like the debuginfo tests
expect.
jasonmolenda added a commit that referenced this pull request Sep 11, 2024
lldb will change how it reports stop reasons around breakpoints in the
near future. I landed an earlier version of this change and noticed
debuginfo test failures on the CI bots due to the changes. I'm
addressing the issues found by CI at
#105594 and will re-land once
I've done all of them.

Currently, when lldb stops at a breakpoint instruction -- but has not
yet executed the instruction -- it will overwrite the thread's Stop
Reason with "breakpoint-hit". This caused bugs when the original stop
reason was important to the user - for instance, a watchpoint on an
AArch64 system where we have to instruction-step past the watchpoint to
find the new value. Normally we would instruction step, fetch the new
value, then report the user that a watchpoint has been hit with the old
and new values. But if the instruction after this access is a breakpoint
site, we overwrite the "watchpoint hit" stop reason (and related
actions) with "breakpoint hit".

dexter sets breakpoints on all source lines, then steps line-to-line,
hitting the breakpoints. But with this new behavior, we see two steps
per source line: The first step gets us to the start of the next line,
with a "step completed" stop reason. Then we step again and we execute
the breakpoint instruction, stop with the pc the same, and report
"breakpoint hit". Now we can step a second time and move past the
breakpoint.

I've changed the `step` method in LLDB.py to check if we step to a
breakpoint site but have a "step completed" stop reason -- in which case
we have this new breakpoint behavior, and we need to step a second time
to actually hit the breakpoint like the debuginfo tests expect.
@jasonmolenda
Copy link
Collaborator Author

(1) the armv7/aarch32 step-by-breakpoint fix for lldb-server is merged on github main.
(2) the change to ProcessGDBRemote to recognize "swbreak" and "hwbreak" as a breakpoint is merged on github main.
(3) lldb with this patch works fine with the existing debugserver on armv7k where we get the breakpoint hit mach exception when we instruction step (StopInfoMachException handles it)
(4) dexter has been updated on github main to work with this patch or with old lldb stepping

which leaves

(5) Martin's mingw finish bug report
(6) Review Omair's Windows aarch64 hardware breakpoint/watchpoint PR for any changes necessary to work with this PR.

@jasonmolenda
Copy link
Collaborator Author

Looking at this again. The last issue that I needed to investigate was Martin's mingw testsuite that had a regression when I landed it. The test program was https://github.com/mstorsjo/llvm-mingw/blob/master/test/hello-exception.cpp and when run

bin/lldb -x -b -o 'b done' -o r -o fin /tmp/a.out -- -noop

put a breakpoint on done(), run to it, finish. On arm64 darwin, where done() is a single ret instruction, I can reproduce this failure with my stepping PR. The thread plan stack when I do finish has a "step over breakpoint" thread plan, and a "step out" thread plan. We do the step over breakpoint thread plan, correctly pop both thread plans from the stack, but with my patch we're asking if either thread plan says we should resume execution, and one of them voted yes and won. (or possibly returned should stop == 0) It's been a while since i looked into that logic, need to dig back in.

Important thing is that the failure is not mingw specific, I'll probably add this as an API test in lldb, maybe updated to not require a flag to run done.

Copy link

github-actions bot commented Jan 30, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

the breakpoint.  Being in the correct stack frame is not sufficient.

Fixing the case of hitting a breakpoint on the last instruction of
a function and doing `finish`.  A ThreadPlanStepOut is pushed, the
thread is set to eStateRunning, and then because we're at a breakpoint
site that we've hit, a ThreadPlanStepOverBreakpoint is pushed on
the thread plan stack.  It sets itself to AutoContinue==true because
the thread state is eStateRunning.

We instruction step past the breakpoint, stop with stop reason
`trace`.  ThreadPlanStepOverBreakpoint has completed, and it says
that we should AutoContinue.  ThreadPlanStepOut detects that it is
now in the stack frame it wanted to step to -- does not check whether
it hit its breakpoint -- and pulls it self off the stack, removes
its breakpoint.

We resume execution via the ThreadPlanStepOverBreakpoint's AutoContinue,
and there is now no breakpoint for ThreadPlanStepOut - we lose
control of the process.

Instead, until we hit the ThreadPlanStepOut breakpoint, we say that
it has not completed.  So now in this scenario of a breakpoint hit on
the last instruction and "finish", we instruction step, then continue
the thread and immediately hit the ThreadPlanStepOut breakpoint that
we're sitting at (but have not yet executed).

This fixes the llvm-mingw test failure that Martin reported, and I
captured in the new API test TestEmptyFuncThreadStepOut.py.
@jasonmolenda jasonmolenda changed the title [lldb] Change lldb's breakpoint detection behavior [WIP] [lldb] Change lldb's breakpoint detection behavior Feb 11, 2025
@jasonmolenda
Copy link
Collaborator Author

jasonmolenda commented Feb 11, 2025

@JDevlieghere @jimingham OK I'm at the point where I'm ready to land this PR again.

To recap, I landed this originally in July 2024 via #96260 .
It hit three CI bot failures, and @mstorsjo reported a regression on llvm-mingw with a hello-exception-dwarf test case. I listed all of the issues at the top of this PR, and I've fixed all of them, except Martin's, in separate commits to llvm-project main as I debugged them. I fixed Martin's with a commit included in this PR.

The problem Martin observed was a bug with ThreadPlanStepOut and ThreadPlanStepOverBreakpoint when you hit a breakpoint on the last instruction of a function. I also added an lldb API test that does this. I fixed this with a change to ThreadPlanStepOut which requires it to hit its breakpoint before it considers itself complete. I looked for other thread plans that may have similar issues but didn't see any that matched the same code pattern as ThreadPlanStepOut did.

I believe this is ready to land now.

@jasonmolenda
Copy link
Collaborator Author

The details of the bug that Martin found were actually bit a interesting.

The core of the issue is that the thread has a ThreadPlanStepOut, and before we resume execution, because we're at a BreakpointSite that has been hit, we push a ThreadPlanStepOverBreakpoint plan. It sees that the thread's state is eStateRunning (not eStateStepping -- so this is a thread doing a "continue" style resume). StepOverBreakpoint marks itself as "AutoResume == true" so that when it completes, it will run the thread as it was intended to be doing originally.

We do our instruction step. StepOverBreakpoint has completed, and it says we should resume execution. StepOut says it has completed execution because the StackID == the StackID it needed to execute to. It pops itself off the stack and removes its breakpoint. Thread::ShouldResume uses the StepOverBreakpoint's "auto resume == true" state to resume execution, and we lose control.

Jim points out that StepOverBreakpoint is unique because it was not tied to any higher level/controlling thread plan. It was injected on the thread just before we resume execution, and it tries to guess is the thread wants to stop when it's done or resume execution based on the thread's running state when it's added.

He thinks StepOverBreakpoint should maybe be owned by the first thread plan on the stack, so when StepOut removes itself from the stack, StepOverBreakpoint's opinion about resuming is not relevant.

The other part I did not dig in to was what exactly Thread::ShouldStop is doing when the current thread plan (StepOverBreakpoint) says 'auto resume' and the next one up the stack (StepOut) says "I'm complete, stop", and it resumed. I wanted to avoid touching something as central as Thread::ShouldStop's logic here, but I don't quite understand why the result of these two inputs was, "resume".

@jasonmolenda
Copy link
Collaborator Author

Small change in plan. The original PR I landed in July 2024 was #96260 and when we found a few problems with CIs and wider testing, I reverted it and opened this PR to come up with fixes for all the issues. But I was able to land all of the individual fixes (in ways that worked with the old stepping algorithm and the new) on llvm-project main by themselves.

I'll either use this to track the rebased version of my July 2024 PR, or create a clean one with just the original fix rebased and none of the other work in progress, once the last fix to ThreadPlanStepOut is merged.

@jasonmolenda
Copy link
Collaborator Author

I'm abandoning this PR where I was working through all the CI regressions, but landed them in separate clean PRs.

I created #126988 to re-land the July 2024 patch, rebased to current sources.

jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Feb 13, 2025
lldb will change how it reports stop reasons around breakpoints in the
near future. I landed an earlier version of this change and noticed
debuginfo test failures on the CI bots due to the changes. I'm
addressing the issues found by CI at
llvm#105594 and will re-land once
I've done all of them.

Currently, when lldb stops at a breakpoint instruction -- but has not
yet executed the instruction -- it will overwrite the thread's Stop
Reason with "breakpoint-hit". This caused bugs when the original stop
reason was important to the user - for instance, a watchpoint on an
AArch64 system where we have to instruction-step past the watchpoint to
find the new value. Normally we would instruction step, fetch the new
value, then report the user that a watchpoint has been hit with the old
and new values. But if the instruction after this access is a breakpoint
site, we overwrite the "watchpoint hit" stop reason (and related
actions) with "breakpoint hit".

dexter sets breakpoints on all source lines, then steps line-to-line,
hitting the breakpoints. But with this new behavior, we see two steps
per source line: The first step gets us to the start of the next line,
with a "step completed" stop reason. Then we step again and we execute
the breakpoint instruction, stop with the pc the same, and report
"breakpoint hit". Now we can step a second time and move past the
breakpoint.

I've changed the `step` method in LLDB.py to check if we step to a
breakpoint site but have a "step completed" stop reason -- in which case
we have this new breakpoint behavior, and we need to step a second time
to actually hit the breakpoint like the debuginfo tests expect.

(cherry picked from commit 93e45a6)
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Mar 29, 2025
lldb will change how it reports stop reasons around breakpoints in the
near future. I landed an earlier version of this change and noticed
debuginfo test failures on the CI bots due to the changes. I'm
addressing the issues found by CI at
llvm#105594 and will re-land once
I've done all of them.

Currently, when lldb stops at a breakpoint instruction -- but has not
yet executed the instruction -- it will overwrite the thread's Stop
Reason with "breakpoint-hit". This caused bugs when the original stop
reason was important to the user - for instance, a watchpoint on an
AArch64 system where we have to instruction-step past the watchpoint to
find the new value. Normally we would instruction step, fetch the new
value, then report the user that a watchpoint has been hit with the old
and new values. But if the instruction after this access is a breakpoint
site, we overwrite the "watchpoint hit" stop reason (and related
actions) with "breakpoint hit".

dexter sets breakpoints on all source lines, then steps line-to-line,
hitting the breakpoints. But with this new behavior, we see two steps
per source line: The first step gets us to the start of the next line,
with a "step completed" stop reason. Then we step again and we execute
the breakpoint instruction, stop with the pc the same, and report
"breakpoint hit". Now we can step a second time and move past the
breakpoint.

I've changed the `step` method in LLDB.py to check if we step to a
breakpoint site but have a "step completed" stop reason -- in which case
we have this new breakpoint behavior, and we need to step a second time
to actually hit the breakpoint like the debuginfo tests expect.

(cherry picked from commit 93e45a6)
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Apr 8, 2025
lldb will change how it reports stop reasons around breakpoints in the
near future. I landed an earlier version of this change and noticed
debuginfo test failures on the CI bots due to the changes. I'm
addressing the issues found by CI at
llvm#105594 and will re-land once
I've done all of them.

Currently, when lldb stops at a breakpoint instruction -- but has not
yet executed the instruction -- it will overwrite the thread's Stop
Reason with "breakpoint-hit". This caused bugs when the original stop
reason was important to the user - for instance, a watchpoint on an
AArch64 system where we have to instruction-step past the watchpoint to
find the new value. Normally we would instruction step, fetch the new
value, then report the user that a watchpoint has been hit with the old
and new values. But if the instruction after this access is a breakpoint
site, we overwrite the "watchpoint hit" stop reason (and related
actions) with "breakpoint hit".

dexter sets breakpoints on all source lines, then steps line-to-line,
hitting the breakpoints. But with this new behavior, we see two steps
per source line: The first step gets us to the start of the next line,
with a "step completed" stop reason. Then we step again and we execute
the breakpoint instruction, stop with the pc the same, and report
"breakpoint hit". Now we can step a second time and move past the
breakpoint.

I've changed the `step` method in LLDB.py to check if we step to a
breakpoint site but have a "step completed" stop reason -- in which case
we have this new breakpoint behavior, and we need to step a second time
to actually hit the breakpoint like the debuginfo tests expect.

(cherry picked from commit 93e45a6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants