-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb] Change lldb's breakpoint detection behavior #105594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Change lldb's breakpoint detection behavior #105594
Conversation
lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped *at* the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, but I'm convinced it is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the thread plan is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. I discussed this with Jim, and trying to paper over this behavior will lead to more complicated scenarios behaving non-intuitively. And mostly it's the testsuite that was trying to instruction step past a breakpoint and getting thrown off -- and I changed those tests to expect the new behavior. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. rdar://123942164
@llvm/pr-subscribers-lldb Author: Jason Molenda (jasonmolenda) ChangesThis is a PR to track the changes I need to make to to address the CI bot failures when I merged that PR, as well as investigating the windows problem Martin Storsjö saw. There's likely going to be 4-5 additional changes to this PR before it's ready to be merged again, I'll create a new PR with the patches I originally landed and fixing those separate issues in additional commits. Issues:
Patch is 35.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/105594.diff 9 Files Affected:
diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index 38b65b2bc58490..22d452c7b4b8a3 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -131,6 +131,7 @@ class Thread : public std::enable_shared_from_this<Thread>,
register_backup_sp; // You need to restore the registers, of course...
uint32_t current_inlined_depth;
lldb::addr_t current_inlined_pc;
+ lldb::addr_t stopped_at_unexecuted_bp;
};
/// Constructor
@@ -380,6 +381,26 @@ class Thread : public std::enable_shared_from_this<Thread>,
virtual void SetQueueLibdispatchQueueAddress(lldb::addr_t dispatch_queue_t) {}
+ /// When a thread stops at an enabled BreakpointSite that has not executed,
+ /// the Process plugin should call SetThreadStoppedAtUnexecutedBP(pc).
+ /// If that BreakpointSite was actually triggered (the instruction was
+ /// executed, for a software breakpoint), regardless of whether the
+ /// breakpoint is valid for this thread, SetThreadHitBreakpointSite()
+ /// should be called to record that fact.
+ ///
+ /// Depending on the structure of the Process plugin, it may be easiest
+ /// to call SetThreadStoppedAtUnexecutedBP(pc) unconditionally when at
+ /// a BreakpointSite, and later when it is known that it was triggered,
+ /// SetThreadHitBreakpointSite() can be called. These two methods
+ /// overwrite the same piece of state in the Thread, the last one
+ /// called on a Thread wins.
+ void SetThreadStoppedAtUnexecutedBP(lldb::addr_t pc) {
+ m_stopped_at_unexecuted_bp = pc;
+ }
+ void SetThreadHitBreakpointSite() {
+ m_stopped_at_unexecuted_bp = LLDB_INVALID_ADDRESS;
+ }
+
/// Whether this Thread already has all the Queue information cached or not
///
/// A Thread may be associated with a libdispatch work Queue at a given
@@ -1314,6 +1335,9 @@ class Thread : public std::enable_shared_from_this<Thread>,
bool m_should_run_before_public_stop; // If this thread has "stop others"
// private work to do, then it will
// set this.
+ lldb::addr_t m_stopped_at_unexecuted_bp; // Set to the address of a breakpoint
+ // instruction that we have not yet
+ // hit, but will hit when we resume.
const uint32_t m_index_id; ///< A unique 1 based index assigned to each thread
/// for easy UI/command line access.
lldb::RegisterContextSP m_reg_context_sp; ///< The register context for this
diff --git a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
index 25cee369d7ee3d..f1853be12638ea 100644
--- a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
+++ b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
@@ -488,38 +488,6 @@ const char *StopInfoMachException::GetDescription() {
return m_description.c_str();
}
-static StopInfoSP GetStopInfoForHardwareBP(Thread &thread, Target *target,
- uint32_t exc_data_count,
- uint64_t exc_sub_code,
- uint64_t exc_sub_sub_code) {
- // Try hardware watchpoint.
- if (target) {
- // The exc_sub_code indicates the data break address.
- WatchpointResourceSP wp_rsrc_sp =
- target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
- (addr_t)exc_sub_code);
- if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
- return StopInfo::CreateStopReasonWithWatchpointID(
- thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
- }
- }
-
- // Try hardware breakpoint.
- ProcessSP process_sp(thread.GetProcess());
- if (process_sp) {
- // The exc_sub_code indicates the data break address.
- lldb::BreakpointSiteSP bp_sp =
- process_sp->GetBreakpointSiteList().FindByAddress(
- (lldb::addr_t)exc_sub_code);
- if (bp_sp && bp_sp->IsEnabled()) {
- return StopInfo::CreateStopReasonWithBreakpointSiteID(thread,
- bp_sp->GetID());
- }
- }
-
- return nullptr;
-}
-
#if defined(__APPLE__)
const char *
StopInfoMachException::MachException::Name(exception_type_t exc_type) {
@@ -607,6 +575,25 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
target ? target->GetArchitecture().GetMachine()
: llvm::Triple::UnknownArch;
+ ProcessSP process_sp(thread.GetProcess());
+ RegisterContextSP reg_ctx_sp(thread.GetRegisterContext());
+ // Caveat: with x86 KDP if we've hit a breakpoint, the pc we
+ // receive is past the breakpoint instruction.
+ // If we have a breakpoints at 0x100 and 0x101, we hit the
+ // 0x100 breakpoint and the pc is reported at 0x101.
+ // We will initially mark this thread as being stopped at an
+ // unexecuted breakpoint at 0x101. Later when we see that
+ // we stopped for a Breakpoint reason, we will decrement the
+ // pc, and update the thread to record that we hit the
+ // breakpoint at 0x100.
+ // The fact that the pc may be off by one at this point
+ // (for an x86 KDP breakpoint hit) is not a problem.
+ addr_t pc = reg_ctx_sp->GetPC();
+ BreakpointSiteSP bp_site_sp =
+ process_sp->GetBreakpointSiteList().FindByAddress(pc);
+ if (bp_site_sp && bp_site_sp->IsEnabled())
+ thread.SetThreadStoppedAtUnexecutedBP(pc);
+
switch (exc_type) {
case 1: // EXC_BAD_ACCESS
case 2: // EXC_BAD_INSTRUCTION
@@ -633,171 +620,158 @@ StopInfoSP StopInfoMachException::CreateStopReasonWithMachException(
}
break;
+ // A mach exception comes with 2-4 pieces of data.
+ // The sub-codes are only provided for certain types
+ // of mach exceptions.
+ // [exc_type, exc_code, exc_sub_code, exc_sub_sub_code]
+ //
+ // Here are all of the EXC_BREAKPOINT, exc_type==6,
+ // exceptions we can receive.
+ //
+ // Instruction step:
+ // [6, 1, 0]
+ // Intel KDP [6, 3, ??]
+ // armv7 [6, 0x102, <stop-pc>] Same as software breakpoint!
+ //
+ // Software breakpoint:
+ // x86 [6, 2, 0]
+ // Intel KDP [6, 2, <bp-addr + 1>]
+ // arm64 [6, 1, <bp-addr>]
+ // armv7 [6, 0x102, <bp-addr>] Same as instruction step!
+ //
+ // Hardware breakpoint:
+ // x86 [6, 1, <bp-addr>, 0]
+ // x86/Rosetta not implemented, see software breakpoint
+ // arm64 [6, 1, <bp-addr>]
+ // armv7 not implemented, see software breakpoint
+ //
+ // Hardware watchpoint:
+ // x86 [6, 1, <accessed-addr>, 0] (both Intel hw and Rosetta)
+ // arm64 [6, 0x102, <accessed-addr>, 0]
+ // armv7 [6, 0x102, <accessed-addr>, 0]
+ //
+ // arm64 BRK instruction (imm arg not reflected in the ME)
+ // [ 6, 1, <addr-of-BRK-insn>]
+ //
+ // In order of codes mach exceptions:
+ // [6, 1, 0] - instruction step
+ // [6, 1, <bp-addr>] - hardware breakpoint or watchpoint
+ //
+ // [6, 2, 0] - software breakpoint
+ // [6, 2, <bp-addr + 1>] - software breakpoint
+ //
+ // [6, 3] - instruction step
+ //
+ // [6, 0x102, <stop-pc>] armv7 instruction step
+ // [6, 0x102, <bp-addr>] armv7 software breakpoint
+ // [6, 0x102, <accessed-addr>, 0] arm64/armv7 watchpoint
+
case 6: // EXC_BREAKPOINT
{
- bool is_actual_breakpoint = false;
- bool is_trace_if_actual_breakpoint_missing = false;
- switch (cpu) {
- case llvm::Triple::x86:
- case llvm::Triple::x86_64:
- if (exc_code == 1) // EXC_I386_SGL
- {
- if (!exc_sub_code) {
- // This looks like a plain trap.
- // Have to check if there is a breakpoint here as well. When you
- // single-step onto a trap, the single step stops you not to trap.
- // Since we also do that check below, let's just use that logic.
- is_actual_breakpoint = true;
- is_trace_if_actual_breakpoint_missing = true;
- } else {
- if (StopInfoSP stop_info =
- GetStopInfoForHardwareBP(thread, target, exc_data_count,
- exc_sub_code, exc_sub_sub_code))
- return stop_info;
- }
- } else if (exc_code == 2 || // EXC_I386_BPT
- exc_code == 3) // EXC_I386_BPTFLT
- {
- // KDP returns EXC_I386_BPTFLT for trace breakpoints
- if (exc_code == 3)
- is_trace_if_actual_breakpoint_missing = true;
-
- is_actual_breakpoint = true;
+ bool stopped_by_hitting_breakpoint = false;
+ bool stopped_by_completing_stepi = false;
+ bool stopped_watchpoint = false;
+ std::optional<addr_t> address;
+
+ // exc_code 1
+ if (exc_code == 1) {
+ if (exc_sub_code == 0) {
+ stopped_by_completing_stepi = true;
+ } else {
+ // Ambiguous: could be signalling a
+ // breakpoint or watchpoint hit.
+ stopped_by_hitting_breakpoint = true;
+ stopped_watchpoint = true;
+ address = exc_sub_code;
+ }
+ }
+
+ // exc_code 2
+ if (exc_code == 2) {
+ if (exc_sub_code == 0)
+ stopped_by_hitting_breakpoint = true;
+ else {
+ stopped_by_hitting_breakpoint = true;
+ // Intel KDP software breakpoint
if (!pc_already_adjusted)
pc_decrement = 1;
}
- break;
+ }
- case llvm::Triple::arm:
- case llvm::Triple::thumb:
- if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
- {
- // LWP_TODO: We need to find the WatchpointResource that matches
- // the address, and evaluate its Watchpoints.
-
- // It's a watchpoint, then, if the exc_sub_code indicates a
- // known/enabled data break address from our watchpoint list.
- lldb::WatchpointSP wp_sp;
- if (target)
- wp_sp = target->GetWatchpointList().FindByAddress(
- (lldb::addr_t)exc_sub_code);
- if (wp_sp && wp_sp->IsEnabled()) {
- return StopInfo::CreateStopReasonWithWatchpointID(thread,
- wp_sp->GetID());
- } else {
- is_actual_breakpoint = true;
- is_trace_if_actual_breakpoint_missing = true;
- }
- } else if (exc_code == 1) // EXC_ARM_BREAKPOINT
- {
- is_actual_breakpoint = true;
- is_trace_if_actual_breakpoint_missing = true;
- } else if (exc_code == 0) // FIXME not EXC_ARM_BREAKPOINT but a kernel
- // is currently returning this so accept it
- // as indicating a breakpoint until the
- // kernel is fixed
- {
- is_actual_breakpoint = true;
- is_trace_if_actual_breakpoint_missing = true;
- }
- break;
+ // exc_code 3
+ if (exc_code == 3)
+ stopped_by_completing_stepi = true;
- case llvm::Triple::aarch64_32:
- case llvm::Triple::aarch64: {
- // xnu describes three things with type EXC_BREAKPOINT:
- //
- // exc_code 0x102 [EXC_ARM_DA_DEBUG], exc_sub_code addr-of-insn
- // Watchpoint access. exc_sub_code is the address of the
- // instruction which trigged the watchpoint trap.
- // debugserver may add the watchpoint number that was triggered
- // in exc_sub_sub_code.
- //
- // exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code 0
- // Instruction step has completed.
- //
- // exc_code 1 [EXC_ARM_BREAKPOINT], exc_sub_code address-of-instruction
- // Software breakpoint instruction executed.
-
- if (exc_code == 1 && exc_sub_code == 0) // EXC_ARM_BREAKPOINT
- {
- // This is hit when we single instruction step aka MDSCR_EL1 SS bit 0
- // is set
- is_actual_breakpoint = true;
- is_trace_if_actual_breakpoint_missing = true;
- if (thread.GetTemporaryResumeState() != eStateStepping)
- not_stepping_but_got_singlestep_exception = true;
- }
- if (exc_code == 0x102) // EXC_ARM_DA_DEBUG
- {
- // LWP_TODO: We need to find the WatchpointResource that matches
- // the address, and evaluate its Watchpoints.
-
- // It's a watchpoint, then, if the exc_sub_code indicates a
- // known/enabled data break address from our watchpoint list.
- lldb::WatchpointSP wp_sp;
- if (target)
- wp_sp = target->GetWatchpointList().FindByAddress(
- (lldb::addr_t)exc_sub_code);
- if (wp_sp && wp_sp->IsEnabled()) {
- return StopInfo::CreateStopReasonWithWatchpointID(thread,
- wp_sp->GetID());
- }
- // EXC_ARM_DA_DEBUG seems to be reused for EXC_BREAKPOINT as well as
- // EXC_BAD_ACCESS
- if (thread.GetTemporaryResumeState() == eStateStepping)
- return StopInfo::CreateStopReasonToTrace(thread);
+ // exc_code 0x102
+ if (exc_code == 0x102 && exc_sub_code != 0) {
+ if (cpu == llvm::Triple::arm || cpu == llvm::Triple::thumb) {
+ stopped_by_hitting_breakpoint = true;
+ stopped_by_completing_stepi = true;
}
- // It looks like exc_sub_code has the 4 bytes of the instruction that
- // triggered the exception, i.e. our breakpoint opcode
- is_actual_breakpoint = exc_code == 1;
- break;
+ stopped_watchpoint = true;
+ address = exc_sub_code;
}
- default:
- break;
- }
+ // The Mach Exception may have been ambiguous --
+ // e.g. we stopped either because of a breakpoint
+ // or a watchpoint. We'll disambiguate which it
+ // really was.
- if (is_actual_breakpoint) {
- RegisterContextSP reg_ctx_sp(thread.GetRegisterContext());
+ if (stopped_by_hitting_breakpoint) {
addr_t pc = reg_ctx_sp->GetPC() - pc_decrement;
- ProcessSP process_sp(thread.CalculateProcess());
-
- lldb::BreakpointSiteSP bp_site_sp;
- if (process_sp)
+ if (address)
+ bp_site_sp =
+ process_sp->GetBreakpointSiteList().FindByAddress(*address);
+ if (!bp_site_sp && reg_ctx_sp) {
bp_site_sp = process_sp->GetBreakpointSiteList().FindByAddress(pc);
+ }
if (bp_site_sp && bp_site_sp->IsEnabled()) {
- // Update the PC if we were asked to do so, but only do so if we find
- // a breakpoint that we know about cause this could be a trap
- // instruction in the code
- if (pc_decrement > 0 && adjust_pc_if_needed)
- reg_ctx_sp->SetPC(pc);
-
- // If the breakpoint is for this thread, then we'll report the hit,
- // but if it is for another thread, we can just report no reason. We
- // don't need to worry about stepping over the breakpoint here, that
- // will be taken care of when the thread resumes and notices that
- // there's a breakpoint under the pc. If we have an operating system
- // plug-in, we might have set a thread specific breakpoint using the
- // operating system thread ID, so we can't make any assumptions about
- // the thread ID so we must always report the breakpoint regardless
- // of the thread.
+ // We've hit this breakpoint, whether it was intended for this thread
+ // or not. Clear this in the Tread object so we step past it on resume.
+ thread.SetThreadHitBreakpointSite();
+
+ // If we have an operating system plug-in, we might have set a thread
+ // specific breakpoint using the operating system thread ID, so we
+ // can't make any assumptions about the thread ID so we must always
+ // report the breakpoint regardless of the thread.
if (bp_site_sp->ValidForThisThread(thread) ||
- thread.GetProcess()->GetOperatingSystem() != nullptr)
+ thread.GetProcess()->GetOperatingSystem() != nullptr) {
+ // Update the PC if we were asked to do so, but only do so if we find
+ // a breakpoint that we know about cause this could be a trap
+ // instruction in the code
+ if (pc_decrement > 0 && adjust_pc_if_needed && reg_ctx_sp)
+ reg_ctx_sp->SetPC(pc);
return StopInfo::CreateStopReasonWithBreakpointSiteID(
thread, bp_site_sp->GetID());
- else if (is_trace_if_actual_breakpoint_missing)
- return StopInfo::CreateStopReasonToTrace(thread);
- else
+ } else {
return StopInfoSP();
+ }
}
+ }
- // Don't call this a trace if we weren't single stepping this thread.
- if (is_trace_if_actual_breakpoint_missing &&
- thread.GetTemporaryResumeState() == eStateStepping) {
- return StopInfo::CreateStopReasonToTrace(thread);
+ // Breakpoint-hit events are handled.
+ // Now handle watchpoints.
+
+ if (stopped_watchpoint && address) {
+ WatchpointResourceSP wp_rsrc_sp =
+ target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
+ *address);
+ if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
+ return StopInfo::CreateStopReasonWithWatchpointID(
+ thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
}
}
+
+ // Finally, handle instruction step.
+
+ if (stopped_by_completing_stepi) {
+ if (thread.GetTemporaryResumeState() != eStateStepping)
+ not_stepping_but_got_singlestep_exception = true;
+ else
+ return StopInfo::CreateStopReasonToTrace(thread);
+ }
+
} break;
case 7: // EXC_SYSCALL
diff --git a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
index f383b3d40a4f3a..8b12b87eacbc61 100644
--- a/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+++ b/lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
@@ -377,24 +377,17 @@ void ProcessWindows::RefreshStateAfterStop() {
if (!stop_thread)
return;
- switch (active_exception->GetExceptionCode()) {
- case EXCEPTION_SINGLE_STEP: {
- RegisterContextSP register_context = stop_thread->GetRegisterContext();
- const uint64_t pc = register_context->GetPC();
- BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
- if (site && site->ValidForThisThread(*stop_thread)) {
- LLDB_LOG(log,
- "Single-stepped onto a breakpoint in process {0} at "
- "address {1:x} with breakpoint site {2}",
- m_session_data->m_debugger->GetProcess().GetProcessId(), pc,
- site->GetID());
- stop_info = StopInfo::CreateStopReasonWithBreakpointSiteID(*stop_thread,
- site->GetID());
- stop_thread->SetStopInfo(stop_info);
+ RegisterContextSP register_context = stop_thread->GetRegisterContext();
+ uint64_t pc = register_context->GetPC();
- return;
- }
+ // If we're at a BreakpointSite, mark this as an Unexecuted Breakpoint.
+ // We'll clear that state if we've actually executed the breakpoint.
+ BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
+ if (site && site->IsEnabled())
+ stop_thread->SetThreadStoppedAtUnexecutedBP(pc);
+ switch (active_exception->GetExceptionCode()) {
+ case EXCEPTION_SINGLE_STEP: {
auto *reg_ctx = static_cast<RegisterContextWindows *>(
stop_thread->GetRegisterContext().get());
uint32_t slot_id = reg_ctx->GetTriggeredHardwareBreakpointSlotId();
@@ -419,8 +412,6 @@ void ProcessWindows::RefreshStateAfterStop() {
}
case EXCEPTION_BREAKPOINT: {
- RegisterContextSP register_context = stop_thread->GetRegisterContext();
-
int breakpoint_size = 1;
switch (GetTarget().GetArchitecture().GetMachine()) {
case llvm::Triple::aarch64:
@@ -443,9 +434,9 @@ void ProcessWindows::RefreshStateAfterStop() {
}
// The current PC is AFTER the BP opcode, on all architectures.
- uint64_t pc = register_context->GetPC() - breakpoint_size;
+ pc = register_context->GetPC() - breakpoint_size;
- BreakpointSiteSP site(GetBreakpointSiteList().FindByAddress(pc));
+ site = GetBreakpointSiteList().FindByAddress(pc);
if (site) {
LLDB_LOG(log,
"detected breakp...
[truncated]
|
mark it as a `reason:breakpoint` so the correct StopInfo is created.
NativeProcessFreeBSD::MonitorSIGTRAP (MIPS) and NativeProcessLinux::MonitorBreakpoint (MIPS, armv7, RISCV, LoongArch) do instruction stepping by setting a breakpoint on the next instruction that will be executed, and resuming the thread. They check that there was one of these "instruction step" breakpoints set for this thread, and rewrite the stop reason from "breakpoint hit" to "trace". However, if a user is sitting at a BreakpointSite (break instruction), and does `stepi`, then a breakpoint instruction will be put on the next instruction, the thread resumes, and we stop at the same location having hit our *original* breakpoint, not the one inserted on the next instruction for stepping. The stop reason is rewritten to "trace" but the pc hasn't advanced. The stepping logic will try to instruction step again and it infinite loops. I changed NativeProcessLinux and NativeProcessFreeBSD to only rewrite the stop reason to "trace" when the pc is the breakpoint we added for the insn-step. I don't have any of the devices that use this behavior, but I did at least compile test NativeProcessLinux in an Ubuntu AArch64 VM (but my core doesn't support aarch32 so I can't test it).
Regarding the third of my todo's
I've tested the new breakpoint stepping behavior with the old debugserver on the armv7k watches and it works well enough, I won't change lldb to handle this specially. |
I have a fix for the fourth of five issues needing investigation - the debuginfo dexter test fails. In this case, it places a breakpoint on every source line, then step-in's through the program. With old lldb, we would step to a BreakpointLocation at the start of a source line, and the breakpoint-hit stop reason would override the "step-in completed" stop reason. With this new algorithm, we stop at the BreakpointLocation that has not executed yet, so our stop reason is "step-in completed". Then when we step again, we hit the breakpoint we're stopped at, and the stop reason is "breakpoint-hit" -- but we have not advanced. So dexter needs to step twice to get past these breakpoints-on-every-source-line as it step-in's through the program. I have a PR that fixes it (when dexter steps, if it stops at a BreakpointLocation with a step-in-completed stop reason, step again), asking around if I should put it up as a separate PR or include it as a commit on this. |
dexter sets breakpoints on every source line in the file. Then it step-in's to every source line, and tests are written to expect that one step will move between lines, hitting the breakpoints that are set there. With this new algorithm, lldb will now step to the next source line, and be stopped at a breakpoint, but it has not yet hit the breakpoint. The stop reason will be "step-in completed". When you do "step-in" again, we will hit the breakpoint instruction, stop with a "breakpoint-hit" stop reason -- but the pc has not advanced. Then when you "step-in" again, we will move to the next source line. In short, lldb needs to step twice in a situation with breakpoints on every source line and stepping across them. This patch detects when lldb is behaving this way, and preserves the behavior the tests expect. It will step-in, and if we stop with a step-in stop reason and we are stopped at a breakpoint site, we will step once again to hit the breakpoint.
lldb will change how it reports stop reasons around breakpoints in the near future. I landed an earlier version of this change and noticed debuginfo test failures on the CI bots due to the changes. I'm addressing the issues found by CI at llvm#105594 and will re-land once I've done all of them. Currently, when lldb stops at a breakpoint instruction -- but has not yet executed the instruction -- it will overwrite the thread's Stop Reason with "breakpoint-hit". This caused bugs when the original stop reason was important to the user - for instance, a watchpoint on an AArch64 system where we have to instruction-step past the watchpoint to find the new value. Normally we would instruction step, fetch the new value, then report the user that a watchpoint has been hit with the old and new values. But if the instruction after this access is a breakpoint site, we overwrite the "watchpoint hit" stop reason (and related actions) with "breakpoint hit". dexter sets breakpoints on all source lines, then steps line-to-line, hitting the breakpoints. But with this new behavior, we see two steps per source line: The first step gets us to the start of the next line, with a "step completed" stop reason. Then we step again and we execute the breakpoint instruction, stop with the pc the same, and report "breakpoint hit". Now we can step a second time and move past the breakpoint. I've changed the `step` method in LLDB.py to check if we step to a breakpoint site but have a "step completed" stop reason -- in which case we have this new breakpoint behavior, and we need to step a second time to actually hit the breakpoint like the debuginfo tests expect.
This reverts commit a82695c.
lldb will change how it reports stop reasons around breakpoints in the near future. I landed an earlier version of this change and noticed debuginfo test failures on the CI bots due to the changes. I'm addressing the issues found by CI at #105594 and will re-land once I've done all of them. Currently, when lldb stops at a breakpoint instruction -- but has not yet executed the instruction -- it will overwrite the thread's Stop Reason with "breakpoint-hit". This caused bugs when the original stop reason was important to the user - for instance, a watchpoint on an AArch64 system where we have to instruction-step past the watchpoint to find the new value. Normally we would instruction step, fetch the new value, then report the user that a watchpoint has been hit with the old and new values. But if the instruction after this access is a breakpoint site, we overwrite the "watchpoint hit" stop reason (and related actions) with "breakpoint hit". dexter sets breakpoints on all source lines, then steps line-to-line, hitting the breakpoints. But with this new behavior, we see two steps per source line: The first step gets us to the start of the next line, with a "step completed" stop reason. Then we step again and we execute the breakpoint instruction, stop with the pc the same, and report "breakpoint hit". Now we can step a second time and move past the breakpoint. I've changed the `step` method in LLDB.py to check if we step to a breakpoint site but have a "step completed" stop reason -- in which case we have this new breakpoint behavior, and we need to step a second time to actually hit the breakpoint like the debuginfo tests expect.
Landing this in a separate PR llvm#108504 This reverts commit 48a0fc1.
Breaking this patch out into a separate PR llvm#108518 This reverts commit 3408ce1.
(1) the armv7/aarch32 step-by-breakpoint fix for lldb-server is merged on github main. which leaves (5) Martin's mingw finish bug report |
Looking at this again. The last issue that I needed to investigate was Martin's mingw testsuite that had a regression when I landed it. The test program was https://github.com/mstorsjo/llvm-mingw/blob/master/test/hello-exception.cpp and when run
put a breakpoint on Important thing is that the failure is not mingw specific, I'll probably add this as an API test in lldb, maybe updated to not require a flag to run |
test failure with this change in breakpoint behavior.
✅ With the latest revision this PR passed the C/C++ code formatter. |
the breakpoint. Being in the correct stack frame is not sufficient. Fixing the case of hitting a breakpoint on the last instruction of a function and doing `finish`. A ThreadPlanStepOut is pushed, the thread is set to eStateRunning, and then because we're at a breakpoint site that we've hit, a ThreadPlanStepOverBreakpoint is pushed on the thread plan stack. It sets itself to AutoContinue==true because the thread state is eStateRunning. We instruction step past the breakpoint, stop with stop reason `trace`. ThreadPlanStepOverBreakpoint has completed, and it says that we should AutoContinue. ThreadPlanStepOut detects that it is now in the stack frame it wanted to step to -- does not check whether it hit its breakpoint -- and pulls it self off the stack, removes its breakpoint. We resume execution via the ThreadPlanStepOverBreakpoint's AutoContinue, and there is now no breakpoint for ThreadPlanStepOut - we lose control of the process. Instead, until we hit the ThreadPlanStepOut breakpoint, we say that it has not completed. So now in this scenario of a breakpoint hit on the last instruction and "finish", we instruction step, then continue the thread and immediately hit the ThreadPlanStepOut breakpoint that we're sitting at (but have not yet executed). This fixes the llvm-mingw test failure that Martin reported, and I captured in the new API test TestEmptyFuncThreadStepOut.py.
@JDevlieghere @jimingham OK I'm at the point where I'm ready to land this PR again. To recap, I landed this originally in July 2024 via #96260 . The problem Martin observed was a bug with ThreadPlanStepOut and ThreadPlanStepOverBreakpoint when you hit a breakpoint on the last instruction of a function. I also added an lldb API test that does this. I fixed this with a change to ThreadPlanStepOut which requires it to hit its breakpoint before it considers itself complete. I looked for other thread plans that may have similar issues but didn't see any that matched the same code pattern as ThreadPlanStepOut did. I believe this is ready to land now. |
The details of the bug that Martin found were actually bit a interesting. The core of the issue is that the thread has a ThreadPlanStepOut, and before we resume execution, because we're at a BreakpointSite that has been hit, we push a ThreadPlanStepOverBreakpoint plan. It sees that the thread's state is eStateRunning (not eStateStepping -- so this is a thread doing a "continue" style resume). StepOverBreakpoint marks itself as "AutoResume == true" so that when it completes, it will run the thread as it was intended to be doing originally. We do our instruction step. StepOverBreakpoint has completed, and it says we should resume execution. StepOut says it has completed execution because the StackID == the StackID it needed to execute to. It pops itself off the stack and removes its breakpoint. Thread::ShouldResume uses the StepOverBreakpoint's "auto resume == true" state to resume execution, and we lose control. Jim points out that StepOverBreakpoint is unique because it was not tied to any higher level/controlling thread plan. It was injected on the thread just before we resume execution, and it tries to guess is the thread wants to stop when it's done or resume execution based on the thread's running state when it's added. He thinks StepOverBreakpoint should maybe be owned by the first thread plan on the stack, so when StepOut removes itself from the stack, StepOverBreakpoint's opinion about resuming is not relevant. The other part I did not dig in to was what exactly Thread::ShouldStop is doing when the current thread plan (StepOverBreakpoint) says 'auto resume' and the next one up the stack (StepOut) says "I'm complete, stop", and it resumed. I wanted to avoid touching something as central as Thread::ShouldStop's logic here, but I don't quite understand why the result of these two inputs was, "resume". |
Small change in plan. The original PR I landed in July 2024 was #96260 and when we found a few problems with CIs and wider testing, I reverted it and opened this PR to come up with fixes for all the issues. But I was able to land all of the individual fixes (in ways that worked with the old stepping algorithm and the new) on llvm-project main by themselves. I'll either use this to track the rebased version of my July 2024 PR, or create a clean one with just the original fix rebased and none of the other work in progress, once the last fix to ThreadPlanStepOut is merged. |
I'm abandoning this PR where I was working through all the CI regressions, but landed them in separate clean PRs. I created #126988 to re-land the July 2024 patch, rebased to current sources. |
lldb will change how it reports stop reasons around breakpoints in the near future. I landed an earlier version of this change and noticed debuginfo test failures on the CI bots due to the changes. I'm addressing the issues found by CI at llvm#105594 and will re-land once I've done all of them. Currently, when lldb stops at a breakpoint instruction -- but has not yet executed the instruction -- it will overwrite the thread's Stop Reason with "breakpoint-hit". This caused bugs when the original stop reason was important to the user - for instance, a watchpoint on an AArch64 system where we have to instruction-step past the watchpoint to find the new value. Normally we would instruction step, fetch the new value, then report the user that a watchpoint has been hit with the old and new values. But if the instruction after this access is a breakpoint site, we overwrite the "watchpoint hit" stop reason (and related actions) with "breakpoint hit". dexter sets breakpoints on all source lines, then steps line-to-line, hitting the breakpoints. But with this new behavior, we see two steps per source line: The first step gets us to the start of the next line, with a "step completed" stop reason. Then we step again and we execute the breakpoint instruction, stop with the pc the same, and report "breakpoint hit". Now we can step a second time and move past the breakpoint. I've changed the `step` method in LLDB.py to check if we step to a breakpoint site but have a "step completed" stop reason -- in which case we have this new breakpoint behavior, and we need to step a second time to actually hit the breakpoint like the debuginfo tests expect. (cherry picked from commit 93e45a6)
lldb will change how it reports stop reasons around breakpoints in the near future. I landed an earlier version of this change and noticed debuginfo test failures on the CI bots due to the changes. I'm addressing the issues found by CI at llvm#105594 and will re-land once I've done all of them. Currently, when lldb stops at a breakpoint instruction -- but has not yet executed the instruction -- it will overwrite the thread's Stop Reason with "breakpoint-hit". This caused bugs when the original stop reason was important to the user - for instance, a watchpoint on an AArch64 system where we have to instruction-step past the watchpoint to find the new value. Normally we would instruction step, fetch the new value, then report the user that a watchpoint has been hit with the old and new values. But if the instruction after this access is a breakpoint site, we overwrite the "watchpoint hit" stop reason (and related actions) with "breakpoint hit". dexter sets breakpoints on all source lines, then steps line-to-line, hitting the breakpoints. But with this new behavior, we see two steps per source line: The first step gets us to the start of the next line, with a "step completed" stop reason. Then we step again and we execute the breakpoint instruction, stop with the pc the same, and report "breakpoint hit". Now we can step a second time and move past the breakpoint. I've changed the `step` method in LLDB.py to check if we step to a breakpoint site but have a "step completed" stop reason -- in which case we have this new breakpoint behavior, and we need to step a second time to actually hit the breakpoint like the debuginfo tests expect. (cherry picked from commit 93e45a6)
lldb will change how it reports stop reasons around breakpoints in the near future. I landed an earlier version of this change and noticed debuginfo test failures on the CI bots due to the changes. I'm addressing the issues found by CI at llvm#105594 and will re-land once I've done all of them. Currently, when lldb stops at a breakpoint instruction -- but has not yet executed the instruction -- it will overwrite the thread's Stop Reason with "breakpoint-hit". This caused bugs when the original stop reason was important to the user - for instance, a watchpoint on an AArch64 system where we have to instruction-step past the watchpoint to find the new value. Normally we would instruction step, fetch the new value, then report the user that a watchpoint has been hit with the old and new values. But if the instruction after this access is a breakpoint site, we overwrite the "watchpoint hit" stop reason (and related actions) with "breakpoint hit". dexter sets breakpoints on all source lines, then steps line-to-line, hitting the breakpoints. But with this new behavior, we see two steps per source line: The first step gets us to the start of the next line, with a "step completed" stop reason. Then we step again and we execute the breakpoint instruction, stop with the pc the same, and report "breakpoint hit". Now we can step a second time and move past the breakpoint. I've changed the `step` method in LLDB.py to check if we step to a breakpoint site but have a "step completed" stop reason -- in which case we have this new breakpoint behavior, and we need to step a second time to actually hit the breakpoint like the debuginfo tests expect. (cherry picked from commit 93e45a6)
This is a PR to track the changes I need to make to
#96260
to address the CI bot failures when I merged that PR, as well as investigating the windows problem Martin Storsjö saw. There's likely going to be 4-5 additional changes to this PR before it's ready to be merged again, I'll create a new PR with the patches I originally landed and fixing those separate issues in additional commits.
Issues:
trace
stop-reason, so it will fail to recognize stepping over a breakpoint correctly.reason = "breakpoint"
when we have aswbreak
orhwbreak
from the remote stub, like it does today forwatch
/awatch
. [lldb] Claim to support swbreak and hwbreak packets when debugging a gdbremote #102873trace
event, need to verify that we can step / step past breakpoints correctly on old watches running that old debugserver.finish
failure on windows.