-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[lldb] Correctly annotate threads at a bp site as hitting it #82709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Correctly annotate threads at a bp site as hitting it #82709
Conversation
This is next in my series of "fix the racey tests that fail on greendragon" addressing the failure of TestConcurrentManyBreakpoints.py where we set a breakpoint in a function that 100 threads execute, and we check that we hit the breakpoint 100 times. But sometimes it is only hit 99 times, and the test fails. When we hit a software breakpoint, the pc value for the thread is the address of the breakpoint instruction - as if it had not been hit yet. And because a user might ADD a breakpoint for the current pc from the commandline, when we go to resume execution, any thread that is sitting at a breakpoint site will be silently advanced past the breakpoint instruction (disable bp, instruction step that thread, re-enable bp) before resuming. What this test is exposing is that there is another corner case, a thread that is sitting at a breakpoint site but has not yet executed the breakpoint instruction. The thread will have no stop reason, no mach exception, so it will not be recorded as having hit the breakpoint (because it hasn't yet). But when we resume execution, because it is sitting at a breakpoint site, we advance past it and miss the breakpoint hit. In 2016 Abhishek Aggarwal handled a similar issue with a patch in `ProcessGDBRemote::SetThreadStopInfo()`, adding a breakpoint StopInfo for a thread sitting at a breakpoint site that has no stop reason. debugserver's `jThreadsInfo` would not correctly execute Abhishek's code though because it would respond with `"reason":"none"` for a thread with no stop reason, and `SetThreadStopInfo()` expected an empty reason here. The first part of my patch is to clear the `reason` if it is `"none"` so we flow through the code correctly. On Darwin, though, our stop reply packet (Txx...) includes the `threads`, `thread-pcs`, and `jstopinfo` keys, which give us the tids for all current threads, the pc values for those threads, and `jstopinfo` has a JSON dictionary with the mach exceptions for all threads that have a mach exception. In `ProcessGDBRemote::CalculateThreadStopInfo()` we set the StopInfo for each thread for a private stop and if we have `jstopinfo` it is the source of all the StopInfos. I have to add the same logic here, to give the thread a breakpoint StopInfo even though it hasn't executed the breakpoint yet. In this case we are very early in thread construction and I only have the information in the Txx stop reply packet -- tids, pcs, and jstopinfo, so I can't use the normal general mechanisms of going through the RegisterContext to get the pc, it's a bit different. If I hack debugserver to not issue `jstopinfo`, `CalculateThreadStopInfo` will fall back to sending `qThreadStopInfo` for each thread and going through `ProcessGDBRemote::SetThreadStopInfo()` to set the stop infos (and with the `reason:none` fix, use Abhishek's code). rdar://110549165
@llvm/pr-subscribers-lldb Author: Jason Molenda (jasonmolenda) ChangesThis is next in my series of "fix the racey tests that fail on greendragon" addressing the failure of TestConcurrentManyBreakpoints.py where we set a breakpoint in a function that 100 threads execute, and we check that we hit the breakpoint 100 times. But sometimes it is only hit 99 times, and the test fails. When we hit a software breakpoint, the pc value for the thread is the address of the breakpoint instruction - as if it had not been hit yet. And because a user might ADD a breakpoint for the current pc from the commandline, when we go to resume execution, any thread that is sitting at a breakpoint site will be silently advanced past the breakpoint instruction (disable bp, instruction step that thread, re-enable bp) before resuming. What this test is exposing is that there is another corner case, a thread that is sitting at a breakpoint site but has not yet executed the breakpoint instruction. The thread will have no stop reason, no mach exception, so it will not be recorded as having hit the breakpoint (because it hasn't yet). But when we resume execution, because it is sitting at a breakpoint site, we advance past it and miss the breakpoint hit. In 2016 Abhishek Aggarwal handled a similar issue with a patch in On Darwin, though, our stop reply packet (Txx...) includes the If I hack debugserver to not issue rdar://110549165 Full diff: https://github.com/llvm/llvm-project/pull/82709.diff 2 Files Affected:
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index 629b191f3117aa..eabbc8ad433212 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -1600,6 +1600,26 @@ bool ProcessGDBRemote::CalculateThreadStopInfo(ThreadGDBRemote *thread) {
// has no stop reason.
thread->GetRegisterContext()->InvalidateIfNeeded(true);
if (!GetThreadStopInfoFromJSON(thread, m_jstopinfo_sp)) {
+ // If a thread is stopped at a breakpoint site, set that as the stop
+ // reason even if it hasn't executed the breakpoint instruction yet.
+ // We will silently step over the breakpoint when we resume execution
+ // and miss the fact that this thread hit the breakpoint.
+ const size_t num_thread_ids = m_thread_ids.size();
+ for (size_t i = 0; i < num_thread_ids; i++) {
+ if (m_thread_ids[i] == thread->GetID() && m_thread_pcs.size() > i) {
+ addr_t pc = m_thread_pcs[i];
+ lldb::BreakpointSiteSP bp_site_sp =
+ thread->GetProcess()->GetBreakpointSiteList().FindByAddress(pc);
+ if (bp_site_sp) {
+ if (bp_site_sp->ValidForThisThread(*thread)) {
+ thread->SetStopInfo(
+ StopInfo::CreateStopReasonWithBreakpointSiteID(
+ *thread, bp_site_sp->GetID()));
+ return true;
+ }
+ }
+ }
+ }
thread->SetStopInfo(StopInfoSP());
}
return true;
@@ -1630,7 +1650,7 @@ void ProcessGDBRemote::ParseExpeditedRegisters(
ThreadSP ProcessGDBRemote::SetThreadStopInfo(
lldb::tid_t tid, ExpeditedRegisterMap &expedited_register_map,
- uint8_t signo, const std::string &thread_name, const std::string &reason,
+ uint8_t signo, const std::string &thread_name, std::string reason,
const std::string &description, uint32_t exc_type,
const std::vector<addr_t> &exc_data, addr_t thread_dispatch_qaddr,
bool queue_vars_valid, // Set to true if queue_name, queue_kind and
@@ -1722,6 +1742,8 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
} else {
bool handled = false;
bool did_exec = false;
+ if (reason == "none")
+ reason.clear();
if (!reason.empty()) {
if (reason == "trace") {
addr_t pc = thread_sp->GetRegisterContext()->GetPC();
@@ -1864,11 +1886,10 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
lldb::BreakpointSiteSP bp_site_sp =
thread_sp->GetProcess()->GetBreakpointSiteList().FindByAddress(pc);
- // If the current pc is a breakpoint site then the StopInfo should be
- // set to Breakpoint even though the remote stub did not set it as such.
- // This can happen when the thread is involuntarily interrupted (e.g.
- // due to stops on other threads) just as it is about to execute the
- // breakpoint instruction.
+ // If a thread is stopped at a breakpoint site, set that as the stop
+ // reason even if it hasn't executed the breakpoint instruction yet.
+ // We will silently step over the breakpoint when we resume execution
+ // and miss the fact that this thread hit the breakpoint.
if (bp_site_sp && bp_site_sp->ValidForThisThread(*thread_sp)) {
thread_sp->SetStopInfo(StopInfo::CreateStopReasonWithBreakpointSiteID(
*thread_sp, bp_site_sp->GetID()));
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
index c1ea1cc7905587..f8b49c318180e3 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
@@ -366,7 +366,7 @@ class ProcessGDBRemote : public Process,
lldb::ThreadSP
SetThreadStopInfo(lldb::tid_t tid,
ExpeditedRegisterMap &expedited_register_map, uint8_t signo,
- const std::string &thread_name, const std::string &reason,
+ const std::string &thread_name, std::string reason,
const std::string &description, uint32_t exc_type,
const std::vector<lldb::addr_t> &exc_data,
lldb::addr_t thread_dispatch_qaddr, bool queue_vars_valid,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, but I'll defer to @jimingham as he's the expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems safe to me, and it can't hurt to take care of this corner case here regardless of what the higher levels of lldb does.
It bugs me a little that we're treating what seems like a general problem down in the GDBRemote process plugin. But anyway, if we are going to make this a process plugin policy: that the plugins are responsible for producing breakpoint stop reasons both for when the underlying system generates one and when we see "artificial" hits like this we should probably say so somewhere.
With the couple doc comments aside, I'm fine with this.
@@ -1722,6 +1742,8 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo( | |||
} else { | |||
bool handled = false; | |||
bool did_exec = false; | |||
if (reason == "none") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs a comment, or maybe somewhere else saying that none == empty, so we always convert none to empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the patch where I handle this a little more cleanly, checking that reason
has a value, and that the value is not "none"
, and added a short comment.
Treat `reason:none` as equivalent to no `reason` more cleanly in ProcessGDBRemote::SetThreadStopInfo. Add documentation to the Thread base class `CalculateStopInfo` method noting that the case of a thread at a breakpoint site which has not yet executed the breakpoint must be handled.
This all falls under |
) This is next in my series of "fix the racey tests that fail on greendragon" addressing the failure of TestConcurrentManyBreakpoints.py where we set a breakpoint in a function that 100 threads execute, and we check that we hit the breakpoint 100 times. But sometimes it is only hit 99 times, and the test fails. When we hit a software breakpoint, the pc value for the thread is the address of the breakpoint instruction - as if it had not been hit yet. And because a user might ADD a breakpoint for the current pc from the commandline, when we go to resume execution, any thread that is sitting at a breakpoint site will be silently advanced past the breakpoint instruction (disable bp, instruction step that thread, re-enable bp) before resuming -- whether that thread has hit its breakpoint or not. What this test is exposing is that there is another corner case, a thread that is sitting at a breakpoint site but has not yet executed the breakpoint instruction. The thread will have no stop reason, no mach exception, so it will not be recorded as having hit the breakpoint (because it hasn't yet). But when we resume execution, because it is sitting at a breakpoint site, we advance past it and miss the breakpoint hit. In 2016 Abhishek Aggarwal handled a similar issue with a patch in `ProcessGDBRemote::SetThreadStopInfo()`, adding a breakpoint StopInfo for a thread sitting at a breakpoint site that has no stop reason. debugserver's `jThreadsInfo` would not correctly execute Abhishek's code though because it would respond with `"reason":"none"` for a thread with no stop reason, and `SetThreadStopInfo()` expected an empty reason here. The first part of my patch is to clear the `reason` if it is `"none"` so we flow through the code correctly. On Darwin, though, our stop reply packet (Txx...) includes the `threads`, `thread-pcs`, and `jstopinfo` keys, which give us the tids for all current threads, the pc values for those threads, and `jstopinfo` has a JSON dictionary with the mach exceptions for all threads that have a mach exception. In `ProcessGDBRemote::CalculateThreadStopInfo()` we set the StopInfo for each thread for a private stop and if we have `jstopinfo` it is the source of all the StopInfos. I have to add the same logic here, to give the thread a breakpoint StopInfo even though it hasn't executed the breakpoint yet. In this case we are very early in thread construction and I only have the information in the Txx stop reply packet -- tids, pcs, and jstopinfo, so I can't use the normal general mechanisms of going through the RegisterContext to get the pc, it's a bit different. If I hack debugserver to not issue `jstopinfo`, `CalculateThreadStopInfo` will fall back to sending `qThreadStopInfo` for each thread and going through `ProcessGDBRemote::SetThreadStopInfo()` to set the stop infos (and with the `reason:none` fix, use Abhishek's code). rdar://110549165 (cherry picked from commit 87fadb3)
This is next in my series of "fix the racey tests that fail on greendragon" addressing the failure of TestConcurrentManyBreakpoints.py where we set a breakpoint in a function that 100 threads execute, and we check that we hit the breakpoint 100 times. But sometimes it is only hit 99 times, and the test fails.
When we hit a software breakpoint, the pc value for the thread is the address of the breakpoint instruction - as if it had not been hit yet. And because a user might ADD a breakpoint for the current pc from the commandline, when we go to resume execution, any thread that is sitting at a breakpoint site will be silently advanced past the breakpoint instruction (disable bp, instruction step that thread, re-enable bp) before resuming -- whether that thread has hit its breakpoint or not.
What this test is exposing is that there is another corner case, a thread that is sitting at a breakpoint site but has not yet executed the breakpoint instruction. The thread will have no stop reason, no mach exception, so it will not be recorded as having hit the breakpoint (because it hasn't yet). But when we resume execution, because it is sitting at a breakpoint site, we advance past it and miss the breakpoint hit.
In 2016 Abhishek Aggarwal handled a similar issue with a patch in
ProcessGDBRemote::SetThreadStopInfo()
, adding a breakpoint StopInfo for a thread sitting at a breakpoint site that has no stop reason. debugserver'sjThreadsInfo
would not correctly execute Abhishek's code though because it would respond with"reason":"none"
for a thread with no stop reason, andSetThreadStopInfo()
expected an empty reason here. The first part of my patch is to clear thereason
if it is"none"
so we flow through the code correctly.On Darwin, though, our stop reply packet (Txx...) includes the
threads
,thread-pcs
, andjstopinfo
keys, which give us the tids for all current threads, the pc values for those threads, andjstopinfo
has a JSON dictionary with the mach exceptions for all threads that have a mach exception. InProcessGDBRemote::CalculateThreadStopInfo()
we set the StopInfo for each thread for a private stop and if we havejstopinfo
it is the source of all the StopInfos. I have to add the same logic here, to give the thread a breakpoint StopInfo even though it hasn't executed the breakpoint yet. In this case we are very early in thread construction and I only have the information in the Txx stop reply packet -- tids, pcs, and jstopinfo, so I can't use the normal general mechanisms of going through the RegisterContext to get the pc, it's a bit different.If I hack debugserver to not issue
jstopinfo
,CalculateThreadStopInfo
will fall back to sendingqThreadStopInfo
for each thread and going throughProcessGDBRemote::SetThreadStopInfo()
to set the stop infos (and with thereason:none
fix, use Abhishek's code).rdar://110549165