Send an explicit interrupt to cancel an attach waitfor. #72565

jimingham · 2023-11-16T20:32:40Z

Currently when you interrupt a:

(lldb) process attach -w -n some_process

lldb just closes the connection to the stub and kills the lldb_private::Process it made for the attach. The stub at the other end notices the connection go down and exits because of that. But when communication to a device is handled through some kind of proxy server which isn't as well behaved as one would wish, that signal might not be reliable, causing debugserver to persist on the machine, waiting to steal the next instance of that process.

We can work around those failures by sending an explicit interrupt before closing down the connection. The stub will also have to be waiting for the interrupt for this to make any difference. I changed debugserver to do that.

I didn't make the equivalent change in lldb-server. So long as you aren't faced with a flakey connection, this should not be necessary.

Currently when you interrupt a: (lldb) process attach -w -n some_process lldb just closes the connection to the stub and kills the process it made for the attach. The stub at the other end notices the connection go down and exits because of that. But when communication to a device is handled through some kind of proxy server, that signal might not be reliable, causing debugserver to persist on the machine. We can work around those failures by sending an explicit interrupt before closing down the connection. The stub will also have to be waiting for the interrupt for this to make any difference. I changed debugserver to do that. I didn't make the equivalent change in lldb-server. So long as you aren't faced with a flakey connection, this should not be necessary.

llvmbot · 2023-11-16T20:33:10Z

@llvm/pr-subscribers-lldb

Author: None (jimingham)

Changes

Currently when you interrupt a:

(lldb) process attach -w -n some_process

lldb just closes the connection to the stub and kills the process it made for the attach. The stub at the other end notices the connection go down and exits because of that. But when communication to a device is handled through some kind of proxy server, that signal might not be reliable, causing debugserver to persist on the machine.

We can work around those failures by sending an explicit interrupt before closing down the connection. The stub will also have to be waiting for the interrupt for this to make any difference. I changed debugserver to do that.

I didn't make the equivalent change in lldb-server. So long as you aren't faced with a flakey connection, this should not be necessary.

Full diff: https://github.com/llvm/llvm-project/pull/72565.diff

3 Files Affected:

(modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+4-2)
(modified) lldb/source/Target/Process.cpp (+8-1)
(modified) lldb/tools/debugserver/source/DNB.cpp (+28-2)

diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index f90a561aae2e3b7..b6fe1bd7104b280 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -2372,8 +2372,10 @@ Status ProcessGDBRemote::DoHalt(bool &caused_stop) {
   Status error;
 
   if (m_public_state.GetValue() == eStateAttaching) {
-    // We are being asked to halt during an attach. We need to just close our
-    // file handle and debugserver will go away, and we can be done...
+    // We are being asked to halt during an attach. We used to just close our
+    // file handle and debugserver will go away, but with remote proxies, it
+    // is better to send a positive signal, so let's send the interrupt first...
+    caused_stop = m_gdb_comm.Interrupt(GetInterruptTimeout());
     m_gdb_comm.Disconnect();
   } else
     caused_stop = m_gdb_comm.Interrupt(GetInterruptTimeout());
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 21b80b8240ab64b..f3da2839e262e23 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -3156,8 +3156,8 @@ Status Process::Halt(bool clear_thread_plans, bool use_run_lock) {
     // Don't hijack and eat the eStateExited as the code that was doing the
     // attach will be waiting for this event...
     RestoreProcessEvents();
-    SetExitStatus(SIGKILL, "Cancelled async attach.");
     Destroy(false);
+    SetExitStatus(SIGKILL, "Cancelled async attach.");
     return Status();
   }
 
@@ -3843,6 +3843,13 @@ thread_result_t Process::RunPrivateStateThread(bool is_secondary_thread) {
                   ") woke up with an interrupt while attaching - "
                   "forwarding interrupt.",
                   __FUNCTION__, static_cast<void *>(this), GetID());
+        // The server may be spinning waiting for a process to appear, in which
+        // case we should tell it to stop doing that.  Normally, we don't NEED
+        // to do that because we will next close the communication to the stub
+        // and that will get it to shut down.  But there are remote debugging
+        // cases where relying on that side-effect causes the shutdown to be 
+        // flakey, so we should send a positive signal to interrupt the wait. 
+        Status error = HaltPrivate();
         BroadcastEvent(eBroadcastBitInterrupt, nullptr);
       } else if (StateIsRunningState(m_last_broadcast_state)) {
         LLDB_LOGF(log,
diff --git a/lldb/tools/debugserver/source/DNB.cpp b/lldb/tools/debugserver/source/DNB.cpp
index f6c1130fdd8054c..0ec50df42d1fedc 100644
--- a/lldb/tools/debugserver/source/DNB.cpp
+++ b/lldb/tools/debugserver/source/DNB.cpp
@@ -50,6 +50,7 @@
 #include "MacOSX/MachProcess.h"
 #include "MacOSX/MachTask.h"
 #include "MacOSX/ThreadInfo.h"
+#include "RNBRemote.h"
 
 typedef std::shared_ptr<MachProcess> MachProcessSP;
 typedef std::map<nub_process_t, MachProcessSP> ProcessMap;
@@ -745,7 +746,6 @@ DNBProcessAttachWait(RNBContext *ctx, const char *waitfor_process_name,
         break;
       }
     } else {
-
       // Get the current process list, and check for matches that
       // aren't in our original list. If anyone wants to attach
       // to an existing process by name, they should do it with
@@ -799,7 +799,33 @@ DNBProcessAttachWait(RNBContext *ctx, const char *waitfor_process_name,
         break;
       }
 
-      ::usleep(waitfor_interval); // Sleep for WAITFOR_INTERVAL, then poll again
+      // Now we're going to wait a while before polling again.  But we also
+      // need to check whether we've gotten an event from the debugger  
+      // telling us to interrupt the wait.  So we'll use the wait for a possible
+      // next event to also be our short pause...
+      struct timespec short_timeout;
+      DNBTimer::OffsetTimeOfDay(&short_timeout, 0, waitfor_interval);
+      uint32_t event_mask = RNBContext::event_read_packet_available 
+          | RNBContext::event_read_thread_exiting;
+      nub_event_t set_events = ctx->Events().WaitForSetEvents(event_mask, 
+          &short_timeout);
+      if (set_events & RNBContext::event_read_packet_available) {
+        // If we get any packet from the debugger while waiting on the async,
+        // it has to be telling us to interrupt.  So always exit here.
+        // Over here in DNB land we can see that there was a packet, but all
+        // the methods to actually handle it are protected.  It's not worth
+        // rearranging all that just to get which packet we were sent...
+        DNBLogError("Interrupted by packet while waiting for '%s' to appear.\n",
+                   waitfor_process_name);
+        break;
+      }
+      if (set_events & RNBContext::event_read_thread_exiting) {
+        // The packet thread is shutting down, get out of here...
+        DNBLogError("Interrupted by packet thread shutdown while waiting for "
+                    "%s to appear.\n", waitfor_process_name);
+        break;
+      }
+      
     }
   }

github-actions · 2023-11-16T20:35:13Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff a322d50804c5248f9e2981f3733bcb598fa13d51 f6be566cda37974027d2fb591465f385965bd7d1 -- lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp lldb/source/Target/Process.cpp lldb/tools/debugserver/source/DNB.cpp

View the diff from clang-format here.

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index f3da2839e2..382daa92b6 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -3847,8 +3847,8 @@ thread_result_t Process::RunPrivateStateThread(bool is_secondary_thread) {
         // case we should tell it to stop doing that.  Normally, we don't NEED
         // to do that because we will next close the communication to the stub
         // and that will get it to shut down.  But there are remote debugging
-        // cases where relying on that side-effect causes the shutdown to be 
-        // flakey, so we should send a positive signal to interrupt the wait. 
+        // cases where relying on that side-effect causes the shutdown to be
+        // flakey, so we should send a positive signal to interrupt the wait.
         Status error = HaltPrivate();
         BroadcastEvent(eBroadcastBitInterrupt, nullptr);
       } else if (StateIsRunningState(m_last_broadcast_state)) {
diff --git a/lldb/tools/debugserver/source/DNB.cpp b/lldb/tools/debugserver/source/DNB.cpp
index 0ec50df42d..37d71fd47a 100644
--- a/lldb/tools/debugserver/source/DNB.cpp
+++ b/lldb/tools/debugserver/source/DNB.cpp
@@ -800,15 +800,15 @@ DNBProcessAttachWait(RNBContext *ctx, const char *waitfor_process_name,
       }
 
       // Now we're going to wait a while before polling again.  But we also
-      // need to check whether we've gotten an event from the debugger  
+      // need to check whether we've gotten an event from the debugger
       // telling us to interrupt the wait.  So we'll use the wait for a possible
       // next event to also be our short pause...
       struct timespec short_timeout;
       DNBTimer::OffsetTimeOfDay(&short_timeout, 0, waitfor_interval);
-      uint32_t event_mask = RNBContext::event_read_packet_available 
-          | RNBContext::event_read_thread_exiting;
-      nub_event_t set_events = ctx->Events().WaitForSetEvents(event_mask, 
-          &short_timeout);
+      uint32_t event_mask = RNBContext::event_read_packet_available |
+                            RNBContext::event_read_thread_exiting;
+      nub_event_t set_events =
+          ctx->Events().WaitForSetEvents(event_mask, &short_timeout);
       if (set_events & RNBContext::event_read_packet_available) {
         // If we get any packet from the debugger while waiting on the async,
         // it has to be telling us to interrupt.  So always exit here.
@@ -816,16 +816,16 @@ DNBProcessAttachWait(RNBContext *ctx, const char *waitfor_process_name,
         // the methods to actually handle it are protected.  It's not worth
         // rearranging all that just to get which packet we were sent...
         DNBLogError("Interrupted by packet while waiting for '%s' to appear.\n",
-                   waitfor_process_name);
+                    waitfor_process_name);
         break;
       }
       if (set_events & RNBContext::event_read_thread_exiting) {
         // The packet thread is shutting down, get out of here...
         DNBLogError("Interrupted by packet thread shutdown while waiting for "
-                    "%s to appear.\n", waitfor_process_name);
+                    "%s to appear.\n",
+                    waitfor_process_name);
         break;
       }
-      
     }
   }

jasonmolenda

LGTM.

JDevlieghere · 2023-11-16T21:37:26Z

lldb/source/Target/Process.cpp

+        // and that will get it to shut down.  But there are remote debugging
+        // cases where relying on that side-effect causes the shutdown to be 
+        // flakey, so we should send a positive signal to interrupt the wait. 
+        Status error = HaltPrivate();


Should we log the return value?

Sure, I'll add that.

JDevlieghere · 2023-11-16T21:39:55Z

lldb/tools/debugserver/source/DNB.cpp

+        // Over here in DNB land we can see that there was a packet, but all
+        // the methods to actually handle it are protected.  It's not worth
+        // rearranging all that just to get which packet we were sent...
+        DNBLogError("Interrupted by packet while waiting for '%s' to appear.\n",


Can we check the packet and assert/log when it's not an interrupt? I know that's technically not allowed by the GDB remote protocol, but that would make it easy to catch such a mistake.

All the code that does the actual packet handling (as opposed to just waiting for a packet type) is private to RNBRemote.cpp, and not available here. I didn't think getting the packet here was worth rejiggering that relationship. You can turn on packet logging in debugserver if you wanted to see which packet this was.

The structure of the gdb remote protocol is such that when you've sent a packet, you can't send anything else until you get a response. "continue" is special in that you can send ^C to the remote stub to interrupt execution. A "waitfor" attach request most reasonably behaves the same as "continue" - the only thing it would make any sense for the debugger to send when we're waiting is ^C. If the debugger were to send anything else, yeah, something has gone very wrong, but when Jim described the layering and how it wasn't straightforward to check that, I didn't push further, it seemed fine to me to assume the debugger didn't do something unsupported like send a non-interrupt packet.

I agree that we are attaching and the only thing we can send a ^C, so no need for anything else here really.

clayborg

Looks good to me from a code perspective as long as this works.

It would be nice to add a test for Darwin only. Luckily this is easy to test to ensure it works.

clayborg · 2023-11-17T19:08:26Z

lldb/tools/debugserver/source/DNB.cpp

+        // Over here in DNB land we can see that there was a packet, but all
+        // the methods to actually handle it are protected.  It's not worth
+        // rearranging all that just to get which packet we were sent...
+        DNBLogError("Interrupted by packet while waiting for '%s' to appear.\n",


I agree that we are attaching and the only thing we can send a ^C, so no need for anything else here really.

since cancelling attach actually does work everywhere that we are currently testing.

github-actions · 2023-11-30T01:28:53Z

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:

darker --check --diff -r a322d50804c5248f9e2981f3733bcb598fa13d51..f6be566cda37974027d2fb591465f385965bd7d1 lldb/test/API/python_api/process/cancel_attach/TestCancelAttach.py

View the diff from darker here.

--- TestCancelAttach.py	2023-11-30 01:25:00.000000 +0000
+++ TestCancelAttach.py	2023-11-30 01:28:51.443331 +0000
@@ -26,14 +26,16 @@
                 # Make this a daemon thread so if we don't manage to interrupt,
                 # Python will keep this thread from hanging the test.
                 threading.Thread.__init__(self, daemon=True)
                 self.target = target
                 self.error = error
-                
+
             def run(self):
-                self.target.AttachToProcessWithName(lldb.SBListener(), "LLDB-No-Such-Process", True, self.error)
-                
+                self.target.AttachToProcessWithName(
+                    lldb.SBListener(), "LLDB-No-Such-Process", True, self.error
+                )
+
         error = lldb.SBError()
         thread = AttachThread(target, error)
         thread.start()
 
         # Now wait till the attach on the child thread has made a process
@@ -48,10 +50,9 @@
         # Now send the attach interrupt:
         target.process.SendAsyncInterrupt()
         # We don't want to stall if we can't interrupt, so join with a timeout:
         thread.join(60)
         if thread.is_alive():
-          self.fail("The attach thread is alive after timeout interval")
+            self.fail("The attach thread is alive after timeout interval")
 
         # Now check the error, should say the attach was interrupted:
         self.assertTrue(error.Fail(), "We succeeded in not attaching")
-

jimingham · 2023-11-30T01:29:18Z

I added a test case for cancelling attach. This should actually work everywhere so I didn't limit it to Darwin. It doesn't test the actual bug I was trying to fix because that would require coming up with an unreliable file handle repeater and that's way more work than it's worth. But since this uses Python threading, and interruption so I wouldn't be surprised if it doesn't work reliably everywhere.

jimingham requested a review from JDevlieghere as a code owner November 16, 2023 20:32

jimingham requested review from jasonmolenda and DavidSpickett and removed request for JDevlieghere and jasonmolenda November 16, 2023 20:32

llvmbot added the lldb label Nov 16, 2023

jimingham requested review from jasonmolenda, clayborg and JDevlieghere November 16, 2023 20:32

jasonmolenda approved these changes Nov 16, 2023

View reviewed changes

JDevlieghere reviewed Nov 16, 2023

View reviewed changes

clayborg reviewed Nov 17, 2023

View reviewed changes

Add a test case for cancelling attach. This should work everywhere,

f6be566

since cancelling attach actually does work everywhere that we are currently testing.

jimingham merged commit d1bf194 into llvm:main Nov 30, 2023

jimingham deleted the attach-wait-cancel branch November 30, 2023 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Send an explicit interrupt to cancel an attach waitfor. #72565

Send an explicit interrupt to cancel an attach waitfor. #72565

Uh oh!

jimingham commented Nov 16, 2023 •

edited

Loading

Uh oh!

llvmbot commented Nov 16, 2023

Uh oh!

github-actions bot commented Nov 16, 2023 •

edited

Loading

Uh oh!

jasonmolenda left a comment

Uh oh!

JDevlieghere Nov 16, 2023

Uh oh!

jimingham Nov 16, 2023

Uh oh!

JDevlieghere Nov 16, 2023

Uh oh!

jimingham Nov 16, 2023

Uh oh!

jasonmolenda Nov 16, 2023

Uh oh!

clayborg Nov 17, 2023

Uh oh!

clayborg left a comment

Uh oh!

clayborg Nov 17, 2023

Uh oh!

github-actions bot commented Nov 30, 2023

Uh oh!

jimingham commented Nov 30, 2023

Uh oh!

Uh oh!

Send an explicit interrupt to cancel an attach waitfor. #72565

Send an explicit interrupt to cancel an attach waitfor. #72565

Uh oh!

Conversation

jimingham commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 16, 2023

Uh oh!

github-actions bot commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonmolenda left a comment

Choose a reason for hiding this comment

Uh oh!

JDevlieghere Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

jimingham Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

JDevlieghere Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

jimingham Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

jasonmolenda Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

clayborg Nov 17, 2023

Choose a reason for hiding this comment

Uh oh!

clayborg left a comment

Choose a reason for hiding this comment

Uh oh!

clayborg Nov 17, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 30, 2023

Uh oh!

jimingham commented Nov 30, 2023

Uh oh!

Uh oh!

jimingham commented Nov 16, 2023 •

edited

Loading

github-actions bot commented Nov 16, 2023 •

edited

Loading