Skip to content

[lldb] Print a message when background tasks take a while to complete #82799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

JDevlieghere
Copy link
Member

@JDevlieghere JDevlieghere commented Feb 23, 2024

When terminating the debugger, we wait for all background tasks to
complete. Given that there's no way to interrupt those treads, this can
take a while. When that happens, the debugger appears to hang at exit.

The above situation is unfortunately not uncommon when background
downloading of dSYMs is enabled (symbols.auto-download background).
Even when calling dsymForUUID with a reasonable timeout, it can take a
while to complete.

This patch improves the user experience by printing a message from the
driver when it takes more than one (1) second to terminate the debugger.

@llvmbot
Copy link
Member

llvmbot commented Feb 23, 2024

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

When terminating the debugger, we wait for all background tasks to complete. Given that there's no way to interrupt those treads, this can take a while. When that happens, the debugger appears to hang at exit.

The above situation is unfortunately not uncommon when background downloading of dSYMs is enabled (symbols.auto-download background). Even when calling dsymForUUID with a reasonable timeout, it can take a while to complete.

This patch improves the user experience by registering a callback and printing a message when we're waiting on the thread pool for more than a second. Both the timeout and the callback is configurable.

Fixing the underlying issue is a much harder problem. Neither C++ threads no pthread support interrupting threads, so it would have to be a cooperative interruption mechanism, similar to how we deal with this for HostThreads. If we go that route, we should do it at the thread pool level so that it applies to all background tasks.

A cooperative interruption mechanism alone solve the dsymForUUID problem. It's a blocking call to an external binary, so we'd have to monitor for interrupts and kill the execution from yet another thread.


Full diff: https://github.com/llvm/llvm-project/pull/82799.diff

7 Files Affected:

  • (modified) lldb/include/lldb/API/SBDebugger.h (+7)
  • (modified) lldb/include/lldb/API/SBDefines.h (+1)
  • (modified) lldb/include/lldb/Core/Debugger.h (+5)
  • (modified) lldb/include/lldb/lldb-types.h (+1)
  • (modified) lldb/source/API/SBDebugger.cpp (+8)
  • (modified) lldb/source/Core/Debugger.cpp (+29-2)
  • (modified) lldb/tools/driver/Driver.cpp (+6)
diff --git a/lldb/include/lldb/API/SBDebugger.h b/lldb/include/lldb/API/SBDebugger.h
index 62b2f91f5076d5..c2b4e6adfa5d27 100644
--- a/lldb/include/lldb/API/SBDebugger.h
+++ b/lldb/include/lldb/API/SBDebugger.h
@@ -331,6 +331,13 @@ class LLDB_API SBDebugger {
   void SetDestroyCallback(lldb::SBDebuggerDestroyCallback destroy_callback,
                           void *baton);
 
+  /// Register a callback that is called when destroying the thread pool exceeds
+  /// the given timeout.
+  static void
+  SetThreadPoolTimeoutCallback(uint64_t timeout_milliseconds,
+                               lldb::SBDebuggerTimeoutCallback timeout_callback,
+                               void *baton);
+
 #ifndef SWIG
   LLDB_DEPRECATED_FIXME("Use DispatchInput(const void *, size_t)",
                         "DispatchInput(const void *, size_t)")
diff --git a/lldb/include/lldb/API/SBDefines.h b/lldb/include/lldb/API/SBDefines.h
index 1181920677b46f..5df380a250278c 100644
--- a/lldb/include/lldb/API/SBDefines.h
+++ b/lldb/include/lldb/API/SBDefines.h
@@ -138,6 +138,7 @@ typedef bool (*SBBreakpointHitCallback)(void *baton, SBProcess &process,
 
 typedef void (*SBDebuggerDestroyCallback)(lldb::user_id_t debugger_id,
                                           void *baton);
+typedef void (*SBDebuggerTimeoutCallback)(void *baton);
 
 typedef SBError (*SBPlatformLocateModuleCallback)(
     void *baton, const SBModuleSpec &module_spec, SBFileSpec &module_file_spec,
diff --git a/lldb/include/lldb/Core/Debugger.h b/lldb/include/lldb/Core/Debugger.h
index 6ba90eb6ed8fdf..b16334d302a47c 100644
--- a/lldb/include/lldb/Core/Debugger.h
+++ b/lldb/include/lldb/Core/Debugger.h
@@ -250,6 +250,11 @@ class Debugger : public std::enable_shared_from_this<Debugger>,
 
   void SetLoggingCallback(lldb::LogOutputCallback log_callback, void *baton);
 
+  static void
+  SetThreadPoolTimeoutCallback(std::chrono::milliseconds timeout_milliseconds,
+                               lldb::TimeoutCallback timeout_callback,
+                               void *baton);
+
   // Properties Functions
   enum StopDisassemblyType {
     eStopDisassemblyTypeNever = 0,
diff --git a/lldb/include/lldb/lldb-types.h b/lldb/include/lldb/lldb-types.h
index d60686e33142ac..e1d58f02f60820 100644
--- a/lldb/include/lldb/lldb-types.h
+++ b/lldb/include/lldb/lldb-types.h
@@ -69,6 +69,7 @@ typedef int pipe_t;                     // Host pipe type
 #define LLDB_INVALID_HOST_THREAD ((lldb::thread_t)NULL)
 #define LLDB_INVALID_PIPE ((lldb::pipe_t)-1)
 
+typedef void (*TimeoutCallback)(void *baton);
 typedef void (*LogOutputCallback)(const char *, void *baton);
 typedef bool (*CommandOverrideCallback)(void *baton, const char **argv);
 typedef bool (*ExpressionCancelCallback)(ExpressionEvaluationPhase phase,
diff --git a/lldb/source/API/SBDebugger.cpp b/lldb/source/API/SBDebugger.cpp
index fbcf30e67fc1cd..5c78b4fa05aaaf 100644
--- a/lldb/source/API/SBDebugger.cpp
+++ b/lldb/source/API/SBDebugger.cpp
@@ -1695,6 +1695,14 @@ void SBDebugger::SetDestroyCallback(
   }
 }
 
+void SBDebugger::SetThreadPoolTimeoutCallback(
+    uint64_t timeout_milliseconds,
+    lldb::SBDebuggerTimeoutCallback timeout_callback, void *baton) {
+  LLDB_INSTRUMENT_VA(timeout_milliseconds, timeout_callback, baton);
+  Debugger::SetThreadPoolTimeoutCallback(
+      std::chrono::milliseconds(timeout_milliseconds), timeout_callback, baton);
+}
+
 SBTrace
 SBDebugger::LoadTraceFromFile(SBError &error,
                               const SBFileSpec &trace_description_file) {
diff --git a/lldb/source/Core/Debugger.cpp b/lldb/source/Core/Debugger.cpp
index c3e603dbae8961..c48ca3bfe677b0 100644
--- a/lldb/source/Core/Debugger.cpp
+++ b/lldb/source/Core/Debugger.cpp
@@ -103,7 +103,14 @@ static std::recursive_mutex *g_debugger_list_mutex_ptr =
     nullptr; // NOTE: intentional leak to avoid issues with C++ destructor chain
 static Debugger::DebuggerList *g_debugger_list_ptr =
     nullptr; // NOTE: intentional leak to avoid issues with C++ destructor chain
+
+/// Thread Pool
+/// @{
 static llvm::ThreadPool *g_thread_pool = nullptr;
+static std::chrono::milliseconds g_timeout_milliseconds;
+static lldb::TimeoutCallback g_timeout_callback = nullptr;
+static void *g_timeout_callback_baton = nullptr;
+/// @}
 
 static constexpr OptionEnumValueElement g_show_disassembly_enum_values[] = {
     {
@@ -623,8 +630,20 @@ void Debugger::Terminate() {
   }
 
   if (g_thread_pool) {
-    // The destructor will wait for all the threads to complete.
-    delete g_thread_pool;
+    // Delete the thread pool in a different thread so we can set a timeout.
+    std::future<void> future = std::async(std::launch::async, []() {
+      // The destructor will wait for all the threads to complete.
+      delete g_thread_pool;
+    });
+
+    if (g_timeout_callback) {
+      if (future.wait_for(g_timeout_milliseconds) ==
+          std::future_status::timeout)
+        g_timeout_callback(g_timeout_callback_baton);
+    }
+
+    // Wait for the asynchronous task to complete.
+    future.wait();
   }
 
   if (g_debugger_list_ptr && g_debugger_list_mutex_ptr) {
@@ -2195,3 +2214,11 @@ llvm::ThreadPool &Debugger::GetThreadPool() {
          "Debugger::GetThreadPool called before Debugger::Initialize");
   return *g_thread_pool;
 }
+
+void Debugger::SetThreadPoolTimeoutCallback(
+    std::chrono::milliseconds timeout_milliseconds,
+    lldb::TimeoutCallback timeout_callback, void *baton) {
+  g_timeout_milliseconds = timeout_milliseconds;
+  g_timeout_callback = timeout_callback;
+  g_timeout_callback_baton = baton;
+}
diff --git a/lldb/tools/driver/Driver.cpp b/lldb/tools/driver/Driver.cpp
index c63ff0ff597e5c..b3ba1962141967 100644
--- a/lldb/tools/driver/Driver.cpp
+++ b/lldb/tools/driver/Driver.cpp
@@ -96,6 +96,10 @@ static void reset_stdin_termios() {
   }
 }
 
+static void thread_pool_timeout_callback(void *) {
+  fprintf(stderr, "Waiting for background tasks to complete...");
+}
+
 Driver::Driver()
     : SBBroadcaster("Driver"), m_debugger(SBDebugger::Create(false)) {
   // We want to be able to handle CTRL+D in the terminal to have it terminate
@@ -816,6 +820,8 @@ int main(int argc, char const *argv[]) {
     }
   }
 
+  SBDebugger::SetThreadPoolTimeoutCallback(
+      /*timeout_milliseconds=*/1000, &thread_pool_timeout_callback, nullptr);
   SBDebugger::Terminate();
   return exit_code;
 }

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just use a Progress object instead of installing a callback? Seems like a very specific use case to install a callback for in our public API. Do we create Progress objects for each background download? If we don't, should we? Then this would show up in the command line like "Downloading symbols for ..." and it might communicate to the user what is happening better than "Waiting for background tasks to complete...".

What would happen if we could add an internal API that would cancel all background downloads of symbols?

I am not sure I like installing a specific callback for thread pool issues. What if a thread gets deadlocked waiting for data and it never completes the download?

@@ -623,8 +630,20 @@ void Debugger::Terminate() {
}

if (g_thread_pool) {
// The destructor will wait for all the threads to complete.
delete g_thread_pool;
// Delete the thread pool in a different thread so we can set a timeout.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a Progress object here? Maybe something like:

Progress progress("Shutting down worker threads");

Or is it too late and no one would receive events?

Another option would be to have the thread pool detach from all threads so that we don't need to wait until they complete and just go ahead and exit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it too late and no one would receive events?

Yup, that's exactly the problem. The debugger has already been destroyed at this point, otherwise I could've written to the debugger's output stream. By the time we terminate, there's pretty much nothing left at this point. Before I had the callback I just wrote to stderr directly, but that's not a very nice thing to do as a library.

Another option would be to have the thread pool detach from all threads so that we don't need to wait until they complete and just go ahead and exit?

That's what we did in the past and that caused a bunch of crashes. If you want to do that without crashing, you need to be very careful about what you do in a task running on the thread pool, because the debugger might have been destroyed and terminated while you were running.

@JDevlieghere
Copy link
Member Author

Can we not just use a Progress object instead of installing a callback? Seems like a very specific use case to install a callback for in our public API. Do we create Progress objects for each background download? If we don't, should we? Then this would show up in the command line like "Downloading symbols for ..." and it might communicate to the user what is happening better than "Waiting for background tasks to complete...".

We do emit progress events for all the downloads, but there's two reasons this doesn't work as you expect:

  1. Currently, the default event handler ignores events that are shadowed. So if I kick off 4 downloads at the same time, we'll show the progress event for the first one only. The reason for this how we use ANSI escape codes to redraw the current line. I actually have something in the works to change that, but even if I can fix that, there's a second issue:

  2. The thread pool is global and by the time we terminate, the debugger has already been destroyed, which includes it event handler thread and even its output stream.

What would happen if we could add an internal API that would cancel all background downloads of symbols?

That's what I described at the bottom of the description. You could come up with an ad-hoc solution for symbol downloads, but if we want to lean on the thread pool more, we'll see the same issue in other places too. If we want to go that route I think we should do it at the ThreadPool level and make all tasks (cooperatively) interruptible.

I am not sure I like installing a specific callback for thread pool issues. What if a thread gets deadlocked waiting for data and it never completes the download?

That's already a risk. The ThreadPool explicitly provides no guarantees around deadlocking.

@JDevlieghere
Copy link
Member Author

Greg and I talked this over offline. We both feel that progress events are the best way to convey that work is going on in the background. To make that work for the driver, we'll check if the debugger being destroyed is the very last one and if it is, we'll delay its destruction until the thread pool as completed. That still leaves the first problem of not shadowing progress events, but that can be deal with separately.

Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used the std::async feature before, but it looks reasonable. This looks good to me.

@JDevlieghere
Copy link
Member Author

I tried implementing this and waiting on the thread pool when the last debugger is being destroyed is easy:

  // Only hold the debugger list lock long enough to check if we're the last
  // debugger being destroyed.
  bool last_debugger_being_destroyed = false;
  if (g_debugger_list_ptr && g_debugger_list_mutex_ptr) {
    std::lock_guard<std::recursive_mutex> guard(*g_debugger_list_mutex_ptr);
    last_debugger_being_destroyed =
        g_debugger_list_ptr->size() == 1 &&
        (*g_debugger_list_ptr)[0].get() == debugger_sp.get();
  }

  // Check if we should wait on the thread pool. If the debugger being destroyed
  // is the very last one, we can safely assume work is being done on its
  // behalf. We want to keep the debugger alive at least until the background
  // tasks are finished so they can report their progress.
  if (last_debugger_being_destroyed)
    g_thread_pool->wait();

However, at that point there's no default event handler thread anymore: we tear it down when exiting RunCommandInterpreter. I think I'm just going to move the timeout to the driver and call it a day...

When terminating the debugger, we wait for all background tasks to
complete. Given that there's no way to interrupt those treads, this can
take a while. When that happens, the debugger appears to hang at exit.

The above situation is unfortunately not uncommon when background
downloading of dSYMs is enabled (`symbols.auto-download background`).
Even when calling dsymForUUID with a reasonable timeout, it can take a
while to complete.

This patch improves the user experience by printing a message from the
driver when it takes more than one (1) second to terminate the debugger.
@JDevlieghere JDevlieghere merged commit 2f343fc into llvm:main Mar 5, 2024
@JDevlieghere JDevlieghere deleted the timeout-callback branch March 5, 2024 20:04
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Apr 22, 2024
…llvm#82799)

When terminating the debugger, we wait for all background tasks to
complete. Given that there's no way to interrupt those treads, this can
take a while. When that happens, the debugger appears to hang at exit.

The above situation is unfortunately not uncommon when background
downloading of dSYMs is enabled (`symbols.auto-download background`).
Even when calling dsymForUUID with a reasonable timeout, it can take a
while to complete.

This patch improves the user experience by printing a message from the
driver when it takes more than one (1) second to terminate the debugger.

(cherry picked from commit 2f343fc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants