-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb] Make SBProcess thread related actions listen to StopLocker #134339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Make SBProcess thread related actions listen to StopLocker #134339
Conversation
@llvm/pr-subscribers-lldb Author: Wanyi (kusmour) ChangesRecently upon debugging a program with thousands of threads in VS Code, lldb-dap would hang at a
This is because LLDB is still in the middle of resuming all the threads. And requesting threads will require stopping the process. From the gdb-remote log it ended up getting I don't think it's reasonable to allow getting threads from a running process. The alternative will be reject the Full diff: https://github.com/llvm/llvm-project/pull/134339.diff 2 Files Affected:
diff --git a/lldb/test/API/tools/lldb-dap/attach/multithreading.cpp b/lldb/test/API/tools/lldb-dap/attach/multithreading.cpp
new file mode 100644
index 0000000000000..0e9c340916ece
--- /dev/null
+++ b/lldb/test/API/tools/lldb-dap/attach/multithreading.cpp
@@ -0,0 +1,102 @@
+#include <condition_variable>
+#include <functional>
+#include <iostream>
+#include <mutex>
+#include <queue>
+#include <thread>
+
+int plus_one(int n) {
+ std::cout << "In plus_one" << std::endl;
+ return n + 1;
+}
+
+class ThreadPool {
+ public:
+ ThreadPool(size_t num_threads = std::thread::hardware_concurrency()) {
+ for (uint32_t i = 0; i < num_threads; ++i) {
+ threads.emplace_back(std::thread(&ThreadPool::ThreadLoop, this));
+ }
+ }
+ ~ThreadPool() {
+ {
+ std::unique_lock<std::mutex> lock(queue_mutex);
+ terminated = true;
+ }
+ condition.notify_all();
+ for (std::thread& thread : threads) {
+ thread.join();
+ }
+ }
+ void enqueue(const std::function<void()>& job) {
+ {
+ std::unique_lock<std::mutex> lock(queue_mutex);
+ if (terminated)
+ throw std::runtime_error("enqueue on stopped ThreadPool");
+ jobs.push(job);
+ }
+ condition.notify_one();
+ }
+
+ private:
+ void ThreadLoop() {
+ while (true) {
+ std::function<void()> job;
+ {
+ std::unique_lock<std::mutex> lock(queue_mutex);
+ condition.wait(lock, [this] { return !jobs.empty() || terminated; });
+ if (terminated && jobs.empty())
+ return;
+ job = jobs.front();
+ jobs.pop();
+ }
+ job();
+ }
+ }
+
+ bool terminated = false;
+ std::mutex queue_mutex;
+ std::condition_variable condition;
+ std::vector<std::thread> threads;
+ std::queue<std::function<void()>> jobs;
+};
+
+void thread_func(int job_index) {
+ std::cout << __FUNCTION__ << " (job index = " << job_index
+ << ") running on thread " << std::this_thread::get_id() << "\n";
+ std::cout << "Calling function plus_one(int)\nResult = "
+ << plus_one(job_index) << "\n";
+}
+
+void thread_sleep(int job_index, int sleep_sec) {
+ std::cout << __FUNCTION__ << " (job index = " << job_index
+ << ") starting on thread " << std::this_thread::get_id() << "\n";
+ std::this_thread::sleep_for(std::chrono::seconds(sleep_sec));
+ std::cout << __FUNCTION__ << " (job index = " << job_index
+ << ") finished on thread " << std::this_thread::get_id() << "\n";
+}
+
+int main() {
+ ThreadPool tp1(2000);
+ ThreadPool tp2;
+ std::cout << "main() running on thread " << std::this_thread::get_id()
+ << "\n";
+ std::cout
+ << "Program is expecting stdin. Please attach debugger and hit enter to continue\n";
+ std::cin.get();
+ // At least one of the thread in tp1 will be sceduled with a task twice or
+ // more if num_jobs is larger than #threads in tp1.
+ int num_jobs = 3000;
+ for (int i = 0; i < num_jobs; ++i) {
+ tp1.enqueue(std::bind(thread_func, i));
+ }
+ // We may or may not hit the breakpoint thread_sleep here, as the tread pool
+ // might finish executing before debugger release the bp for tp1.
+ // To make sure we hit the bp, we can increase the sleep time in the call.
+ for (int i = 0; i < num_jobs; ++i) {
+ tp2.enqueue([i] { thread_sleep(i, 1); });
+ }
+ for (int i = 0; i < num_jobs; ++i) {
+ tp1.enqueue(std::bind(thread_sleep, i, 1));
+ return 0;
+ }
+}
diff --git a/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp b/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp
index 5e622f3d3dcd4..aa7f3c0d57f9d 100644
--- a/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp
+++ b/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp
@@ -73,9 +73,11 @@ void AttachRequestHandler::operator()(const llvm::json::Object &request) const {
llvm::StringRef core_file = GetString(arguments, "coreFile").value_or("");
const uint64_t timeout_seconds =
GetInteger<uint64_t>(arguments, "timeout").value_or(30);
- dap.stop_at_entry = core_file.empty()
- ? GetBoolean(arguments, "stopOnEntry").value_or(false)
- : true;
+ // Clients like VS Code sends threads request right after receiving
+ // configurationDone reponse where the process might be resuming.
+ // Getting threads list on a running process is not supported by LLDB.
+ // Always stop the process after attaching.
+ dap.stop_at_entry = true;
dap.configuration.postRunCommands = GetStrings(arguments, "postRunCommands");
const llvm::StringRef debuggerRoot =
GetString(arguments, "debuggerRoot").value_or("");
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
ba22857
to
18e4af1
Compare
// Clients like VS Code sends threads request right after receiving | ||
// configurationDone reponse where the process might be resuming. | ||
// Getting threads list on a running process is not supported by LLDB. | ||
// Always stop the process after attaching. | ||
dap.stop_at_entry = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the 'stop on entry' setting of VSCode will now have no effect on LLDB when debugging? The user will have no choice but to stop now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for attaching users will always see the stop. If you read the launch.json config for attaching. It actually does not mention this flag at all. So it was doing the attaching without sending stopped event to the IDE by default. However, it still accept this flag if it's presented in the launch.json config.
Reading the description, it seems like the third option would be the right one:
and this patch is working around that while and breaking existing behavior in the process. What would it take to go with (3)? |
|
I have debugged lldb-server and it didn't crash. Instead it finished the and exit with status 0. From my investigation this is because the gdb-remote packets |
I have tried a fix attempts but failed. I am exploring related code around the |
In the PR description you said "I don't think it's reasonable to allow getting threads from a running process." and I agree with that. However, that's not what The current implementation is incorrect and unsafe in multiple ways:
I created a draft PR to stop the process before we get the threads, but that's not sufficient. We need proper locking, like we added for #131242, but this time to protect the process control. That probably requires exposing the |
Note, the current stop locker is a locker = process.GetStopLocker() and the latter will also acquire need to acquire the stop locker's reader end. That should be okay since we would only expose the reader side, and we would always set the state in the locker on some lldb private thread before running any code. But we might want to make sure that you can't Continue the process while still holding the stop locker to keep from having the reader still in force when the writer needs to change the value. |
So the issue from only the lldb-dap side here is VS Code is sending a threads request, but we have resumed the process upon attach after sending the configuration done request. So the race condition happens like:
So from a DAP only perspective, we need to sync with our lldb-dap event thread. Current code looks like:
But probably should look like:
The call to The fact that GetNumThreads() is able to interrupt (send a CTRL+C) to the lldb-server is another bug that needs fixing. I thought that only the breakpoint setting code would interrupt a process and send GDB remote packets while running. Why are we allowing the threads request to interrupt? |
If it's helpful here's a stack trace from debugging lldb-dap. We got to frame #7 in
|
So after looking further into the, the issue is that |
That code has been there at least since the dread reformatting. I have no idea why past one of us thought it was a good idea to return the previous number of threads or the old thread list when you weren't stopped. We don't do that with most anything else you ask about threads, and I can't think of any way that would be useful. Whatever we do with DAP, we should stop doing this in SBThread... |
I am totally fine updating So the fix now should be we modify
New code should be:
Then we also need to sync in : Old code:
New code:
Where Does that sound ok to everyone? |
That sounds right to me. |
I wonder if we should add a test about the things we DON'T hand out when a process is running just so we claim the behavior concretely. |
Unit tests on the SB API or DAP API tests? I am happy to add tests as follow-ups |
I meant SB API tests, to assert that we don't hand out this stale data when the process is running. |
@clayborg's suggestion sounds good to me 👍 If we make |
18e4af1
to
9c6a379
Compare
811b06d
to
9618197
Compare
✅ With the latest revision this PR passed the Python code formatter. |
9618197
to
0db4c3f
Compare
lldb/tools/lldb-dap/Handler/ConfigurationDoneRequestHandler.cpp
Outdated
Show resolved
Hide resolved
// Client requests the baseline of currently existing threads after | ||
// a successful launch or attach. | ||
// A 'threads' request will be sent after configurationDone response | ||
// Obtain the list of threads before we resume the process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe expand on this a bit saying something like:
// If a 'threads' request returns no threads, then the interrupt
// button won't be available in the GUI debugger controls, so
// we need to report valid threads back to the IDE. We also need
// a valid thread for the "continued" packet to be sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pause button relies on having some threads informations and it always send the threadID
in the pause request. I have also checked the VSCode source code about fetching threads. If no threads are returned in the threads response then VSCode won't update the know thread list.
Which means, we really just need to return the baseline of threads for the debug session to carry on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the continued
event, VSCode will ignore the threadID if allThreadsContinued: true
. Looks like can send an invalid threadID if resuming all threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we just need to change our code to always send it even if the dap.focus_tid
is not valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to DAP
Please note: a debug adapter is not expected to send this event in response to a request that implies that execution continues, e.g. launch or continue.
It is only necessary to send a continued event if there was no previous request that implied this.
In this case, if the attach
is expecting the execution continues then we should treat it the same way as launch
. I am happy to discuss more about this as well as if we should support stopOnEntry
for attaching :) going forward.
lldb/tools/lldb-dap/Handler/ConfigurationDoneRequestHandler.cpp
Outdated
Show resolved
Hide resolved
44af671
to
931b3fb
Compare
6e3ec6c
to
38bfc5b
Compare
38bfc5b
to
37340cb
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/16066 Here is the relevant piece of the build log for the reference
|
…vm#134339) # Summary This PR updates `SBProcess::GetNumThreads()` and `SBProcess::GetThreadAtIndex()` to listen to the stop locker. `SBProcess::GetNumThreads()` will return 0 if the process is running. ## Problem Description Recently upon debugging a program with thousands of threads in VS Code, lldb-dap would hang at a `threads` request sent right after receiving the `configurationDone` response. Soon after it will end the debug session with the following error ``` Process <pid> exited with status = -1 (0xffffffff) lost connection ``` This is because LLDB is still in the middle of resuming all the threads. And requesting threads will end up interrupt the process on Linux. From the gdb-remote log it ended up getting `lldb::StateType::eStateInvalid` and just exit with status -1. I don't think it's reasonable to allow getting threads from a running process. There are a few approaches to fix this: 1) Send the stopped event to IDE after `configurationDone`. This aligns with the CLI behavior. 2) However, the above approach will break the existing user facing behavior. The alternative will be reject the `threads` request if the process is not stopped. 3) Improve the run lock. This is a synchronize issue where process was in the middle of resuming while lldb-dap attempts to interrupt it. **This PR implements the option 3** ## HOWEVER This fixed the "lost connection" issue below but new issue has surfaced. From testing, and also from checking the [VSCode source code](https://github.com/microsoft/vscode/blob/174af221c9ea2ccdb64abe4aab8e1a805e77beae/src/vs/workbench/contrib/debug/browser/debugSession.ts#L791), it expects having threadID to perform `pause`. So after attaching, without any threads reported to the client, the user will not be able to pause the attached process. `setBreakpoint` will still work and once we make a stop at the bp (or any stop that will report threads, client can perform pause again. ## NEXT 1) Made an attempt to return initial thread list so that VSCode can pause (second commit in the PR) 2) Investigate why threads will trigger unwinding the second frame of a thread, which leads to sending the interrupt 3) Decided if we want to support `stopOnEntry` for attaching, given i. This is not an official specification ii. If enable stopOnEntry, we need to fix attaching on Linux, to send only one stopped event. Currently, all threads upon attaching will have stop reason `SIGSTOP` and lldb-dap will send `stopped` event for each one of them. Every `stopped` will trigger the client request for threads. iii. Alternatively, we can support auto continue correspond to `(lldb) process attach --continue`. This require the ii above. ### Additionally lldb-dap will not send a `continued` event after `configurationDone` because it checks `dap.focus_tid == LLDB_INVALID_THREAD_ID` (so that we don't send it for `launch` request). Notice `dap.focus_tid` will only get assigned when handling stop or stepping. According to DAP > Please note: a debug adapter is not expected to send this event in response to a request that implies that execution continues, e.g. launch or continue. It is only necessary to send a continued event if there was no previous request that implied this. So I guess we are not violating DAP if we don't send `continued` event. But I'd like to get some sense about this. ## Test Plan Used following program for testing: https://gist.github.com/kusmour/1729d2e07b7b1063897db77de194e47d **NOTE: Utilize stdin to get pid and attach AFTER hitting enter. Attach should happen when all the threads start running.** DAP messages before the change <img width="1165" alt="image" src="https://github.com/user-attachments/assets/a9ad85fb-81ce-419c-95e5-612639905c66" /> DAP message after the change - report zero threads after attaching <img width="1165" alt="image" src="https://github.com/user-attachments/assets/a1179e18-6844-437a-938c-0383702294cd" /> --------- Co-authored-by: Jonas Devlieghere <[email protected]>
Summary
This PR updates
SBProcess::GetNumThreads()
andSBProcess::GetThreadAtIndex()
to listen to the stop locker.SBProcess::GetNumThreads()
will return 0 if the process is running.Problem Description
Recently upon debugging a program with thousands of threads in VS Code, lldb-dap would hang at a
threads
request sent right after receiving theconfigurationDone
response. Soon after it will end the debug session with the following errorThis is because LLDB is still in the middle of resuming all the threads. And requesting threads will end up interrupt the process on Linux. From the gdb-remote log it ended up getting
lldb::StateType::eStateInvalid
and just exit with status -1.I don't think it's reasonable to allow getting threads from a running process. There are a few approaches to fix this:
configurationDone
. This aligns with the CLI behavior.threads
request if the process is not stopped.This PR implements the option 3
HOWEVER
This fixed the "lost connection" issue below but new issue has surfaced. From testing, and also from checking the VSCode source code, it expects having threadID to perform
pause
. So after attaching, without any threads reported to the client, the user will not be able to pause the attached process.setBreakpoint
will still work and once we make a stop at the bp (or any stop that will report threads, client can perform pause again.NEXT
stopOnEntry
for attaching, giveni. This is not an official specification
ii. If enable stopOnEntry, we need to fix attaching on Linux, to send only one stopped event. Currently, all threads upon attaching will have stop reason
SIGSTOP
and lldb-dap will sendstopped
event for each one of them. Everystopped
will trigger the client request for threads.iii. Alternatively, we can support auto continue correspond to
(lldb) process attach --continue
. This require the ii above.Additionally
lldb-dap will not send a
continued
event afterconfigurationDone
because it checksdap.focus_tid == LLDB_INVALID_THREAD_ID
(so that we don't send it forlaunch
request). Noticedap.focus_tid
will only get assigned when handling stop or stepping.According to DAP
So I guess we are not violating DAP if we don't send
continued
event. But I'd like to get some sense about this.Test Plan
Used following program for testing: https://gist.github.com/kusmour/1729d2e07b7b1063897db77de194e47d
NOTE: Utilize stdin to get pid and attach AFTER hitting enter. Attach should happen when all the threads start running.
DAP messages before the change

DAP message after the change - report zero threads after attaching
