[lldb] Don't invalid register context after setting thread pc's #109499

jasonmolenda · 2024-09-20T23:59:42Z

Some gdb remote serial protocol stubs will send the thread IDs and PCs for all threads in a process in the stop-reply packet. lldb often needs to know the pc values for all threads while at a private stop, and that results in read-register packets for threads, and can be a big performance problem when this is a hot code path.

GDBRemoteRegisterContext tracks the StopID of when its values were set, and when the thread's StopID has incremented, it marks all values it has as Invalid, and knows to refetch them.

We have a code path that resulted in setting the PCs for all the threads, and then ProcessGDBRemote::CalculateThreadStopInfo forcing an invalidation of all the register contexts, forcing us to re-read the pc values for all threads except the one that stopped.

There are times when it is valid to force an invalidation of the regsiter cache - for instance, if the layout of the registers has changed because the processor state is different, or we've sent a write-all-registers packet to the inferior and we want to make sure we stay in sync with the inferior. But there was no reason for this method to be forcing the register context to be invalid.

I added a test when running on Darwin systems, where debugserver always sends the thread IDs and PCs, which turns on packet logging. The test runs against an inferior which has 4 threads; it steps over a dlopen() call, steps in to a user function with debug info, steps-over and steps-in across source lines with multiple function calls, and then examines the packet log and flags it as an error if lldb asked for the pc value of any thread at any point in the debug session.

For this program and the operations we're doing, with debugserver that provides thread IDs and PCs, we should never ask for the value of a pc register.

rdar://136247381

Some gdb remote serial protocol stubs will send the thread IDs and PCs for all threads in a process in the stop-reply packet. lldb often needs to know the pc values for all threads while at a private stop, and that results in <n-1> read-register packets for <n> threads, and can be a big performance problem when this is a hot code path. GDBRemoteRegisterContext tracks the StopID of when its values were set, and when the thread's StopID has incremented, it marks all values it has as Invalid, and knows to refetch them. We have a code path that resulted in setting the PCs for all the threads, and then `ProcessGDBRemote::CalculateThreadStopInfo` *forcing* an invalidation of all the register contexts, forcing us to re-read the pc values for all threads except the one that stopped. There are times when it is valid to force an invalidation of the regsiter cache - for instance, if the layout of the registers has changed because the processor state is different, or we've sent a write-all-registers packet to the inferior and we want to make sure we stay in sync with the inferior. But there was no reason for this method to be forcing the register context to be invalid. I added a test when running on Darwin systems, where debugserver always sends the thread IDs and PCs, which turns on packet logging. The test runs against an inferior which has 4 threads; it steps over a dlopen() call, steps in to a user function with debug info, steps-over and steps-in across source lines with multiple function calls, and then examines the packet log and flags it as an error if lldb asked for the pc value of any thread at any point in the debug session. For this program and the operations we're doing, with debugserver that provides thread IDs and PCs, we should never ask for the value of a pc register. rdar://136247381

llvmbot · 2024-09-21T00:00:19Z

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

Some gdb remote serial protocol stubs will send the thread IDs and PCs for all threads in a process in the stop-reply packet. lldb often needs to know the pc values for all threads while at a private stop, and that results in <n-1> read-register packets for <n> threads, and can be a big performance problem when this is a hot code path.

GDBRemoteRegisterContext tracks the StopID of when its values were set, and when the thread's StopID has incremented, it marks all values it has as Invalid, and knows to refetch them.

We have a code path that resulted in setting the PCs for all the threads, and then ProcessGDBRemote::CalculateThreadStopInfo forcing an invalidation of all the register contexts, forcing us to re-read the pc values for all threads except the one that stopped.

There are times when it is valid to force an invalidation of the regsiter cache - for instance, if the layout of the registers has changed because the processor state is different, or we've sent a write-all-registers packet to the inferior and we want to make sure we stay in sync with the inferior. But there was no reason for this method to be forcing the register context to be invalid.

I added a test when running on Darwin systems, where debugserver always sends the thread IDs and PCs, which turns on packet logging. The test runs against an inferior which has 4 threads; it steps over a dlopen() call, steps in to a user function with debug info, steps-over and steps-in across source lines with multiple function calls, and then examines the packet log and flags it as an error if lldb asked for the pc value of any thread at any point in the debug session.

For this program and the operations we're doing, with debugserver that provides thread IDs and PCs, we should never ask for the value of a pc register.

rdar://136247381

Full diff: https://github.com/llvm/llvm-project/pull/109499.diff

5 Files Affected:

(modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (-1)
(added) lldb/test/API/macosx/expedited-thread-pcs/Makefile (+11)
(added) lldb/test/API/macosx/expedited-thread-pcs/TestExpeditedThreadPCs.py (+91)
(added) lldb/test/API/macosx/expedited-thread-pcs/foo.c (+1)
(added) lldb/test/API/macosx/expedited-thread-pcs/main.cpp (+62)

diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index d5dfe79fd8862a..9e8c6046179631 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -1600,7 +1600,6 @@ bool ProcessGDBRemote::CalculateThreadStopInfo(ThreadGDBRemote *thread) {
     // If we have "jstopinfo" then we have stop descriptions for all threads
     // that have stop reasons, and if there is no entry for a thread, then it
     // has no stop reason.
-    thread->GetRegisterContext()->InvalidateIfNeeded(true);
     if (!GetThreadStopInfoFromJSON(thread, m_jstopinfo_sp)) {
       // If a thread is stopped at a breakpoint site, set that as the stop
       // reason even if it hasn't executed the breakpoint instruction yet.
diff --git a/lldb/test/API/macosx/expedited-thread-pcs/Makefile b/lldb/test/API/macosx/expedited-thread-pcs/Makefile
new file mode 100644
index 00000000000000..7799f06e770970
--- /dev/null
+++ b/lldb/test/API/macosx/expedited-thread-pcs/Makefile
@@ -0,0 +1,11 @@
+CXX_SOURCES := main.cpp
+
+.PHONY: build-libfoo
+all: build-libfoo a.out
+
+include Makefile.rules
+
+build-libfoo: foo.c
+	$(MAKE) -f $(MAKEFILE_RULES) \
+		DYLIB_C_SOURCES=foo.c DYLIB_NAME=foo DYLIB_ONLY=YES
+
diff --git a/lldb/test/API/macosx/expedited-thread-pcs/TestExpeditedThreadPCs.py b/lldb/test/API/macosx/expedited-thread-pcs/TestExpeditedThreadPCs.py
new file mode 100644
index 00000000000000..0611907a34b0d6
--- /dev/null
+++ b/lldb/test/API/macosx/expedited-thread-pcs/TestExpeditedThreadPCs.py
@@ -0,0 +1,91 @@
+"""Test that the expedited thread pc values are not re-fetched by lldb."""
+
+import subprocess
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+file_index = 0
+
+
+class TestExpeditedThreadPCs(TestBase):
+    NO_DEBUG_INFO_TESTCASE = True
+
+    @skipUnlessDarwin
+    def test_expedited_thread_pcs(self):
+        TestBase.setUp(self)
+
+        global file_index
+        ++file_index
+        logfile = os.path.join(
+            self.getBuildDir(),
+            "packet-log-" + self.getArchitecture() + "-" + str(file_index) + ".txt",
+        )
+        self.runCmd("log enable -f %s gdb-remote packets" % (logfile))
+
+        def cleanup():
+            self.runCmd("log disable gdb-remote packets")
+            if os.path.exists(logfile):
+                os.unlink(logfile)
+
+        self.addTearDownHook(cleanup)
+
+        self.source = "main.cpp"
+        self.build()
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+            self, "break here", lldb.SBFileSpec(self.source, False)
+        )
+
+        # verify that libfoo.dylib hasn't loaded yet
+        for m in target.modules:
+            self.assertNotEqual(m.GetFileSpec().GetFilename(), "libfoo.dylib")
+
+        thread.StepInto()
+        thread.StepInto()
+
+        thread.StepInto()
+        thread.StepInto()
+        thread.StepInto()
+
+        # verify that libfoo.dylib has loaded
+        for m in target.modules:
+            if m.GetFileSpec().GetFilename() == "libfoo.dylib":
+                found_libfoo = True
+        self.assertTrue(found_libfoo)
+
+        thread.StepInto()
+        thread.StepInto()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+        thread.StepOver()
+
+        process.Kill()
+
+        # Confirm that we never fetched the pc for any threads during
+        # this debug session.
+        if os.path.exists(logfile):
+            f = open(logfile)
+            lines = f.readlines()
+            num_errors = 0
+            for line in lines:
+                arch = self.getArchitecture()
+                if arch == "arm64" or arch == "arm64_32":
+                    #   <reg name="pc" regnum="32" offset="256" bitsize="64" group="general" group_id="1" ehframe_regnum="32" dwarf_regnum="32" generic="pc"/>
+                    # A fetch of $pc on arm64 looks like
+                    #  <  22> send packet: $p20;thread:91698e;#70
+                    self.assertNotIn("$p20;thread", line)
+                else:
+                    #   <reg name="rip" regnum="16" offset="128" bitsize="64" group="general" altname="pc" group_id="1" ehframe_regnum="16" dwarf_regnum="16" generic="pc"/>
+                    # A fetch of $pc on x86_64 looks like
+                    #  <  22> send packet: $p10;thread:91889c;#6f
+                    self.assertNotIn("$p10;thread", line)
+
+            f.close()
diff --git a/lldb/test/API/macosx/expedited-thread-pcs/foo.c b/lldb/test/API/macosx/expedited-thread-pcs/foo.c
new file mode 100644
index 00000000000000..de1cbc4c4648a1
--- /dev/null
+++ b/lldb/test/API/macosx/expedited-thread-pcs/foo.c
@@ -0,0 +1 @@
+int foo() { return 5; }
diff --git a/lldb/test/API/macosx/expedited-thread-pcs/main.cpp b/lldb/test/API/macosx/expedited-thread-pcs/main.cpp
new file mode 100644
index 00000000000000..d77c6793afb6b2
--- /dev/null
+++ b/lldb/test/API/macosx/expedited-thread-pcs/main.cpp
@@ -0,0 +1,62 @@
+#include <dlfcn.h>
+#include <stdio.h>
+#include <thread>
+#include <unistd.h>
+
+void f1() {
+  while (1)
+    sleep(1);
+}
+void f2() {
+  while (1)
+    sleep(1);
+}
+void f3() {
+  while (1)
+    sleep(1);
+}
+
+int main() {
+  std::thread t1{f1};
+  std::thread t2{f2};
+  std::thread t3{f3};
+
+  puts("break here");
+
+  void *handle = dlopen("libfoo.dylib", RTLD_LAZY);
+  int (*foo_ptr)() = (int (*)())dlsym(handle, "foo");
+  int c = foo_ptr();
+
+  // clang-format off
+  // multiple function calls on a single source line so 'step'
+  // and 'next' need to do multiple steps of work.
+  puts("1"); puts("2"); puts("3"); puts("4"); puts("5");
+  puts("6"); puts("7"); puts("8"); puts("9"); puts("10");
+  puts("11"); puts("12"); puts("13"); puts("14"); puts("15");
+  puts("16"); puts("17"); puts("18"); puts("19"); puts("20");
+  puts("21"); puts("22"); puts("23"); puts("24"); puts("24");
+  // clang-format on
+  puts("one");
+  puts("two");
+  puts("three");
+  puts("four");
+  puts("five");
+  puts("six");
+  puts("seven");
+  puts("eight");
+  puts("nine");
+  puts("ten");
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  c++;
+  return c;
+}

labath · 2024-09-23T07:51:38Z

What's the exact situation that triggered these extra packets. Could it be simulated from a gdb-remote client test (i.e., by mocking server responses)?

jasonmolenda · 2024-09-23T18:11:36Z

What's the exact situation that triggered these extra packets. Could it be simulated from a gdb-remote client test (i.e., by mocking server responses)?

We've always done it when we step over our dynamic loader notification breakpoint (when new solibs are loaded and lldb is informed), I think it depends on the ordering that we invoke these methods in ProcessGDBRemote, I didn't try to debug what shifted in our call ordering to make this become a hotter problem recently.

(that's why my test case dlopen's a solib, because that was a specific issue)

jimingham

LGTM.

Given InvalidateIfNeeded (which by name seems harmless) can have these undesirably effects (presumably only when you pass true for force) it might be nice to warn developers about this by adding an instructive comment to InvalidateIfNeeded?

But the content seems fine here.

…#109499) Some gdb remote serial protocol stubs will send the thread IDs and PCs for all threads in a process in the stop-reply packet. lldb often needs to know the pc values for all threads while at a private stop, and that results in <n-1> read-register packets for <n> threads, and can be a big performance problem when this is a hot code path. GDBRemoteRegisterContext tracks the StopID of when its values were set, and when the thread's StopID has incremented, it marks all values it has as Invalid, and knows to refetch them. We have a code path that resulted in setting the PCs for all the threads, and then `ProcessGDBRemote::CalculateThreadStopInfo` *forcing* an invalidation of all the register contexts, forcing us to re-read the pc values for all threads except the one that stopped. There are times when it is valid to force an invalidation of the regsiter cache - for instance, if the layout of the registers has changed because the processor state is different, or we've sent a write-all-registers packet to the inferior and we want to make sure we stay in sync with the inferior. But there was no reason for this method to be forcing the register context to be invalid. I added a test when running on Darwin systems, where debugserver always sends the thread IDs and PCs, which turns on packet logging. The test runs against an inferior which has 4 threads; it steps over a dlopen() call, steps in to a user function with debug info, steps-over and steps-in across source lines with multiple function calls, and then examines the packet log and flags it as an error if lldb asked for the pc value of any thread at any point in the debug session. For this program and the operations we're doing, with debugserver that provides thread IDs and PCs, we should never ask for the value of a pc register. rdar://136247381 (cherry picked from commit 6e6d5ea)

…register-context-unnecessarily [lldb] Don't invalid register context after setting thread pc's (llvm#109499)

…#109499) Some gdb remote serial protocol stubs will send the thread IDs and PCs for all threads in a process in the stop-reply packet. lldb often needs to know the pc values for all threads while at a private stop, and that results in <n-1> read-register packets for <n> threads, and can be a big performance problem when this is a hot code path. GDBRemoteRegisterContext tracks the StopID of when its values were set, and when the thread's StopID has incremented, it marks all values it has as Invalid, and knows to refetch them. We have a code path that resulted in setting the PCs for all the threads, and then `ProcessGDBRemote::CalculateThreadStopInfo` *forcing* an invalidation of all the register contexts, forcing us to re-read the pc values for all threads except the one that stopped. There are times when it is valid to force an invalidation of the regsiter cache - for instance, if the layout of the registers has changed because the processor state is different, or we've sent a write-all-registers packet to the inferior and we want to make sure we stay in sync with the inferior. But there was no reason for this method to be forcing the register context to be invalid. I added a test when running on Darwin systems, where debugserver always sends the thread IDs and PCs, which turns on packet logging. The test runs against an inferior which has 4 threads; it steps over a dlopen() call, steps in to a user function with debug info, steps-over and steps-in across source lines with multiple function calls, and then examines the packet log and flags it as an error if lldb asked for the pc value of any thread at any point in the debug session. For this program and the operations we're doing, with debugserver that provides thread IDs and PCs, we should never ask for the value of a pc register. rdar://136247381 (cherry picked from commit 6e6d5ea)

…te-register-context-unnecessarily [lldb] Don't invalid register context after setting thread pc's (llvm#109499)

jasonmolenda requested a review from jimingham September 20, 2024 23:59

jasonmolenda requested a review from JDevlieghere as a code owner September 20, 2024 23:59

llvmbot added the lldb label Sep 20, 2024

jimingham approved these changes Sep 23, 2024

View reviewed changes

jasonmolenda merged commit 6e6d5ea into llvm:main Sep 23, 2024
9 checks passed

jasonmolenda deleted the do-not-fetch-pc-values-with-debugserver branch September 23, 2024 19:13

jasonmolenda added a commit to swiftlang/llvm-project that referenced this pull request Sep 23, 2024

Merge pull request #9315 from jasonmolenda/cp/136247381-dont-invalid-…

0123db6

…register-context-unnecessarily [lldb] Don't invalid register context after setting thread pc's (llvm#109499)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Don't invalid register context after setting thread pc's #109499

[lldb] Don't invalid register context after setting thread pc's #109499

Uh oh!

jasonmolenda commented Sep 20, 2024

Uh oh!

llvmbot commented Sep 21, 2024

Uh oh!

labath commented Sep 23, 2024

Uh oh!

jasonmolenda commented Sep 23, 2024

Uh oh!

jimingham left a comment

Uh oh!

Uh oh!

Uh oh!

[lldb] Don't invalid register context after setting thread pc's #109499

[lldb] Don't invalid register context after setting thread pc's #109499

Uh oh!

Conversation

jasonmolenda commented Sep 20, 2024

Uh oh!

llvmbot commented Sep 21, 2024

Uh oh!

labath commented Sep 23, 2024

Uh oh!

jasonmolenda commented Sep 23, 2024

Uh oh!

jimingham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!