Add a test for evicting unreachable modules from the global module cache #74894

jimingham · 2023-12-08T22:51:56Z

When you debug a binary and the change & rebuild and then rerun in lldb w/o quitting lldb, the Modules in the Global Module Cache for the old binary & .o files if used are now "unreachable". Nothing in lldb is holding them alive, and they've already been unlinked. lldb will properly discard them if there's not another Target referencing them.

However, this only works in simple cases at present. If you have several Targets that reference the same modules, it's pretty easy to end up stranding Modules that are no longer reachable, and if you use a sequence of SBDebuggers unreachable modules can also get stranded. If you run a long-lived lldb process and are iteratively developing on a large code base, lldb's memory gets filled with useless Modules.

This patch adds a test for the mode that currently works:

(lldb) target create foo
(lldb) run

(lldb) run

In that case, we do delete the unreachable Modules.

The next step will be to add tests for the cases where we fail to do this, then see how to safely/efficiently evict unreachable modules in those cases as well.

…che.

llvmbot · 2023-12-08T22:52:28Z

@llvm/pr-subscribers-lldb

Author: None (jimingham)

Changes

When you debug a binary and the change & rebuild and then rerun in lldb w/o quitting lldb, the Modules in the Global Module Cache for the old binary & .o files if used are now "unreachable". Nothing in lldb is holding them alive, and they've already been unlinked. lldb will properly discard them if there's not another Target referencing them.

However, this only works in simple cases at present. If you have several Targets that reference the same modules, it's pretty easy to end up stranding Modules that are no longer reachable, and if you use a sequence of SBDebuggers unreachable modules can also get stranded. If you run a long-lived lldb process and are iteratively developing on a large code base, lldb's memory gets filled with useless Modules.

This patch adds a test for the mode that currently works:

(lldb) target create foo
(lldb) run
<rebuild foo outside lldb>
(lldb) run

In that case, we do delete the unreachable Modules.

The next step will be to add tests for the cases where we fail to do this, then see how to safely/efficiently evict unreachable modules in those cases as well.

Full diff: https://github.com/llvm/llvm-project/pull/74894.diff

4 Files Affected:

(added) lldb/test/API/python_api/global_module_cache/Makefile (+1)
(added) lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py (+110)
(added) lldb/test/API/python_api/global_module_cache/one-print.c (+8)
(added) lldb/test/API/python_api/global_module_cache/two-print.c (+9)

diff --git a/lldb/test/API/python_api/global_module_cache/Makefile b/lldb/test/API/python_api/global_module_cache/Makefile
new file mode 100644
index 0000000000000..22f1051530f87
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/Makefile
@@ -0,0 +1 @@
+include Makefile.rules
diff --git a/lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py b/lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py
new file mode 100644
index 0000000000000..ff74d09a12818
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py
@@ -0,0 +1,110 @@
+"""
+Test the use of the global module cache in lldb
+"""
+
+import lldb
+
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+import os
+import shutil
+from pathlib import Path
+import time
+
+
+class GlobalModuleCacheTestCase(TestBase):
+    # NO_DEBUG_INFO_TESTCASE = True
+
+    def check_counter_var(self, thread, value):
+        frame = thread.frames[0]
+        var = frame.FindVariable("counter")
+        self.assertTrue(var.GetError().Success(), "Got counter variable")
+        self.assertEqual(var.GetValueAsUnsigned(), value, "This was one-print")
+
+    def copy_to_main(self, src, dst):
+        # We are relying on the source file being newer than the .o file from
+        # a previous build, so sleep a bit here to ensure that the touch is later.
+        time.sleep(2)
+        try:
+            shutil.copy(src, dst)
+        except:
+            self.fail(f"Could not copy {src} to {dst}")
+        Path(dst).touch()
+
+    # The rerun tests indicate rerunning on Windows doesn't really work, so
+    # this one won't either.
+    @skipIfWindows
+    def test_OneTargetOneDebugger(self):
+        # Make sure that if we have one target, and we run, then
+        # change the binary and rerun, the binary (and any .o files
+        # if using dwarf in .o file debugging) get removed from the
+        # shared module cache.  They are no longer reachable.
+        debug_style = self.getDebugInfo()
+
+        # Before we do anything, clear the global module cache so we don't
+        # see objects from other runs:
+        lldb.SBDebugger.MemoryPressureDetected()
+
+        # Set up the paths for our two versions of main.c:
+        main_c_path = os.path.join(self.getBuildDir(), "main.c")
+        one_print_path = os.path.join(self.getSourceDir(), "one-print.c")
+        two_print_path = os.path.join(self.getSourceDir(), "two-print.c")
+        main_filespec = lldb.SBFileSpec(main_c_path)
+
+        # First copy the one-print.c to main.c in the build folder and
+        # build our a.out from there:
+        self.copy_to_main(one_print_path, main_c_path)
+        self.build(dictionary={"C_SOURCES": main_c_path, "EXE": "a.out"})
+
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+            self, "return counter;", main_filespec
+        )
+
+        # Make sure we ran the version we intended here:
+        self.check_counter_var(thread, 1)
+        process.Kill()
+
+        # Now copy two-print.c over main.c, rebuild, and rerun:
+        # os.unlink(target.GetExecutable().fullpath)
+        self.copy_to_main(two_print_path, main_c_path)
+
+        self.build(dictionary={"C_SOURCES": main_c_path, "EXE": "a.out"})
+        error = lldb.SBError()
+        (_, process, thread, _) = lldbutil.run_to_breakpoint_do_run(self, target, bkpt)
+        # In two-print.c counter will be 2:
+        self.check_counter_var(thread, 2)
+
+        num_a_dot_out_entries = 1
+        # For dSYM's there will be two lines of output, one for the a.out and one
+        # for the dSYM.
+        if debug_style == "dsym":
+            num_a_dot_out_entries += 1
+
+        self.check_image_list_result(num_a_dot_out_entries, 1)
+
+    def check_image_list_result(self, num_a_dot_out, num_main_dot_o):
+        # Now look at the global module list, there should only be one a.out, and if we are
+        # doing dwarf in .o file, there should only be one .o file:
+        image_cmd_result = lldb.SBCommandReturnObject()
+        interp = self.dbg.GetCommandInterpreter()
+        interp.HandleCommand("image list -g", image_cmd_result)
+        image_list_str = image_cmd_result.GetOutput()
+        image_list = image_list_str.splitlines()
+        found_a_dot_out = 0
+        found_main_dot_o = 0
+
+        for line in image_list:
+            # FIXME: force this to be at the end of the string:
+            if "a.out" in line:
+                found_a_dot_out += 1
+            if "main.o" in line:
+                found_main_dot_o += 1
+
+        self.assertEqual(
+            num_a_dot_out, found_a_dot_out, "Got the right number of a.out's"
+        )
+        if found_main_dot_o > 0:
+            self.assertEqual(
+                num_main_dot_o, found_main_dot_o, "Got the right number of main.o's"
+            )
diff --git a/lldb/test/API/python_api/global_module_cache/one-print.c b/lldb/test/API/python_api/global_module_cache/one-print.c
new file mode 100644
index 0000000000000..5a572ca7c65fe
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/one-print.c
@@ -0,0 +1,8 @@
+#include <stdio.h>
+
+int
+main() {
+  int counter = 0;
+  printf("I only print one time: %d.\n", counter++);
+  return counter;
+}
diff --git a/lldb/test/API/python_api/global_module_cache/two-print.c b/lldb/test/API/python_api/global_module_cache/two-print.c
new file mode 100644
index 0000000000000..ce6cb84c5c46e
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/two-print.c
@@ -0,0 +1,9 @@
+#include <stdio.h>
+
+int
+main() {
+  int counter = 0;
+  printf("I print one time: %d.\n", counter++);
+  printf("I print two times: %d.\n", counter++);
+  return counter;
+}

github-actions · 2023-12-08T22:54:17Z

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

jimingham · 2023-12-08T23:32:01Z

Note, I thought about adding an SBDebugger::GetSharedModules or something, but I don't actually think that's a good thing to give external access to. Some day we should make an SBTestAPI with some useful for testing but not for the SB API library so we can make this sort of testing easier, but for this test grubbing the command output is not all that bad.

clayborg · 2023-12-09T04:56:04Z

Note, I thought about adding an `SBDebugger::GetSharedModules or something, but I don't actually think that's a good thing to give external access to. Some day we should make an SBTestAPI with some useful for testing but not for the SB API library so we can make this sort of testing easier, but for this test grubbing the command output is not all that bad.

I wouldn't mind exposing such an API for the global module list as a static functions (get count + get at index) on SBDebugger. One issue is that only real modules will show up here. This works fine for all modules for Darwin as executables, dSYM files and the .o files are all represented each by a lldb_private::Module. Not true for .dwo files though as they share the original module for the executable and only create a new ObjectFile, not a new Module for each. Not a big deal though.

This API would be useful for testing and also for people to know what modules are currently in the global module cache taking up memory in the LLDB process. To test this you might be able to run your first executable and then check the number of shared modules and access them if needed then run:

static void SBDebugger::MemoryPressureDetected();

As this used to get rid of orphaned modules in the past and I think it still does, then you could check this shared module count and list again after creating and running the second executable to see what you found.

jimingham · 2023-12-11T22:36:02Z

Note, I thought about adding an `SBDebugger::GetSharedModules or something, but I don't actually think that's a good thing to give external access to. Some day we should make an SBTestAPI with some useful for testing but not for the SB API library so we can make this sort of testing easier, but for this test grubbing the command output is not all that bad.

I wouldn't mind exposing such an API for the global module list as a static functions (get count + get at index) on SBDebugger. One issue is that only real modules will show up here. This works fine for all modules for Darwin as executables, dSYM files and the .o files are all represented each by a lldb_private::Module. Not true for .dwo files though as they share the original module for the executable and only create a new ObjectFile, not a new Module for each. Not a big deal though.

This API would be useful for testing and also for people to know what modules are currently in the global module cache taking up memory in the LLDB process. To test this you might be able to run your first executable and then check the number of shared modules and access them if needed then run:
static void SBDebugger::MemoryPressureDetected();
As this used to get rid of orphaned modules in the past and I think it still does, then you could check this shared module count and list again after creating and running the second executable to see what you found.

If I knew how to hand out SBReadOnlyModule objects, I think this would be a fine API. But I am wary of adding an API where one SBDebugger could inadvertently change another SBDebugger's environment behind its back. We do do allow this by accident today, for instance have Debugger1 turn off "external symbol search" but then Debugger2 has it on, and loads the same module as Debugger1, forcing symbols on Debugger1.

But IMO that's a bug, and I'd rather not add more ways to write that bug.

Fortunately, the output of image list -g is super easy to parse, so I don't need the API for this investigation.

jimingham · 2023-12-11T23:56:31Z

I added a test for the case where one target has the old version and another target has the new one. In this case, I don't think (short of MemoryPressureDetected by hand) that we'll ever get rid of these unreachable Modules.

github-actions · 2023-12-11T23:57:27Z

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:

darker --check --diff -r 32d535195ec5d9b0caf03fee13f796fc8c66a79f...90119f1d3955029931939d529a9f4b3c113b8284 lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py

View the diff from darker here.

--- TestGlobalModuleCache.py	2023-12-12 19:26:09.000000 +0000
+++ TestGlobalModuleCache.py	2023-12-12 19:28:53.039653 +0000
@@ -9,10 +9,11 @@
 from lldbsuite.test import lldbutil
 import os
 import shutil
 from pathlib import Path
 import time
+
 
 class GlobalModuleCacheTestCase(TestBase):
     # NO_DEBUG_INFO_TESTCASE = True
 
     def check_counter_var(self, thread, value):
@@ -97,10 +98,11 @@
         else:
             if one_target:
                 new_debugger = lldb.SBDebugger().Create()
                 self.old_debugger = self.dbg
                 self.dbg = new_debugger
+
                 def cleanupDebugger(self):
                     lldb.SBDebugger.Destroy(self.dbg)
                     self.dbg = self.old_debugger
                     self.old_debugger = None
 
@@ -128,11 +130,11 @@
         lldb.SBDebugger.MemoryPressureDetected()
         error_after_mpd = self.check_image_list_result(num_a_dot_out_entries, 1)
         fail_msg = ""
         if error != "":
             fail_msg = "Error before MPD: " + error
-            
+
         if error_after_mpd != "":
             fail_msg = fail_msg + "\nError after MPD: " + error_after_mpd
         if fail_msg != "":
             self.fail(fail_msg)
 
@@ -157,13 +159,15 @@
             # FIXME: force this to be at the end of the string:
             if "a.out" in line:
                 found_a_dot_out += 1
             if "main.o" in line:
                 found_main_dot_o += 1
-        
+
         if num_a_dot_out != found_a_dot_out:
             return f"Got {found_a_dot_out} number of a.out's, expected {num_a_dot_out}"
-            
+
         if found_main_dot_o > 0 and num_main_dot_o != found_main_dot_o:
-            return f"Got {found_main_dot_o} number of main.o's, expected {num_main_dot_o}"
+            return (
+                f"Got {found_main_dot_o} number of main.o's, expected {num_main_dot_o}"
+            )
 
         return ""

DavidSpickett · 2023-12-13T11:34:12Z

This test crashes on Arm and AArch64:

******************** TEST 'lldb-api :: python_api/global_module_cache/TestGlobalModuleCache.py' FAILED ********************
Script:
--
/usr/bin/python3.8 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/usr/local/bin/llvm-ar --env OBJCOPY=/usr/bin/llvm-objcopy --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/python_api/global_module_cache -p TestGlobalModuleCache.py
--
Exit Code: -6

Command Output (stdout):
--
lldb version 18.0.0git (https://github.com/llvm/llvm-project.git revision bb18611e3630d956b74fe33209321cb8ba9272a8)
  clang revision bb18611e3630d956b74fe33209321cb8ba9272a8
  llvm revision bb18611e3630d956b74fe33209321cb8ba9272a8

--
Command Output (stderr):
--
python3.8: ../llvm-project/lldb/source/Target/ThreadPlanStack.cpp:151: lldb::ThreadPlanSP lldb_private::ThreadPlanStack::PopPlan(): Assertion `m_plans.size() > 1 && "Can't pop the base thread plan"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
 #0 0x0000ffff7f3d8d60 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d7d60)
 #1 0x0000ffff7f3d6d1c llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d5d1c)
 #2 0x0000ffff7f3d9494 SignalHandler(int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d8494)
 #3 0x0000ffff85fee7dc (linux-vdso.so.1+0x7dc)
 #4 0x0000ffff85e7cd78 raise /build/glibc-RIFKjK/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x0000ffff85e69aac abort /build/glibc-RIFKjK/glibc-2.31/stdlib/abort.c:81:7
 #6 0x0000ffff85e76490 __assert_fail_base /build/glibc-RIFKjK/glibc-2.31/assert/assert.c:89:7
 #7 0x0000ffff85e764f4 (/lib/aarch64-linux-gnu/libc.so.6+0x2d4f4)
 #8 0x0000ffff7ef6f744 lldb_private::ThreadPlanStack::PopPlan() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d6e744)
 #9 0x0000ffff7ef482f8 lldb_private::Thread::PopPlan() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d472f8)
#10 0x0000ffff7ef47ca8 lldb_private::Thread::ShouldStop(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d46ca8)
#11 0x0000ffff7ef544c8 lldb_private::ThreadList::ShouldStop(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d534c8)
#12 0x0000ffff7eed3c30 lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cd2c30)
#13 0x0000ffff7eecd220 lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4ccc220)
#14 0x0000ffff7eed4eb4 lldb_private::Process::RunPrivateStateThread(bool) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cd3eb4)
#15 0x0000ffff7eee06f0 std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cdf6f0)
#16 0x0000ffff7edd3f80 lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4bd2f80)
#17 0x0000ffff85e1f624 start_thread /build/glibc-RIFKjK/glibc-2.31/nptl/pthread_create.c:477:8
#18 0x0000ffff85f1a49c /build/glibc-RIFKjK/glibc-2.31/misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:81:0

Though not consistently.

I'm going to revert this until I have time to figure out why.

…odule cache (#74894)" This reverts commit 2684281. Due to being flaky on Arm and AArch64 buildbots.

jimingham · 2023-12-13T19:39:23Z

Nothing that test does should cause a crash. It's fine to revert for investigation, but it seems like a good test in that it is uncovering a real bug (though not the one intended). Jim

…

On Dec 13, 2023, at 3:34 AM, David Spickett ***@***.***> wrote: This test crashes on Arm and AArch64: ******************** TEST 'lldb-api :: python_api/global_module_cache/TestGlobalModuleCache.py' FAILED ******************** Script: -- /usr/bin/python3.8 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/usr/local/bin/llvm-ar --env OBJCOPY=/usr/bin/llvm-objcopy --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/python_api/global_module_cache -p TestGlobalModuleCache.py -- Exit Code: -6 Command Output (stdout): -- lldb version 18.0.0git (https://github.com/llvm/llvm-project.git revision bb18611) clang revision bb18611 llvm revision bb18611 -- Command Output (stderr): -- python3.8: ../llvm-project/lldb/source/Target/ThreadPlanStack.cpp:151: lldb::ThreadPlanSP lldb_private::ThreadPlanStack::PopPlan(): Assertion `m_plans.size() > 1 && "Can't pop the base thread plan"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. #0 0x0000ffff7f3d8d60 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d7d60) #1 0x0000ffff7f3d6d1c llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d5d1c) #2 0x0000ffff7f3d9494 SignalHandler(int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d8494) #3 0x0000ffff85fee7dc (linux-vdso.so.1+0x7dc) #4 0x0000ffff85e7cd78 raise /build/glibc-RIFKjK/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1 #5 0x0000ffff85e69aac abort /build/glibc-RIFKjK/glibc-2.31/stdlib/abort.c:81:7 #6 0x0000ffff85e76490 __assert_fail_base /build/glibc-RIFKjK/glibc-2.31/assert/assert.c:89:7 #7 0x0000ffff85e764f4 (/lib/aarch64-linux-gnu/libc.so.6+0x2d4f4) #8 0x0000ffff7ef6f744 lldb_private::ThreadPlanStack::PopPlan() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d6e744) #9 0x0000ffff7ef482f8 lldb_private::Thread::PopPlan() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d472f8) #10 0x0000ffff7ef47ca8 lldb_private::Thread::ShouldStop(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d46ca8) #11 0x0000ffff7ef544c8 lldb_private::ThreadList::ShouldStop(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d534c8) #12 0x0000ffff7eed3c30 lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cd2c30) #13 0x0000ffff7eecd220 lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4ccc220) #14 0x0000ffff7eed4eb4 lldb_private::Process::RunPrivateStateThread(bool) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cd3eb4) #15 0x0000ffff7eee06f0 std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cdf6f0) #16 0x0000ffff7edd3f80 lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4bd2f80) #17 0x0000ffff85e1f624 start_thread /build/glibc-RIFKjK/glibc-2.31/nptl/pthread_create.c:477:8 #18 0x0000ffff85f1a49c /build/glibc-RIFKjK/glibc-2.31/misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:81:0 Though not consistently. I'm going to revert this until I have time to figure out why. — Reply to this email directly, view it on GitHub <#74894 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVWY5VMA66TJ5Z5EZGR3YJGHEHAVCNFSM6AAAAABANJIUI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTG42TCNRVG4>. You are receiving this because you modified the open/close state.

adrian-prantl · 2023-12-13T23:44:08Z

lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py

+import time
+
+class GlobalModuleCacheTestCase(TestBase):
+    # NO_DEBUG_INFO_TESTCASE = True


why is this commented out?

DavidSpickett · 2023-12-14T10:12:50Z

lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py

+        process.Kill()
+
+        # Now copy two-print.c over main.c, rebuild, and rerun:
+        # os.unlink(target.GetExecutable().fullpath)


Also commented out code here.

…odule cache (#74894)" This reverts commit 35dacf2. And relands the original change with two additions so I can debug the failure on Arm/AArch64: * Enable lldb step logging in the tests. * Assert that the current plan is not the base plan at the spot I believe is calling PopPlan. These will be removed and replaced with a proper fix once I see some failures on the bots, I couldn't reproduce it locally. (also, no sign of it on the x86_64 bot)

DavidSpickett · 2023-12-14T11:32:30Z

Failures didn't reproduce locally, so this is back in with an extra assert and logging enabled so I can catch it if it happens again.

DavidSpickett · 2023-12-14T19:21:25Z

I've caught it happening on Arm. We appear to be trying to pop the base plan after finishing a single step, going to read through the code tomorrow to figure out why.

In the meantime, I've removed all my modifications and skipped it on Arm and AArch64 Linux.

jimingham · 2023-12-14T19:41:43Z

It came from the test case I copied to this one but wasn't appropriate. Should have been deleted, not commented out. Jim

…

On Dec 13, 2023, at 3:44 PM, Adrian Prantl ***@***.***> wrote: @adrian-prantl commented on this pull request. In lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py <#74894 (comment)>: > +""" +Test the use of the global module cache in lldb +""" + +import lldb + +from lldbsuite.test.decorators import * +from lldbsuite.test.lldbtest import * +from lldbsuite.test import lldbutil +import os +import shutil +from pathlib import Path +import time + +class GlobalModuleCacheTestCase(TestBase): + # NO_DEBUG_INFO_TESTCASE = True why is this commented out? — Reply to this email directly, view it on GitHub <#74894 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW5HJ4E4Y4VARD3P7UTYJI4VNAVCNFSM6AAAAABANJIUI2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOOBQGY3DKMJZGQ>. You are receiving this because you modified the open/close state.

jimingham · 2023-12-14T19:45:42Z

This test does kill off a process by destroying the Debugger that owned the process. It would be interesting to see if changing the test to explicitly kill the process, then its target, then the Debugger makes the crash go away. That should not be necessary of course, but if that fixes the problem it will help narrow down the cause.

DavidSpickett · 2023-12-15T09:33:34Z

My extra logging resulted in this:

intern-state     ^^^^^^^^ Thread::ShouldStop Begin ^^^^^^^^
intern-state     Plan stack initial state:
  thread #1: tid = 0x47d2d:
    Active plan stack:
      Element 0: Base thread plan.
      Element 1: Single stepping past breakpoint site 11 at 0xf7fd1258

intern-state     Step over breakpoint stopped for reason: trace.
python3.8        Discarding thread plans for thread (tid = 0x47d2d, force 1)
intern-state     Plan Step over breakpoint trap should stop: 0.
intern-state     Completed step over breakpoint plan.
intern-state     Plan Step over breakpoint trap auto-continue: true.

--
Command Output (stderr):
--
python3.8: ../llvm-project/lldb/source/Target/Thread.cpp:898: bool lldb_private::Thread::ShouldStop(Event *): Assertion `i != 1 && "Cannot pop plan when there is only one plan (the base plan)"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
#0 0xee160c64 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-arm-linux-gnueabihf.so+0xec1c64)
#1 0xee15e688 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-arm-linux-gnueabihf.so+0xebf688)
#2 0xee1614f8 SignalHandler(int) (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-arm-linux-gnueabihf.so+0xec24f8)
#3 0xf7b21530 __default_sa_restorer /build/glibc-tftl1u/glibc-2.31/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:67:0
#4 0xf7b117e6 /build/glibc-tftl1u/glibc-2.31/csu/../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47:0
#5 0xf7b207fe raise /build/glibc-tftl1u/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:28:1

With the assert going off here.

Which is the last place I'd expect it to be with all the IsBasePlan checks, but there might be a situation where current_plan is something unexpected. Maybe current_plan refers to something that has already been removed from the stack, I'm looking into it now.

DavidSpickett · 2023-12-15T09:39:15Z

Though the original assert was: m_plans.size() > 1 && "Can't pop the base thread plan"

Not that there was exactly 1 plan on the stack. So current_plan might be the single step plan as expected, then when we go to pop that off the stack, but the stack has been emptied/destroyed already.

DavidSpickett · 2023-12-20T08:50:23Z

It would be interesting to see if changing the test to explicitly kill the process, then its target, then the Debugger makes the crash go away.

Trying this as a756dc4.

DavidSpickett · 2023-12-20T14:04:01Z

The problem will require a more involved fix I think, I've written up the details in #76057.

jimingham · 2023-12-20T22:20:52Z

This can happen if you don't succeed in shutting down the private state thread as part of exiting, so that it is still trying to operate on a process that has been finalized. Not sure why the shutdown ordering seems wrong in the Linux case, however. Jim

…

On Dec 15, 2023, at 1:39 AM, David Spickett ***@***.***> wrote: Though the original assert was: m_plans.size() > 1 && "Can't pop the base thread plan" Not that there was exactly 1 plan on the stack. So current_plan might be the single step plan as expected, then when we go to pop that off the stack, but the stack has been emptied/destroyed already. — Reply to this email directly, view it on GitHub <#74894 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVWY52KBS5JSVF5J6JTDYJQLFBAVCNFSM6AAAAABANJIUI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJXGU3DOOBQHA>. You are receiving this because you modified the open/close state.

Michael137 · 2024-06-03T14:27:00Z

Note I'm also seeing some flakiness on the public macOS buildbots: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake-matrix/322/execution/node/69/log/

Add a test for evicting unreachable modules from the global module ca…

438d35a

…che.

jimingham requested a review from JDevlieghere as a code owner December 8, 2023 22:51

llvmbot added the lldb label Dec 8, 2023

Add a test for one target with the old file and one with the new.

684cee4

jimingham requested a review from bulbazord December 11, 2023 23:56

jimingham added 2 commits December 11, 2023 16:50

Added a test for two debuggers, one target each. It also fails.

0926917

busywork

90119f1

jimingham merged commit 2684281 into llvm:main Dec 12, 2023

jimingham deleted the test-global-module-cache branch December 12, 2023 19:28

DavidSpickett added a commit that referenced this pull request Dec 13, 2023

Revert "Add a test for evicting unreachable modules from the global m…

35dacf2

…odule cache (#74894)" This reverts commit 2684281. Due to being flaky on Arm and AArch64 buildbots.

adrian-prantl reviewed Dec 13, 2023

View reviewed changes

DavidSpickett reviewed Dec 14, 2023

View reviewed changes

DavidSpickett mentioned this pull request Dec 20, 2023

TestGlobalModuleCache.py test is flaky on 32 bit Arm Linux #76057

Open

Add a test for evicting unreachable modules from the global module cache #74894

Add a test for evicting unreachable modules from the global module cache #74894

Uh oh!

Conversation

jimingham commented Dec 8, 2023

Uh oh!

llvmbot commented Dec 8, 2023

Uh oh!

github-actions bot commented Dec 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimingham commented Dec 8, 2023

Uh oh!

clayborg commented Dec 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimingham commented Dec 11, 2023

Uh oh!

jimingham commented Dec 11, 2023

Uh oh!

github-actions bot commented Dec 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavidSpickett commented Dec 13, 2023

Uh oh!

jimingham commented Dec 13, 2023 via email

Uh oh!

adrian-prantl Dec 13, 2023

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

DavidSpickett commented Dec 14, 2023

Uh oh!

DavidSpickett commented Dec 14, 2023

Uh oh!

jimingham commented Dec 14, 2023 via email

Uh oh!

jimingham commented Dec 14, 2023

Uh oh!

DavidSpickett commented Dec 15, 2023

Uh oh!

DavidSpickett commented Dec 15, 2023

Uh oh!

DavidSpickett commented Dec 20, 2023

Uh oh!

DavidSpickett commented Dec 20, 2023

Uh oh!

jimingham commented Dec 20, 2023 via email

Uh oh!

Michael137 commented Jun 3, 2024

Uh oh!

Uh oh!

github-actions bot commented Dec 8, 2023 •

edited

Loading

clayborg commented Dec 9, 2023 •

edited

Loading

github-actions bot commented Dec 11, 2023 •

edited

Loading