Skip to content

Add a test for evicting unreachable modules from the global module cache #74894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 12, 2023

Conversation

jimingham
Copy link
Collaborator

When you debug a binary and the change & rebuild and then rerun in lldb w/o quitting lldb, the Modules in the Global Module Cache for the old binary & .o files if used are now "unreachable". Nothing in lldb is holding them alive, and they've already been unlinked. lldb will properly discard them if there's not another Target referencing them.

However, this only works in simple cases at present. If you have several Targets that reference the same modules, it's pretty easy to end up stranding Modules that are no longer reachable, and if you use a sequence of SBDebuggers unreachable modules can also get stranded. If you run a long-lived lldb process and are iteratively developing on a large code base, lldb's memory gets filled with useless Modules.

This patch adds a test for the mode that currently works:

(lldb) target create foo
(lldb) run

(lldb) run

In that case, we do delete the unreachable Modules.

The next step will be to add tests for the cases where we fail to do this, then see how to safely/efficiently evict unreachable modules in those cases as well.

@llvmbot
Copy link
Member

llvmbot commented Dec 8, 2023

@llvm/pr-subscribers-lldb

Author: None (jimingham)

Changes

When you debug a binary and the change & rebuild and then rerun in lldb w/o quitting lldb, the Modules in the Global Module Cache for the old binary & .o files if used are now "unreachable". Nothing in lldb is holding them alive, and they've already been unlinked. lldb will properly discard them if there's not another Target referencing them.

However, this only works in simple cases at present. If you have several Targets that reference the same modules, it's pretty easy to end up stranding Modules that are no longer reachable, and if you use a sequence of SBDebuggers unreachable modules can also get stranded. If you run a long-lived lldb process and are iteratively developing on a large code base, lldb's memory gets filled with useless Modules.

This patch adds a test for the mode that currently works:

(lldb) target create foo
(lldb) run
<rebuild foo outside lldb>
(lldb) run

In that case, we do delete the unreachable Modules.

The next step will be to add tests for the cases where we fail to do this, then see how to safely/efficiently evict unreachable modules in those cases as well.


Full diff: https://github.com/llvm/llvm-project/pull/74894.diff

4 Files Affected:

  • (added) lldb/test/API/python_api/global_module_cache/Makefile (+1)
  • (added) lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py (+110)
  • (added) lldb/test/API/python_api/global_module_cache/one-print.c (+8)
  • (added) lldb/test/API/python_api/global_module_cache/two-print.c (+9)
diff --git a/lldb/test/API/python_api/global_module_cache/Makefile b/lldb/test/API/python_api/global_module_cache/Makefile
new file mode 100644
index 0000000000000..22f1051530f87
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/Makefile
@@ -0,0 +1 @@
+include Makefile.rules
diff --git a/lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py b/lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py
new file mode 100644
index 0000000000000..ff74d09a12818
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py
@@ -0,0 +1,110 @@
+"""
+Test the use of the global module cache in lldb
+"""
+
+import lldb
+
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+import os
+import shutil
+from pathlib import Path
+import time
+
+
+class GlobalModuleCacheTestCase(TestBase):
+    # NO_DEBUG_INFO_TESTCASE = True
+
+    def check_counter_var(self, thread, value):
+        frame = thread.frames[0]
+        var = frame.FindVariable("counter")
+        self.assertTrue(var.GetError().Success(), "Got counter variable")
+        self.assertEqual(var.GetValueAsUnsigned(), value, "This was one-print")
+
+    def copy_to_main(self, src, dst):
+        # We are relying on the source file being newer than the .o file from
+        # a previous build, so sleep a bit here to ensure that the touch is later.
+        time.sleep(2)
+        try:
+            shutil.copy(src, dst)
+        except:
+            self.fail(f"Could not copy {src} to {dst}")
+        Path(dst).touch()
+
+    # The rerun tests indicate rerunning on Windows doesn't really work, so
+    # this one won't either.
+    @skipIfWindows
+    def test_OneTargetOneDebugger(self):
+        # Make sure that if we have one target, and we run, then
+        # change the binary and rerun, the binary (and any .o files
+        # if using dwarf in .o file debugging) get removed from the
+        # shared module cache.  They are no longer reachable.
+        debug_style = self.getDebugInfo()
+
+        # Before we do anything, clear the global module cache so we don't
+        # see objects from other runs:
+        lldb.SBDebugger.MemoryPressureDetected()
+
+        # Set up the paths for our two versions of main.c:
+        main_c_path = os.path.join(self.getBuildDir(), "main.c")
+        one_print_path = os.path.join(self.getSourceDir(), "one-print.c")
+        two_print_path = os.path.join(self.getSourceDir(), "two-print.c")
+        main_filespec = lldb.SBFileSpec(main_c_path)
+
+        # First copy the one-print.c to main.c in the build folder and
+        # build our a.out from there:
+        self.copy_to_main(one_print_path, main_c_path)
+        self.build(dictionary={"C_SOURCES": main_c_path, "EXE": "a.out"})
+
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+            self, "return counter;", main_filespec
+        )
+
+        # Make sure we ran the version we intended here:
+        self.check_counter_var(thread, 1)
+        process.Kill()
+
+        # Now copy two-print.c over main.c, rebuild, and rerun:
+        # os.unlink(target.GetExecutable().fullpath)
+        self.copy_to_main(two_print_path, main_c_path)
+
+        self.build(dictionary={"C_SOURCES": main_c_path, "EXE": "a.out"})
+        error = lldb.SBError()
+        (_, process, thread, _) = lldbutil.run_to_breakpoint_do_run(self, target, bkpt)
+        # In two-print.c counter will be 2:
+        self.check_counter_var(thread, 2)
+
+        num_a_dot_out_entries = 1
+        # For dSYM's there will be two lines of output, one for the a.out and one
+        # for the dSYM.
+        if debug_style == "dsym":
+            num_a_dot_out_entries += 1
+
+        self.check_image_list_result(num_a_dot_out_entries, 1)
+
+    def check_image_list_result(self, num_a_dot_out, num_main_dot_o):
+        # Now look at the global module list, there should only be one a.out, and if we are
+        # doing dwarf in .o file, there should only be one .o file:
+        image_cmd_result = lldb.SBCommandReturnObject()
+        interp = self.dbg.GetCommandInterpreter()
+        interp.HandleCommand("image list -g", image_cmd_result)
+        image_list_str = image_cmd_result.GetOutput()
+        image_list = image_list_str.splitlines()
+        found_a_dot_out = 0
+        found_main_dot_o = 0
+
+        for line in image_list:
+            # FIXME: force this to be at the end of the string:
+            if "a.out" in line:
+                found_a_dot_out += 1
+            if "main.o" in line:
+                found_main_dot_o += 1
+
+        self.assertEqual(
+            num_a_dot_out, found_a_dot_out, "Got the right number of a.out's"
+        )
+        if found_main_dot_o > 0:
+            self.assertEqual(
+                num_main_dot_o, found_main_dot_o, "Got the right number of main.o's"
+            )
diff --git a/lldb/test/API/python_api/global_module_cache/one-print.c b/lldb/test/API/python_api/global_module_cache/one-print.c
new file mode 100644
index 0000000000000..5a572ca7c65fe
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/one-print.c
@@ -0,0 +1,8 @@
+#include <stdio.h>
+
+int
+main() {
+  int counter = 0;
+  printf("I only print one time: %d.\n", counter++);
+  return counter;
+}
diff --git a/lldb/test/API/python_api/global_module_cache/two-print.c b/lldb/test/API/python_api/global_module_cache/two-print.c
new file mode 100644
index 0000000000000..ce6cb84c5c46e
--- /dev/null
+++ b/lldb/test/API/python_api/global_module_cache/two-print.c
@@ -0,0 +1,9 @@
+#include <stdio.h>
+
+int
+main() {
+  int counter = 0;
+  printf("I print one time: %d.\n", counter++);
+  printf("I print two times: %d.\n", counter++);
+  return counter;
+}

Copy link

github-actions bot commented Dec 8, 2023

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

@jimingham
Copy link
Collaborator Author

Note, I thought about adding an SBDebugger::GetSharedModules or something, but I don't actually think that's a good thing to give external access to. Some day we should make an SBTestAPI with some useful for testing but not for the SB API library so we can make this sort of testing easier, but for this test grubbing the command output is not all that bad.

@clayborg
Copy link
Collaborator

clayborg commented Dec 9, 2023

Note, I thought about adding an `SBDebugger::GetSharedModules or something, but I don't actually think that's a good thing to give external access to. Some day we should make an SBTestAPI with some useful for testing but not for the SB API library so we can make this sort of testing easier, but for this test grubbing the command output is not all that bad.

I wouldn't mind exposing such an API for the global module list as a static functions (get count + get at index) on SBDebugger. One issue is that only real modules will show up here. This works fine for all modules for Darwin as executables, dSYM files and the .o files are all represented each by a lldb_private::Module. Not true for .dwo files though as they share the original module for the executable and only create a new ObjectFile, not a new Module for each. Not a big deal though.

This API would be useful for testing and also for people to know what modules are currently in the global module cache taking up memory in the LLDB process. To test this you might be able to run your first executable and then check the number of shared modules and access them if needed then run:

static void SBDebugger::MemoryPressureDetected();

As this used to get rid of orphaned modules in the past and I think it still does, then you could check this shared module count and list again after creating and running the second executable to see what you found.

@jimingham
Copy link
Collaborator Author

Note, I thought about adding an `SBDebugger::GetSharedModules or something, but I don't actually think that's a good thing to give external access to. Some day we should make an SBTestAPI with some useful for testing but not for the SB API library so we can make this sort of testing easier, but for this test grubbing the command output is not all that bad.

I wouldn't mind exposing such an API for the global module list as a static functions (get count + get at index) on SBDebugger. One issue is that only real modules will show up here. This works fine for all modules for Darwin as executables, dSYM files and the .o files are all represented each by a lldb_private::Module. Not true for .dwo files though as they share the original module for the executable and only create a new ObjectFile, not a new Module for each. Not a big deal though.

This API would be useful for testing and also for people to know what modules are currently in the global module cache taking up memory in the LLDB process. To test this you might be able to run your first executable and then check the number of shared modules and access them if needed then run:

static void SBDebugger::MemoryPressureDetected();

As this used to get rid of orphaned modules in the past and I think it still does, then you could check this shared module count and list again after creating and running the second executable to see what you found.

If I knew how to hand out SBReadOnlyModule objects, I think this would be a fine API. But I am wary of adding an API where one SBDebugger could inadvertently change another SBDebugger's environment behind its back. We do do allow this by accident today, for instance have Debugger1 turn off "external symbol search" but then Debugger2 has it on, and loads the same module as Debugger1, forcing symbols on Debugger1.

But IMO that's a bug, and I'd rather not add more ways to write that bug.

Fortunately, the output of image list -g is super easy to parse, so I don't need the API for this investigation.

@jimingham
Copy link
Collaborator Author

I added a test for the case where one target has the old version and another target has the new one. In this case, I don't think (short of MemoryPressureDetected by hand) that we'll ever get rid of these unreachable Modules.

@jimingham jimingham requested a review from bulbazord December 11, 2023 23:56
Copy link

github-actions bot commented Dec 11, 2023

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 32d535195ec5d9b0caf03fee13f796fc8c66a79f...90119f1d3955029931939d529a9f4b3c113b8284 lldb/test/API/python_api/global_module_cache/TestGlobalModuleCache.py
View the diff from darker here.
--- TestGlobalModuleCache.py	2023-12-12 19:26:09.000000 +0000
+++ TestGlobalModuleCache.py	2023-12-12 19:28:53.039653 +0000
@@ -9,10 +9,11 @@
 from lldbsuite.test import lldbutil
 import os
 import shutil
 from pathlib import Path
 import time
+
 
 class GlobalModuleCacheTestCase(TestBase):
     # NO_DEBUG_INFO_TESTCASE = True
 
     def check_counter_var(self, thread, value):
@@ -97,10 +98,11 @@
         else:
             if one_target:
                 new_debugger = lldb.SBDebugger().Create()
                 self.old_debugger = self.dbg
                 self.dbg = new_debugger
+
                 def cleanupDebugger(self):
                     lldb.SBDebugger.Destroy(self.dbg)
                     self.dbg = self.old_debugger
                     self.old_debugger = None
 
@@ -128,11 +130,11 @@
         lldb.SBDebugger.MemoryPressureDetected()
         error_after_mpd = self.check_image_list_result(num_a_dot_out_entries, 1)
         fail_msg = ""
         if error != "":
             fail_msg = "Error before MPD: " + error
-            
+
         if error_after_mpd != "":
             fail_msg = fail_msg + "\nError after MPD: " + error_after_mpd
         if fail_msg != "":
             self.fail(fail_msg)
 
@@ -157,13 +159,15 @@
             # FIXME: force this to be at the end of the string:
             if "a.out" in line:
                 found_a_dot_out += 1
             if "main.o" in line:
                 found_main_dot_o += 1
-        
+
         if num_a_dot_out != found_a_dot_out:
             return f"Got {found_a_dot_out} number of a.out's, expected {num_a_dot_out}"
-            
+
         if found_main_dot_o > 0 and num_main_dot_o != found_main_dot_o:
-            return f"Got {found_main_dot_o} number of main.o's, expected {num_main_dot_o}"
+            return (
+                f"Got {found_main_dot_o} number of main.o's, expected {num_main_dot_o}"
+            )
 
         return ""

@jimingham jimingham merged commit 2684281 into llvm:main Dec 12, 2023
@jimingham jimingham deleted the test-global-module-cache branch December 12, 2023 19:28
@DavidSpickett
Copy link
Collaborator

This test crashes on Arm and AArch64:

******************** TEST 'lldb-api :: python_api/global_module_cache/TestGlobalModuleCache.py' FAILED ********************
Script:
--
/usr/bin/python3.8 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/usr/local/bin/llvm-ar --env OBJCOPY=/usr/bin/llvm-objcopy --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/python_api/global_module_cache -p TestGlobalModuleCache.py
--
Exit Code: -6

Command Output (stdout):
--
lldb version 18.0.0git (https://github.com/llvm/llvm-project.git revision bb18611e3630d956b74fe33209321cb8ba9272a8)
  clang revision bb18611e3630d956b74fe33209321cb8ba9272a8
  llvm revision bb18611e3630d956b74fe33209321cb8ba9272a8

--
Command Output (stderr):
--
python3.8: ../llvm-project/lldb/source/Target/ThreadPlanStack.cpp:151: lldb::ThreadPlanSP lldb_private::ThreadPlanStack::PopPlan(): Assertion `m_plans.size() > 1 && "Can't pop the base thread plan"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
 #0 0x0000ffff7f3d8d60 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d7d60)
 #1 0x0000ffff7f3d6d1c llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d5d1c)
 #2 0x0000ffff7f3d9494 SignalHandler(int) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x51d8494)
 #3 0x0000ffff85fee7dc (linux-vdso.so.1+0x7dc)
 #4 0x0000ffff85e7cd78 raise /build/glibc-RIFKjK/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x0000ffff85e69aac abort /build/glibc-RIFKjK/glibc-2.31/stdlib/abort.c:81:7
 #6 0x0000ffff85e76490 __assert_fail_base /build/glibc-RIFKjK/glibc-2.31/assert/assert.c:89:7
 #7 0x0000ffff85e764f4 (/lib/aarch64-linux-gnu/libc.so.6+0x2d4f4)
 #8 0x0000ffff7ef6f744 lldb_private::ThreadPlanStack::PopPlan() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d6e744)
 #9 0x0000ffff7ef482f8 lldb_private::Thread::PopPlan() (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d472f8)
#10 0x0000ffff7ef47ca8 lldb_private::Thread::ShouldStop(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d46ca8)
#11 0x0000ffff7ef544c8 lldb_private::ThreadList::ShouldStop(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4d534c8)
#12 0x0000ffff7eed3c30 lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cd2c30)
#13 0x0000ffff7eecd220 lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4ccc220)
#14 0x0000ffff7eed4eb4 lldb_private::Process::RunPrivateStateThread(bool) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cd3eb4)
#15 0x0000ffff7eee06f0 std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4cdf6f0)
#16 0x0000ffff7edd3f80 lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-aarch64-linux-gnu.so+0x4bd2f80)
#17 0x0000ffff85e1f624 start_thread /build/glibc-RIFKjK/glibc-2.31/nptl/pthread_create.c:477:8
#18 0x0000ffff85f1a49c /build/glibc-RIFKjK/glibc-2.31/misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:81:0

Though not consistently.

I'm going to revert this until I have time to figure out why.

DavidSpickett added a commit that referenced this pull request Dec 13, 2023
…odule cache (#74894)"

This reverts commit 2684281.

Due to being flaky on Arm and AArch64 buildbots.
@jimingham
Copy link
Collaborator Author

jimingham commented Dec 13, 2023 via email

import time

class GlobalModuleCacheTestCase(TestBase):
# NO_DEBUG_INFO_TESTCASE = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out?

process.Kill()

# Now copy two-print.c over main.c, rebuild, and rerun:
# os.unlink(target.GetExecutable().fullpath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also commented out code here.

DavidSpickett added a commit that referenced this pull request Dec 14, 2023
…odule cache (#74894)"

This reverts commit 35dacf2.

And relands the original change with two additions so I can debug the failure on Arm/AArch64:
* Enable lldb step logging in the tests.
* Assert that the current plan is not the base plan at the spot I believe is calling PopPlan.

These will be removed and replaced with a proper fix once I see some failures on the bots,
I couldn't reproduce it locally.

(also, no sign of it on the x86_64 bot)
@DavidSpickett
Copy link
Collaborator

Failures didn't reproduce locally, so this is back in with an extra assert and logging enabled so I can catch it if it happens again.

@DavidSpickett
Copy link
Collaborator

I've caught it happening on Arm. We appear to be trying to pop the base plan after finishing a single step, going to read through the code tomorrow to figure out why.

In the meantime, I've removed all my modifications and skipped it on Arm and AArch64 Linux.

@jimingham
Copy link
Collaborator Author

jimingham commented Dec 14, 2023 via email

@jimingham
Copy link
Collaborator Author

This test does kill off a process by destroying the Debugger that owned the process. It would be interesting to see if changing the test to explicitly kill the process, then its target, then the Debugger makes the crash go away. That should not be necessary of course, but if that fixes the problem it will help narrow down the cause.

@DavidSpickett
Copy link
Collaborator

My extra logging resulted in this:

intern-state     ^^^^^^^^ Thread::ShouldStop Begin ^^^^^^^^
intern-state     Plan stack initial state:
  thread #1: tid = 0x47d2d:
    Active plan stack:
      Element 0: Base thread plan.
      Element 1: Single stepping past breakpoint site 11 at 0xf7fd1258

intern-state     Step over breakpoint stopped for reason: trace.
python3.8        Discarding thread plans for thread (tid = 0x47d2d, force 1)
intern-state     Plan Step over breakpoint trap should stop: 0.
intern-state     Completed step over breakpoint plan.
intern-state     Plan Step over breakpoint trap auto-continue: true.

--
Command Output (stderr):
--
python3.8: ../llvm-project/lldb/source/Target/Thread.cpp:898: bool lldb_private::Thread::ShouldStop(Event *): Assertion `i != 1 && "Cannot pop plan when there is only one plan (the base plan)"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
#0 0xee160c64 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-arm-linux-gnueabihf.so+0xec1c64)
#1 0xee15e688 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-arm-linux-gnueabihf.so+0xebf688)
#2 0xee1614f8 SignalHandler(int) (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lib/python3.8/site-packages/lldb/_lldb.cpython-38-arm-linux-gnueabihf.so+0xec24f8)
#3 0xf7b21530 __default_sa_restorer /build/glibc-tftl1u/glibc-2.31/signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:67:0
#4 0xf7b117e6 /build/glibc-tftl1u/glibc-2.31/csu/../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47:0
#5 0xf7b207fe raise /build/glibc-tftl1u/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:28:1

With the assert going off here.

Which is the last place I'd expect it to be with all the IsBasePlan checks, but there might be a situation where current_plan is something unexpected. Maybe current_plan refers to something that has already been removed from the stack, I'm looking into it now.

@DavidSpickett
Copy link
Collaborator

Though the original assert was: m_plans.size() > 1 && "Can't pop the base thread plan"

Not that there was exactly 1 plan on the stack. So current_plan might be the single step plan as expected, then when we go to pop that off the stack, but the stack has been emptied/destroyed already.

@DavidSpickett
Copy link
Collaborator

It would be interesting to see if changing the test to explicitly kill the process, then its target, then the Debugger makes the crash go away.

Trying this as a756dc4.

@DavidSpickett
Copy link
Collaborator

The problem will require a more involved fix I think, I've written up the details in #76057.

@jimingham
Copy link
Collaborator Author

jimingham commented Dec 20, 2023 via email

@Michael137
Copy link
Member

Note I'm also seeing some flakiness on the public macOS buildbots: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake-matrix/322/execution/node/69/log/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants