Allow option to ignore module load errors in ScriptedProcess #127153

rchamala · 2025-02-14T01:17:19Z

Current state in scripted process expects all the modules passed into "get_loaded_images" to load successfully else none of them load. Even if a module loads fine, but has already been appended it still fails. This is restrictive and does not help our usecase.

Usecase: We have a parent scripted process using coredump + tombstone.

Scripted process uses child elf-core process to read memory dump
Uses tombstones to pass thread names and modules.

We do not know whether the modules will be successfully downloaded before creating the scripted process. We use python module callbacks to download a module from symbol server at LLDB load time when the scripted process is being created. The issue is that if one of the symbol is not found from the list specified in tombstone, none of the modules load in scripted process. Even if we ensure symbols are present in symbol server before creating the scripted process, if the load address is wrong or if the module is already appended, all module loads are skipped.

Solution: Pass in a custom boolean option arg for every module from python scripted process plugin which will indicate whether to ignore the module load error. This will provide the flexibility to user for loading the successfully fetched modules into target while ignoring the failed ones

github-actions · 2025-02-14T01:20:43Z

✅ With the latest revision this PR passed the Python code formatter.

llvmbot · 2025-02-14T05:06:29Z

@llvm/pr-subscribers-lldb

Author: None (rchamala)

Changes

Current state in scripted process expects all the modules passed into "get_loaded_images" to load successfully else none of them load. Even if a module loads fine, but has already been appended it still fails. This is restrictive and does not help our usecase.

Usecase: We have a parent scripted process using coredump + tombstone.

Scripted process uses child elf-core process to read memory dump
Uses tombstones to pass thread names and modules.

We do not know whether the modules will be successfully downloaded before creating the scripted process. We use python module callbacks to download a module from symbol server at LLDB load time when the scripted process is being created. The issue is that if one of the symbol is not found from the list specified in tombstone, none of the modules load in scripted process.

Solution: Pass in a custom boolean option arg for every module from python scripted process plugin which will indicate whether to ignore the module load error. This will provide the flexibility to user for loading the successfully fetched modules into target while ignoring the failed ones

Full diff: https://github.com/llvm/llvm-project/pull/127153.diff

3 Files Affected:

(modified) lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp (+26-13)
(modified) lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py (+16-1)
(modified) lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py (+75-15)

diff --git a/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp b/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp
index d2111ce877ce5..79d0bc51bc18c 100644
--- a/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp
+++ b/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp
@@ -165,7 +165,7 @@ Status ScriptedProcess::DoLoadCore() {
 Status ScriptedProcess::DoLaunch(Module *exe_module,
                                  ProcessLaunchInfo &launch_info) {
   LLDB_LOGF(GetLog(LLDBLog::Process), "ScriptedProcess::%s launching process", __FUNCTION__);
-  
+
   /* MARK: This doesn't reflect how lldb actually launches a process.
            In reality, it attaches to debugserver, then resume the process.
            That's not true in all cases.  If debugserver is remote, lldb
@@ -422,9 +422,11 @@ bool ScriptedProcess::GetProcessInfo(ProcessInstanceInfo &info) {
 lldb_private::StructuredData::ObjectSP
 ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
   Status error;
-  auto error_with_message = [&error](llvm::StringRef message) {
-    return ScriptedInterface::ErrorWithMessage<bool>(LLVM_PRETTY_FUNCTION,
-                                                     message.data(), error);
+  auto handle_error_with_message = [&error](llvm::StringRef message,
+                                            bool ignore_error) {
+    ScriptedInterface::ErrorWithMessage<bool>(LLVM_PRETTY_FUNCTION,
+                                              message.data(), error);
+    return ignore_error;
   };
 
   StructuredData::ArraySP loaded_images_sp = GetInterface().GetLoadedImages();
@@ -436,12 +438,13 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
   ModuleList module_list;
   Target &target = GetTarget();
 
-  auto reload_image = [&target, &module_list, &error_with_message](
+  auto reload_image = [&target, &module_list, &handle_error_with_message](
                           StructuredData::Object *obj) -> bool {
     StructuredData::Dictionary *dict = obj->GetAsDictionary();
 
     if (!dict)
-      return error_with_message("Couldn't cast image object into dictionary.");
+      return handle_error_with_message(
+          "Couldn't cast image object into dictionary.", false);
 
     ModuleSpec module_spec;
     llvm::StringRef value;
@@ -449,9 +452,11 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
     bool has_path = dict->HasKey("path");
     bool has_uuid = dict->HasKey("uuid");
     if (!has_path && !has_uuid)
-      return error_with_message("Dictionary should have key 'path' or 'uuid'");
+      return handle_error_with_message(
+          "Dictionary should have key 'path' or 'uuid'", false);
     if (!dict->HasKey("load_addr"))
-      return error_with_message("Dictionary is missing key 'load_addr'");
+      return handle_error_with_message("Dictionary is missing key 'load_addr'",
+                                       false);
 
     if (has_path) {
       dict->GetValueForKeyAsString("path", value);
@@ -467,16 +472,21 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
     ModuleSP module_sp =
         target.GetOrCreateModule(module_spec, true /* notify */);
 
+    bool ignore_module_load_error = false;
+    dict->GetValueForKeyAsBoolean("ignore_module_load_error",
+                                  ignore_module_load_error);
     if (!module_sp)
-      return error_with_message("Couldn't create or get module.");
+      return handle_error_with_message("Couldn't create or get module.",
+                                       ignore_module_load_error);
 
     lldb::addr_t load_addr = LLDB_INVALID_ADDRESS;
     lldb::offset_t slide = LLDB_INVALID_OFFSET;
     dict->GetValueForKeyAsInteger("load_addr", load_addr);
     dict->GetValueForKeyAsInteger("slide", slide);
     if (load_addr == LLDB_INVALID_ADDRESS)
-      return error_with_message(
-          "Couldn't get valid load address or slide offset.");
+      return handle_error_with_message(
+          "Couldn't get valid load address or slide offset.",
+          ignore_module_load_error);
 
     if (slide != LLDB_INVALID_OFFSET)
       load_addr += slide;
@@ -486,13 +496,16 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
                               changed);
 
     if (!changed && !module_sp->GetObjectFile())
-      return error_with_message("Couldn't set the load address for module.");
+      return handle_error_with_message(
+          "Couldn't set the load address for module.",
+          ignore_module_load_error);
 
     dict->GetValueForKeyAsString("path", value);
     FileSpec objfile(value);
     module_sp->SetFileSpecAndObjectName(objfile, objfile.GetFilename());
 
-    return module_list.AppendIfNeeded(module_sp);
+    return ignore_module_load_error ? true
+                                    : module_list.AppendIfNeeded(module_sp);
   };
 
   if (!loaded_images_sp->ForEach(reload_image))
diff --git a/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py b/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py
index a5c79378bab50..fa887390f4c9f 100644
--- a/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py
+++ b/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py
@@ -76,6 +76,12 @@ def cleanup():
             )
         self.assertTrue(corefile_process, PROCESS_IS_VALID)
 
+        # Create a random lib which does not exist in the corefile.
+        random_dylib = self.get_module_with_name(corefile_target, "random.dylib")
+        self.assertFalse(
+            random_dylib, "Dynamic library random.dylib should not be found."
+        )
+
         structured_data = lldb.SBStructuredData()
         structured_data.SetFromJSON(
             json.dumps(
@@ -83,7 +89,16 @@ def cleanup():
                     "backing_target_idx": self.dbg.GetIndexOfTarget(
                         corefile_process.GetTarget()
                     ),
-                    "libbaz_path": self.getBuildArtifact("libbaz.dylib"),
+                    "custom_modules": [
+                        {
+                            "path": self.getBuildArtifact("libbaz.dylib"),
+                        },
+                        {
+                            "path": "/random/path/random.dylib",
+                            "load_addr": 12345678,
+                            "ignore_module_load_error": True,
+                        },
+                    ],
                 }
             )
         )
diff --git a/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py b/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py
index 8641d9a7ced35..6f3124e64d3ce 100644
--- a/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py
+++ b/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py
@@ -46,22 +46,82 @@ def __init__(self, exe_ctx: lldb.SBExecutionContext, args: lldb.SBStructuredData
         if len(self.threads) == 2:
             self.threads[len(self.threads) - 1].is_stopped = True
 
-        corefile_module = self.get_module_with_name(
-            self.corefile_target, "libbaz.dylib"
-        )
-        if not corefile_module or not corefile_module.IsValid():
-            return
-        module_path = os.path.join(
-            corefile_module.GetFileSpec().GetDirectory(),
-            corefile_module.GetFileSpec().GetFilename(),
-        )
-        if not os.path.exists(module_path):
-            return
-        module_load_addr = corefile_module.GetObjectFileHeaderAddress().GetLoadAddress(
-            self.corefile_target
-        )
+        custom_modules = args.GetValueForKey("custom_modules")
+        if custom_modules.GetType() == lldb.eStructuredDataTypeArray:
+            for id in range(custom_modules.GetSize()):
+
+                custom_module = custom_modules.GetItemAtIndex(id)
+                if (
+                    not custom_module
+                    or not custom_module.IsValid()
+                    or not custom_module.GetType() == lldb.eStructuredDataTypeDictionary
+                ):
+                    continue
+
+                # Get custom module path from args
+                module_path_arg = custom_module.GetValueForKey("path")
+                module_path = None
+                if (
+                    not module_path_arg
+                    or not module_path_arg.IsValid()
+                    or not module_path_arg.GetType() == lldb.eStructuredDataTypeString
+                ):
+                    return
+
+                module_path = module_path_arg.GetStringValue(100)
+                module_name = os.path.basename(module_path)
+
+                # Get ignore_module_load_error boolean from args
+                ignore_module_load_error = False
+                ignore_module_load_error_arg = custom_module.GetValueForKey(
+                    "ignore_module_load_error"
+                )
+                if (
+                    ignore_module_load_error_arg
+                    and ignore_module_load_error_arg.IsValid()
+                    and ignore_module_load_error_arg.GetType()
+                    == lldb.eStructuredDataTypeBoolean
+                ):
+                    ignore_module_load_error = (
+                        ignore_module_load_error_arg.GetBooleanValue()
+                    )
 
-        self.loaded_images.append({"path": module_path, "load_addr": module_load_addr})
+                if not os.path.exists(module_path) and not ignore_module_load_error:
+                    return
+
+                # Get custom module load address from args
+                module_load_addr = None
+                module_load_addr_arg = custom_module.GetValueForKey("load_addr")
+                if (
+                    module_load_addr_arg
+                    and module_load_addr_arg.IsValid()
+                    and module_load_addr_arg.GetType()
+                    == lldb.eStructuredDataTypeInteger
+                ):
+                    module_load_addr = module_load_addr_arg.GetIntegerValue()
+
+                # If module load address is not specified/valid, try to find it from corefile module
+                if module_load_addr is None:
+                    corefile_module = self.get_module_with_name(
+                        self.corefile_target, module_name
+                    )
+
+                    if not corefile_module or not corefile_module.IsValid():
+                        return
+
+                    module_load_addr = (
+                        corefile_module.GetObjectFileHeaderAddress().GetLoadAddress(
+                            self.corefile_target
+                        )
+                    )
+
+                self.loaded_images.append(
+                    {
+                        "path": module_path,
+                        "load_addr": module_load_addr,
+                        "ignore_module_load_error": ignore_module_load_error,
+                    }
+                )
 
     def get_memory_region_containing_address(
         self, addr: int

labath · 2025-02-14T10:27:37Z

Let's see what @medismailben has to say, but I think it would be fine to "ignore" (meaning, to continue trying to load other modules) load errors even without the special flags, as I think that's how our non-scripted processes work (do they?)

rchamala · 2025-02-14T15:44:41Z

Yes, for non-scripted process, we ignore the modules which fail to load. Note that currently, I am returning failure if a wrong type is passed in to the dictionary via "get_loaded_images" or if "load_addr" is missing even for one module. Once the type checks are done, I am conditionally ignoring module load errors.

medismailben · 2025-02-17T23:08:40Z

I'm not convinced of that this change needs to be happen in ScriptedProcess ... May be we could have a setting to not discard all modules loaded in case 1 of them failed to load that would also work with other Process plugins. @jimingham what do you think ?

rchamala · 2025-02-17T23:48:38Z

I'm not convinced of that this change needs to be happen in ScriptedProcess ..

Just so that we’re on the same page, the logic to skip loading modules upon a module load error is part of ScriptedProcess conversion logic in “scriptedprocess::GetLoadedDynamicLibrariesInfos”. The method does not call target to load modules upon a single module load error. Can you please expand your thoughts on why this change should not be part of ScriptedProcess ?

medismailben · 2025-02-18T00:48:44Z

I'd like to get a better understanding of what you're trying to achieve:

Does the child elf-core process doesn't have any module loaded ? Are they only described in the tombstone and require to be downloaded which could potentially fail ?

rchamala · 2025-02-18T01:30:09Z

I'd like to get a better understanding of what you're trying to achieve:

Does the child elf-core process doesn't have any module loaded ? Are they only described in the tombstone and require to be downloaded which could potentially fail ?

Child elf-core process has module information for shared libraries directly loaded from the shared library file(.so file) but does not have module information from the ones loaded from .apk. At Android app runtime, some shared libraries are directly loaded from .so files and some are loaded from .apk. Apk is like a compressed file used by the installer to run the app and contains all artifacts needed to run the app which includes shared libraries. Due to the limitation in Android Kernel as explained here, the coredump generated from Android kernel does not dump program headers for the shared libraries in .apk. Note that Android linker when running the Android app, is able to understand the shared libraries inside .apk, can mmap the address of all shared libraries in /proc/PID/maps correctly and can run the app correctly.

Tombstone is basically extracted from the live memory map in /proc/PID/maps, and has information about all shared libraries and their load addresses. Additionally, tombstone has thread names which are not contained in coredump. So, our design is as follows for the parent scripted process:

All the read memory api's to read memory regions from coredump file are used from child elf-core process
Get all modules(shared library name & load address), thread names from tombstone file

We use locate module callback to fetch the module from our symbol server during lldb process creation. For scripted process, when we call target.GetOrCreatemodule the module callback is called to fetch the module from symbol server. When the symbol is not found or when the module is already appended, ScriptedProcess method GetLoadedDynamicLibrariesInfos is designed to fail and does not return any modules when even one module is not loaded or already appended to target. Due to this, all modules end up not being loaded. My request is to return the successful loaded modules in target when ScriptedProcess is asked to get loaded dynamic libraries instead of returning nothing. Please let me know if you have more questions

labath · 2025-02-18T08:15:23Z

I'm not convinced of that this change needs to be happen in ScriptedProcess ... May be we could have a setting to not discard all modules loaded in case 1 of them failed to load that would also work with other Process plugins. @jimingham what do you think ?

I'm confused. Isn't that already what happens for non-scripted processes?

$ lldb -n a.out -o "image list" -b
(lldb) process attach --name "a.out"
Process 11191 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20
libc.so.6`pause:
->  0x7f7083e088e4 <+20>: cmpq   $-0x1000, %rax ; imm = 0xF000 
    0x7f7083e088ea <+26>: ja     0x7f7083e08920 ; <+80>
    0x7f7083e088ec <+28>: retq   
    0x7f7083e088ed <+29>: nopl   (%rax)
Executable module set to "/tmp/X/a.out".
Architecture set to: x86_64-pc-linux-gnu.
(lldb) image list
[  0] 171A52C4 0x000056532dec0000 /tmp/X/a.out 
[  1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000)
[  2] 838926F7 0x00007f7083f45000 libb.so          <====== LIBB.SO HERE
[  3] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 
      /usr/lib/debug/lib64/libc.so.6.debug
[  4] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2 
$ rm libb.so     <========= FILE DELETED
$ lldb -n a.out -o "image list" -b
(lldb) process attach --name "a.out"
Process 11191 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20
libc.so.6`pause:
->  0x7f7083e088e4 <+20>: cmpq   $-0x1000, %rax ; imm = 0xF000 
    0x7f7083e088ea <+26>: ja     0x7f7083e08920 ; <+80>
    0x7f7083e088ec <+28>: retq   
    0x7f7083e088ed <+29>: nopl   (%rax)
Executable module set to "/tmp/X/a.out".
Architecture set to: x86_64-pc-linux-gnu.
(lldb) image list    <===== LIBB.SO MISSING (BUT OTHER MODULES STILL THERE)
[  0] 171A52C4 0x000056532dec0000 /tmp/X/a.out 
[  1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000)
[  2] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 
      /usr/lib/debug/lib64/libc.so.6.debug
[  3] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2

I got this on linux. On darwin, the module might still be present because it might be read from process memory directly. Still, if that failed for some reason (perhaps because instead of a live process we were dealing with a core file -- one which does not include the module contents in the dump), I would expect other modules to be present.

Rejecting all of them because one couldn't be found seems to go against our "do our best to keep going with limited info" philosophy.

jimingham · 2025-02-18T18:22:39Z

It's never the case on Darwin that you know nothing about a loaded shared library. You will always know at least its name, UUID, and load addresses. It doesn't matter whether the binary exists on disk or not, we will always be able to reconstruct at least a skeleton form from the loader's information. So thinking just about Darwin, I'd opine that you should never totally omit information about a loaded library. If the loader tells you about it, you should at least make its presence available even if you can't reconstruct much information from it. The ELF behavior surprises me, I would have definitely though it was a bug that the absence of the library on disk would cause the library to totally disappear from a debug session that was clearly using it. That seems to me like information we really should try to surface the user. I really haven't played around with ELF much so I'm not up on how lossy it can be. In the case where you've deleted the library on disk is there really NOTHING that lldb can tell you about that loaded library? Jim

…

On Feb 18, 2025, at 12:15 AM, Pavel Labath ***@***.***> wrote: labath left a comment (llvm/llvm-project#127153) I'm not convinced of that this change needs to be happen in ScriptedProcess ... May be we could have a setting to not discard all modules loaded in case 1 of them failed to load that would also work with other Process plugins. @jimingham what do you think ? I'm confused. Isn't that already what happens for non-scripted processes? $ lldb -n a.out -o "image list" -b (lldb) process attach --name "a.out" Process 11191 stopped * thread #1, name = 'a.out', stop reason = signal SIGSTOP frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20 libc.so.6`pause: -> 0x7f7083e088e4 <+20>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7f7083e088ea <+26>: ja 0x7f7083e08920 ; <+80> 0x7f7083e088ec <+28>: retq 0x7f7083e088ed <+29>: nopl (%rax) Executable module set to "/tmp/X/a.out". Architecture set to: x86_64-pc-linux-gnu. (lldb) image list [ 0] 171A52C4 0x000056532dec0000 /tmp/X/a.out [ 1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000) [ 2] 838926F7 0x00007f7083f45000 libb.so <====== LIBB.SO HERE [ 3] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 /usr/lib/debug/lib64/libc.so.6.debug [ 4] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2 $ rm libb.so <========= FILE DELETED $ lldb -n a.out -o "image list" -b (lldb) process attach --name "a.out" Process 11191 stopped * thread #1, name = 'a.out', stop reason = signal SIGSTOP frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20 libc.so.6`pause: -> 0x7f7083e088e4 <+20>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7f7083e088ea <+26>: ja 0x7f7083e08920 ; <+80> 0x7f7083e088ec <+28>: retq 0x7f7083e088ed <+29>: nopl (%rax) Executable module set to "/tmp/X/a.out". Architecture set to: x86_64-pc-linux-gnu. (lldb) image list <===== LIBB.SO MISSING (BUT OTHER MODULES STILL THERE) [ 0] 171A52C4 0x000056532dec0000 /tmp/X/a.out [ 1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000) [ 2] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 /usr/lib/debug/lib64/libc.so.6.debug [ 3] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2 I got this on linux. On darwin, the module might still be present because it might be read from process memory directly. Still, if that failed for some reason (perhaps because instead of a live process we were dealing with a core file -- one which does not include the module contents in the dump), I would expect other modules to be present. Rejecting all of them because one couldn't be found seems to go against our "do our best to keep going with limited info" philosophy. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned. <https://github.com/jimingham> <#127153 (comment)> <https://github.com/notifications/unsubscribe-auth/ADUPVW72PO3RURTHZHULBDD2QLT3NAVCNFSM6AAAAABXDQPSSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUHA4TQMJRHE> labath left a comment (llvm/llvm-project#127153) <#127153 (comment)> I'm not convinced of that this change needs to be happen in ScriptedProcess ... May be we could have a setting to not discard all modules loaded in case 1 of them failed to load that would also work with other Process plugins. @jimingham <https://github.com/jimingham> what do you think ? I'm confused. Isn't that already what happens for non-scripted processes? $ lldb -n a.out -o "image list" -b (lldb) process attach --name "a.out" Process 11191 stopped * thread #1, name = 'a.out', stop reason = signal SIGSTOP frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20 libc.so.6`pause: -> 0x7f7083e088e4 <+20>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7f7083e088ea <+26>: ja 0x7f7083e08920 ; <+80> 0x7f7083e088ec <+28>: retq 0x7f7083e088ed <+29>: nopl (%rax) Executable module set to "/tmp/X/a.out". Architecture set to: x86_64-pc-linux-gnu. (lldb) image list [ 0] 171A52C4 0x000056532dec0000 /tmp/X/a.out [ 1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000) [ 2] 838926F7 0x00007f7083f45000 libb.so <====== LIBB.SO HERE [ 3] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 /usr/lib/debug/lib64/libc.so.6.debug [ 4] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2 $ rm libb.so <========= FILE DELETED $ lldb -n a.out -o "image list" -b (lldb) process attach --name "a.out" Process 11191 stopped * thread #1, name = 'a.out', stop reason = signal SIGSTOP frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20 libc.so.6`pause: -> 0x7f7083e088e4 <+20>: cmpq $-0x1000, %rax ; imm = 0xF000 0x7f7083e088ea <+26>: ja 0x7f7083e08920 ; <+80> 0x7f7083e088ec <+28>: retq 0x7f7083e088ed <+29>: nopl (%rax) Executable module set to "/tmp/X/a.out". Architecture set to: x86_64-pc-linux-gnu. (lldb) image list <===== LIBB.SO MISSING (BUT OTHER MODULES STILL THERE) [ 0] 171A52C4 0x000056532dec0000 /tmp/X/a.out [ 1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000) [ 2] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 /usr/lib/debug/lib64/libc.so.6.debug [ 3] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2 I got this on linux. On darwin, the module might still be present because it might be read from process memory directly. Still, if that failed for some reason (perhaps because instead of a live process we were dealing with a core file -- one which does not include the module contents in the dump), I would expect other modules to be present. Rejecting all of them because one couldn't be found seems to go against our "do our best to keep going with limited info" philosophy. — Reply to this email directly, view it on GitHub <#127153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW72PO3RURTHZHULBDD2QLT3NAVCNFSM6AAAAABXDQPSSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUHA4TQMJRHE>. You are receiving this because you were mentioned.

rchamala · 2025-02-18T19:40:25Z

In my case, for non-scripted process of coredump, I see the module information even when symbol is not found and I see a placeholder object being created for the module. For scripted process, it shows only the main module due to the limitation I pointed in this PR

Target 1 is child elf-core process target. Target 2 is scripted process target

Current State (without my change)

Target list

Current targets:
  target #0: <none> ( platform=host )
  target #1: [vdso] ( arch=aarch64-*-linux, platform=remote-android, pid=1234, state=stopped )
* target #2: /path/to/coredump ( arch=aarch64-*-*, platform=remote-android, pid=3456, state=stopped )

a. Sample Image List of child elf-core target. Note the * at the end, which indicates placeholder object

[0]  0x1234455 [vdso]   /path/to/a.out
[1] 0x1234566  /path/to/libc.so(*)
[2]     ...

b. Sample Image list of scripted process target. Only one module is shown

[0]  0x1234445 [vdso]   /path/to/a.out

With my PR to ignore load module errors:

a. Sample image list of scripted process target. All modules are shown, for the ones whose symbols are not found, place holder objects are created similar to non-scripted process targets

[0]  0x1234455 [vdso]   /path/to/a.out
[1]  0x1245666 /path/to/libb.so
[2]  ...

jimingham · 2025-02-18T20:41:36Z

So it seems like if we ignore module loading errors for these cases, the generic code in lldb does the right thing and produces an "as good as it can do" module entry. So why would we ever want to elide a library if the scripted side can't find a physical file for it? Whatever is producing the module after errors should always be allowed to do that, shouldn't it?

rchamala · 2025-02-19T05:36:24Z

So it seems like if we ignore module loading errors for these cases, the generic code in lldb does the right thing and produces an "as good as it can do" module entry. So why would we ever want to elide a library if the scripted side can't find a physical file for it? Whatever is producing the module after errors should always be allowed to do that, shouldn't it?

Yes, if we ignore the modules which are not found, generic code in lldb can load the ones which are found. This is what happens in non-scripted process. But for scripted process, we skip loading the successfully found modules even if we could not find one module. The modules are created here, and this foreach block returns false even if one module is not found, and thereby skips loading any module here

labath · 2025-02-19T09:20:48Z

It's never the case on Darwin that you know nothing about a loaded shared library. You will always know at least its name, UUID, and load addresses. It doesn't matter whether the binary exists on disk or not, we will always be able to reconstruct at least a skeleton form from the loader's information.

So thinking just about Darwin, I'd opine that you should never totally omit information about a loaded library. If the loader tells you about it, you should at least make its presence available even if you can't reconstruct much information from it.

The ELF behavior surprises me, I would have definitely though it was a bug that the absence of the library on disk would cause the library to totally disappear from a debug session that was clearly using it. That seems to me like information we really should try to surface the user.

I really haven't played around with ELF much so I'm not up on how lossy it can be. In the case where you've deleted the library on disk is there really NOTHING that lldb can tell you about that loaded library? Jim

Reading an ELF file from memory is slightly tricky, but not impossible (I think Greg is working on making that work). I agree that showing /some/ information about the module would be better than pretending it doesn't exist, but that's not the point I was trying to make. My point was that not showing information about the missing module is (MUCH) better than not showing information about ANY module (even those that were found successfully).

Even in the darwin case, I can reproduce the missing module by deleting the file and overwriting the mach header in the memory of the process. Here I am doing it on purpose, but it could happen, however unlikely, due to a bug in the debugged binary. I'm assuming that in this case, the loader information will still contain information about the library being loaded at this address, but give up on trying to load it because the magic value doesn't match. The other modules still load fine, though.

rchamala · 2025-02-20T16:41:44Z

@jimingham @medismailben Can you please let me know if you have more questions regarding the problem and solution ? Right now, I have conditionally ignored the modules which are not created and only loaded the successful ones, if needed I can always load the successfully created modules in scripted process

medismailben · 2025-02-20T21:16:49Z

@jimingham @medismailben Can you please let me know if you have more questions regarding the problem and solution ? Right now, I have conditionally ignored the modules which are not created and only loaded the successful ones, if needed I can always load the successfully created modules in scripted process

So we've talked offline with @jimingham and there are 2 things we think should change here:

ScriptedProcess module loading is too strict (compared to non-ScriptedProcess approach). I don't think having a variable in the ScriptedProcess to skip loading error per module is a good way to address this issue. Instead I'd just make the ForEach call not return and log loading errors in the lambda.

a. Sample Image List of child elf-core target. Note the * at the end, which indicates placeholder object

[0] 0x1234455 [vdso] /path/to/a.out
[1] 0x1234566 /path/to/libc.so(*)
[2] ...

Is this behavior something that's specific to ElfCore or is that generic to any process plugin ? If it's the former, I really think this should become an lldb-wide thing. This can be done as a follow-up.

medismailben

I think this patch could be heavily simplified: we could just not return in case the ForEach call fails.

Also, I think it would be nice if you could add a placeholder module for the ones that failed to load, so we would see them in the image list output with the (*) next to them

lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp

medismailben · 2025-02-20T21:18:12Z

lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp

+  auto handle_error_with_message = [&error](llvm::StringRef message,
+                                            bool ignore_error) {


This doesn't need to change

Yes. Based on the recent discussion to make it the default, I can remove this

medismailben · 2025-02-20T21:27:29Z

lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py

+                        {
+                            "path": "/random/path/random.dylib",
+                            "load_addr": 12345678,
+                            "ignore_module_load_error": True,


No need for this anymore

Suggested change

"ignore_module_load_error": True,

medismailben · 2025-02-20T21:29:24Z

lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp

@@ -436,22 +438,25 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
  ModuleList module_list;
  Target &target = GetTarget();

-  auto reload_image = [&target, &module_list, &error_with_message](
+  auto reload_image = [&target, &module_list, &handle_error_with_message](


You can just revert the error_with_message & reload_image lambda exactly how it was before this change.

medismailben · 2025-02-20T21:31:52Z

lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py

+                # Get ignore_module_load_error boolean from args
+                ignore_module_load_error = False
+                ignore_module_load_error_arg = custom_module.GetValueForKey(
+                    "ignore_module_load_error"
+                )
+                if (
+                    ignore_module_load_error_arg
+                    and ignore_module_load_error_arg.IsValid()
+                    and ignore_module_load_error_arg.GetType()
+                    == lldb.eStructuredDataTypeBoolean
+                ):
+                    ignore_module_load_error = (
+                        ignore_module_load_error_arg.GetBooleanValue()
+                    )


No need for this anymore

Suggested change

# Get ignore_module_load_error boolean from args

ignore_module_load_error = False

ignore_module_load_error_arg = custom_module.GetValueForKey(

"ignore_module_load_error"

)

if (

ignore_module_load_error_arg

and ignore_module_load_error_arg.IsValid()

and ignore_module_load_error_arg.GetType()

== lldb.eStructuredDataTypeBoolean

):

ignore_module_load_error = (

ignore_module_load_error_arg.GetBooleanValue()

)

Yes, will remove it based on our recent discussion to enable it by default

medismailben · 2025-02-20T21:33:12Z

lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py

+                if not os.path.exists(module_path) and not ignore_module_load_error:
+                    return


ditto

Suggested change

if not os.path.exists(module_path) and not ignore_module_load_error:

return

medismailben · 2025-02-20T21:34:11Z

lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py

+                # Get custom module load address from args
+                module_load_addr = None
+                module_load_addr_arg = custom_module.GetValueForKey("load_addr")
+                if (
+                    module_load_addr_arg
+                    and module_load_addr_arg.IsValid()
+                    and module_load_addr_arg.GetType()
+                    == lldb.eStructuredDataTypeInteger
+                ):
+                    module_load_addr = module_load_addr_arg.GetIntegerValue()
+
+                # If module load address is not specified/valid, try to find it from corefile module
+                if module_load_addr is None:
+                    corefile_module = self.get_module_with_name(
+                        self.corefile_target, module_name
+                    )
+
+                    if not corefile_module or not corefile_module.IsValid():
+                        return
+
+                    module_load_addr = (
+                        corefile_module.GetObjectFileHeaderAddress().GetLoadAddress(
+                            self.corefile_target
+                        )
+                    )


nit: Seems like you're re-implementing the C++ module loading logic in python here which doesn't seems necessary.

The current test logic is hardcoded to find a single module named "libbaz.dylib" and get load address from corefile_module. I was hoping to have a test that accepts different modules given module path and module load address from the caller. If it is not given, I was planning to get the load address from corefile_module similar to how the current test is doing. Lmk if that sounds fine

Fair, the reason I mentioned this is because I wanted to exercise the error handling code path in C++ rather than do in python

medismailben · 2025-02-20T21:34:25Z

lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py

+                    {
+                        "path": module_path,
+                        "load_addr": module_load_addr,
+                        "ignore_module_load_error": ignore_module_load_error,


You don't need that anymore

Suggested change

"ignore_module_load_error": ignore_module_load_error,

medismailben · 2025-02-20T21:36:30Z

lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp


    dict->GetValueForKeyAsString("path", value);
    FileSpec objfile(value);
    module_sp->SetFileSpecAndObjectName(objfile, objfile.GetFilename());

-    return module_list.AppendIfNeeded(module_sp);
+    return ignore_module_load_error ? true
+                                    : module_list.AppendIfNeeded(module_sp);
  };

  if (!loaded_images_sp->ForEach(reload_image))


I can't make a comment on the next line, but all you need to do here is to remove the return keyword on the next line. This way, we will still log Couldn't reload all images in case some modules failed to load but we wouldn't return false.

rchamala · 2025-02-21T00:21:43Z

So we've talked offline with @jimingham and there are 2 things we think should change here:

ScriptedProcess module loading is too strict (compared to non-ScriptedProcess approach). I don't think having a variable in the ScriptedProcess to skip loading error per module is a good way to address this issue. Instead I'd just make the ForEach call not return and log loading errors in the lambda.

Yes, I can make it the default. I was initially not sure if we want to retain the old behavior for some reason, so had added the conditional logic. If we plan to make it the default, the change can be simple but we probably can't use the ForEach block directly. The ForEach block runs callback for all entries serially and returns false if one of them fails without running rest of the entries . I can either

get the value at each index and run the callback
add a boolean in ForEach block to indicate whether to fail on an entry failure.

Will do 1) for now

Is this behavior something that's specific to ElfCore or is that generic to any process plugin ? If it's the former, I really think this should become an lldb-wide thing. This can be done as a follow-up.

I realized that it is our internal logic that modified the dynamic loader to create a placeholder module, whenever a module is not found. I believe the current lldb-wide logic does not create placeholder modules looking at the logic, We can follow up with @clayborg on how to integrate similar logic to be lldb-wide.

rchamala · 2025-02-21T00:39:26Z

I think this patch could be heavily simplified: we could just not return in case the ForEach call fails.

Yes, based on our recent discussion , will remove it and instead of using ForEach call, will retrieve each item at index and call the callback separately.

Also, I think it would be nice if you could add a placeholder module for the ones that failed to load, so we would see them in the image list output with the (*) next to them

Sure

…or missing modules

medismailben · 2025-02-23T06:15:08Z

lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp

@@ -453,22 +455,16 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
    if (!dict->HasKey("load_addr"))
      return error_with_message("Dictionary is missing key 'load_addr'");

+    llvm::StringRef path = "";


If you're using a new StringRef here for path, I'd rename the StringRef used to get UUID accordingly instead of leaving the value variable name.

medismailben

LGTM!

Allow option to ignore module load errors in ScriptedProcess

a4fdb2d

rchamala requested review from labath and clayborg February 14, 2025 01:17

rchamala added 2 commits February 13, 2025 18:07

Fix code formatting

ba03d9d

Fix nit formatting

45b45e0

rchamala marked this pull request as ready for review February 14, 2025 05:05

rchamala requested a review from JDevlieghere as a code owner February 14, 2025 05:05

llvmbot added the lldb label Feb 14, 2025

labath requested a review from medismailben February 14, 2025 10:25

medismailben requested changes Feb 20, 2025

View reviewed changes

Ignore module load errors by default and create placeholder modules f…

c14b6df

…or missing modules

medismailben reviewed Feb 23, 2025

View reviewed changes

medismailben approved these changes Feb 23, 2025

View reviewed changes

Fix nit

3d3c48b

rchamala merged commit 2ff3b18 into llvm:main Feb 23, 2025
6 of 10 checks passed

		auto handle_error_with_message = [&error](llvm::StringRef message,
		bool ignore_error) {

		if not os.path.exists(module_path) and not ignore_module_load_error:
		return

Allow option to ignore module load errors in ScriptedProcess #127153

Allow option to ignore module load errors in ScriptedProcess #127153

Uh oh!

Conversation

rchamala commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 14, 2025

Uh oh!

labath commented Feb 14, 2025

Uh oh!

rchamala commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

medismailben commented Feb 17, 2025

Uh oh!

rchamala commented Feb 17, 2025

Uh oh!

medismailben commented Feb 18, 2025

Uh oh!

rchamala commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

labath commented Feb 18, 2025

Uh oh!

jimingham commented Feb 18, 2025 via email

Uh oh!

rchamala commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimingham commented Feb 18, 2025

Uh oh!

rchamala commented Feb 19, 2025

Uh oh!

labath commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rchamala commented Feb 20, 2025

Uh oh!

medismailben commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

medismailben left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rchamala Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rchamala commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

rchamala commented Feb 14, 2025 •

edited

Loading

github-actions bot commented Feb 14, 2025 •

edited

Loading

rchamala commented Feb 14, 2025 •

edited

Loading

rchamala commented Feb 18, 2025 •

edited

Loading

rchamala commented Feb 18, 2025 •

edited

Loading

labath commented Feb 19, 2025 •

edited

Loading

medismailben commented Feb 20, 2025 •

edited

Loading

medismailben left a comment •

edited

Loading

rchamala Feb 21, 2025 •

edited

Loading

rchamala commented Feb 21, 2025 •

edited

Loading