Skip to content

Allow option to ignore module load errors in ScriptedProcess #127153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 23, 2025

Conversation

rchamala
Copy link
Contributor

@rchamala rchamala commented Feb 14, 2025

Current state in scripted process expects all the modules passed into "get_loaded_images" to load successfully else none of them load. Even if a module loads fine, but has already been appended it still fails. This is restrictive and does not help our usecase.

Usecase: We have a parent scripted process using coredump + tombstone.

  1. Scripted process uses child elf-core process to read memory dump

  2. Uses tombstones to pass thread names and modules.

We do not know whether the modules will be successfully downloaded before creating the scripted process. We use python module callbacks to download a module from symbol server at LLDB load time when the scripted process is being created. The issue is that if one of the symbol is not found from the list specified in tombstone, none of the modules load in scripted process. Even if we ensure symbols are present in symbol server before creating the scripted process, if the load address is wrong or if the module is already appended, all module loads are skipped.

Solution: Pass in a custom boolean option arg for every module from python scripted process plugin which will indicate whether to ignore the module load error. This will provide the flexibility to user for loading the successfully fetched modules into target while ignoring the failed ones

@rchamala rchamala requested review from labath and clayborg February 14, 2025 01:17
Copy link

github-actions bot commented Feb 14, 2025

✅ With the latest revision this PR passed the Python code formatter.

@rchamala rchamala marked this pull request as ready for review February 14, 2025 05:05
@llvmbot llvmbot added the lldb label Feb 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 14, 2025

@llvm/pr-subscribers-lldb

Author: None (rchamala)

Changes

Current state in scripted process expects all the modules passed into "get_loaded_images" to load successfully else none of them load. Even if a module loads fine, but has already been appended it still fails. This is restrictive and does not help our usecase.

Usecase: We have a parent scripted process using coredump + tombstone.

  1. Scripted process uses child elf-core process to read memory dump

  2. Uses tombstones to pass thread names and modules.

We do not know whether the modules will be successfully downloaded before creating the scripted process. We use python module callbacks to download a module from symbol server at LLDB load time when the scripted process is being created. The issue is that if one of the symbol is not found from the list specified in tombstone, none of the modules load in scripted process.

Solution: Pass in a custom boolean option arg for every module from python scripted process plugin which will indicate whether to ignore the module load error. This will provide the flexibility to user for loading the successfully fetched modules into target while ignoring the failed ones


Full diff: https://github.com/llvm/llvm-project/pull/127153.diff

3 Files Affected:

  • (modified) lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp (+26-13)
  • (modified) lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py (+16-1)
  • (modified) lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py (+75-15)
diff --git a/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp b/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp
index d2111ce877ce5..79d0bc51bc18c 100644
--- a/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp
+++ b/lldb/source/Plugins/Process/scripted/ScriptedProcess.cpp
@@ -165,7 +165,7 @@ Status ScriptedProcess::DoLoadCore() {
 Status ScriptedProcess::DoLaunch(Module *exe_module,
                                  ProcessLaunchInfo &launch_info) {
   LLDB_LOGF(GetLog(LLDBLog::Process), "ScriptedProcess::%s launching process", __FUNCTION__);
-  
+
   /* MARK: This doesn't reflect how lldb actually launches a process.
            In reality, it attaches to debugserver, then resume the process.
            That's not true in all cases.  If debugserver is remote, lldb
@@ -422,9 +422,11 @@ bool ScriptedProcess::GetProcessInfo(ProcessInstanceInfo &info) {
 lldb_private::StructuredData::ObjectSP
 ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
   Status error;
-  auto error_with_message = [&error](llvm::StringRef message) {
-    return ScriptedInterface::ErrorWithMessage<bool>(LLVM_PRETTY_FUNCTION,
-                                                     message.data(), error);
+  auto handle_error_with_message = [&error](llvm::StringRef message,
+                                            bool ignore_error) {
+    ScriptedInterface::ErrorWithMessage<bool>(LLVM_PRETTY_FUNCTION,
+                                              message.data(), error);
+    return ignore_error;
   };
 
   StructuredData::ArraySP loaded_images_sp = GetInterface().GetLoadedImages();
@@ -436,12 +438,13 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
   ModuleList module_list;
   Target &target = GetTarget();
 
-  auto reload_image = [&target, &module_list, &error_with_message](
+  auto reload_image = [&target, &module_list, &handle_error_with_message](
                           StructuredData::Object *obj) -> bool {
     StructuredData::Dictionary *dict = obj->GetAsDictionary();
 
     if (!dict)
-      return error_with_message("Couldn't cast image object into dictionary.");
+      return handle_error_with_message(
+          "Couldn't cast image object into dictionary.", false);
 
     ModuleSpec module_spec;
     llvm::StringRef value;
@@ -449,9 +452,11 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
     bool has_path = dict->HasKey("path");
     bool has_uuid = dict->HasKey("uuid");
     if (!has_path && !has_uuid)
-      return error_with_message("Dictionary should have key 'path' or 'uuid'");
+      return handle_error_with_message(
+          "Dictionary should have key 'path' or 'uuid'", false);
     if (!dict->HasKey("load_addr"))
-      return error_with_message("Dictionary is missing key 'load_addr'");
+      return handle_error_with_message("Dictionary is missing key 'load_addr'",
+                                       false);
 
     if (has_path) {
       dict->GetValueForKeyAsString("path", value);
@@ -467,16 +472,21 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
     ModuleSP module_sp =
         target.GetOrCreateModule(module_spec, true /* notify */);
 
+    bool ignore_module_load_error = false;
+    dict->GetValueForKeyAsBoolean("ignore_module_load_error",
+                                  ignore_module_load_error);
     if (!module_sp)
-      return error_with_message("Couldn't create or get module.");
+      return handle_error_with_message("Couldn't create or get module.",
+                                       ignore_module_load_error);
 
     lldb::addr_t load_addr = LLDB_INVALID_ADDRESS;
     lldb::offset_t slide = LLDB_INVALID_OFFSET;
     dict->GetValueForKeyAsInteger("load_addr", load_addr);
     dict->GetValueForKeyAsInteger("slide", slide);
     if (load_addr == LLDB_INVALID_ADDRESS)
-      return error_with_message(
-          "Couldn't get valid load address or slide offset.");
+      return handle_error_with_message(
+          "Couldn't get valid load address or slide offset.",
+          ignore_module_load_error);
 
     if (slide != LLDB_INVALID_OFFSET)
       load_addr += slide;
@@ -486,13 +496,16 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
                               changed);
 
     if (!changed && !module_sp->GetObjectFile())
-      return error_with_message("Couldn't set the load address for module.");
+      return handle_error_with_message(
+          "Couldn't set the load address for module.",
+          ignore_module_load_error);
 
     dict->GetValueForKeyAsString("path", value);
     FileSpec objfile(value);
     module_sp->SetFileSpecAndObjectName(objfile, objfile.GetFilename());
 
-    return module_list.AppendIfNeeded(module_sp);
+    return ignore_module_load_error ? true
+                                    : module_list.AppendIfNeeded(module_sp);
   };
 
   if (!loaded_images_sp->ForEach(reload_image))
diff --git a/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py b/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py
index a5c79378bab50..fa887390f4c9f 100644
--- a/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py
+++ b/lldb/test/API/functionalities/scripted_process/TestStackCoreScriptedProcess.py
@@ -76,6 +76,12 @@ def cleanup():
             )
         self.assertTrue(corefile_process, PROCESS_IS_VALID)
 
+        # Create a random lib which does not exist in the corefile.
+        random_dylib = self.get_module_with_name(corefile_target, "random.dylib")
+        self.assertFalse(
+            random_dylib, "Dynamic library random.dylib should not be found."
+        )
+
         structured_data = lldb.SBStructuredData()
         structured_data.SetFromJSON(
             json.dumps(
@@ -83,7 +89,16 @@ def cleanup():
                     "backing_target_idx": self.dbg.GetIndexOfTarget(
                         corefile_process.GetTarget()
                     ),
-                    "libbaz_path": self.getBuildArtifact("libbaz.dylib"),
+                    "custom_modules": [
+                        {
+                            "path": self.getBuildArtifact("libbaz.dylib"),
+                        },
+                        {
+                            "path": "/random/path/random.dylib",
+                            "load_addr": 12345678,
+                            "ignore_module_load_error": True,
+                        },
+                    ],
                 }
             )
         )
diff --git a/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py b/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py
index 8641d9a7ced35..6f3124e64d3ce 100644
--- a/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py
+++ b/lldb/test/API/functionalities/scripted_process/stack_core_scripted_process.py
@@ -46,22 +46,82 @@ def __init__(self, exe_ctx: lldb.SBExecutionContext, args: lldb.SBStructuredData
         if len(self.threads) == 2:
             self.threads[len(self.threads) - 1].is_stopped = True
 
-        corefile_module = self.get_module_with_name(
-            self.corefile_target, "libbaz.dylib"
-        )
-        if not corefile_module or not corefile_module.IsValid():
-            return
-        module_path = os.path.join(
-            corefile_module.GetFileSpec().GetDirectory(),
-            corefile_module.GetFileSpec().GetFilename(),
-        )
-        if not os.path.exists(module_path):
-            return
-        module_load_addr = corefile_module.GetObjectFileHeaderAddress().GetLoadAddress(
-            self.corefile_target
-        )
+        custom_modules = args.GetValueForKey("custom_modules")
+        if custom_modules.GetType() == lldb.eStructuredDataTypeArray:
+            for id in range(custom_modules.GetSize()):
+
+                custom_module = custom_modules.GetItemAtIndex(id)
+                if (
+                    not custom_module
+                    or not custom_module.IsValid()
+                    or not custom_module.GetType() == lldb.eStructuredDataTypeDictionary
+                ):
+                    continue
+
+                # Get custom module path from args
+                module_path_arg = custom_module.GetValueForKey("path")
+                module_path = None
+                if (
+                    not module_path_arg
+                    or not module_path_arg.IsValid()
+                    or not module_path_arg.GetType() == lldb.eStructuredDataTypeString
+                ):
+                    return
+
+                module_path = module_path_arg.GetStringValue(100)
+                module_name = os.path.basename(module_path)
+
+                # Get ignore_module_load_error boolean from args
+                ignore_module_load_error = False
+                ignore_module_load_error_arg = custom_module.GetValueForKey(
+                    "ignore_module_load_error"
+                )
+                if (
+                    ignore_module_load_error_arg
+                    and ignore_module_load_error_arg.IsValid()
+                    and ignore_module_load_error_arg.GetType()
+                    == lldb.eStructuredDataTypeBoolean
+                ):
+                    ignore_module_load_error = (
+                        ignore_module_load_error_arg.GetBooleanValue()
+                    )
 
-        self.loaded_images.append({"path": module_path, "load_addr": module_load_addr})
+                if not os.path.exists(module_path) and not ignore_module_load_error:
+                    return
+
+                # Get custom module load address from args
+                module_load_addr = None
+                module_load_addr_arg = custom_module.GetValueForKey("load_addr")
+                if (
+                    module_load_addr_arg
+                    and module_load_addr_arg.IsValid()
+                    and module_load_addr_arg.GetType()
+                    == lldb.eStructuredDataTypeInteger
+                ):
+                    module_load_addr = module_load_addr_arg.GetIntegerValue()
+
+                # If module load address is not specified/valid, try to find it from corefile module
+                if module_load_addr is None:
+                    corefile_module = self.get_module_with_name(
+                        self.corefile_target, module_name
+                    )
+
+                    if not corefile_module or not corefile_module.IsValid():
+                        return
+
+                    module_load_addr = (
+                        corefile_module.GetObjectFileHeaderAddress().GetLoadAddress(
+                            self.corefile_target
+                        )
+                    )
+
+                self.loaded_images.append(
+                    {
+                        "path": module_path,
+                        "load_addr": module_load_addr,
+                        "ignore_module_load_error": ignore_module_load_error,
+                    }
+                )
 
     def get_memory_region_containing_address(
         self, addr: int

@labath labath requested a review from medismailben February 14, 2025 10:25
@labath
Copy link
Collaborator

labath commented Feb 14, 2025

Let's see what @medismailben has to say, but I think it would be fine to "ignore" (meaning, to continue trying to load other modules) load errors even without the special flags, as I think that's how our non-scripted processes work (do they?)

@rchamala
Copy link
Contributor Author

rchamala commented Feb 14, 2025

Yes, for non-scripted process, we ignore the modules which fail to load. Note that currently, I am returning failure if a wrong type is passed in to the dictionary via "get_loaded_images" or if "load_addr" is missing even for one module. Once the type checks are done, I am conditionally ignoring module load errors.

@medismailben
Copy link
Member

I'm not convinced of that this change needs to be happen in ScriptedProcess ... May be we could have a setting to not discard all modules loaded in case 1 of them failed to load that would also work with other Process plugins. @jimingham what do you think ?

@rchamala
Copy link
Contributor Author

I'm not convinced of that this change needs to be happen in ScriptedProcess ..

Just so that we’re on the same page, the logic to skip loading modules upon a module load error is part of ScriptedProcess conversion logic in “scriptedprocess::GetLoadedDynamicLibrariesInfos”. The method does not call target to load modules upon a single module load error. Can you please expand your thoughts on why this change should not be part of ScriptedProcess ?

@medismailben
Copy link
Member

I'd like to get a better understanding of what you're trying to achieve:

Does the child elf-core process doesn't have any module loaded ? Are they only described in the tombstone and require to be downloaded which could potentially fail ?

@rchamala
Copy link
Contributor Author

rchamala commented Feb 18, 2025

I'd like to get a better understanding of what you're trying to achieve:

Does the child elf-core process doesn't have any module loaded ? Are they only described in the tombstone and require to be downloaded which could potentially fail ?

Child elf-core process has module information for shared libraries directly loaded from the shared library file(.so file) but does not have module information from the ones loaded from .apk. At Android app runtime, some shared libraries are directly loaded from .so files and some are loaded from .apk. Apk is like a compressed file used by the installer to run the app and contains all artifacts needed to run the app which includes shared libraries. Due to the limitation in Android Kernel as explained here, the coredump generated from Android kernel does not dump program headers for the shared libraries in .apk. Note that Android linker when running the Android app, is able to understand the shared libraries inside .apk, can mmap the address of all shared libraries in /proc/PID/maps correctly and can run the app correctly.

Tombstone is basically extracted from the live memory map in /proc/PID/maps, and has information about all shared libraries and their load addresses. Additionally, tombstone has thread names which are not contained in coredump. So, our design is as follows for the parent scripted process:

  1. All the read memory api's to read memory regions from coredump file are used from child elf-core process
  2. Get all modules(shared library name & load address), thread names from tombstone file

We use locate module callback to fetch the module from our symbol server during lldb process creation. For scripted process, when we call target.GetOrCreatemodule the module callback is called to fetch the module from symbol server. When the symbol is not found or when the module is already appended, ScriptedProcess method GetLoadedDynamicLibrariesInfos is designed to fail and does not return any modules when even one module is not loaded or already appended to target. Due to this, all modules end up not being loaded. My request is to return the successful loaded modules in target when ScriptedProcess is asked to get loaded dynamic libraries instead of returning nothing. Please let me know if you have more questions

@labath
Copy link
Collaborator

labath commented Feb 18, 2025

I'm not convinced of that this change needs to be happen in ScriptedProcess ... May be we could have a setting to not discard all modules loaded in case 1 of them failed to load that would also work with other Process plugins. @jimingham what do you think ?

I'm confused. Isn't that already what happens for non-scripted processes?

$ lldb -n a.out -o "image list" -b
(lldb) process attach --name "a.out"
Process 11191 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20
libc.so.6`pause:
->  0x7f7083e088e4 <+20>: cmpq   $-0x1000, %rax ; imm = 0xF000 
    0x7f7083e088ea <+26>: ja     0x7f7083e08920 ; <+80>
    0x7f7083e088ec <+28>: retq   
    0x7f7083e088ed <+29>: nopl   (%rax)
Executable module set to "/tmp/X/a.out".
Architecture set to: x86_64-pc-linux-gnu.
(lldb) image list
[  0] 171A52C4 0x000056532dec0000 /tmp/X/a.out 
[  1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000)
[  2] 838926F7 0x00007f7083f45000 libb.so          <====== LIBB.SO HERE
[  3] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 
      /usr/lib/debug/lib64/libc.so.6.debug
[  4] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2 
$ rm libb.so     <========= FILE DELETED
$ lldb -n a.out -o "image list" -b
(lldb) process attach --name "a.out"
Process 11191 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007f7083e088e4 libc.so.6`pause + 20
libc.so.6`pause:
->  0x7f7083e088e4 <+20>: cmpq   $-0x1000, %rax ; imm = 0xF000 
    0x7f7083e088ea <+26>: ja     0x7f7083e08920 ; <+80>
    0x7f7083e088ec <+28>: retq   
    0x7f7083e088ed <+29>: nopl   (%rax)
Executable module set to "/tmp/X/a.out".
Architecture set to: x86_64-pc-linux-gnu.
(lldb) image list    <===== LIBB.SO MISSING (BUT OTHER MODULES STILL THERE)
[  0] 171A52C4 0x000056532dec0000 /tmp/X/a.out 
[  1] 421DCFD2-138A-B321-D6F1-7AFD7B7FC999-F79CA445 0x00007ffdeb3a5000 [vdso] (0x00007ffdeb3a5000)
[  2] C88DE6C8 0x00007f7083d1f000 /lib64/libc.so.6 
      /usr/lib/debug/lib64/libc.so.6.debug
[  3] D626A570 0x00007f7083f4c000 /lib64/ld-linux-x86-64.so.2 

I got this on linux. On darwin, the module might still be present because it might be read from process memory directly. Still, if that failed for some reason (perhaps because instead of a live process we were dealing with a core file -- one which does not include the module contents in the dump), I would expect other modules to be present.

Rejecting all of them because one couldn't be found seems to go against our "do our best to keep going with limited info" philosophy.

@jimingham
Copy link
Collaborator

jimingham commented Feb 18, 2025 via email

@rchamala
Copy link
Contributor Author

rchamala commented Feb 18, 2025

In my case, for non-scripted process of coredump, I see the module information even when symbol is not found and I see a placeholder object being created for the module. For scripted process, it shows only the main module due to the limitation I pointed in this PR

Target 1 is child elf-core process target. Target 2 is scripted process target

  1. Current State (without my change)

Target list

Current targets:
  target #0: <none> ( platform=host )
  target #1: [vdso] ( arch=aarch64-*-linux, platform=remote-android, pid=1234, state=stopped )
* target #2: /path/to/coredump ( arch=aarch64-*-*, platform=remote-android, pid=3456, state=stopped )

a. Sample Image List of child elf-core target. Note the * at the end, which indicates placeholder object

[0]  0x1234455 [vdso]   /path/to/a.out
[1] 0x1234566  /path/to/libc.so(*)
[2]     ...

b. Sample Image list of scripted process target. Only one module is shown

[0]  0x1234445 [vdso]   /path/to/a.out
  1. With my PR to ignore load module errors:

a. Sample image list of scripted process target. All modules are shown, for the ones whose symbols are not found, place holder objects are created similar to non-scripted process targets

[0]  0x1234455 [vdso]   /path/to/a.out
[1]  0x1245666 /path/to/libb.so
[2]  ...

@jimingham
Copy link
Collaborator

So it seems like if we ignore module loading errors for these cases, the generic code in lldb does the right thing and produces an "as good as it can do" module entry. So why would we ever want to elide a library if the scripted side can't find a physical file for it? Whatever is producing the module after errors should always be allowed to do that, shouldn't it?

@rchamala
Copy link
Contributor Author

So it seems like if we ignore module loading errors for these cases, the generic code in lldb does the right thing and produces an "as good as it can do" module entry. So why would we ever want to elide a library if the scripted side can't find a physical file for it? Whatever is producing the module after errors should always be allowed to do that, shouldn't it?

Yes, if we ignore the modules which are not found, generic code in lldb can load the ones which are found. This is what happens in non-scripted process. But for scripted process, we skip loading the successfully found modules even if we could not find one module. The modules are created here, and this foreach block returns false even if one module is not found, and thereby skips loading any module here

@labath
Copy link
Collaborator

labath commented Feb 19, 2025

It's never the case on Darwin that you know nothing about a loaded shared library. You will always know at least its name, UUID, and load addresses. It doesn't matter whether the binary exists on disk or not, we will always be able to reconstruct at least a skeleton form from the loader's information.

So thinking just about Darwin, I'd opine that you should never totally omit information about a loaded library. If the loader tells you about it, you should at least make its presence available even if you can't reconstruct much information from it.

The ELF behavior surprises me, I would have definitely though it was a bug that the absence of the library on disk would cause the library to totally disappear from a debug session that was clearly using it. That seems to me like information we really should try to surface the user.

I really haven't played around with ELF much so I'm not up on how lossy it can be. In the case where you've deleted the library on disk is there really NOTHING that lldb can tell you about that loaded library? Jim

Reading an ELF file from memory is slightly tricky, but not impossible (I think Greg is working on making that work). I agree that showing /some/ information about the module would be better than pretending it doesn't exist, but that's not the point I was trying to make. My point was that not showing information about the missing module is (MUCH) better than not showing information about ANY module (even those that were found successfully).

Even in the darwin case, I can reproduce the missing module by deleting the file and overwriting the mach header in the memory of the process. Here I am doing it on purpose, but it could happen, however unlikely, due to a bug in the debugged binary. I'm assuming that in this case, the loader information will still contain information about the library being loaded at this address, but give up on trying to load it because the magic value doesn't match. The other modules still load fine, though.

@rchamala
Copy link
Contributor Author

@jimingham @medismailben Can you please let me know if you have more questions regarding the problem and solution ? Right now, I have conditionally ignored the modules which are not created and only loaded the successful ones, if needed I can always load the successfully created modules in scripted process

@medismailben
Copy link
Member

medismailben commented Feb 20, 2025

@jimingham @medismailben Can you please let me know if you have more questions regarding the problem and solution ? Right now, I have conditionally ignored the modules which are not created and only loaded the successful ones, if needed I can always load the successfully created modules in scripted process

So we've talked offline with @jimingham and there are 2 things we think should change here:

  • ScriptedProcess module loading is too strict (compared to non-ScriptedProcess approach). I don't think having a variable in the ScriptedProcess to skip loading error per module is a good way to address this issue. Instead I'd just make the ForEach call not return and log loading errors in the lambda.

a. Sample Image List of child elf-core target. Note the * at the end, which indicates placeholder object

[0] 0x1234455 [vdso] /path/to/a.out
[1] 0x1234566 /path/to/libc.so(*)
[2] ...

  • Is this behavior something that's specific to ElfCore or is that generic to any process plugin ? If it's the former, I really think this should become an lldb-wide thing. This can be done as a follow-up.

Copy link
Member

@medismailben medismailben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch could be heavily simplified: we could just not return in case the ForEach call fails.

Also, I think it would be nice if you could add a placeholder module for the ones that failed to load, so we would see them in the image list output with the (*) next to them

Comment on lines 425 to 426
auto handle_error_with_message = [&error](llvm::StringRef message,
bool ignore_error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Based on the recent discussion to make it the default, I can remove this

{
"path": "/random/path/random.dylib",
"load_addr": 12345678,
"ignore_module_load_error": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this anymore

Suggested change
"ignore_module_load_error": True,

@@ -436,22 +438,25 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
ModuleList module_list;
Target &target = GetTarget();

auto reload_image = [&target, &module_list, &error_with_message](
auto reload_image = [&target, &module_list, &handle_error_with_message](
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just revert the error_with_message & reload_image lambda exactly how it was before this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines 74 to 87
# Get ignore_module_load_error boolean from args
ignore_module_load_error = False
ignore_module_load_error_arg = custom_module.GetValueForKey(
"ignore_module_load_error"
)
if (
ignore_module_load_error_arg
and ignore_module_load_error_arg.IsValid()
and ignore_module_load_error_arg.GetType()
== lldb.eStructuredDataTypeBoolean
):
ignore_module_load_error = (
ignore_module_load_error_arg.GetBooleanValue()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this anymore

Suggested change
# Get ignore_module_load_error boolean from args
ignore_module_load_error = False
ignore_module_load_error_arg = custom_module.GetValueForKey(
"ignore_module_load_error"
)
if (
ignore_module_load_error_arg
and ignore_module_load_error_arg.IsValid()
and ignore_module_load_error_arg.GetType()
== lldb.eStructuredDataTypeBoolean
):
ignore_module_load_error = (
ignore_module_load_error_arg.GetBooleanValue()
)

Copy link
Contributor Author

@rchamala rchamala Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove it based on our recent discussion to enable it by default

Comment on lines 89 to 90
if not os.path.exists(module_path) and not ignore_module_load_error:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Suggested change
if not os.path.exists(module_path) and not ignore_module_load_error:
return

Comment on lines 92 to 116
# Get custom module load address from args
module_load_addr = None
module_load_addr_arg = custom_module.GetValueForKey("load_addr")
if (
module_load_addr_arg
and module_load_addr_arg.IsValid()
and module_load_addr_arg.GetType()
== lldb.eStructuredDataTypeInteger
):
module_load_addr = module_load_addr_arg.GetIntegerValue()

# If module load address is not specified/valid, try to find it from corefile module
if module_load_addr is None:
corefile_module = self.get_module_with_name(
self.corefile_target, module_name
)

if not corefile_module or not corefile_module.IsValid():
return

module_load_addr = (
corefile_module.GetObjectFileHeaderAddress().GetLoadAddress(
self.corefile_target
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Seems like you're re-implementing the C++ module loading logic in python here which doesn't seems necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current test logic is hardcoded to find a single module named "libbaz.dylib" and get load address from corefile_module. I was hoping to have a test that accepts different modules given module path and module load address from the caller. If it is not given, I was planning to get the load address from corefile_module similar to how the current test is doing. Lmk if that sounds fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, the reason I mentioned this is because I wanted to exercise the error handling code path in C++ rather than do in python

{
"path": module_path,
"load_addr": module_load_addr,
"ignore_module_load_error": ignore_module_load_error,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need that anymore

Suggested change
"ignore_module_load_error": ignore_module_load_error,


dict->GetValueForKeyAsString("path", value);
FileSpec objfile(value);
module_sp->SetFileSpecAndObjectName(objfile, objfile.GetFilename());

return module_list.AppendIfNeeded(module_sp);
return ignore_module_load_error ? true
: module_list.AppendIfNeeded(module_sp);
};

if (!loaded_images_sp->ForEach(reload_image))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make a comment on the next line, but all you need to do here is to remove the return keyword on the next line. This way, we will still log Couldn't reload all images in case some modules failed to load but we wouldn't return false.

@rchamala
Copy link
Contributor Author

rchamala commented Feb 21, 2025

So we've talked offline with @jimingham and there are 2 things we think should change here:

  • ScriptedProcess module loading is too strict (compared to non-ScriptedProcess approach). I don't think having a variable in the ScriptedProcess to skip loading error per module is a good way to address this issue. Instead I'd just make the ForEach call not return and log loading errors in the lambda.

Yes, I can make it the default. I was initially not sure if we want to retain the old behavior for some reason, so had added the conditional logic. If we plan to make it the default, the change can be simple but we probably can't use the ForEach block directly. The ForEach block runs callback for all entries serially and returns false if one of them fails without running rest of the entries . I can either

  1. get the value at each index and run the callback
  2. add a boolean in ForEach block to indicate whether to fail on an entry failure.

Will do 1) for now

  • Is this behavior something that's specific to ElfCore or is that generic to any process plugin ? If it's the former, I really think this should become an lldb-wide thing. This can be done as a follow-up.

I realized that it is our internal logic that modified the dynamic loader to create a placeholder module, whenever a module is not found. I believe the current lldb-wide logic does not create placeholder modules looking at the logic, We can follow up with @clayborg on how to integrate similar logic to be lldb-wide.

@rchamala
Copy link
Contributor Author

I think this patch could be heavily simplified: we could just not return in case the ForEach call fails.

Yes, based on our recent discussion , will remove it and instead of using ForEach call, will retrieve each item at index and call the callback separately.

Also, I think it would be nice if you could add a placeholder module for the ones that failed to load, so we would see them in the image list output with the (*) next to them

Sure

@@ -453,22 +455,16 @@ ScriptedProcess::GetLoadedDynamicLibrariesInfos() {
if (!dict->HasKey("load_addr"))
return error_with_message("Dictionary is missing key 'load_addr'");

llvm::StringRef path = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using a new StringRef here for path, I'd rename the StringRef used to get UUID accordingly instead of leaving the value variable name.

Copy link
Member

@medismailben medismailben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rchamala rchamala merged commit 2ff3b18 into llvm:main Feb 23, 2025
6 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants