Skip to content

Clean up PlatformDarwinKernel::GetSharedModule, document #78652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jasonmolenda
Copy link
Collaborator

PlatformDarwinKernel::GetSharedModule, which can find a kernel or kext from a local filesystem scan, needed a little cleanup. The method which finds kernels was (1) not looking for the SymbolFileSpec when creating a Module, and (2) adding that newly created Module to a Target, which GetSharedModule should not be doing - after auditing many other subclass implementations of this method, I haven't found any others doing it. Platform::GetSharedModule didn't have a headerdoc so it took a little work to piece together the intended behaviors.

This is addressing a bug where PlatformDarwinKernel::GetSharedModuleKernel would find the ObjectFile for a kernel, create a Module, and add it to the Target. Then up in DynamicLoaderDarwinKernel, it would check if the Module had a SymbolFile FileSpec, and because it did not, it would do its own search for a binary & dSYM, find them, and then add that to the Target. Now we have two copies of the Module in the Target, one with a dSYM and the other without, and only one of them has its load addresses set.

GetSharedModule should not be adding binaries to the Target, and it should set the SymbolFile FileSpec when it is creating the Module.

rdar://120895951

PlatformDarwinKernel::GetSharedModule, which can find a kernel or
kext from a local filesystem scan, needed a little cleanup.  The
method which finds kernels was (1) not looking for the SymbolFileSpec
when creating a Module, and (2) adding that newly created Module
to a Target, which GetSharedModule should not be doing - after
auditing many other subclass implementations of this method, I
haven't found any others doing it.  Platform::GetSharedModule didn't
have a headerdoc so it took a little work to piece together the
intended behaviors.

This is addressing a bug where PlatformDarwinKernel::GetSharedModuleKernel
would find the ObjectFile for a kernel, create a Module, and add it
to the Target.  Then up in DynamicLoaderDarwinKernel, it would check if
the Module had a SymbolFile FileSpec, and because it did not, it would
do its own search for a binary & dSYM, find them, and then add that to
the Target.  Now we have two copies of the Module in the Target, one
with a dSYM and the other without, and only one of them has its load
addresses set.

GetSharedModule should not be adding binaries to the Target, and it
should set the SymbolFile FileSpec when it is creating the Module.

rdar://120895951
@llvmbot
Copy link
Member

llvmbot commented Jan 19, 2024

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

PlatformDarwinKernel::GetSharedModule, which can find a kernel or kext from a local filesystem scan, needed a little cleanup. The method which finds kernels was (1) not looking for the SymbolFileSpec when creating a Module, and (2) adding that newly created Module to a Target, which GetSharedModule should not be doing - after auditing many other subclass implementations of this method, I haven't found any others doing it. Platform::GetSharedModule didn't have a headerdoc so it took a little work to piece together the intended behaviors.

This is addressing a bug where PlatformDarwinKernel::GetSharedModuleKernel would find the ObjectFile for a kernel, create a Module, and add it to the Target. Then up in DynamicLoaderDarwinKernel, it would check if the Module had a SymbolFile FileSpec, and because it did not, it would do its own search for a binary & dSYM, find them, and then add that to the Target. Now we have two copies of the Module in the Target, one with a dSYM and the other without, and only one of them has its load addresses set.

GetSharedModule should not be adding binaries to the Target, and it should set the SymbolFile FileSpec when it is creating the Module.

rdar://120895951


Full diff: https://github.com/llvm/llvm-project/pull/78652.diff

2 Files Affected:

  • (modified) lldb/include/lldb/Target/Platform.h (+26)
  • (modified) lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp (+21-41)
diff --git a/lldb/include/lldb/Target/Platform.h b/lldb/include/lldb/Target/Platform.h
index 129e4565d9ff99..13196ff74d663f 100644
--- a/lldb/include/lldb/Target/Platform.h
+++ b/lldb/include/lldb/Target/Platform.h
@@ -288,6 +288,32 @@ class Platform : public PluginInterface {
   LocateExecutableScriptingResources(Target *target, Module &module,
                                      Stream &feedback_stream);
 
+  /// \param[in] module_spec
+  ///     The ModuleSpec of a binary to find.
+  ///
+  /// \param[in] process
+  ///     A Process.
+  ///
+  /// \param[out] module_sp
+  ///     A Module that matches the ModuleSpec, if one is found.
+  ///
+  /// \param[in] module_search_paths_ptr
+  ///     Locations to possibly look for a binary that matches the ModuleSpec.
+  ///
+  /// \param[out] old_modules
+  ///     Existing Modules in the Process' Target image list which match
+  ///     the FileSpec.
+  ///
+  /// \param[out] did_create_ptr
+  ///     Optional boolean, nullptr may be passed for this argument.
+  ///     If this method is returning a *new* ModuleSP, this
+  ///     will be set to true.
+  ///     If this method is returning a ModuleSP that is already in the
+  ///     Target's image list, it will be false.
+  ///
+  /// \return
+  ///     The Status object for any errors found while searching for
+  ///     the binary.
   virtual Status GetSharedModule(
       const ModuleSpec &module_spec, Process *process,
       lldb::ModuleSP &module_sp, const FileSpecList *module_search_paths_ptr,
diff --git a/lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp b/lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
index e2839f3285cce4..6490432c72e18d 100644
--- a/lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
+++ b/lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
@@ -791,9 +791,10 @@ Status PlatformDarwinKernel::GetSharedModuleKernel(
     const ModuleSpec &module_spec, Process *process, ModuleSP &module_sp,
     const FileSpecList *module_search_paths_ptr,
     llvm::SmallVectorImpl<ModuleSP> *old_modules, bool *did_create_ptr) {
-  Status error;
   module_sp.reset();
   UpdateKextandKernelsLocalScan();
+  if (did_create_ptr)
+    *did_create_ptr = false;
 
   // First try all kernel binaries that have a dSYM next to them
   for (auto possible_kernel : m_kernel_binaries_with_dsyms) {
@@ -803,22 +804,19 @@ Status PlatformDarwinKernel::GetSharedModuleKernel(
       module_sp.reset(new Module(kern_spec));
       if (module_sp && module_sp->GetObjectFile() &&
           module_sp->MatchesModuleSpec(kern_spec)) {
-        // module_sp is an actual kernel binary we want to add.
-        if (process) {
-          const bool notify = false;
-          process->GetTarget().GetImages().AppendIfNeeded(module_sp, notify);
-          error.Clear();
-          return error;
-        } else {
-          error = ModuleList::GetSharedModule(kern_spec, module_sp, nullptr,
-                                              nullptr, nullptr);
-          if (module_sp && module_sp->GetObjectFile() &&
-              module_sp->GetObjectFile()->GetType() !=
-                  ObjectFile::Type::eTypeCoreFile) {
-            return error;
-          }
-          module_sp.reset();
-        }
+        // The dSYM is next to the binary (that's the only
+        // way it ends up in the index), but it might be a
+        // .dSYM.yaa that needs to be expanded, don't just
+        // append ".dSYM" to the filename for the SymbolFile.
+        FileSpecList search_paths =
+            process->GetTarget().GetDebugFileSearchPaths();
+        FileSpec dsym_fspec =
+            PluginManager::LocateExecutableSymbolFile(kern_spec, search_paths);
+        if (FileSystem::Instance().Exists(dsym_fspec))
+          module_sp->SetSymbolFileFileSpec(dsym_fspec);
+        if (did_create_ptr)
+          *did_create_ptr = true;
+        return {};
       }
     }
   }
@@ -836,36 +834,18 @@ Status PlatformDarwinKernel::GetSharedModuleKernel(
       module_sp.reset(new Module(kern_spec));
       if (module_sp && module_sp->GetObjectFile() &&
           module_sp->MatchesModuleSpec(kern_spec)) {
-        // module_sp is an actual kernel binary we want to add.
-        if (process) {
-          const bool notify = false;
-          process->GetTarget().GetImages().AppendIfNeeded(module_sp, notify);
-          error.Clear();
-          return error;
-        } else {
-          error = ModuleList::GetSharedModule(kern_spec, module_sp, nullptr,
-                                              nullptr, nullptr);
-          if (module_sp && module_sp->GetObjectFile() &&
-              module_sp->GetObjectFile()->GetType() !=
-                  ObjectFile::Type::eTypeCoreFile) {
-            return error;
-          }
-          module_sp.reset();
-        }
+        if (did_create_ptr)
+          *did_create_ptr = true;
+        return {};
       }
     }
   }
 
   // Give the generic methods, including possibly calling into  DebugSymbols
   // framework on macOS systems, a chance.
-  error = PlatformDarwin::GetSharedModule(module_spec, process, module_sp,
-                                          module_search_paths_ptr, old_modules,
-                                          did_create_ptr);
-  if (error.Success() && module_sp.get()) {
-    return error;
-  }
-
-  return error;
+  return PlatformDarwin::GetSharedModule(module_spec, process, module_sp,
+                                         module_search_paths_ptr, old_modules,
+                                         did_create_ptr);
 }
 
 std::vector<lldb_private::FileSpec>

@jasonmolenda
Copy link
Collaborator Author

Testing this is a bit tricky, I don't think a corefile will trigger the bug because ProcessMachCore has already done a full search for the kernel binary & dSYM and loaded it into lldb; it needs to be a live debug session, and I need something which looks enough like a kernel binary to be accepted by this (I could make a little mach-o header modifier program and that converts a userland binary into a kernel binary), but then creating an SB API test that convinces DynamicLoaderDarwinKernel to active would be very difficult.

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah testing this is probably tricky to do.

Nonetheless, I think your cleanup logic makes sense. It's also nice that you've documented the intended behavior. 😄

}

return error;
return PlatformDarwin::GetSharedModule(module_spec, process, module_sp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure that this is a dumb question, so forgive me ...

It seems like we can end up here with a "useless" pointer to an instance of Module in module_sp. Fortunately, the first thing that PlatformDarwin::GetSharedModule does is to reset that pointer which means that there is no leak. However, would it make sense to do a reset before calling just in case the code in that function changes in the future?

Again, I apologize if this is a stupid question!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a dumb question at all! module_sp is an "out parameter" and not clearing an out parameter is a common opportunity for bugs. Whether it's a "bug" depends on the contract of course. Per the documentation that Jason added here ("A Module that matches the ModuleSpec, if one is found.") it should only be set when a module is found (and assuming it applies to all the overloads) the same expectation could be had about PlatformDarwin::GetSharedModule.

The obvious way to sidestep this issue is to not use out parameters at all. If the shared pointer was the only out parameter we could've potentially refactored this to return an Expected<ModuleSP> or something, but we have a handful of other out parameters that suffer from the same problem.

Regardless, as we're using smart pointers, there's no risk of a leak. Therefore I don't think the extra reset is warranted. If we really wanted to be defensive, I'd prefer to use asserts to catch cases where we're not holding ourselves to these kind of contracts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! Now that I looked at the signature and saw that module_sp is passed by reference, we assume that it will go out of the caller's scope (or the caller's, caller's scope ...) at some point and the "leaked" instance of Module would be cleaned up then!

Thank you for the explanation.

It just seemed odd to have that contain a pointer at all at that point in the code because the point of calling PlatformDarwin::GetSharedModule is that a valid one was not found. I.e., if the condition on lines 805/806 was false, it seemed to me that "A Module that matches the ModuleSpec" was not actually found and so module_sp should have been reset before either falling out of the loop or going to the next iteration.

Thank you again for the explanation!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question, yeah Jonas is right I shouldn't have bothered reseting the shared pointer at the beginning, I did that more to demonstrate in the code that it does not have a value on entry to the method, to make it clearer for the reader. An assert would work just as well.

Copy link
Contributor

@hawkinsw hawkinsw Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to follow up again, @jasonmolenda, but I think that putting an assert at the top of PlatformDarwin::GetSharedModule might have exposed the curiosity that I am thinking about ...

Let's say that we are on the last iteration of this loop:

  // First try all kernel binaries that have a dSYM next to them
  for (auto possible_kernel : m_kernel_binaries_with_dsyms) {
    if (FileSystem::Instance().Exists(possible_kernel)) {
      ModuleSpec kern_spec(possible_kernel);
      kern_spec.GetUUID() = module_spec.GetUUID();
      module_sp.reset(new Module(kern_spec));

module_sp is reset to point to a pointer to a new instantiation of a Module. Further, assume that

      (module_sp && module_sp->GetObjectFile() &&
          module_sp->MatchesModuleSpec(kern_spec))

is not true. That means that we fall out of that loop and module_sp still points to an instance of a Module. We proceed to the next loop and, for the sake of argument, let's say that objfile_names contains 0 entries for every possible_kernel_dsym. Effectively, then, there will be no iterations of that second loop.

As a result, control will pass to PlatformDarwin::GetSharedModule with module_sp pointing to a leftover instance of a Module from the last iteration of the first loop. It seems like it should be that we reset module_sp before doing "the next" iteration on both of the loops in the function. I am sorry if it seems like I am being hard headed. I am just trying to understand and help!

Thank you for your time!!

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Instead of resetting the module_sp out variable, add an
assert that it is empty to make it easier for readers of
this method to follow the logic.
@jasonmolenda jasonmolenda merged commit 430e145 into llvm:main Jan 19, 2024
@jasonmolenda jasonmolenda deleted the clean-up-platformdarwinkernel-getsharedmodulekernel branch January 19, 2024 07:26
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Jan 19, 2024
PlatformDarwinKernel::GetSharedModule, which can find a kernel or kext
from a local filesystem scan, needed a little cleanup. The method which
finds kernels was (1) not looking for the SymbolFileSpec when creating a
Module, and (2) adding that newly created Module to a Target, which
GetSharedModule should not be doing - after auditing many other subclass
implementations of this method, I haven't found any others doing it.
Platform::GetSharedModule didn't have a headerdoc so it took a little
work to piece together the intended behaviors.

This is addressing a bug where
PlatformDarwinKernel::GetSharedModuleKernel would find the ObjectFile
for a kernel, create a Module, and add it to the Target. Then up in
DynamicLoaderDarwinKernel, it would check if the Module had a SymbolFile
FileSpec, and because it did not, it would do its own search for a
binary & dSYM, find them, and then add that to the Target. Now we have
two copies of the Module in the Target, one with a dSYM and the other
without, and only one of them has its load addresses set.

GetSharedModule should not be adding binaries to the Target, and it
should set the SymbolFile FileSpec when it is creating the Module.

rdar://120895951
(cherry picked from commit 430e145)
jasonmolenda added a commit to swiftlang/llvm-project that referenced this pull request Jan 20, 2024
…rmdarwinkernel-getsharedmodule

Clean up PlatformDarwinKernel::GetSharedModule, document (llvm#78652)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants