Skip to content

Remove redundant symbol lookups in IRExecutionUnit::FindInSymbols #102835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

DmT021
Copy link
Contributor

@DmT021 DmT021 commented Aug 11, 2024

When we search for a symbol, we first check if it is in the module_sp of the current SymbolContext, and if not, we check in the target's modules. However, the target's ModuleList also includes the already checked module, which leads to a redundant search in it.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 11, 2024

@augusto2112 Please take a look

Copy link
Contributor

@augusto2112 augusto2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall this looks good! Just some small things to fix.

Could you run ninja check-lldb from your build folder to make sure this change doesn't break anything?

if (auto load_addr = resolver.Resolve(sc_list))
return *load_addr;
}

if (sc.target_sp) {
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you removed the check here, you can remove the parenthesis as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just wanted to emphasize that these two lookups are similar, but intentionally separeted.

@augusto2112 augusto2112 requested a review from bulbazord August 12, 2024 17:09
@DmT021
Copy link
Contributor Author

DmT021 commented Aug 12, 2024

Could you run ninja check-lldb from your build folder to make sure this change doesn't break anything?

Already did. All tests are ok.

@augusto2112
Copy link
Contributor

@DmT021 I think you can update this patch and set it ready for review then :)

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me and the change looks good. Feel free to move this out of draft mode :)

@DmT021 DmT021 force-pushed the wp/redundant-symbol-lookups-main branch from c417f45 to 45c91e4 Compare August 12, 2024 17:49
@DmT021 DmT021 marked this pull request as ready for review August 12, 2024 17:49
@DmT021 DmT021 requested a review from JDevlieghere as a code owner August 12, 2024 17:49
@llvmbot llvmbot added the lldb label Aug 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2024

@llvm/pr-subscribers-backend-hexagon

@llvm/pr-subscribers-lldb

Author: Dmitrii Galimzianov (DmT021)

Changes

When we search for a symbol, we first check if it is in the module_sp of the current SymbolContext, and if not, we check in the target's modules. However, the target's ModuleList also includes the already checked module, which leads to a redundant search in it.


Full diff: https://github.com/llvm/llvm-project/pull/102835.diff

1 Files Affected:

  • (modified) lldb/source/Expression/IRExecutionUnit.cpp (+15-12)
diff --git a/lldb/source/Expression/IRExecutionUnit.cpp b/lldb/source/Expression/IRExecutionUnit.cpp
index f220704423627d..aaf9cc3cfc9a23 100644
--- a/lldb/source/Expression/IRExecutionUnit.cpp
+++ b/lldb/source/Expression/IRExecutionUnit.cpp
@@ -785,6 +785,10 @@ IRExecutionUnit::FindInSymbols(const std::vector<ConstString> &names,
     return LLDB_INVALID_ADDRESS;
   }
 
+  ModuleList images = target->GetImages();
+  // We'll process module_sp separately, before the other modules.
+  images.Remove(sc.module_sp);
+
   LoadAddressResolver resolver(target, symbol_was_missing_weak);
 
   ModuleFunctionSearchOptions function_options;
@@ -799,23 +803,22 @@ IRExecutionUnit::FindInSymbols(const std::vector<ConstString> &names,
                                   sc_list);
       if (auto load_addr = resolver.Resolve(sc_list))
         return *load_addr;
-    }
 
-    if (sc.target_sp) {
-      SymbolContextList sc_list;
-      sc.target_sp->GetImages().FindFunctions(name, lldb::eFunctionNameTypeFull,
-                                              function_options, sc_list);
+      sc.module_sp->FindSymbolsWithNameAndType(name, lldb::eSymbolTypeAny,
+                                               sc_list);
       if (auto load_addr = resolver.Resolve(sc_list))
         return *load_addr;
     }
 
-    if (sc.target_sp) {
-      SymbolContextList sc_list;
-      sc.target_sp->GetImages().FindSymbolsWithNameAndType(
-          name, lldb::eSymbolTypeAny, sc_list);
-      if (auto load_addr = resolver.Resolve(sc_list))
-        return *load_addr;
-    }
+    SymbolContextList sc_list;
+    images.FindFunctions(name, lldb::eFunctionNameTypeFull, function_options,
+                         sc_list);
+    if (auto load_addr = resolver.Resolve(sc_list))
+      return *load_addr;
+
+    images.FindSymbolsWithNameAndType(name, lldb::eSymbolTypeAny, sc_list);
+    if (auto load_addr = resolver.Resolve(sc_list))
+      return *load_addr;
 
     lldb::addr_t best_internal_load_address =
         resolver.GetBestInternalLoadAddress();

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, makes sense to me. Out of curiosity, could you provide some context around where/how this came up?

Ideally we would have versions of FindFunctions/FindSymbolsWithNameAndType that take a ModuleSP hint (aka where to search first). We already have such an API for FindTypes (which got introduced with the TypeQuery rework). We just never added one for the other FindXXX APIs because those weren't reworked to use TypeQuery.

Not sure it's worth adding that search_first parameter to all overloads of FindFunctions, but maybe that wouldn't look as awful as I think it might?

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 12, 2024

Out of curiosity, could you provide some context around where/how this came up?

I just noticed this by accident in the dwarf all log when I was trying to figure out why LLDB was evaluating expressions slowly in my project.
Here's a swift forum thread with some logs.
https://forums.swift.org/t/symbolfiledwarf-findtypes-is-called-too-many-times-apparently/72902
Notice for example how swift_release is searched twice in F1.o

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this does solve the problem, it means other clients that want to do the same thing will need to create this same kind of code.

I would propose we modify ModuleList::FindSymbolsWithNameAndType(...) to take an extra parameter SymbolContext &sc which indicates the context from which we are doing the search:

void ModuleList::FindSymbolsWithNameAndType(
ConstString name,
lldb::SymbolType symbol_type,
const SymbolContext &sc, /// <-- The symbol context from which to perform the lookup
SymbolContextList &sc_list) const;

Then the code inside will check if "sc" has a module, and if so, prefer that module first since someone is performing the search from somewhere (like an expression has a symbol context of the location in the current stack frame).

Then we don't duplicate this kind of code everywhere.

We need these SymbolContext &sc in a lot of searches:

  • search for types for an expression so we can prefer types from the current function, then compile unit, then module, and all modules.
  • search for functions, for same reason

@augusto2112
Copy link
Contributor

I agree that passing some sort of "hint" object to prioritize the search would be a more complete solution, but since this is @DmT021's first contribution to LLVM I think we should merge this change if there are no concerns with the current implementation, and if @DmT021 wants to expand on a more general solution for the Find... family of functions he could work on that on a separate patch.

@clayborg I can see how a module could be used as a hint, but I'm not sure how we'd prioritize lookups using functions or compile unit. The code in SymTab.cpp does a binary search to find the symbol, how could we restrict this further by passing a function hint, for example?

@clayborg
Copy link
Collaborator

I agree that passing some sort of "hint" object to prioritize the search would be a more complete solution, but since this is @DmT021's first contribution to LLVM I think we should merge this change if there are no concerns with the current implementation, and if @DmT021 wants to expand on a more general solution for the Find... family of functions he could work on that on a separate patch.

It isn't required, but it really isn't any more work to move the functionality so it can be re-used. I am fine either way, but would ok it right away if we just moved the functionality.

@clayborg I can see how a module could be used as a hint, but I'm not sure how we'd prioritize lookups using functions or compile unit. The code in SymTab.cpp does a binary search to find the symbol, how could we restrict this further by passing a function hint, for example?

For symbol lookups, we would only check the Module. If there is not module, we would just search through the target's image list. If there was a module, then we search it first, and avoid that module if we search the module list just by checking the pointer of the module when iterating over the target's module list. The full symbol context isn't required, but it would be nice to set the precedent that others can duplicate when doing lookups of other kinds.

Type lookups are the ones that can benefit from having a SymbolContext to use as the search context. If we type in an expression that mentions a typename, if would be good to start looking in the current function, then fall back to the compile unit, then fall back to the module, and then fall back to all modules. That would help find the best type when searching, and I believe we do have some code that does something similar, but it is done after getting all of the results back.

@DmT021 DmT021 force-pushed the wp/redundant-symbol-lookups-main branch from 45c91e4 to e20d83c Compare August 13, 2024 05:19
@DmT021
Copy link
Contributor Author

DmT021 commented Aug 13, 2024

@clayborg Please check if I got you right. I moved this logic into ModuleList, but only to FindSymbolsWithNameAndType and FindFunctions methods, since they were the only ones used in the original IRExecutionUnit::FindInSymbols.
Also, I'm not sure if using SymbolContext() explicitly in places where we don't have context is a better solution, then having a default. But I think it will lead to better discoverability that such optimization already exists.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying this, and yes, this is exactly what I was thinking!

After seeing all of the code changes that this created, I suggested in ModuleList.h that we add a the symbol context as a default parameter to the functions as const SymbolContext *sc = nullptr. If we do this, most of the changes in this diff go away and it will be small again. Also if the symbol context comes in a as a pointer, users can know that this parameter is optional and it is ok for it to be nullptr, where it might not be clear to users that they can pass in a default constructed SymbolContext.

Thanks for the changes, this improves this patch quite a bit and I think with a few more modifications this will be a small patch again.

Thanks again! I look forward to seeing the next round of changes.

Comment on lines 286 to 288
void FindFunctions(ConstString name, lldb::FunctionNameType name_type_mask,
const ModuleFunctionSearchOptions &options,
SymbolContextList &sc_list) const;
const SymbolContext &sc, SymbolContextList &sc_list) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to reduce the number of changes in the codebase if we add the symbol context at the end and make it a pointer. If the pointer is NULL, then no context matching will occur, else it will try to use the sc->module_sp. It will help reduce the changes in our codebase and having a pointer might help indicate that the symbol context is optional.

Maybe we do:


void FindFunctions(ConstString name, lldb::FunctionNameType name_type_mask,
                     const ModuleFunctionSearchOptions &options,
                     SymbolContextList &sc_list, const SymbolContext *sc = nullptr) const;

@@ -295,7 +295,7 @@ class ModuleList {
/// \see Module::FindFunctions ()
void FindFunctions(const RegularExpression &name,
const ModuleFunctionSearchOptions &options,
SymbolContextList &sc_list);
const SymbolContext &sc, SymbolContextList &sc_list);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto from above

Comment on lines 356 to 358
void FindSymbolsWithNameAndType(ConstString name,
lldb::SymbolType symbol_type,
const SymbolContext &sc,
SymbolContextList &sc_list) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Comment on lines 1767 to 1768
target_sp->GetImages().FindFunctions(
ConstString(name), mask, function_options, SymbolContext(), *sb_sc_list);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the default parameter as suggested above, all changes in this file go away.

Comment on lines 354 to 355
GetTarget().GetImages().FindFunctions(
name, eFunctionNameTypeAuto, function_options, SymbolContext(), sc_list);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the default parameter as suggested above, all changes in this file go away.

@clayborg
Copy link
Collaborator

clayborg commented Aug 13, 2024

We should also document in the headerdoc this new const SymbolContext *sc = nullptr argument and say something like:

/// \param[in] sc If the value is NULL, then all modules will be searched in 
/// order. If the value is a valid pointer and if a module is specified in the 
/// symbol context, that module will be searched first followed by all other
/// modules in the list.

@@ -895,32 +895,14 @@ const Symbol *SymbolContext::FindBestGlobalDataSymbol(ConstString name,
}
};

if (module) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay

for (const ModuleSP &module_sp : m_modules)
module_sp->FindSymbolsWithNameAndType(name, symbol_type, sc_list);
if (module_sp != sc.module_sp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely minor nit, but can we be consistent with how we skip over the module hint?
I.e., either always do:

if (module_sp != sc.module_sp)
  ...

or

if (module_sp == sc.module_sp)
  continue;

Which one to use I'm not too bothered about. Just makes it easer to read if this is consistent in all the FindSymbolsWithNameAndType/FindFunctions APIs

@@ -285,7 +285,7 @@ class ModuleList {
/// \see Module::FindFunctions ()
void FindFunctions(ConstString name, lldb::FunctionNameType name_type_mask,
const ModuleFunctionSearchOptions &options,
SymbolContextList &sc_list) const;
const SymbolContext &sc, SymbolContextList &sc_list) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call sc, search_first instead, or something more self-documenting like that? (which is what FindTypes also does).

And as Greg pointed out, could we add documentation to the function describing what the parameter does?

Copy link
Contributor Author

@DmT021 DmT021 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call sc, search_first instead, or something more self-documenting like that? (which is what FindTypes also does).

The word "hint" came up a couple times in the discussion as well. Maybe search_hint ?

could we add documentation to the function describing what the parameter does?

Sure.

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 13, 2024

Oh, wait a sec. I actually changed the behavior of the IRExecutionUnit::FindInSymbols. It used to exit early if the function was found in module_sp, but now we will always scan through the whole ModuleList.
And we can't change the behavior of the ModuleList::FindFunctions to return after the first match, because it might be not what we want in general case.
Maybe I should add more generic versions of these functions taking a callback to invoke on each match instead of SymbolContextList? Something like

void ModuleList::FindSymbolsWithNameAndType(
  ConstString name,
  lldb::SymbolType symbol_type,
  const SymbolContext *search_hint,
  llvm::function_ref<bool(const Symbol&)> callback
  )

where the result of callback indicates whether the search should be continued or not.

@augusto2112
Copy link
Contributor

@clayborg I can see how a module could be used as a hint, but I'm not sure how we'd prioritize lookups using functions or compile unit. The code in SymTab.cpp does a binary search to find the symbol, how could we restrict this further by passing a function hint, for example?

For symbol lookups, we would only check the Module. If there is not module, we would just search through the target's image list. If there was a module, then we search it first, and avoid that module if we search the module list just by checking the pointer of the module when iterating over the target's module list. The full symbol context isn't required, but it would be nice to set the precedent that others can duplicate when doing lookups of other kinds.

If that's the case, personally I think that it would make more sense if FindFunctions and FindSymbolsWithNameAndType took in a module because it wouldn't be possible to pass in something that doesn't make sense to those functions (like a symbol context with only a compile unit set, for example).

@Michael137
Copy link
Member

Oh, wait a sec. I actually changed the behavior of the IRExecutionUnit::FindInSymbols. It used to exit early if the function was found in module_sp, but now we will always scan through the whole ModuleList. And we can't change the behavior of the ModuleList::FindFunctions to return after the first match, because it might be not what we want in general case. Maybe I should add more generic versions of these functions taking a callback to invoke on each match instead of SymbolContextList? Something like

void ModuleList::FindSymbolsWithNameAndType(
  ConstString name,
  lldb::SymbolType symbol_type,
  const SymbolContext *search_hint,
  llvm::function_ref<bool(const Symbol&)> callback
  )

where the result of callback indicates whether the search should be continued or not.

Can we make it configurable behind the ModuleSearchOptions? This is what the FindTypes API does via TypeQuery (which has a bunch of flags to configure your search, including eFindOne, etc.). There's already a ModuleFunctionSearchOptions that could be re-used for this?

@augusto2112
Copy link
Contributor

Oh, wait a sec. I actually changed the behavior of the IRExecutionUnit::FindInSymbols. It used to exit early if the function was found in module_sp, but now we will always scan through the whole ModuleList. And we can't change the behavior of the ModuleList::FindFunctions to return after the first match, because it might be not what we want in general case. Maybe I should add more generic versions of these functions taking a callback to invoke on each match instead of SymbolContextList? Something like

void ModuleList::FindSymbolsWithNameAndType(
  ConstString name,
  lldb::SymbolType symbol_type,
  const SymbolContext *search_hint,
  llvm::function_ref<bool(const Symbol&)> callback
  )

where the result of callback indicates whether the search should be continued or not.

I think it'd be fine to return early if the type if found with the search_hint, as long as this is clearly documented in the API. My impression is that this would be a more useful behavior than just reordering the search results so the search_hint results are first, and I don't think we need the extra flexibility provided by that callback at the moment. I don't touch this code very often though so if @clayborg or @Michael137 disagree I'd defer to them.

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't thoroughly reviewed the latest callback-based approach but wonder if it is just more straightforward to keep the original plan of adding asearch_hint to the FindFunctions APIs, and add a flag to ModuleFunctionSearchOptions that indicates whether we're looking for a single function or multiple. And if we're looking for a single function, short-circuit if we find one. Thanks for working through our suggestions so far!

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 19, 2024

The tricky part is in the post-processing. IRExecutionUnit::FindInSymbols wants an external symbol and falls back to an internal one after all the modules are scanned. SymbolContext::FindBestGlobalDataSymbol prefers an internal symbol if it is found in the hinted module.
Also resolution of a reexported symbol is part of the post-processing

@Michael137
Copy link
Member

Michael137 commented Aug 21, 2024

I would still prefer it if we just added a search_hint into ModuleFunctionSearchOptions (and as a parameter to FindSymbolsWithNameAndType). And document that if this hint is provided, we short-circuit the search. I'm not sure there is a lot of use in having a search_hint but still continue trawling through all the images even if we found it in the module we hinted at.

In the future we could consider consolidating the FindFunctions APIs, similar to what Greg did with FindTypes (e.g., we do we need both a CompilerContext and CompilerDeclContext based API, feels like those should really be implemented in terms of each other. But maybe that's the extent of which this can be cleaned up? Would need to think more about this). I would just prefer to keep the current patch small so it doesn't inhibit refactoring in the future.

I'll let @clayborg @augusto2112 chime in here

@augusto2112
Copy link
Contributor

I agree with Michael, I think just adding the search hint and clearly documenting it would already be a great improvement and would make implementing this patch a lot simpler.

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 21, 2024

I'm not sure I understand because this will break the logic that is currently in place. Many nuances in post-processing wouldn't be possible to implement with a simple short-circuit.
If a symbol is found in the hinted module, it might be internal. IRExecutionUnit::FindInSymbols processes the results by preferring external symbols over internal ones. If I short-circuit the search after the first match in the hinted module IRExecutionUnit::FindInSymbols just wouldn't see potentially more preferable results.
On the other hand SymbolContext::FindBestGlobalDataSymbol prefers symbols of any type from the hinted module over the other symbols. But it still prefers external symbols over internal ones. So the priority order looks like this: hint external > hint internal > other external > other internal.
Moreover, SymbolContext::FindBestGlobalDataSymbol verifies that there are no multiple matches found, but it's tricky as well. It checks if there's a result in the module - it does verification only matches from the module, otherwise it verifies all the results.
Perhaps I can't explain all the nuances quite accurately, you should look at the post-processing code itself in these functions. But the most important thing is that we don't have a full understanding of when we can stop inside FindFunction/FindSymbolsWithNameAndType.

@Michael137
Copy link
Member

Michael137 commented Aug 26, 2024

I'm not sure I understand because this will break the logic that is currently in place. Many nuances in post-processing wouldn't be possible to implement with a simple short-circuit. If a symbol is found in the hinted module, it might be internal. IRExecutionUnit::FindInSymbols processes the results by preferring external symbols over internal ones. If I short-circuit the search after the first match in the hinted module IRExecutionUnit::FindInSymbols just wouldn't see potentially more preferable results. On the other hand SymbolContext::FindBestGlobalDataSymbol prefers symbols of any type from the hinted module over the other symbols. But it still prefers external symbols over internal ones. So the priority order looks like this: hint external > hint internal > other external > other internal. Moreover, SymbolContext::FindBestGlobalDataSymbol verifies that there are no multiple matches found, but it's tricky as well. It checks if there's a result in the module - it does verification only matches from the module, otherwise it verifies all the results. Perhaps I can't explain all the nuances quite accurately, you should look at the post-processing code itself in these functions. But the most important thing is that we don't have a full understanding of when we can stop inside FindFunction/FindSymbolsWithNameAndType.

Sorry, just getting back to this. Got lost in my review queue.

Ok thanks for pointing this out. I didn't realize Resolve was tracking the internal load address and only succeeding when it found an external symbol. That's quite a subtle API. I would agree with Greg's expectation of how symbol resolution should work when evaluating expressions. Starting with the most local match, search outward until we find something appropriate to call. I'm not sure why exactly we have the current external/internal preference order, and also why FindInSymbols and FindBestGlobalDataSymbol differ slightly in how they deal with the local module. I suspect these just naturally evolved as the expression parser use-cases grew.

What happens if we stop preferring the external symbols over internal ones in IRExecutionUnit::FindInSymbols? What tests break?

(side-note, ClangExpressionDeclMap::LookupFunction also has logic that prefers an external symbol over a non-external one..)

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 28, 2024

What happens if we stop preferring the external symbols over internal ones in IRExecutionUnit::FindInSymbols? What tests break?

It looks like there are no failing tests.

@clayborg
Copy link
Collaborator

What happens if we stop preferring the external symbols over internal ones in IRExecutionUnit::FindInSymbols? What tests break?

It looks like there are no failing tests.

We should always prefer symbols from the current module first, probably external first, then fall back to internal. If we do a search of all modules, we should prefer external symbols first and then internal, but only if they are unique. Just think about how a symbol would be resolved inside of a shared library vs how it would get accessed from outside of the shared library. Debuggers can break the rules when we need to, but we should try to stay true to how things would actually happen when possible first.

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 28, 2024

We should always prefer symbols from the current module first, probably external first, then fall back to internal. If we do a search of all modules, we should prefer external symbols first and then internal, but only if they are unique.

So it's current module external, current module internal, other modules external, other modules internal, right? This is how SymbolContext.cpp implements it.
What if we use SymbolContext::FindBestGlobalDataSymbol and SymbolContext::FindBestFunction (a new function*) in IRExecutionUnit::FindInSymbols? This way we can simplify LoadAddressResolver by removing all this logic about tracking internal symbols.

* SymbolContext::FindBestFunction will use the same lookup order but search functions instead of symbols.

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 28, 2024

Nope, LoadAddressResolver has its own logic about eSymbolTypeUndefined that doesn't match FindBestGlobalDataSymbol. And it breaks TestWeakSymbols.TestWeakSymbolsInExpressions.test_weak_symbol_in_expr.
At this point, I think it would be wiser to return to the original patch where I just remove module module_sp from a copy of ModuleList.

@clayborg
Copy link
Collaborator

Don't let us hold this patch up if returning the the original patch that fixes things makes things work if you can't easily get the same functionality from this version of the patch. Will one part of this patch work with the new stuff and the other part not? If so, maybe we allow the place that works use the new functionality and just hard code the other one?

@Michael137
Copy link
Member

Michael137 commented Aug 28, 2024

What happens if we stop preferring the external symbols over internal ones in IRExecutionUnit::FindInSymbols? What tests break?

It looks like there are no failing tests.

We should always prefer symbols from the current module first, probably external first, then fall back to internal. If we do a search of all modules, we should prefer external symbols first and then internal, but only if they are unique. Just think about how a symbol would be resolved inside of a shared library vs how it would get accessed from outside of the shared library. Debuggers can break the rules when we need to, but we should try to stay true to how things would actually happen when possible first.

Just for my own understanding, could you elaborate on this? If we're doing a local module search and find an internal symbol (AFAIU, here an internal symbol means local in the sense of STB_LOCAL, not a weak definition) why would we still prefer the external symbol? As you said in your previous comment: "If you have an expression like my_global, you really want to look at the compile unit your expression is being run from, and if that compile unit has a global or static variable named my_global, we should pick that one. If there isn't a match in the current compile unit, then we should search in the current module. If it has one, that is most likely the global the user wants."

@jimingham
Copy link
Collaborator

jimingham commented Aug 28, 2024 via email

@clayborg
Copy link
Collaborator

I think Greg is saying that if we don't find a local symbol, and don't find the symbol in the global symbols of the current module, then we should search all the other modules, but in that case preferring external symbols to internal since those are the ones that are potentially visible from the CU we started from. Jim

On Aug 28, 2024, at 1:04 AM, Michael Buch @.***> wrote: What happens if we stop preferring the external symbols over internal ones in IRExecutionUnit::FindInSymbols? What tests break? It looks like there are no failing tests. We should always prefer symbols from the current module first, probably external first, then fall back to internal. If we do a search of all modules, we should prefer external symbols first and then internal, but only if they are unique. Just think about how a symbol would be resolved inside of a shared library vs how it would get accessed from outside of the shared library. Debuggers can break the rules when we need to, but we should try to stay true to how things would actually happen when possible first. Just for my own understanding, could you elaborate on this? If we're doing a local module search and find an internal symbol (AFAIU, here an internal symbol means local to the CU, not a weak definition) why would we still prefer the external symbol? As you said in your previous comment <#102835 (comment)>: "If you have an expression like my_global, you really want to look at the compile unit your expression is being run from, and if that compile unit has a global or static variable named my_global, we should pick that one. If there isn't a match in the current compile unit, then we should search in the current module. If it has one, that is most likely the global the user wants." — Reply to this email directly, view it on GitHub <#102835 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVW6JZ2OQ6EGACD72SYDZTWACNAVCNFSM6AAAAABMLEK6UGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJUGYYTINJXGA. You are receiving this because you are on a team that was mentioned.

yes. If we can do it, find something in the local compile unit first, if there are two symbols in the CU, then the public one, else the private. Fall back to the current module, first public, then private. And repeat with all modules, and if we find only 1 public symbol, use that, else fall back to a single private symbol. If there are multiple symbols on all modules, then we emit an error.

@DmT021
Copy link
Contributor Author

DmT021 commented Aug 28, 2024

Don't let us hold this patch up if returning the the original patch that fixes things makes things work if you can't easily get the same functionality from this version of the patch. Will one part of this patch work with the new stuff and the other part not? If so, maybe we allow the place that works use the new functionality and just hard code the other one?

The current version of the patch does archive the desired result (no redundant scan of the current module) and all the tests are green. But the issue is it's awfully complex. All this callback logic is hard to read and even harder to modify next time. Especially in SymbolContext.
I'd prefer a slightly repetitive approach. At each lookup function (in this case SymbolContext::FindBestGlobalDataSymbol and IRExecutionUnit::FindInSymbols):

  • check the current module
  • if no suitable result is found:
    • get a copy of ModuleList and remove current module from it
    • search through the copy.

But I don't insist; if you're ok with this implementation, I'm ok too.

@Michael137
Copy link
Member

Michael137 commented Aug 28, 2024

I think Greg is saying that if we don't find a local symbol, and don't find the symbol in the global symbols of the current module, then we should search all the other modules, but in that case preferring external symbols to internal since those are the ones that are potentially visible from the CU we started from.

yes. If we can do it, find something in the local compile unit first, if there are two symbols in the CU, then the public one, else the private.

I guess here lies my hangup. What is the distinction between local and private lldb_private::Symbols here? And why prefer the global over the private (if we're stopped in a context where the private symbol is visible according to language rules)?

@jimingham
Copy link
Collaborator

jimingham commented Aug 29, 2024 via email

@DmT021 DmT021 force-pushed the wp/redundant-symbol-lookups-main branch from c647b26 to 424b3b7 Compare September 23, 2024 22:34
@DmT021
Copy link
Contributor Author

DmT021 commented Sep 24, 2024

@clayborg @Michael137 I've reverted the changes here and rewrote this patch pretty much like it was originally presented with a slightly different order of operation. The only change is in the IRExecutionUnit::FindInSymbols function and the order of lookup is described in the comment here. If you don't mind let's merge this variant.
In my opinion, the variant with lambda is too complex for a little win and I'm afraid future changes in this logic will be even more complicated.

@DmT021
Copy link
Contributor Author

DmT021 commented Oct 3, 2024

ping?

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we're paying the cost of copying the entire list of module pointers (and removing an element) for one less FindFunctions/FindSymbolsWithNameAndType call. Probably fine? But it still feels a bit off having to do this (though as you demonstrated, baking it into the FindFunctions API has proven to be pretty hairy).

Why is it so expensive to perform consecutive lookups into the same module for the same name? Should that be perhaps where we should focus the performance investigation on? So users don't need to care about how many times we're doing the lookup?

But if others are ok with this tradeoff here, I don't want to further block this PR

sc.module_sp->FindFunctions(name, CompilerDeclContext(),
lldb::eFunctionNameTypeFull, function_options,
sc_list);
if (auto load_addr = resolver.Resolve(sc_list))
return *load_addr;
sc_list.Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a new block scope with a local sc_list instead of clearing the list each time? Might be more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered why it was calling Clear() - to avoid repetitive allocations of the internal vector in SymbolContextList.

@DmT021
Copy link
Contributor Author

DmT021 commented Oct 3, 2024

Why is it so expensive to perform consecutive lookups into the same module for the same name?
It depends on circumstances but basically, the worst-case scenario is when you:

  1. lookup a symbol in the context of a huge module
  2. the symbol is in another module
  3. the huge module appears before the other module in the ModuleList

For example, I have a project that actively links its dependencies statically when possible, so the total number of symbols in the main executable is enormous(order of 10^6). When I perform a lookup for a foreign symbol (let's say swift_retain) in the context of this module I'll always search through this huge module twice.
And this is probably quite common in practice. One will likely evaluate an expression while standing on a breakpoint in the main executable, and swift_retain will likely be looked up for any expression evaluation, and libswiftCore will appear after the main executable in the module list.

Should that be perhaps where we should focus the performance investigation on? So users don't need to care about how many times we're doing the lookup?

I was thinking about it and these two optimizations in particular:

  1. Swift symbols contain module names. So we probably can infer a pretty good guess about the module from the symbol itself.
  2. We probably can build an index of all the symbols in all the modules. But keeping this index in sync with the target's module list doesn't seem to be an easy task.

Both of these approaches are of the scope of this change though.

@DmT021 DmT021 force-pushed the wp/redundant-symbol-lookups-main branch from 424b3b7 to aad4d1c Compare October 3, 2024 14:12
Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion so far this seems like a good enough perf. improvement without having to revamp the FindFunctions API. So lgtm, thanks for the patience.

Please give it a couple of days before merging in case Greg (or someone else) have any other comments

@DmT021
Copy link
Contributor Author

DmT021 commented Oct 7, 2024

@Michael137 I don't have merge rights, can you merge this pls?

@augusto2112 augusto2112 merged commit d2457e6 into llvm:main Oct 7, 2024
7 checks passed
Copy link

github-actions bot commented Oct 7, 2024

@DmT021 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@DmT021
Copy link
Contributor Author

DmT021 commented Oct 7, 2024

@augusto2112 Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants