-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Remove redundant symbol lookups in IRExecutionUnit::FindInSymbols #102835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@augusto2112 Please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall this looks good! Just some small things to fix.
Could you run ninja check-lldb
from your build folder to make sure this change doesn't break anything?
if (auto load_addr = resolver.Resolve(sc_list)) | ||
return *load_addr; | ||
} | ||
|
||
if (sc.target_sp) { | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you removed the check here, you can remove the parenthesis as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just wanted to emphasize that these two lookups are similar, but intentionally separeted.
Already did. All tests are ok. |
@DmT021 I think you can update this patch and set it ready for review then :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me and the change looks good. Feel free to move this out of draft mode :)
c417f45
to
45c91e4
Compare
@llvm/pr-subscribers-backend-hexagon @llvm/pr-subscribers-lldb Author: Dmitrii Galimzianov (DmT021) ChangesWhen we search for a symbol, we first check if it is in the module_sp of the current SymbolContext, and if not, we check in the target's modules. However, the target's ModuleList also includes the already checked module, which leads to a redundant search in it. Full diff: https://github.com/llvm/llvm-project/pull/102835.diff 1 Files Affected:
diff --git a/lldb/source/Expression/IRExecutionUnit.cpp b/lldb/source/Expression/IRExecutionUnit.cpp
index f220704423627d..aaf9cc3cfc9a23 100644
--- a/lldb/source/Expression/IRExecutionUnit.cpp
+++ b/lldb/source/Expression/IRExecutionUnit.cpp
@@ -785,6 +785,10 @@ IRExecutionUnit::FindInSymbols(const std::vector<ConstString> &names,
return LLDB_INVALID_ADDRESS;
}
+ ModuleList images = target->GetImages();
+ // We'll process module_sp separately, before the other modules.
+ images.Remove(sc.module_sp);
+
LoadAddressResolver resolver(target, symbol_was_missing_weak);
ModuleFunctionSearchOptions function_options;
@@ -799,23 +803,22 @@ IRExecutionUnit::FindInSymbols(const std::vector<ConstString> &names,
sc_list);
if (auto load_addr = resolver.Resolve(sc_list))
return *load_addr;
- }
- if (sc.target_sp) {
- SymbolContextList sc_list;
- sc.target_sp->GetImages().FindFunctions(name, lldb::eFunctionNameTypeFull,
- function_options, sc_list);
+ sc.module_sp->FindSymbolsWithNameAndType(name, lldb::eSymbolTypeAny,
+ sc_list);
if (auto load_addr = resolver.Resolve(sc_list))
return *load_addr;
}
- if (sc.target_sp) {
- SymbolContextList sc_list;
- sc.target_sp->GetImages().FindSymbolsWithNameAndType(
- name, lldb::eSymbolTypeAny, sc_list);
- if (auto load_addr = resolver.Resolve(sc_list))
- return *load_addr;
- }
+ SymbolContextList sc_list;
+ images.FindFunctions(name, lldb::eFunctionNameTypeFull, function_options,
+ sc_list);
+ if (auto load_addr = resolver.Resolve(sc_list))
+ return *load_addr;
+
+ images.FindSymbolsWithNameAndType(name, lldb::eSymbolTypeAny, sc_list);
+ if (auto load_addr = resolver.Resolve(sc_list))
+ return *load_addr;
lldb::addr_t best_internal_load_address =
resolver.GetBestInternalLoadAddress();
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, makes sense to me. Out of curiosity, could you provide some context around where/how this came up?
Ideally we would have versions of FindFunctions
/FindSymbolsWithNameAndType
that take a ModuleSP
hint (aka where to search first). We already have such an API for FindTypes
(which got introduced with the TypeQuery
rework). We just never added one for the other FindXXX
APIs because those weren't reworked to use TypeQuery
.
Not sure it's worth adding that search_first
parameter to all overloads of FindFunctions
, but maybe that wouldn't look as awful as I think it might?
I just noticed this by accident in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this does solve the problem, it means other clients that want to do the same thing will need to create this same kind of code.
I would propose we modify ModuleList::FindSymbolsWithNameAndType(...)
to take an extra parameter SymbolContext &sc
which indicates the context from which we are doing the search:
void ModuleList::FindSymbolsWithNameAndType(
ConstString name,
lldb::SymbolType symbol_type,
const SymbolContext &sc, /// <-- The symbol context from which to perform the lookup
SymbolContextList &sc_list) const;
Then the code inside will check if "sc" has a module, and if so, prefer that module first since someone is performing the search from somewhere (like an expression has a symbol context of the location in the current stack frame).
Then we don't duplicate this kind of code everywhere.
We need these SymbolContext &sc
in a lot of searches:
- search for types for an expression so we can prefer types from the current function, then compile unit, then module, and all modules.
- search for functions, for same reason
I agree that passing some sort of "hint" object to prioritize the search would be a more complete solution, but since this is @DmT021's first contribution to LLVM I think we should merge this change if there are no concerns with the current implementation, and if @DmT021 wants to expand on a more general solution for the @clayborg I can see how a module could be used as a hint, but I'm not sure how we'd prioritize lookups using functions or compile unit. The code in |
It isn't required, but it really isn't any more work to move the functionality so it can be re-used. I am fine either way, but would ok it right away if we just moved the functionality.
For symbol lookups, we would only check the Module. If there is not module, we would just search through the target's image list. If there was a module, then we search it first, and avoid that module if we search the module list just by checking the pointer of the module when iterating over the target's module list. The full symbol context isn't required, but it would be nice to set the precedent that others can duplicate when doing lookups of other kinds. Type lookups are the ones that can benefit from having a SymbolContext to use as the search context. If we type in an expression that mentions a typename, if would be good to start looking in the current function, then fall back to the compile unit, then fall back to the module, and then fall back to all modules. That would help find the best type when searching, and I believe we do have some code that does something similar, but it is done after getting all of the results back. |
45c91e4
to
e20d83c
Compare
@clayborg Please check if I got you right. I moved this logic into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying this, and yes, this is exactly what I was thinking!
After seeing all of the code changes that this created, I suggested in ModuleList.h that we add a the symbol context as a default parameter to the functions as const SymbolContext *sc = nullptr
. If we do this, most of the changes in this diff go away and it will be small again. Also if the symbol context comes in a as a pointer, users can know that this parameter is optional and it is ok for it to be nullptr
, where it might not be clear to users that they can pass in a default constructed SymbolContext.
Thanks for the changes, this improves this patch quite a bit and I think with a few more modifications this will be a small patch again.
Thanks again! I look forward to seeing the next round of changes.
lldb/include/lldb/Core/ModuleList.h
Outdated
void FindFunctions(ConstString name, lldb::FunctionNameType name_type_mask, | ||
const ModuleFunctionSearchOptions &options, | ||
SymbolContextList &sc_list) const; | ||
const SymbolContext &sc, SymbolContextList &sc_list) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to reduce the number of changes in the codebase if we add the symbol context at the end and make it a pointer. If the pointer is NULL, then no context matching will occur, else it will try to use the sc->module_sp
. It will help reduce the changes in our codebase and having a pointer might help indicate that the symbol context is optional.
Maybe we do:
void FindFunctions(ConstString name, lldb::FunctionNameType name_type_mask,
const ModuleFunctionSearchOptions &options,
SymbolContextList &sc_list, const SymbolContext *sc = nullptr) const;
lldb/include/lldb/Core/ModuleList.h
Outdated
@@ -295,7 +295,7 @@ class ModuleList { | |||
/// \see Module::FindFunctions () | |||
void FindFunctions(const RegularExpression &name, | |||
const ModuleFunctionSearchOptions &options, | |||
SymbolContextList &sc_list); | |||
const SymbolContext &sc, SymbolContextList &sc_list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto from above
lldb/include/lldb/Core/ModuleList.h
Outdated
void FindSymbolsWithNameAndType(ConstString name, | ||
lldb::SymbolType symbol_type, | ||
const SymbolContext &sc, | ||
SymbolContextList &sc_list) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
lldb/source/API/SBTarget.cpp
Outdated
target_sp->GetImages().FindFunctions( | ||
ConstString(name), mask, function_options, SymbolContext(), *sb_sc_list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the default parameter as suggested above, all changes in this file go away.
GetTarget().GetImages().FindFunctions( | ||
name, eFunctionNameTypeAuto, function_options, SymbolContext(), sc_list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the default parameter as suggested above, all changes in this file go away.
We should also document in the headerdoc this new
|
lldb/source/Symbol/SymbolContext.cpp
Outdated
@@ -895,32 +895,14 @@ const Symbol *SymbolContext::FindBestGlobalDataSymbol(ConstString name, | |||
} | |||
}; | |||
|
|||
if (module) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay
lldb/source/Core/ModuleList.cpp
Outdated
for (const ModuleSP &module_sp : m_modules) | ||
module_sp->FindSymbolsWithNameAndType(name, symbol_type, sc_list); | ||
if (module_sp != sc.module_sp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely minor nit, but can we be consistent with how we skip over the module hint?
I.e., either always do:
if (module_sp != sc.module_sp)
...
or
if (module_sp == sc.module_sp)
continue;
Which one to use I'm not too bothered about. Just makes it easer to read if this is consistent in all the FindSymbolsWithNameAndType
/FindFunctions
APIs
lldb/include/lldb/Core/ModuleList.h
Outdated
@@ -285,7 +285,7 @@ class ModuleList { | |||
/// \see Module::FindFunctions () | |||
void FindFunctions(ConstString name, lldb::FunctionNameType name_type_mask, | |||
const ModuleFunctionSearchOptions &options, | |||
SymbolContextList &sc_list) const; | |||
const SymbolContext &sc, SymbolContextList &sc_list) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we call sc
, search_first
instead, or something more self-documenting like that? (which is what FindTypes
also does).
And as Greg pointed out, could we add documentation to the function describing what the parameter does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we call sc, search_first instead, or something more self-documenting like that? (which is what FindTypes also does).
The word "hint" came up a couple times in the discussion as well. Maybe search_hint
?
could we add documentation to the function describing what the parameter does?
Sure.
Oh, wait a sec. I actually changed the behavior of the
where the result of |
If that's the case, personally I think that it would make more sense if |
Can we make it configurable behind the |
I think it'd be fine to return early if the type if found with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't thoroughly reviewed the latest callback-based approach but wonder if it is just more straightforward to keep the original plan of adding asearch_hint
to the FindFunctions
APIs, and add a flag to ModuleFunctionSearchOptions
that indicates whether we're looking for a single function or multiple. And if we're looking for a single function, short-circuit if we find one. Thanks for working through our suggestions so far!
The tricky part is in the post-processing. IRExecutionUnit::FindInSymbols wants an external symbol and falls back to an internal one after all the modules are scanned. SymbolContext::FindBestGlobalDataSymbol prefers an internal symbol if it is found in the hinted module. |
I would still prefer it if we just added a In the future we could consider consolidating the I'll let @clayborg @augusto2112 chime in here |
I agree with Michael, I think just adding the search hint and clearly documenting it would already be a great improvement and would make implementing this patch a lot simpler. |
I'm not sure I understand because this will break the logic that is currently in place. Many nuances in post-processing wouldn't be possible to implement with a simple short-circuit. |
Sorry, just getting back to this. Got lost in my review queue. Ok thanks for pointing this out. I didn't realize What happens if we stop preferring the external symbols over internal ones in (side-note, |
It looks like there are no failing tests. |
We should always prefer symbols from the current module first, probably external first, then fall back to internal. If we do a search of all modules, we should prefer external symbols first and then internal, but only if they are unique. Just think about how a symbol would be resolved inside of a shared library vs how it would get accessed from outside of the shared library. Debuggers can break the rules when we need to, but we should try to stay true to how things would actually happen when possible first. |
So it's * |
Nope, |
Don't let us hold this patch up if returning the the original patch that fixes things makes things work if you can't easily get the same functionality from this version of the patch. Will one part of this patch work with the new stuff and the other part not? If so, maybe we allow the place that works use the new functionality and just hard code the other one? |
Just for my own understanding, could you elaborate on this? If we're doing a local module search and find an internal symbol (AFAIU, here an internal symbol means local in the sense of |
I think Greg is saying that if we don't find a local symbol, and don't find the symbol in the global symbols of the current module, then we should search all the other modules, but in that case preferring external symbols to internal since those are the ones that are potentially visible from the CU we started from.
Jim
… On Aug 28, 2024, at 1:04 AM, Michael Buch ***@***.***> wrote:
What happens if we stop preferring the external symbols over internal ones in IRExecutionUnit::FindInSymbols? What tests break?
It looks like there are no failing tests.
We should always prefer symbols from the current module first, probably external first, then fall back to internal. If we do a search of all modules, we should prefer external symbols first and then internal, but only if they are unique. Just think about how a symbol would be resolved inside of a shared library vs how it would get accessed from outside of the shared library. Debuggers can break the rules when we need to, but we should try to stay true to how things would actually happen when possible first.
Just for my own understanding, could you elaborate on this? If we're doing a local module search and find an internal symbol (AFAIU, here an internal symbol means local to the CU, not a weak definition) why would we still prefer the external symbol? As you said in your previous comment <#102835 (comment)>: "If you have an expression like my_global, you really want to look at the compile unit your expression is being run from, and if that compile unit has a global or static variable named my_global, we should pick that one. If there isn't a match in the current compile unit, then we should search in the current module. If it has one, that is most likely the global the user wants."
—
Reply to this email directly, view it on GitHub <#102835 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW6JZ2OQ6EGACD72SYDZTWACNAVCNFSM6AAAAABMLEK6UGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJUGYYTINJXGA>.
You are receiving this because you are on a team that was mentioned.
|
yes. If we can do it, find something in the local compile unit first, if there are two symbols in the CU, then the public one, else the private. Fall back to the current module, first public, then private. And repeat with all modules, and if we find only 1 public symbol, use that, else fall back to a single private symbol. If there are multiple symbols on all modules, then we emit an error. |
The current version of the patch does archive the desired result (no redundant scan of the current module) and all the tests are green. But the issue is it's awfully complex. All this callback logic is hard to read and even harder to modify next time. Especially in
But I don't insist; if you're ok with this implementation, I'm ok too. |
I guess here lies my hangup. What is the distinction between local and private |
On Aug 28, 2024, at 3:43 PM, Michael Buch ***@***.***> wrote:
yes. If we can do it, find something in the local compile unit first, if there are two symbols in the CU, then the public one, else the private.
I guess here lies my hangup. What is the distinction between local and private lldb_private::Symbols here? And why prefer the global over the private (if we're stopped in a context where the private symbol is visible according to language rules)?
The first goal of the expression parser is to emulate the symbol lookups that the compiler would have done were you compiling code in this context. So if you can resolve a symbol name to one that the compiler could see, then you should stop there - you have the right answer. So for instance if there is a local with that name, we should look no further. If there is a static in the CU you are stopped in with that name, we should use that and stop looking.
The issue comes when we can't find a symbol that would be visible according to the language rules. Do we just give up? That isn't very helpful because sometimes you really do want to access symbols beyond what those that can be seen in that CU. So we made up some heuristics that we can use to find the best candidate.
If we search the entire symbol world and there's only one instance of a matching symbol to be most helpful we should use that one regardless of visibility. So we could make our heuristic be "search the rest of the world for the symbol, and if we find only one instance, use that, otherwise give up".
But that's also not very helpful, so we added some more rules. For instance, an exported symbol in some other library might have been visible in our CU's context (we don't know because we can't tell how the binary was linked, but because it's an exported symbol, it could have been). OTOH, a non-exported symbol in some other library could never have been visible to it. So it makes sense to resolve duplicate symbols in favor of the exported symbol.
The resolution of your hangup is that we're talking here about the "failed to find an obviously correct symbol match" fallbacks. If there's an obviously correct one according to the language rules, we definitely should not be using these fallbacks.
Jim
… —
Reply to this email directly, view it on GitHub <#102835 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW4XZ7R6TDUFHIDACITZTZHANAVCNFSM6AAAAABMLEK6UGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJWGM3DINZXG4>.
You are receiving this because you are on a team that was mentioned.
|
c647b26
to
424b3b7
Compare
@clayborg @Michael137 I've reverted the changes here and rewrote this patch pretty much like it was originally presented with a slightly different order of operation. The only change is in the |
ping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we're paying the cost of copying the entire list of module pointers (and removing an element) for one less FindFunctions
/FindSymbolsWithNameAndType
call. Probably fine? But it still feels a bit off having to do this (though as you demonstrated, baking it into the FindFunctions
API has proven to be pretty hairy).
Why is it so expensive to perform consecutive lookups into the same module for the same name? Should that be perhaps where we should focus the performance investigation on? So users don't need to care about how many times we're doing the lookup?
But if others are ok with this tradeoff here, I don't want to further block this PR
sc.module_sp->FindFunctions(name, CompilerDeclContext(), | ||
lldb::eFunctionNameTypeFull, function_options, | ||
sc_list); | ||
if (auto load_addr = resolver.Resolve(sc_list)) | ||
return *load_addr; | ||
sc_list.Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create a new block scope with a local sc_list
instead of clearing the list each time? Might be more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered why it was calling Clear()
- to avoid repetitive allocations of the internal vector in SymbolContextList
.
For example, I have a project that actively links its dependencies statically when possible, so the total number of symbols in the main executable is enormous(order of 10^6). When I perform a lookup for a foreign symbol (let's say
I was thinking about it and these two optimizations in particular:
Both of these approaches are of the scope of this change though. |
424b3b7
to
aad4d1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion so far this seems like a good enough perf. improvement without having to revamp the FindFunctions
API. So lgtm, thanks for the patience.
Please give it a couple of days before merging in case Greg (or someone else) have any other comments
@Michael137 I don't have merge rights, can you merge this pls? |
@DmT021 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
@augusto2112 Thank you! |
When we search for a symbol, we first check if it is in the module_sp of the current SymbolContext, and if not, we check in the target's modules. However, the target's ModuleList also includes the already checked module, which leads to a redundant search in it.