-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT] Use symbol table info in registerFragment #89648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fragment matching relies on symbol names to identify and register split function fragments. However, as split fragments are local symbols, name aliasing is possible. For such cases, use symbol table to resolve ambiguities: - find FILE symbol the fragment belongs to, - iterate over local symbols belonging to the same file and check if parent symbol is present, - if a local parent symbol is found, use it, otherwise use global parent symbol. This requires the presence of FILE symbols in the input binary. As BOLT requires non-stripped binary, this is a reasonable assumption. Note however that `strip -g` removes FILE symbols by default, but `--keep-file-symbols` can be used to preserve these. We opted to resolve ambiguous cases one by one because there are typically very few such cases and symbol table parsing is relatively expensive: python3.8.6: Total symbols: 17522 Function symbols: 7877 (45% total) Fragments: 2551 (15% total) Fragments with ambiguous parent: 31 (0.1% total) Large binary: Total symbols: 3032180 Function symbols: 1637209 (54% total) Fragments: 43077 (0.1% total) Fragments with ambiguous parent: 180 (0.006% total) Test Plan: Updated X86/fragment-lite.s
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -490,6 +490,9 @@ class RewriteInstance { | |||
/// Store all non-zero symbols in this map for a quick address lookup. | |||
std::map<uint64_t, llvm::object::SymbolRef> FileSymRefs; | |||
|
|||
/// FILE symbols used for disambiguating split function parents. | |||
std::vector<object::DataRefImpl> FileSymbols; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using DataRefImpl
instead of SymbolRef
/ELFSymbolRef
to save space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. SymbolRef contains DataRefImpl and an object pointer which in our case is always InputFile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If memory is a concern, you should free it once done processing. I find the savings questionable to be worth it over readability, especially when you compare symbols.
@@ -490,6 +490,9 @@ class RewriteInstance { | |||
/// Store all non-zero symbols in this map for a quick address lookup. | |||
std::map<uint64_t, llvm::object::SymbolRef> FileSymRefs; | |||
|
|||
/// FILE symbols used for disambiguating split function parents. | |||
std::vector<object::DataRefImpl> FileSymbols; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If memory is a concern, you should free it once done processing. I find the savings questionable to be worth it over readability, especially when you compare symbols.
Follow-up to llvm#89648. Emit a FILE symbol to distinguish BOLT-created local symbols for split functions, to prevent polluting the namespace of the last file symbol. Test Plan: Updated cdsplit-symbol-names.s
Simplify lookups and comparisons using llvm#89861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
bolt/lib/Rewrite/RewriteInstance.cpp
Outdated
break; | ||
} | ||
} | ||
if (FirstGlobal == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have a symbol table without globals? Even if it’s unlikely but still a valid file, we shouldn’t fail to process it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-read the documentation carefully and it says that sh_info has the value one larger than the last local symbol index: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-94076.html#scrolltoc (Table 12-9 ELF sh_link and sh_info Interpretation).
So I'm dropping the check and exit.
Use known order of BOLT split function symbols: fragment symbols immediately precede the parent fragment symbol. Depends On: #89648 Test Plan: Added register-fragments-bolt-symbols.s
Fragment matching relies on symbol names to identify and register split
function fragments. However, as split fragments are often local symbols,
name aliasing is possible. For such cases, use symbol table to resolve
ambiguities.
This requires the presence of FILE symbols in the input binary. As BOLT
requires non-stripped binary, this is a reasonable assumption. Note that
strip -g
removes FILE symbols by default, but--keep-file-symbols
can be used to preserve them.
Depends on: #89861
Test Plan:
Updated X86/fragment-lite.s