Skip to content

[BOLT] Use symbol table info in registerFragment #89648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 29, 2024

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Apr 22, 2024

Fragment matching relies on symbol names to identify and register split
function fragments. However, as split fragments are often local symbols,
name aliasing is possible. For such cases, use symbol table to resolve
ambiguities.

This requires the presence of FILE symbols in the input binary. As BOLT
requires non-stripped binary, this is a reasonable assumption. Note that
strip -g removes FILE symbols by default, but --keep-file-symbols
can be used to preserve them.

Depends on: #89861

Test Plan:
Updated X86/fragment-lite.s

Fragment matching relies on symbol names to identify and register split
function fragments. However, as split fragments are local symbols,
name aliasing is possible. For such cases, use symbol table to resolve
ambiguities:
- find FILE symbol the fragment belongs to,
- iterate over local symbols belonging to the same file and check if
  parent symbol is present,
- if a local parent symbol is found, use it, otherwise use global
  parent symbol.

This requires the presence of FILE symbols in the input binary. As BOLT
requires non-stripped binary, this is a reasonable assumption. Note
however that `strip -g` removes FILE symbols by default, but
`--keep-file-symbols` can be used to preserve these.

We opted to resolve ambiguous cases one by one because there are
typically very few such cases and symbol table parsing is relatively
expensive:

python3.8.6:
Total symbols: 17522
Function symbols: 7877 (45% total)
Fragments: 2551 (15% total)
Fragments with ambiguous parent: 31 (0.1% total)

Large binary:
Total symbols: 3032180
Function symbols: 1637209 (54% total)
Fragments: 43077 (0.1% total)
Fragments with ambiguous parent: 180 (0.006% total)

Test Plan:
Updated X86/fragment-lite.s
Copy link

github-actions bot commented Apr 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -490,6 +490,9 @@ class RewriteInstance {
/// Store all non-zero symbols in this map for a quick address lookup.
std::map<uint64_t, llvm::object::SymbolRef> FileSymRefs;

/// FILE symbols used for disambiguating split function parents.
std::vector<object::DataRefImpl> FileSymbols;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using DataRefImpl instead of SymbolRef/ELFSymbolRef to save space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. SymbolRef contains DataRefImpl and an object pointer which in our case is always InputFile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memory is a concern, you should free it once done processing. I find the savings questionable to be worth it over readability, especially when you compare symbols.

@@ -490,6 +490,9 @@ class RewriteInstance {
/// Store all non-zero symbols in this map for a quick address lookup.
std::map<uint64_t, llvm::object::SymbolRef> FileSymRefs;

/// FILE symbols used for disambiguating split function parents.
std::vector<object::DataRefImpl> FileSymbols;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memory is a concern, you should free it once done processing. I find the savings questionable to be worth it over readability, especially when you compare symbols.

aaupov added a commit to aaupov/llvm-project that referenced this pull request Apr 23, 2024
Follow-up to llvm#89648.
Emit a FILE symbol to distinguish BOLT-created local symbols for split
functions, to prevent polluting the namespace of the last file symbol.

Test Plan: Updated cdsplit-symbol-names.s
@aaupov
Copy link
Contributor Author

aaupov commented Apr 25, 2024

@maksfb: I've addressed the existing concerns – does that look good for you? Note we now depend on #89861 for ELFSymbolRef comparisons. We can add heuristics based on bolt_pseudo.o to make fragment matching even faster in BOLTed binary case as follow-up diffs.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

break;
}
}
if (FirstGlobal == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have a symbol table without globals? Even if it’s unlikely but still a valid file, we shouldn’t fail to process it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-read the documentation carefully and it says that sh_info has the value one larger than the last local symbol index: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-94076.html#scrolltoc (Table 12-9 ELF sh_link and sh_info Interpretation).
So I'm dropping the check and exit.

@aaupov aaupov merged commit a1e9608 into llvm:main Apr 29, 2024
aaupov added a commit that referenced this pull request Apr 29, 2024
Use known order of BOLT split function symbols: fragment symbols
immediately precede the parent fragment symbol.

Depends On: #89648

Test Plan: Added register-fragments-bolt-symbols.s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants