Skip to content

[Object] Provide operator< for ELFSymbolRef #89861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 29, 2024
Merged

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Apr 24, 2024

Normally, operator< accepting DataRefImpl is used when comparing
SymbolRef/ELFSymbolRef. However, it uses std::memcmp which interprets
DataRefImpl union as char string so that the result depends on host endianness.
For ELFSymbolRef a specialized operator< can be used instead to produce
consistent ordering regardless of endianness by comparing the symbol table index
and symbol index fields separately.

Normally, operator< accepting DataRefImpl is used when comparing
SymbolRef/ELFSymbolRef. However, it uses std::memcmp which interprets
DataRefImpl union as char string. For ELFSymbolRef it's known that it
uses the struct view of the union, therefore a specialized operator< can
be used instead.
@llvmbot
Copy link
Member

llvmbot commented Apr 24, 2024

@llvm/pr-subscribers-llvm-binary-utilities

Author: Amir Ayupov (aaupov)

Changes

Normally, operator< accepting DataRefImpl is used when comparing
SymbolRef/ELFSymbolRef. However, it uses std::memcmp which interprets
DataRefImpl union as char string. For ELFSymbolRef it's known that it
uses the struct view of the union, therefore a specialized operator< can
be used instead.


Full diff: https://github.com/llvm/llvm-project/pull/89861.diff

1 Files Affected:

  • (modified) llvm/include/llvm/Object/ELFObjectFile.h (+8)
diff --git a/llvm/include/llvm/Object/ELFObjectFile.h b/llvm/include/llvm/Object/ELFObjectFile.h
index 1d457be93741f2..8ec0e3fe15dc9d 100644
--- a/llvm/include/llvm/Object/ELFObjectFile.h
+++ b/llvm/include/llvm/Object/ELFObjectFile.h
@@ -199,6 +199,14 @@ class ELFSymbolRef : public SymbolRef {
   }
 };
 
+inline bool operator<(const ELFSymbolRef &A, const ELFSymbolRef &B) {
+  const DataRefImpl &DRIA = A.getRawDataRefImpl();
+  const DataRefImpl &DRIB = B.getRawDataRefImpl();
+  if (DRIA.d.a == DRIB.d.a)
+    return DRIA.d.b < DRIB.d.b;
+  return DRIA.d.a < DRIB.d.b;
+}
+
 class elf_symbol_iterator : public symbol_iterator {
 public:
   elf_symbol_iterator(const basic_symbol_iterator &B)

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use some simple unit tests in the ELFObjectFileTest.cppp file.

@jh7370
Copy link
Collaborator

jh7370 commented Apr 24, 2024

I've not really looked myself, but are there instances where you actually have two ELFSymbolRef objects that are compared (using <)? If not, this code is dead, I believe. Also, is there a reason you provide < but not ==? I note that DataRefImpl has both.

@aaupov aaupov marked this pull request as draft April 25, 2024 01:31
@aaupov
Copy link
Contributor Author

aaupov commented Apr 25, 2024

I've not really looked myself, but are there instances where you actually have two ELFSymbolRef objects that are compared (using <)? If not, this code is dead, I believe.

I'm going to use it in FILE symbol lookup using upper_bound on ELFSymbolRefs here, instead of using DRI.d.b (symbol index) directly: #89648

Also, is there a reason you provide < but not ==? I note that DataRefImpl has both.

DataRefImpl's operator== is good enough for comparison.

@aaupov aaupov marked this pull request as ready for review April 25, 2024 01:56
aaupov added a commit to aaupov/llvm-project that referenced this pull request Apr 25, 2024
Simplify lookups and comparisons using llvm#89861
@jh7370
Copy link
Collaborator

jh7370 commented Apr 25, 2024

Also, is there a reason you provide < but not ==? I note that DataRefImpl has both.

DataRefImpl's operator== is good enough for comparison.

Why is DataRefImpl's operator< not good enough when the operator== is?

Also, please try to avoid force pushes in active reviews, per the LLVM guidelines. If rebasing is necessary, consider merging main into your branch instead.

@aaupov
Copy link
Contributor Author

aaupov commented Apr 25, 2024

Also, is there a reason you provide < but not ==? I note that DataRefImpl has both.

DataRefImpl's operator== is good enough for comparison.

Why is DataRefImpl's operator< not good enough when the operator== is?

Sorry, I've made the commit message more clear. DataRefImpl's operator< uses std::memcmp whose result is endianness-dependent (as shown by added test), which is exactly the issue addressed by a custom operator< for ELFSymbolRefs.
operator== also uses std::memcmp but equality is invariant wrt endianness.

Also, please try to avoid force pushes in active reviews, per the LLVM guidelines. If rebasing is necessary, consider merging main into your branch instead.

Noted, thank you.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation about why this whole thing is important, the new commit message is certainly clearer. I think the last sentence of the description would be clearer saying:
"For ELFSymbolRef a specialized operator< can be used instead to produce consistent ordering regardless of endianness by comparing the symbol table index and symbol index fields separately."

One other thing: does DataRefImpl have an operator>? If so, you'll need that here too as otherwise two ELFSymbolRefs will not have inconsistent orderings based on which comparison operator is being used.

@aaupov
Copy link
Contributor Author

aaupov commented Apr 26, 2024

Thanks for the explanation about why this whole thing is important, the new commit message is certainly clearer. I think the last sentence of the description would be clearer saying: "For ELFSymbolRef a specialized operator< can be used instead to produce consistent ordering regardless of endianness by comparing the symbol table index and symbol index fields separately."

Thanks, updated.

One other thing: does DataRefImpl have an operator>? If so, you'll need that here too as otherwise two ELFSymbolRefs will not have inconsistent orderings based on which comparison operator is being used.

No, it doesn't:

inline bool operator==(const DataRefImpl &a, const DataRefImpl &b) {
// Check bitwise identical. This is the only legal way to compare a union w/o
// knowing which member is in use.
return std::memcmp(&a, &b, sizeof(DataRefImpl)) == 0;
}
inline bool operator!=(const DataRefImpl &a, const DataRefImpl &b) {
return !operator==(a, b);
}
inline bool operator<(const DataRefImpl &a, const DataRefImpl &b) {
// Check bitwise identical. This is the only legal way to compare a union w/o
// knowing which member is in use.
return std::memcmp(&a, &b, sizeof(DataRefImpl)) < 0;
}

@aaupov
Copy link
Contributor Author

aaupov commented Apr 29, 2024

@jh7370: thank you for extensive comments, the test is a lot more readable now. Hope it's good to go now :)

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one nit.

@aaupov aaupov merged commit df6d2fa into llvm:main Apr 29, 2024
@aaupov aaupov deleted the elfsymbolref branch April 29, 2024 16:07
aaupov added a commit that referenced this pull request Apr 29, 2024
Fragment matching relies on symbol names to identify and register split
function fragments. However, as split fragments are often local symbols,
name aliasing is possible. For such cases, use symbol table to resolve
ambiguities.

This requires the presence of FILE symbols in the input binary. As BOLT
requires non-stripped binary, this is a reasonable assumption. Note that
`strip -g` removes FILE symbols by default, but `--keep-file-symbols`
can be used to preserve them.

Depends on: #89861

Test Plan:
Updated X86/fragment-lite.s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants