-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[lld-macho] Category merger: handle addends when getting symbol at offset #91238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lld-macho @llvm/pr-subscribers-lld Author: None (alx32) ChangesCurrently the However, it does not deal with the reloc Full diff: https://github.com/llvm/llvm-project/pull/91238.diff 1 Files Affected:
diff --git a/lld/MachO/ObjC.cpp b/lld/MachO/ObjC.cpp
index 4760fffebe3b30..919829b36f06fd 100644
--- a/lld/MachO/ObjC.cpp
+++ b/lld/MachO/ObjC.cpp
@@ -455,6 +455,7 @@ class ObjcCategoryMerger {
Defined *emitCategoryName(const std::string &name, ObjFile *objFile);
void createSymbolReference(Defined *refFrom, const Symbol *refTo,
uint32_t offset, const Reloc &relocTemplate);
+ Defined *tryFindDefinedOnIsec(const InputSection *isec, uint32_t offset);
Symbol *tryGetSymbolAtIsecOffset(const ConcatInputSection *isec,
uint32_t offset);
Defined *tryGetDefinedAtIsecOffset(const ConcatInputSection *isec,
@@ -518,12 +519,32 @@ void ObjcCategoryMerger::collectSectionWriteInfoFromIsec(
Symbol *
ObjcCategoryMerger::tryGetSymbolAtIsecOffset(const ConcatInputSection *isec,
uint32_t offset) {
+ if (!isec)
+ return nullptr;
const Reloc *reloc = isec->getRelocAt(offset);
if (!reloc)
return nullptr;
- return reloc->referent.get<Symbol *>();
+ Symbol *sym = reloc->referent.get<Symbol *>();
+
+ if (reloc->addend) {
+ assert(isa<Defined>(sym) && "Expected defined for non-zero addend");
+ Defined *definedSym = cast<Defined>(sym);
+ sym = tryFindDefinedOnIsec(definedSym->isec(),
+ definedSym->value + reloc->addend);
+ }
+
+ return sym;
+}
+
+Defined *ObjcCategoryMerger::tryFindDefinedOnIsec(const InputSection *isec,
+ uint32_t offset) {
+ for (Defined *sym : isec->symbols)
+ if ((sym->value <= offset) && (sym->value + sym->size > offset))
+ return sym;
+
+ return nullptr;
}
Defined *
|
Defined *ObjcCategoryMerger::tryFindDefinedOnIsec(const InputSection *isec, | ||
uint32_t offset) { | ||
for (Defined *sym : isec->symbols) | ||
if ((sym->value <= offset) && (sym->value + sym->size > offset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we have an offset in the middle of symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not fully debug it but it does happen in very rare cases. I don't think it violates any rule, the reloc can refer to within a symbol, not necessarily to the start of one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to tighten the condition as possible for the expected case or value, instead of loosely accepting a valid case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. I can't think of a narrower case that would actually still work - we do need to find a symbol by a reloc into the middle of it (or any arbitrary position).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, I'd like to understand why we have an offset in the middle of symbol. If it's unexpected, we should fix the root-cause. If expected, we should document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will look into the cases where this happens and come back to this change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test, the addend comes from:
__CATEGORY__TtC11SimpleClass11SimpleClass_$_SimpleClass:
[...]
.quad _$s11SimpleClassAACMf+24 // <---- Addend = 24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function here seems similar to this. Perhaps, implementation could be shared or refactored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they might be different enough to remain separate because:
- In InputFiles it requires the offset to be to the start of the symbol:
// The offset should point at the exact address of a symbol (with no addend.)
Here the offset can be to the inside of the symbol - see here
.quad __DATA__TtC11SimpleClass11SimpleClass+2
We could remove this requirement and share them - but would have to remove the assert from the version in InputFiles.
- In InputFiles it relies on the symbols being ordered - which is not necessarily the case inside optimization passes. So here we have to do a linear scan. I don't think in practice this would be a restriction as the stage of this pass I don't think we run anything that reorders the symbols for these sections - but it's not a hard requirement to have them sorted as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the comment says that but the implementation seems to handle it for the case when it doesn't. Okay with the linear scan, but it would be a nice to make it faster if we know it's already sorted (probably worth checking). If it is sorted, it's worth adding an assert in that case as well to make sure we do do not invalidate correctness behavior.
4798b9c
to
fb979de
Compare
Defined *ObjcCategoryMerger::tryFindDefinedOnIsec(const InputSection *isec, | ||
uint32_t offset) { | ||
for (Defined *sym : isec->symbols) | ||
if ((sym->value <= offset) && (sym->value + sym->size > offset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function here seems similar to this. Perhaps, implementation could be shared or refactored?
Defined *ObjcCategoryMerger::tryFindDefinedOnIsec(const InputSection *isec, | ||
uint32_t offset) { | ||
for (Defined *sym : isec->symbols) | ||
if ((sym->value <= offset) && (sym->value + sym->size > offset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the comment says that but the implementation seems to handle it for the case when it doesn't. Okay with the linear scan, but it would be a nice to make it faster if we know it's already sorted (probably worth checking). If it is sorted, it's worth adding an assert in that case as well to make sure we do do not invalidate correctness behavior.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/783 Here is the relevant piece of the build log for the reference:
|
…fset (llvm#91238) Currently the `tryFindDefinedOnIsec` takes in an `InputSection` and an `offset` and is supposed to return the target symbol that is referenced on that `InputSection` at the given offset. However, it does not deal with the reloc `addend` and might return the incorrect symbol. Here we add support for handling the reloc's `addend`.
…fset (llvm#91238) Currently the `tryFindDefinedOnIsec` takes in an `InputSection` and an `offset` and is supposed to return the target symbol that is referenced on that `InputSection` at the given offset. However, it does not deal with the reloc `addend` and might return the incorrect symbol. Here we add support for handling the reloc's `addend`.
Currently the
tryFindDefinedOnIsec
takes in anInputSection
and anoffset
and is supposed to return the target symbol that is referenced on thatInputSection
at the given offset.However, it does not deal with the reloc
addend
and might return the incorrect symbol.Here we add support for handling the reloc's
addend
.