Skip to content

[lld][ELF] Merge equivalent symbols found during ICF #139493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pranavk
Copy link
Contributor

@pranavk pranavk commented May 12, 2025

This reapplies #134342 which was reverted in fd3fecf

Original description:
Fixes a correctness issue for AArch64 when ADRP and LDR instructions are outlined in separate sections and sections are fed to ICF for deduplication.

See test case (based on #129122) for details. All rodata.* sections are folded into a single section with ICF. This leads to all f2_* function sections getting folded into one (as their relocation target symbols g* belong to .rodata.g* sections that have already been folded into one). Since relocations still refer original g* symbols, we end up creating duplicate GOT entry for all such symbols. This PR addresses that by tracking such folded symbols and create one GOT entry for all such symbols.

Fixes #129122

In addition to the original fix, this reland also fixes following issues:

  1. When two relocations have non-zero addends, we must prevent symbol folding.
  2. It takes care of non-globals (original PR in [lld][ICF] Prevent merging two sections when they point to non-globals #136641).

@llvmbot
Copy link
Member

llvmbot commented May 12, 2025

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: Pranav Kant (pranavk)

Changes

Fixes scenario reported in #134342 (comment)


Full diff: https://github.com/llvm/llvm-project/pull/139493.diff

2 Files Affected:

  • (modified) lld/ELF/ICF.cpp (+15-14)
  • (added) lld/test/ELF/icf-addend.s (+33)
diff --git a/lld/ELF/ICF.cpp b/lld/ELF/ICF.cpp
index 1882116d4d5f0..797d0968da6c2 100644
--- a/lld/ELF/ICF.cpp
+++ b/lld/ELF/ICF.cpp
@@ -334,26 +334,27 @@ bool ICF<ELFT>::equalsConstant(const InputSection *a, const InputSection *b) {
              : constantEq(a, ra.relas, b, rb.relas);
 }
 
-template <class RelTy>
-static SmallVector<Symbol *> getReloc(const InputSection *sec,
+template <class ELFT, class RelTy>
+static SmallVector<std::pair<Symbol *, uint64_t>> getReloc(const InputSection *sec,
                                       Relocs<RelTy> relocs) {
-  SmallVector<Symbol *> syms;
+  SmallVector<std::pair<Symbol *, uint64_t>> syms;
   for (auto ri = relocs.begin(), re = relocs.end(); ri != re; ++ri) {
     Symbol &sym = sec->file->getRelocTargetSym(*ri);
-    syms.push_back(&sym);
+    uint64_t addend = getAddend<ELFT>(*ri);
+    syms.push_back(std::make_pair(&sym, addend));
   }
   return syms;
 }
 
 template <class ELFT>
-static SmallVector<Symbol *> getRelocTargetSyms(const InputSection *sec) {
+static SmallVector<std::pair<Symbol *, uint64_t>> getRelocTargets(const InputSection *sec) {
   const RelsOrRelas<ELFT> rel = sec->template relsOrRelas<ELFT>();
   if (rel.areRelocsCrel())
-    return getReloc(sec, rel.crels);
+    return getReloc<ELFT>(sec, rel.crels);
   if (rel.areRelocsRel())
-    return getReloc(sec, rel.rels);
+    return getReloc<ELFT>(sec, rel.rels);
 
-  return getReloc(sec, rel.relas);
+  return getReloc<ELFT>(sec, rel.relas);
 }
 
 // Compare two lists of relocations. Returns true if all pairs of
@@ -568,19 +569,19 @@ template <class ELFT> void ICF<ELFT>::run() {
     if (end - begin == 1)
       return;
     print() << "selected section " << sections[begin];
-    SmallVector<Symbol *> syms = getRelocTargetSyms<ELFT>(sections[begin]);
+    SmallVector<std::pair<Symbol *, uint64_t>> syms = getRelocTargets<ELFT>(sections[begin]);
     for (size_t i = begin + 1; i < end; ++i) {
       print() << "  removing identical section " << sections[i];
       sections[begin]->replace(sections[i]);
-      SmallVector<Symbol *> replacedSyms =
-          getRelocTargetSyms<ELFT>(sections[i]);
+      SmallVector<std::pair<Symbol *, uint64_t>> replacedSyms =
+          getRelocTargets<ELFT>(sections[i]);
       assert(syms.size() == replacedSyms.size() &&
              "Should have same number of syms!");
       for (size_t i = 0; i < syms.size(); i++) {
-        if (syms[i] == replacedSyms[i] || !syms[i]->isGlobal() ||
-            !replacedSyms[i]->isGlobal())
+        if (syms[i].first == replacedSyms[i].first || !syms[i].first->isGlobal() ||
+            !replacedSyms[i].first->isGlobal() || syms[i].second != replacedSyms[i].second)
           continue;
-        symbolEquivalence.unionSets(syms[i], replacedSyms[i]);
+        symbolEquivalence.unionSets(syms[i].first, replacedSyms[i].first);
       }
 
       // At this point we know sections merged are fully identical and hence
diff --git a/lld/test/ELF/icf-addend.s b/lld/test/ELF/icf-addend.s
new file mode 100644
index 0000000000000..0023bbf2ff421
--- /dev/null
+++ b/lld/test/ELF/icf-addend.s
@@ -0,0 +1,33 @@
+# REQUIRES: x86
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
+# RUN: ld.lld %t.o -o /dev/null --icf=all --print-icf-sections | FileCheck %s
+
+# Check that ICF doesn't fold different symbols when functions referencing them
+# can be folded because they are pointing to the same address.
+
+# CHECK: selected section {{.*}}:(.text.f1)
+# CHECK:   removing identical section {{.*}}:(.text.f2)
+# CHECK-NOT: redirecting 'y' in symtab to 'x'
+# CHECK-NOT: redirecting 'y' to 'x'
+
+.globl x, y
+
+.section .rodata,"a",@progbits
+x:
+.long 11
+y:
+.long 12
+
+.section .text.f1,"ax",@progbits
+f1:
+movq x+4(%rip), %rax 
+
+.section .text.f2,"ax",@progbits
+f2:
+movq y(%rip), %rax
+
+.section .text._start,"ax",@progbits
+.globl _start
+_start:
+call f1
+call f2

Copy link

github-actions bot commented May 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@pranavk pranavk changed the title [DO NOT MERGE][WIP] -- [lld][ICF] Don't merge symbols with different addends [lld][ICF] Don't merge symbols with different addends May 12, 2025
@pranavk pranavk requested review from jyknight and MaskRay May 12, 2025 14:33
@jyknight
Copy link
Member

As per previous discussion, the symbol folding needs to be done in all circumstances where section folding considers the symbols as equivalent -- at least for the special relocation types.

So (at least for those kinds of relocs) I think we must forbid the fold of sections with relocations having different offsets in the first place. (Rather than just avoid folding the symbols, as this patch does).

lld/ELF/ICF.cpp Outdated
// We should not merge sections in the first place if their symbols
// cannot be merged together.
bool canMergeSymbols = addA == addB;
if (da->value + addA == db->value + addB && canMergeSymbols)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can simplify this condition too but i think this is more readable in its current form.

@pranavk
Copy link
Contributor Author

pranavk commented May 13, 2025

Original lld commit was reverted. So I have reapplied that commit in this PR and committed changes to fix the problem in a different commit. You can just look at latest commit to see those changes to make review simple.

This commit basically stops section folding when addends are different which would automatically stop symbol folding. Yes, it means that we would be folding less sections now. I am hoping we wouldn't see too much of this pattern. So it's not going to be noticeable regression. One way to fix this would be to only stop section folding when the relocation is non-trivial but as we discussed in #136641, that's most likely going to bring target-specific logic in which we are not sure we want to do. We can continue building consensus on that in that PR while we let this in?

@pranavk pranavk requested a review from smithp35 May 13, 2025 21:47
@pranavk
Copy link
Contributor Author

pranavk commented May 14, 2025

I have added separate commits for addressing concern in #136641. I am okay splitting it across multiple PRs but I feel everything is so closely related, perhaps it should be just one PR.

@MaskRay
Copy link
Member

MaskRay commented May 14, 2025

[lld][ICF] Don't merge symbols with different addends

Since the "merge symbol" PR has been reverted. This title is no longer accurate. You need to include the original description and append description about addends, potentially simplify the concatenated description.

@pranavk pranavk changed the title [lld][ICF] Don't merge symbols with different addends [lld][ELF] Merge equivalent symbols found during ICF May 14, 2025
@pranavk
Copy link
Contributor Author

pranavk commented May 15, 2025

@MaskRay done.

@@ -286,9 +313,14 @@ bool ICF<ELFT>::constantEq(const InputSection *secA, Relocs<RelTy> ra,
// Relocations referring to InputSections are constant-equal if their
// section offsets are equal.
if (isa<InputSection>(da->section)) {
if (da->value + addA == db->value + addB)
if (da->value + addA == db->value + addB) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was not in #134342 . I have tried to understand why the logic is so complex.
Looks like https://reviews.llvm.org/D34094 , while supporting MergeInputSection, also introduced the untested condition for InputSection

    if (isa<InputSection>(DA->Section))
      return DA->Value + AddA == DB->Value + AddB;

I feel that we should switch to the safer da->value == db->value && addA == addB, which likely doesn't lose anything (I believe it's difficult for clang emitted code to utilize the relaxed condition.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think da->value + addA == db->value + addB is more readable. I can read it as if the address resolves to the same place, sections are considered same. It also takes care of section folding for more cases. Changing it to da->value == db->value && addA == addB would exclude some cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it to da->value == db->value && addA == addB would exclude some cases.

The excluded cases are likely negligible. This condition is for InputSection (non merge string), where symbols defined not at offset 0 in -ffunction-sections -fdata-sections code are extremely rare. I don't know a case LLVM codegen would do this.

Copy link
Contributor Author

@pranavk pranavk May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chromium build that made us revert the original change did have many sections that were being merged due to this property and @zmodem tried my second commit in this PR (e4041b7) separately and noticed a binary size regression that was unacceptable to them. This comment on that Chromium bug is relevant: ~~ https://b.corp.google.com/issues/415810137#comment44
(https://issues.chromium.org/issues/415810137#comment44)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist, I propose to do that in a separate PR so that these changes are not affected by any potential revert.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps break disagreement, I lean towards @MaskRay 's suggestion to use da->value == db->value && addA == addB as the condition (same value, same addend). Maybe it misses something, but it's conceptually simpler even if it misses some case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I corrected the link in my previous comment to a public one. Apologies I didn't notice that I pasted an internal link to the Chromium bug.

Using da->value == db->value && addA == addB may seem trivial but it misses cases that increases the size of chromium builds to unacceptable levels. As I mentioned in my previous comment, @zmodem had tried it already and noticed a large size regression.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misremembering, but IIRC the only size impact we complained about was from the suggested workaround of using --icf=safe.

I don't remember evaluating the size impact of this PR. We just tried it and confirmed it works (https://crbug.com/415810137#comment50), waited for it to get through review, and reverted the original change as the fix seemed to be taking a while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Thanks for clarification. I was under the impression that after my comment here (https://issues.chromium.org/issues/415810137#comment50), you tried it and confirmed it works but the size regression was not acceptable to you.

I am okay simplifying the condition here after this clarification.

lld/ELF/ICF.cpp Outdated
// getting merged into each other (done later in ICF). We do this as
// post-ICF passes cannot handle duplicates when iterating over local
// symbols. There are also assertions that prevent this.
if ((!da->isGlobal() || !db->isGlobal()) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test seems to cover this block . Actually I don't understand why we need this check (#134342 (comment)). If we merge two sections, symbols defined relative to them, even if global, can be merged as well, regardless whether there are GOT/TLS relocations.

If we delete the isTrivialRelocation check from #136641 , we can get rid of the OffsetGetter complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the test for this.

Regarding why we need this, the issue was brought to the attention first here in this comment: #131630 (comment)

Quoting part of James' comment:

I also added a new check that ensures local symbols aren't merged with global symbols or vice-versa, otherwise assert(b->isLocal() && "should have been caught in initializeSymbols()"); is triggered by lld/test/ELF/icf-ineligible.s. (Instead, that test now fails because the sections referring to a local symbol aren't merged with those referring to the equivalent global symbol, as the test expects, but maybe that's OK.)

So we are not merging sections (and that automatically implies not merging their symbols) when there are local symbols involved. Note that we are only avoiding merging of such sections when it's a non-trivial relocation as that's when it will cause the original 'GOT merging' problem that motivated this series of changes.

Regarding deleting isTrivialRelocation(), I am currently using it in two places -- here and previous block. Both usages try to merge sections when only trivial relocations are involved as our original 'GOT merging' problem only occurs with non-trivial relocation.

Basically, the invariant we are trying to maintain here is:
When there's a non-trivial relocation, we shouldn't merge the sections unless their symbols can be merged

We can try to satisfy this invariant broadly by removing !isTrivialRelocation() check but that will fold less sections. I believe the current code satisfies the above invariant as tightly as possible to maximize section folding without causing 'GOT merging' problem.

@MaskRay
Copy link
Member

MaskRay commented May 21, 2025

Let'me rephrase the problem. When we have (#131630 (comment))

  • a Section A1, with a relocation on a symbol S1, where S1 is at offset K in section B1.
  • a Section A2, with a relocation on a symbol S2, where S2 is at offset K in section B2.

Currently, ICF mostly assumes that if you can merge B1 and B2, then you can also merge A1 and A2, without worrying about the symbol's identity.

This aggressive section merging, without accounting for symbol identity, causes correctness issues in the AArch64 MachineOutliner case (#129122).
Specifically, when the relocation type is R_AARCH64_ADR_GOT_PAGE, and sections B1 and B2 are merged while keeping symbols S1 and S2 separate, sections A1 and A2 must not be merged to maintain correctness.

To address this, we propose merging symbols S1 and S2 (by redirecting their Symbol * pointers to a single leader Symbol *) when their respective sections B1 and B2 are merged.
This merging process involves redirecting the Symbol * entry in the file's symbol table.
However, due to the local-before-non-local symbol table requirement, merging a local symbol with a non-local symbol is prohibited.

To resolve this, the proposed solution in this pull request is to:

  • When scanning A1/A2, try merging referenced symbols S1/S2.
  • Prevent A1/A2 merge if the relocation is "non-trivial" and any of S1/S2 is local. (As S1/S2 will be separate, sections A1 and A2 must not be merged)

  • The patch uses !isGlobal() (STB_GLOBAL), which should switch to isLocal(). STB_WEAK/STB_GNU_UNIQUE should be handled the same way as STB_GLOBAL.
  • Symbol merging is performed by scanning referended symbols (getRelocTargetSyms). Should we build a section-to-symbol map by scanning the file symbol table instead? In general, I want to reduce getRelocTargetSym calls.
  • The da->value + addA == db->value + addB condition in constantEq is not well-motivated (I doubt there is any benefit). Can probably switch to da->value == db->value && addA == addB and remove a isTrivialRelocation call.
  • We should not move OffsetGetter and needsGot from Relocations.cpp. Most targets' getRelExpr don't need the loc argument. We don't need 100% accurate getRelExpr. I think we can pass a fake loc ensuring validity of loc[-1] (used by X86.cpp) and loc[3] (used by LoongArch). Then we don't need OffsetGetter at all.

The proposed changes add considerable complexity, and I'm still uncertain about incorporating them into ICF.cpp. That said, thank you for creating the test cases! If I were designing an ABI, I would simply disallow GOT code sequences that span multiple sections.

@smithp35
Copy link
Collaborator

The proposed changes add considerable complexity, and I'm still uncertain about incorporating them into ICF.cpp. That said, thank you for creating the test cases! If I were designing an ABI, I would simply disallow GOT code sequences that span multiple sections.

I did try to see if there was any support internally for restricting this in the ABI. A summary of the outcome:

  • This would be a retrospective change and there would need to be a very good case to make it.
  • The known cases of this going wrong are limited to a combination of the outliner and ICF which haven't been strong enough to convince people on the GNU side that this would be worth making.
  • There is an agreement that splitting apart ADRP and (ADD|LDR) is not a good idea as it inhibits optimisations by software and hardware (particularly on in order CPUs). GCC has moved in the direction of always keeping ADRP and (ADD|LDR) together. However this is a recommendation not a requirement.

So I don't think the ABI is going to mandate that the outliner be changed. However I think there's sufficient recommendation that there is justification to do it.

At the moment I expect GOT relocations to local symbols to be restricted to non-default code-models or extensions like memory tagging/PAuthABI.

Is there a cheaper way of detecting when this case has occurred? One alternative might be to detect and error out with a message to turn off icf, or the outliner if the binary is affected? I can't immediately think of one though.

@pranavk
Copy link
Contributor Author

pranavk commented May 23, 2025

@MaskRay

  • The patch uses !isGlobal() (STB_GLOBAL), which should switch to isLocal(). STB_WEAK/STB_GNU_UNIQUE should be handled the same way as STB_GLOBAL.

Done.

  • Symbol merging is performed by scanning referended symbols (getRelocTargetSyms). Should we build a section-to-symbol map by scanning the file symbol table instead? In general, I want to reduce getRelocTargetSym calls.

Originally, we tried creating such a map in variableEq towards the end of that function but that didn't work because that function is called by multiple threads. We could serially try to create such a map at start of the ICF but I don't see much benefit of that.

  • The da->value + addA == db->value + addB condition in constantEq is not well-motivated (I doubt there is any benefit). Can probably switch to da->value == db->value && addA == addB and remove a isTrivialRelocation call.

Addressed separately above.

  • We should not move OffsetGetter and needsGot from Relocations.cpp. Most targets' getRelExpr don't need the loc argument. We don't need 100% accurate getRelExpr. I think we can pass a fake loc ensuring validity of loc[-1] (used by X86.cpp) and loc[3] (used by LoongArch). Then we don't need OffsetGetter at all.

Done. Thanks for suggestion here.

@pranavk
Copy link
Contributor Author

pranavk commented Jun 5, 2025

I asked @zmodem to verify if this fixes the regression that made us revert the previous reland and he has confirmed and it fixes the regression.

@rnk
Copy link
Collaborator

rnk commented Jun 10, 2025

If I were designing an ABI, I would simply disallow GOT code sequences that span multiple sections.

Sure, but we have to live with the ABI we have, not the one we want.

So I don't think the ABI is going to mandate that the outliner be changed.

... and I think that's a second opinion supporting the position that what LLD is doing right now isn't correct.


Standing back, I feel like this is blocked on making the right tradeoff between corner-case correctness and architectural simplicity/complexity. Usually in the toolchain space I make the argument that "correctness comes first, we can always simplify or optimize later", and I think that applies in this instance. Is there a simpler way that we could bail out of doing ICF on GOT relocations altogether? Is there some other compromise we could be making in ICF that achieves simplicity while making it less powerful in the presence of GOT-using relocations?

Somehow, I feel like this is part of the origin of the conceptual bug: we have relocations that refer to the GOT entry for symbol foo, which is not actually in the section that contains foo, and ICF has accidentally abstracted away that detail.

The obvious recommended compromise on the table is "don't do that" (teach the machine outliner to stop separating these relocations), but this leaves a latent correctness issue, and I'm not really comfortable with that.

@MaskRay
Copy link
Member

MaskRay commented Jun 11, 2025

I typically prioritize correctness too, but exposing OffsetGetter and needsGot from Relocations.cpp is completely unacceptable.
The distinction between local and non-local symbols is also troubling.
The local versus non-local symbol distinction is also problematic, as it keeps potential correctness pitfalls.
I believe the added complexity isn't justified. Has the MachineOutliner issue been resolved? Fixing it would also address COFF and Mach-O.
Obtaining ABI approval would be ideal, but on the compiler side, we could treat separate R_AARCH64_ADR_GOT_PAGE and the paired relocation as incorrect and avoid generating such code sequences.

@smithp35
Copy link
Collaborator

Before I start, please tell me if these suggestions aren't helping. I've managed to cross at least one person's red-line with most of them.

Arm does have a team that work on Chromium performance, if you know who they are it would be useful to contact them to see if they can prioritise/lobby for someone to work on the outliner. If you don't, I can refer this to them although it would likely have more weight going directly.

If the outliner were fixed, IIUC that would fix the current situation but we'd be left with legacy objects or some highly non-idiomatic assembly code. In that case we could try and detect the problematic case. I thought that this might be just as complex as the code here, but I think there may be a simple enough way to do it in a separate pass before ICF [1]. For the problem to occur we need a pair of GOT generating relocations split across two separate sections. For this to be legal AArch64 code there would have to be a branch instruction with a R_AARCH64_JUMP26 relocation to a symbol in another section, in between the ADRP (with GOT generating relocation) and the ADD/LDR (with GOT generating relocation).

A linear [2] scan through the relocations could detect problematic sections, and either refuse to do ICF or exclude the sections from ICF. Something like:

if ADRP got generating reloc to local symbol
  adrp_local_count++;
if ADD or LDR got generating symbol to local symbol
  state_adrp_local--;
if unconditional branch to another section S and state_adrp_local != 0
  section S cannot be merged by ICF.

I don't think that we would need to track which symbol. Only that there is at least one branch relocation to another section and there is still an unbalanced pair of ADRP, LDR or ADD [3]

[1] Could be run as an option, perhaps with an option to skip if we already know our inputs are clear.
[2] Strictly speaking relocations don't have to be ordered in r_offset order, but this may be true for clang using the outliner.
[3] In theory something like the following could slip through, but doing the second half of the calculation first is highly unidiomatic.

ADRP x0, :got: foo
ADD x1, x1 :got_lo12: bar
B <other section>

@pranavk
Copy link
Contributor Author

pranavk commented Jun 11, 2025

I typically prioritize correctness too, but exposing OffsetGetter and needsGot from Relocations.cpp is completely unacceptable.
The distinction between local and non-local symbols is also troubling.
The local versus non-local symbol distinction is also problematic, as it keeps potential correctness pitfalls.

Thanks for detailing your concerns. So it seems you are okay if we reland this PR with only first two commits (faa11ed and e4041b7). Let me know if I am reading you incorrectly. I can simplify the condition in the second commit further as you pointed out in a thread above. This would be very similar to earlier merged PR with minor changes on top of it.

It appears it's the handling of non-global symbols mostly that's the cause of concern along with introduction of TrivialRelocation(). For Google's internal needs, for now, we are okay with just handling this for global symbols and leaving local symbols out but this doesn't really address concerns @jyknight brought here to fix the problem as a whole. Since it has been a point of contention, I think it's worthwhile to address those concerns separately in a different PR (although tbh I am not sure if we can do that successfully).

If you confirm this, I can change this PR accordingly to only include first two commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mislink with ICF and multi-instruction GOT entry references
7 participants