Skip to content

[lld-macho] Fix ICF differentiation of safe_thunks relocs #111811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented Oct 10, 2024

In --icf=safe_thunks mode, the linker differentiates keepUnique functions by creating thunks during a post-processing step after Identical Code Folding (ICF). While this ensures that keepUnique functions themselves are not incorrectly merged, it overlooks functions that reference these keepUnique symbols.

If two functions are identical except for references to different keepUnique functions, the current ICF algorithm incorrectly considers them identical because it doesn't account for the future differentiation introduced by thunks. This leads to incorrect deduplication of functions that should remain distinct.

To address this issue, we modify the ICF comparison to explicitly check for references to keepUnique functions during deduplication. By doing so, functions that reference different keepUnique symbols are correctly identified as distinct, preventing erroneous merging and ensuring the correctness of the linked output.

@alx32 alx32 changed the title [lld-macho] Fix incorrect merging of funcs referencing 'keepUnique' funcs in '--icf=safe_thunks' mode [lld-macho] Fix ICF differentiation of safe_thunks relocs Oct 10, 2024
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@alx32 alx32 requested a review from kyulee-com October 10, 2024 11:20
@alx32 alx32 marked this pull request as ready for review October 10, 2024 11:20
@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2024

@llvm/pr-subscribers-lld

Author: None (alx32)

Changes

In --icf=safe_thunks mode, the linker differentiates keepUnique functions by creating thunks during a post-processing step after Identical Code Folding (ICF). While this ensures that keepUnique functions themselves are not incorrectly merged, it overlooks functions that reference these keepUnique symbols.

If two functions are identical except for references to different keepUnique functions, the current ICF algorithm incorrectly considers them identical because it doesn't account for the future differentiation introduced by thunks. This leads to incorrect deduplication of functions that should remain distinct.

To address this issue, we modify the ICF comparison to explicitly check for references to keepUnique functions during deduplication. By doing so, functions that reference different keepUnique symbols are correctly identified as distinct, preventing erroneous merging and ensuring the correctness of the linked output.


Full diff: https://github.com/llvm/llvm-project/pull/111811.diff

2 Files Affected:

  • (modified) lld/MachO/ICF.cpp (+11)
  • (modified) lld/test/MachO/icf-safe-thunks.ll (+44)
diff --git a/lld/MachO/ICF.cpp b/lld/MachO/ICF.cpp
index 2ff962b06e3679..aedaecfdeb2c01 100644
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -147,6 +147,17 @@ bool ICF::equalsConstant(const ConcatInputSection *ia,
       isecB = rb.referent.get<InputSection *>();
     }
 
+    // Typically, we should not encounter sections marked with `keepUnique` at
+    // this point as they would have resulted in different hashes and therefore
+    // no need for a full comparison.
+    // However, in `safe_thunks` mode, it's possible for two different
+    // relocations to reference identical `keepUnique` functions that will be
+    // distinguished later via thunks - so we need to handle this case
+    // explicitly.
+    if ((isecA != isecB) && ((isecA->keepUnique && isCodeSection(isecA)) ||
+                             (isecB->keepUnique && isCodeSection(isecB))))
+      return false;
+
     if (isecA->parent != isecB->parent)
       return false;
     // Sections with identical parents should be of the same kind.
diff --git a/lld/test/MachO/icf-safe-thunks.ll b/lld/test/MachO/icf-safe-thunks.ll
index 238e90f952e160..da2a8e46a5e9b5 100644
--- a/lld/test/MachO/icf-safe-thunks.ll
+++ b/lld/test/MachO/icf-safe-thunks.ll
@@ -22,6 +22,13 @@
 ; CHECK-ARM64-NEXT:   _func_3identical_v3_canmerge:
 ; CHECK-ARM64-NEXT:        mov {{.*}}, #0x21
 ;
+; CHECK-ARM64:        _func_call_thunked_1_nomerge:
+; CHECK-ARM64-NEXT:        stp	x29
+;
+; CHECK-ARM64:        _func_call_thunked_2_nomerge:
+; CHECK-ARM64-NEXT:   _func_call_thunked_2_merge:
+; CHECK-ARM64-NEXT:        stp	x29
+;
 ; CHECK-ARM64:        _call_all_funcs:
 ; CHECK-ARM64-NEXT:        stp  x29
 ;
@@ -43,6 +50,9 @@
 ; CHECK-ARM64-MAP-NEXT: 0x00000010 [  2] _func_3identical_v1_canmerge
 ; CHECK-ARM64-MAP-NEXT: 0x00000000 [  2] _func_3identical_v2_canmerge
 ; CHECK-ARM64-MAP-NEXT: 0x00000000 [  2] _func_3identical_v3_canmerge
+; CHECK-ARM64-MAP-NEXT: 0x00000020 [  2] _func_call_thunked_1_nomerge
+; CHECK-ARM64-MAP-NEXT: 0x00000020 [  2] _func_call_thunked_2_nomerge
+; CHECK-ARM64-MAP-NEXT: 0x00000000 [  2] _func_call_thunked_2_merge
 ; CHECK-ARM64-MAP-NEXT: 0x00000034 [  2] _call_all_funcs
 ; CHECK-ARM64-MAP-NEXT: 0x00000050 [  2] _take_func_addr
 ; CHECK-ARM64-MAP-NEXT: 0x00000004 [  2] _func_2identical_v2
@@ -125,6 +135,30 @@ entry:
   ret void
 }
 
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_call_thunked_1_nomerge() local_unnamed_addr #0 {
+entry:
+  tail call void @func_2identical_v1()
+  store volatile i8 77, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_call_thunked_2_nomerge() local_unnamed_addr #0 {
+entry:
+  tail call void @func_2identical_v2()
+  store volatile i8 77, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_call_thunked_2_merge() local_unnamed_addr #0 {
+entry:
+  tail call void @func_2identical_v2()
+  store volatile i8 77, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
 ; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp uwtable(sync)
 define void @call_all_funcs() local_unnamed_addr #1 {
 entry:
@@ -227,6 +261,16 @@ attributes #1 = { mustprogress nofree noinline norecurse nounwind ssp uwtable(sy
 ;     g_val = 33;
 ; }
 ;
+; ATTR void func_call_thunked_1_nomerge() {
+;     func_2identical_v1();
+;     g_val = 77;
+; }
+;
+; ATTR void func_call_thunked_2_nomerge() {
+;     func_2identical_v2();
+;     g_val = 77;
+; }
+;
 ; ATTR void call_all_funcs() {
 ;     func_unique_1();
 ;     func_unique_2_canmerge();

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2024

@llvm/pr-subscribers-lld-macho

Author: None (alx32)

Changes

In --icf=safe_thunks mode, the linker differentiates keepUnique functions by creating thunks during a post-processing step after Identical Code Folding (ICF). While this ensures that keepUnique functions themselves are not incorrectly merged, it overlooks functions that reference these keepUnique symbols.

If two functions are identical except for references to different keepUnique functions, the current ICF algorithm incorrectly considers them identical because it doesn't account for the future differentiation introduced by thunks. This leads to incorrect deduplication of functions that should remain distinct.

To address this issue, we modify the ICF comparison to explicitly check for references to keepUnique functions during deduplication. By doing so, functions that reference different keepUnique symbols are correctly identified as distinct, preventing erroneous merging and ensuring the correctness of the linked output.


Full diff: https://github.com/llvm/llvm-project/pull/111811.diff

2 Files Affected:

  • (modified) lld/MachO/ICF.cpp (+11)
  • (modified) lld/test/MachO/icf-safe-thunks.ll (+44)
diff --git a/lld/MachO/ICF.cpp b/lld/MachO/ICF.cpp
index 2ff962b06e3679..aedaecfdeb2c01 100644
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -147,6 +147,17 @@ bool ICF::equalsConstant(const ConcatInputSection *ia,
       isecB = rb.referent.get<InputSection *>();
     }
 
+    // Typically, we should not encounter sections marked with `keepUnique` at
+    // this point as they would have resulted in different hashes and therefore
+    // no need for a full comparison.
+    // However, in `safe_thunks` mode, it's possible for two different
+    // relocations to reference identical `keepUnique` functions that will be
+    // distinguished later via thunks - so we need to handle this case
+    // explicitly.
+    if ((isecA != isecB) && ((isecA->keepUnique && isCodeSection(isecA)) ||
+                             (isecB->keepUnique && isCodeSection(isecB))))
+      return false;
+
     if (isecA->parent != isecB->parent)
       return false;
     // Sections with identical parents should be of the same kind.
diff --git a/lld/test/MachO/icf-safe-thunks.ll b/lld/test/MachO/icf-safe-thunks.ll
index 238e90f952e160..da2a8e46a5e9b5 100644
--- a/lld/test/MachO/icf-safe-thunks.ll
+++ b/lld/test/MachO/icf-safe-thunks.ll
@@ -22,6 +22,13 @@
 ; CHECK-ARM64-NEXT:   _func_3identical_v3_canmerge:
 ; CHECK-ARM64-NEXT:        mov {{.*}}, #0x21
 ;
+; CHECK-ARM64:        _func_call_thunked_1_nomerge:
+; CHECK-ARM64-NEXT:        stp	x29
+;
+; CHECK-ARM64:        _func_call_thunked_2_nomerge:
+; CHECK-ARM64-NEXT:   _func_call_thunked_2_merge:
+; CHECK-ARM64-NEXT:        stp	x29
+;
 ; CHECK-ARM64:        _call_all_funcs:
 ; CHECK-ARM64-NEXT:        stp  x29
 ;
@@ -43,6 +50,9 @@
 ; CHECK-ARM64-MAP-NEXT: 0x00000010 [  2] _func_3identical_v1_canmerge
 ; CHECK-ARM64-MAP-NEXT: 0x00000000 [  2] _func_3identical_v2_canmerge
 ; CHECK-ARM64-MAP-NEXT: 0x00000000 [  2] _func_3identical_v3_canmerge
+; CHECK-ARM64-MAP-NEXT: 0x00000020 [  2] _func_call_thunked_1_nomerge
+; CHECK-ARM64-MAP-NEXT: 0x00000020 [  2] _func_call_thunked_2_nomerge
+; CHECK-ARM64-MAP-NEXT: 0x00000000 [  2] _func_call_thunked_2_merge
 ; CHECK-ARM64-MAP-NEXT: 0x00000034 [  2] _call_all_funcs
 ; CHECK-ARM64-MAP-NEXT: 0x00000050 [  2] _take_func_addr
 ; CHECK-ARM64-MAP-NEXT: 0x00000004 [  2] _func_2identical_v2
@@ -125,6 +135,30 @@ entry:
   ret void
 }
 
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_call_thunked_1_nomerge() local_unnamed_addr #0 {
+entry:
+  tail call void @func_2identical_v1()
+  store volatile i8 77, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_call_thunked_2_nomerge() local_unnamed_addr #0 {
+entry:
+  tail call void @func_2identical_v2()
+  store volatile i8 77, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_call_thunked_2_merge() local_unnamed_addr #0 {
+entry:
+  tail call void @func_2identical_v2()
+  store volatile i8 77, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
 ; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp uwtable(sync)
 define void @call_all_funcs() local_unnamed_addr #1 {
 entry:
@@ -227,6 +261,16 @@ attributes #1 = { mustprogress nofree noinline norecurse nounwind ssp uwtable(sy
 ;     g_val = 33;
 ; }
 ;
+; ATTR void func_call_thunked_1_nomerge() {
+;     func_2identical_v1();
+;     g_val = 77;
+; }
+;
+; ATTR void func_call_thunked_2_nomerge() {
+;     func_2identical_v2();
+;     g_val = 77;
+; }
+;
 ; ATTR void call_all_funcs() {
 ;     func_unique_1();
 ;     func_unique_2_canmerge();

@alx32 alx32 force-pushed the 13_icf_safe_thunks_unique_child branch from 8feac95 to 85b148b Compare October 10, 2024 11:22
@alx32 alx32 merged commit 97a4324 into llvm:main Oct 10, 2024
8 checks passed
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
In `--icf=safe_thunks` mode, the linker differentiates `keepUnique`
functions by creating thunks during a post-processing step after
Identical Code Folding (ICF). While this ensures that `keepUnique`
functions themselves are not incorrectly merged, it overlooks functions
that reference these `keepUnique` symbols.

If two functions are identical except for references to different
`keepUnique` functions, the current ICF algorithm incorrectly considers
them identical because it doesn't account for the future differentiation
introduced by thunks. This leads to incorrect deduplication of functions
that should remain distinct.

To address this issue, we modify the ICF comparison to explicitly check
for references to `keepUnique` functions during deduplication. By doing
so, functions that reference different `keepUnique` symbols are
correctly identified as distinct, preventing erroneous merging and
ensuring the correctness of the linked output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants