Skip to content

[lld-macho][arm64] Enhance safe ICF with thunk-based deduplication #106573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 5, 2024

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented Aug 29, 2024

Currently, our safe ICF mode only merges non-address-significant code, leaving duplicate address-significant functions in the output. This patch introduces safe_thunks ICF mode, which keeps a single master copy of each function and replaces address-significant duplicates with thunks that branch to the master copy.
Currently --icf=safe_thunks is only supported for arm64 architectures.

Perf stats for a large binary:

ICF Option Total Size __text Size __unwind_info % total
--icf=none 91.738 MB 55.220 MB 1.424 MB 0%
--icf=safe 85.042 MB 49.572 MB 1.168 MB 7.30%
--icf=safe_thunks 84.650 MB 49.219 MB 1.143 MB 7.72%
--icf=all 82.060 MB 48.726 MB 1.111 MB 10.55%

So overall we can expect a ~0.45% binary size reduction for a typical large binary compared to the --icf=safe option.

Runtime:
Linking the above binary took ~10 seconds. Comparing the link performance of --icf=safe_thunks vs --icf=safe, a ~2% slowdown was observed.

Currently, our `safe` ICF mode only merges non-address-significant code, leaving duplicate address-significant functions in the output. This patch introduces `safe_thunks` ICF mode, which keeps a single master copy of each function and replaces address-significant duplicates with thunks that branch to the master copy.
@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2024

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-macho

Author: None (alx32)

Changes

Currently, our safe ICF mode only merges non-address-significant code, leaving duplicate address-significant functions in the output. This patch introduces safe_thunks ICF mode, which keeps a single master copy of each function and replaces address-significant duplicates with thunks that branch to the master copy.

Perf stats for a large binary:

ICF Option Total Size __text Size __unwind_info % reduction
--icf=none 91.738 MB 55.220 MB 1.424 MB 0%
--icf=safe 85.042 MB 49.572 MB 1.168 MB 7.30%
--icf=safe_thunks 84.667 MB 49.239 MB 1.143 MB 7.71%
--icf=all 82.060 MB 48.726 MB 1.111 MB 10.55%

So overall we can expect a ~0.45% binary size reduction for a typical large binary compared to the --icf=safe option.

Runtime:
Linking the above binary took ~10 seconds. Comparing the link performance of --icf=safe_thunks vs --icf=safe, a ~2% slowdown was observed.


Full diff: https://github.com/llvm/llvm-project/pull/106573.diff

6 Files Affected:

  • (modified) lld/MachO/Arch/ARM64.cpp (+23)
  • (modified) lld/MachO/Config.h (+1)
  • (modified) lld/MachO/Driver.cpp (+9-1)
  • (modified) lld/MachO/ICF.cpp (+83-3)
  • (modified) lld/MachO/Target.h (+10)
  • (added) lld/test/MachO/icf-safe-thunks.ll (+241)
diff --git a/lld/MachO/Arch/ARM64.cpp b/lld/MachO/Arch/ARM64.cpp
index e192676394c965..30b6c27ff99f89 100644
--- a/lld/MachO/Arch/ARM64.cpp
+++ b/lld/MachO/Arch/ARM64.cpp
@@ -41,6 +41,10 @@ struct ARM64 : ARM64Common {
                             Symbol *objcMsgSend) const override;
   void populateThunk(InputSection *thunk, Symbol *funcSym) override;
   void applyOptimizationHints(uint8_t *, const ObjFile &) const override;
+
+  virtual void initICFSafeThunkBody(InputSection *thunk,
+                                    InputSection *branchTarget) const override;
+  virtual uint32_t getICFSafeThunkSize() const override;
 };
 
 } // namespace
@@ -175,6 +179,25 @@ void ARM64::populateThunk(InputSection *thunk, Symbol *funcSym) {
                              /*offset=*/0, /*addend=*/0,
                              /*referent=*/funcSym);
 }
+// Just a single direct branch to the target function.
+static constexpr uint32_t icfSafeThunkCode[] = {
+    0x94000000, // 08: b    target
+};
+
+void ARM64::initICFSafeThunkBody(InputSection *thunk,
+                                 InputSection *branchTarget) const {
+  // The base data here will not be itself modified, we'll just be adding a
+  // reloc below. So we can directly use the constexpr above as the data.
+  thunk->data = {reinterpret_cast<const uint8_t *>(icfSafeThunkCode),
+                 sizeof(icfSafeThunkCode)};
+
+  thunk->relocs.emplace_back(/*type=*/ARM64_RELOC_BRANCH26,
+                             /*pcrel=*/true, /*length=*/2,
+                             /*offset=*/0, /*addend=*/0,
+                             /*referent=*/branchTarget);
+}
+
+uint32_t ARM64::getICFSafeThunkSize() const { return sizeof(icfSafeThunkCode); }
 
 ARM64::ARM64() : ARM64Common(LP64()) {
   cpuType = CPU_TYPE_ARM64;
diff --git a/lld/MachO/Config.h b/lld/MachO/Config.h
index 5beb0662ba7274..4e940693602c95 100644
--- a/lld/MachO/Config.h
+++ b/lld/MachO/Config.h
@@ -68,6 +68,7 @@ enum class ICFLevel {
   unknown,
   none,
   safe,
+  safe_thunks,
   all,
 };
 
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 6a1ff96ed65697..7b23fdfea1303d 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -847,8 +847,15 @@ static ICFLevel getICFLevel(const ArgList &args) {
   auto icfLevel = StringSwitch<ICFLevel>(icfLevelStr)
                       .Cases("none", "", ICFLevel::none)
                       .Case("safe", ICFLevel::safe)
+                      .Case("safe_thunks", ICFLevel::safe_thunks)
                       .Case("all", ICFLevel::all)
                       .Default(ICFLevel::unknown);
+
+  if (icfLevel == ICFLevel::safe_thunks &&
+      !is_contained({AK_x86_64h, AK_arm64}, config->arch())) {
+    error("--icf=safe_thunks is only supported on arm64 targets");
+  }
+
   if (icfLevel == ICFLevel::unknown) {
     warn(Twine("unknown --icf=OPTION `") + icfLevelStr +
          "', defaulting to `none'");
@@ -2104,7 +2111,8 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
     // foldIdenticalLiterals before foldIdenticalSections.
     foldIdenticalLiterals();
     if (config->icfLevel != ICFLevel::none) {
-      if (config->icfLevel == ICFLevel::safe)
+      if (config->icfLevel == ICFLevel::safe ||
+          config->icfLevel == ICFLevel::safe_thunks)
         markAddrSigSymbols();
       foldIdenticalSections(/*onlyCfStrings=*/false);
     } else if (config->dedupStrings) {
diff --git a/lld/MachO/ICF.cpp b/lld/MachO/ICF.cpp
index fc786b571dc64f..cac5e673986829 100644
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -45,6 +45,7 @@ class ICF {
                       const ConcatInputSection *ib);
   bool equalsVariable(const ConcatInputSection *ia,
                       const ConcatInputSection *ib);
+  void applySafeThunksToRange(size_t begin, size_t end);
 
   // ICF needs a copy of the inputs vector because its equivalence-class
   // segregation algorithm destroys the proper sequence.
@@ -251,6 +252,63 @@ void ICF::forEachClassRange(size_t begin, size_t end,
   }
 }
 
+// Given a range of identical icfInputs's, replace address significant functions
+// with a thunk that is just a direct branch to the first function in the
+// series. This way we end up we keep only one main body of the function but we
+// still retain address uniqueness of rellevant functions by having them be a
+// direct branch thunk rather than contain a full copy of the actual function
+// body.
+void ICF::applySafeThunksToRange(size_t begin, size_t end) {
+  // If we need to create a unique ICF thunk, use the first section as the
+  // section that all thunks will branch to.
+  ConcatInputSection *masterIsec = icfInputs[begin];
+  uint32_t thunkSize = target->getICFSafeThunkSize();
+  static std::mutex thunkInsertionMutex;
+
+  uint32_t keepUniqueCount = masterIsec->keepUnique ? 1 : 0;
+  for (size_t i = begin + 1; i < end; ++i) {
+    ConcatInputSection *isec = icfInputs[i];
+    if (isec->keepUnique)
+      ++keepUniqueCount;
+
+    // We create thunks for the 2nd, 3rd, ... keepUnique sections. The first
+    // keepUnique section we leave as is - as it will not end up sharing an
+    // address with any other keepUnique section.
+    if (keepUniqueCount >= 2 && isec->keepUnique) {
+      // If the target to be folded is smaller than the thunk size, then just
+      // leave it as-is - creating the thunk would be a net loss.
+      if (isec->data.size() <= thunkSize)
+        return;
+
+      // applySafeThunksToRange is called from multiple threads, but
+      // `makeSyntheticInputSection` and `addInputSection` are not thread safe.
+      // So we need to guard them with a mutex.
+      ConcatInputSection *thunk;
+      {
+        std::lock_guard<std::mutex> lock(thunkInsertionMutex);
+        thunk = makeSyntheticInputSection(isec->getSegName(), isec->getName());
+        addInputSection(thunk);
+      }
+
+      target->initICFSafeThunkBody(thunk, masterIsec);
+      thunk->foldIdentical(isec);
+
+      // Since we're folding the target function into a thunk, we need to adjust
+      // the symbols that now got relocated from the target function to the
+      // thunk.
+      // Since the thunk is only one branch, we move all symbols to offset 0 and
+      // make sure that the size of all non-zero-size symbols is equal to the
+      // size of the branch.
+      for (auto *sym : isec->symbols) {
+        if (sym->value != 0)
+          sym->value = 0;
+        if (sym->size != 0)
+          sym->size = thunkSize;
+      }
+    }
+  }
+}
+
 // Split icfInputs into shards, then parallelize invocation of FUNC on subranges
 // with matching equivalence class
 void ICF::forEachClass(llvm::function_ref<void(size_t, size_t)> func) {
@@ -335,9 +393,20 @@ void ICF::run() {
   forEachClass([&](size_t begin, size_t end) {
     if (end - begin < 2)
       return;
+    bool useSafeThunks = config->icfLevel == ICFLevel::safe_thunks;
+
+    // For ICF level safe_thunks, replace keepUnique function bodies with
+    // thunks. For all other ICF levles, directly merge the functions.
+    if (useSafeThunks)
+      applySafeThunksToRange(begin, end);
+
     ConcatInputSection *beginIsec = icfInputs[begin];
-    for (size_t i = begin + 1; i < end; ++i)
+    for (size_t i = begin + 1; i < end; ++i) {
+      // When using safe_thunks, keepUnique inputs are already handeled above
+      if (useSafeThunks && icfInputs[i]->keepUnique)
+        continue;
       beginIsec->foldIdentical(icfInputs[i]);
+    }
   });
 }
 
@@ -421,11 +490,22 @@ void macho::foldIdenticalSections(bool onlyCfStrings) {
     // can still fold it.
     bool hasFoldableFlags = (isSelRefsSection(isec) ||
                              sectionType(isec->getFlags()) == MachO::S_REGULAR);
+
+    bool isCodeSec = isCodeSection(isec);
+
+    // When keepUnique is true, the section is not foldable. Unless we are at
+    // icf level safe_thunks, in which case we still want to fold code sections.
+    // When using safe_thunks we'll apply the safe_thunks logic at merge time
+    // based on the 'keepUnique' flag.
+    bool noUniqueRequirement =
+        !isec->keepUnique ||
+        ((config->icfLevel == ICFLevel::safe_thunks) && isCodeSec);
+
     // FIXME: consider non-code __text sections as foldable?
     bool isFoldable = (!onlyCfStrings || isCfStringSection(isec)) &&
-                      (isCodeSection(isec) || isFoldableWithAddendsRemoved ||
+                      (isCodeSec || isFoldableWithAddendsRemoved ||
                        isGccExceptTabSection(isec)) &&
-                      !isec->keepUnique && !isec->hasAltEntry &&
+                      noUniqueRequirement && !isec->hasAltEntry &&
                       !isec->shouldOmitFromOutput() && hasFoldableFlags;
     if (isFoldable) {
       foldable.push_back(isec);
diff --git a/lld/MachO/Target.h b/lld/MachO/Target.h
index cc47ae4386b477..eaa0336e70cb6b 100644
--- a/lld/MachO/Target.h
+++ b/lld/MachO/Target.h
@@ -74,6 +74,16 @@ class TargetInfo {
                                     uint64_t selrefVA,
                                     Symbol *objcMsgSend) const = 0;
 
+  // Init 'thunk' so that it be a direct jump to 'branchTarget'.
+  virtual void initICFSafeThunkBody(InputSection *thunk,
+                                    InputSection *branchTarget) const {
+    llvm_unreachable("target does not support ICF safe thunks");
+  }
+
+  virtual uint32_t getICFSafeThunkSize() const {
+    llvm_unreachable("target does not support ICF safe thunks");
+  }
+
   // Symbols may be referenced via either the GOT or the stubs section,
   // depending on the relocation type. prepareSymbolRelocation() will set up the
   // GOT/stubs entries, and resolveSymbolVA() will return the addresses of those
diff --git a/lld/test/MachO/icf-safe-thunks.ll b/lld/test/MachO/icf-safe-thunks.ll
new file mode 100644
index 00000000000000..2a0ca8314036f8
--- /dev/null
+++ b/lld/test/MachO/icf-safe-thunks.ll
@@ -0,0 +1,241 @@
+; REQUIRES: aarch64
+
+; RUN: rm -rf %t; mkdir %t
+; RUN: llc -filetype=obj %s -O3 -o %t/icf-obj-safe-thunks.o -enable-machine-outliner=never -mtriple arm64-apple-macos -addrsig
+; RUN: %lld -arch arm64 -lSystem --icf=safe_thunks -dylib -o %t/icf-safe.dylib %t/icf-obj-safe-thunks.o
+; RUN: llvm-objdump %t/icf-safe.dylib -d --macho | FileCheck %s --check-prefixes=CHECK-ARM64
+
+; CHECK-ARM64:        (__TEXT,__text) section
+; CHECK-ARM64-NEXT:   _func_unique_1:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x1
+;
+; CHECK-ARM64:        _func_unique_2_canmerge:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x2
+;
+; CHECK-ARM64:        _func_2identical_v1:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x2
+;
+; CHECK-ARM64:        _func_3identical_v1:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x3
+;
+; CHECK-ARM64:        _func_3identical_v1_canmerge:
+; CHECK-ARM64-NEXT:   _func_3identical_v2_canmerge:
+; CHECK-ARM64-NEXT:   _func_3identical_v3_canmerge:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x21
+;
+; CHECK-ARM64:        _call_all_funcs:
+; CHECK-ARM64-NEXT:        stp  x29
+;
+; CHECK-ARM64:        _take_func_addr:
+; CHECK-ARM64-NEXT:        adr
+;
+; CHECK-ARM64:        _func_2identical_v2:
+; CHECK-ARM64-NEXT:         bl  _func_unique_2_canmerge
+; CHECK-ARM64-NEXT:   _func_3identical_v2:
+; CHECK-ARM64-NEXT:        bl  _func_3identical_v1
+; CHECK-ARM64-NEXT:   _func_3identical_v3:
+; CHECK-ARM64-NEXT:        bl  _func_3identical_v1
+
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "arm64-apple-macosx11.0.0"
+
+@g_val = global i8 0, align 1
+@g_ptr = global ptr null, align 8
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_unique_1() #0 {
+entry:
+  store volatile i8 1, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_unique_2_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 2, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_2identical_v1() #0 {
+entry:
+  store volatile i8 2, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_2identical_v2() #0 {
+entry:
+  store volatile i8 2, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v1() #0 {
+entry:
+  store volatile i8 3, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v2() #0 {
+entry:
+  store volatile i8 3, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v3() #0 {
+entry:
+  store volatile i8 3, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v1_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 33, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v2_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 33, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v3_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 33, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp uwtable(sync)
+define void @call_all_funcs() local_unnamed_addr #1 {
+entry:
+  tail call void @func_unique_1()
+  tail call void @func_unique_2_canmerge()
+  tail call void @func_2identical_v1()
+  tail call void @func_2identical_v2()
+  tail call void @func_3identical_v1()
+  tail call void @func_3identical_v2()
+  tail call void @func_3identical_v3()
+  tail call void @func_3identical_v1_canmerge()
+  tail call void @func_3identical_v2_canmerge()
+  tail call void @func_3identical_v3_canmerge()
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @take_func_addr() local_unnamed_addr #0 {
+entry:
+  store volatile ptr @func_unique_1, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_2identical_v1, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_2identical_v2, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_3identical_v1, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_3identical_v2, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_3identical_v3, ptr @g_ptr, align 8, !tbaa !8
+  ret void
+}
+
+attributes #0 = { mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+altnzcv,+ccdp,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a,+zcm,+zcz" }
+attributes #1 = { mustprogress nofree noinline norecurse nounwind ssp uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+altnzcv,+ccdp,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a,+zcm,+zcz" }
+
+!llvm.module.flags = !{!0, !1, !2, !3}
+!llvm.ident = !{!4}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 8, !"PIC Level", i32 2}
+!2 = !{i32 7, !"uwtable", i32 1}
+!3 = !{i32 7, !"frame-pointer", i32 1}
+!4 = !{!"clang"}
+!5 = !{!6, !6, i64 0}
+!6 = !{!"omnipotent char", !7, i64 0}
+!7 = !{!"Simple C++ TBAA"}
+!8 = !{!9, !9, i64 0}
+!9 = !{!"any pointer", !6, i64 0}
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;;;;;;;;;;;;; Generate the above LLVM IR with the below script ;;;;;;;;;;;;;;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; #!/bin/bash
+; set -ex
+; TOOLCHAIN_BIN="llvm-project/build/Debug/bin"
+;
+; # Create icf-safe-thunks.cpp file
+; cat > icf-safe-thunks.cpp <<EOF
+;
+; #define ATTR __attribute__((noinline)) extern "C"
+; typedef unsigned long long ULL;
+;
+; volatile char g_val = 0;
+; void *volatile g_ptr = 0;
+;
+; ATTR void func_unique_1() {
+;     g_val = 1;
+; }
+;
+; ATTR void func_unique_2_canmerge() {
+;     g_val = 2;
+; }
+;
+; ATTR void func_2identical_v1() {
+;     g_val = 2;
+; }
+;
+; ATTR void func_2identical_v2() {
+;     g_val = 2;
+; }
+;
+; ATTR void func_3identical_v1() {
+;     g_val = 3;
+; }
+;
+; ATTR void func_3identical_v2() {
+;     g_val = 3;
+; }
+;
+; ATTR void func_3identical_v3() {
+;     g_val = 3;
+; }
+;
+; ATTR void func_3identical_v1_canmerge() {
+;     g_val = 33;
+; }
+;
+; ATTR void func_3identical_v2_canmerge() {
+;     g_val = 33;
+; }
+;
+; ATTR void func_3identical_v3_canmerge() {
+;     g_val = 33;
+; }
+;
+; ATTR void call_all_funcs() {
+;     func_unique_1();
+;     func_unique_2_canmerge();
+;     func_2identical_v1();
+;     func_2identical_v2();
+;     func_3identical_v1();
+;     func_3identical_v2();
+;     func_3identical_v3();
+;     func_3identical_v1_canmerge();
+;     func_3identical_v2_canmerge();
+;     func_3identical_v3_canmerge();
+; }
+;
+; ATTR void take_func_addr() {
+;     g_ptr = (void*)func_unique_1;
+;     g_ptr = (void*)func_2identical_v1;
+;     g_ptr = (void*)func_2identical_v2;
+;     g_ptr = (void*)func_3identical_v1;
+;     g_ptr = (void*)func_3identical_v2;
+;     g_ptr = (void*)func_3identical_v3;
+; }
+; EOF
+;
+; $TOOLCHAIN_BIN/clang -target arm64-apple-macos11.0 -S -emit-llvm \
+;                      icf-safe-thunks.cpp -O3 -o icf-safe-thunks.ll

@alx32 alx32 changed the title [lld-macho] Enhance safe ICF with thunk-based deduplication [lld-macho][arm64] Enhance safe ICF with thunk-based deduplication Aug 29, 2024
@ellishg ellishg requested a review from MaskRay August 30, 2024 17:20
@@ -175,6 +179,25 @@ void ARM64::populateThunk(InputSection *thunk, Symbol *funcSym) {
/*offset=*/0, /*addend=*/0,
/*referent=*/funcSym);
}
// Just a single direct branch to the target function.
static constexpr uint32_t icfSafeThunkCode[] = {
0x14000000, // 08: b target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a binary is large, we may not reach the target using this direct branch. What's the plan for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLD needs generic support for such scenarios - not only relating to this feature, but relating to code generation in general. So this is a general problem - and yes, we don't introduce any special feature to mitigate this for this situation. When generic support will be added for this scenario, we will be inheriting it.

Copy link

github-actions bot commented Aug 31, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

for (size_t i = begin + 1; i < end; ++i) {
if (icfInputs[i]->keepUnique) {
std::swap(icfInputs[begin], icfInputs[i]);
haveKeepUnique = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. remove the whole block.

// Enum that describes the kind of ICF folding that a symbol is involved in.
// We need to keep track of this to correctly display symbol sizes in the map
// file.
enum ICFFoldKind { None, Folded_Body, Folded_Thunk };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix Folded_ seems redundant. Can you clarify the comments for each case something like:

	// Enum that describes the type of Identical Code Folding (ICF) applied to a symbol.
	// This information is crucial for accurately representing symbol sizes in the map file.
enum ICFFoldKind {
  None,    // No folding is applied.
  Body,    // The entire body (function or data) is folded.
  Thunk    // The function body is folded into a thunk.
};

if (!isec->keepUnique)
break;

ConcatInputSection *thunk;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could delete this separate declaration, but place it with the definition below with auto *thunk = ...

for (size_t i = begin + 1; i < end; ++i) {
if (icfInputs[i]->keepUnique) {
std::swap(icfInputs[begin], icfInputs[i]);
haveKeepUnique = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. remove the whole block.

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DataCorrupted DataCorrupted self-requested a review September 5, 2024 20:45
@alx32 alx32 merged commit d175616 into llvm:main Sep 5, 2024
8 checks passed
@alx32 alx32 mentioned this pull request Sep 6, 2024
@alx32
Copy link
Contributor Author

alx32 commented Sep 6, 2024

This triggers a static assert on --target=x86_64-windows-gnu: #107511

   24 | static_assert(sizeof(void *) != 8 || sizeof(Defined) == 88,
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Looking into now. Will forward fix / revert soon.

@alx32
Copy link
Contributor Author

alx32 commented Sep 6, 2024

Fix for various platforms: #107514

@Colibrow
Copy link
Contributor

@alx32 Does the options.td need to be more clear?

diff --git a/lld/MachO/Options.td b/lld/MachO/Options.td
index 4b1e9e439107..bc2a8d18f14b 100644
--- a/lld/MachO/Options.td
+++ b/lld/MachO/Options.td
@@ -83,7 +83,7 @@ def print_dylib_search: Flag<["--"], "print-dylib-search">,
     Group<grp_lld>;
 def icf_eq: Joined<["--"], "icf=">,
     HelpText<"Set level for identical code folding (default: none)">,
-    MetaVarName<"[none,safe,all]">,
+    MetaVarName<"[none,safe,safe_thunks,all]">,
     Group<grp_lld>;
 def keep_icf_stabs: Joined<["--"], "keep-icf-stabs">,
     HelpText<"Generate STABS entries for symbols folded by ICF. These entries can then be used by dsymutil to discover the address range where folded symbols are located.">,

I plan to port this into ELF and test it on some projects to evaluate the code size improvements. Does this approach sound reasonable?

If so, I will draft a proposal for the LLVM discussion group and prepare a PR for the ELF-specific changes.
@ellishg cc

@alx32
Copy link
Contributor Author

alx32 commented Jan 16, 2025

@alx32 Does the options.td need to be more clear?

Do you mean should we add docs on the various ICF levels ?

I plan to port this into ELF and test it on some projects to evaluate the code size improvements. Does this approach sound reasonable?

Yes, we should port these behaviors as much as possible. Which features are you looking to port ? Also, unsure if ELF has support for __addrsig which would be a prerequisite for safe and safe_thunks.

@Colibrow
Copy link
Contributor

Since safe_thunks functionality is already implemented for Mach-O ARM64, I'm questioning whether we need to expose this option to users explicitly in the way shown in the patch I referenced.
My plan is to port the icf=safe_thunks optimization from Mach-O to ELF format and evaluate the potential binary size improvements. Currently, we use icf=safe with -faddrsig for some of our production ELF builds.

@alx32
Copy link
Contributor Author

alx32 commented Jan 16, 2025

I'm questioning whether we need to expose this option to users explicitly in the way shown in the patch I referenced.

Ah now I see - I forgot to put the option in Options.td. This is indeed a miss on my part - I will submit a PR to update it and add some docs also.

My plan is to port the icf=safe_thunks optimization from Mach-O to ELF format and evaluate the potential binary size improvements. Currently, we use icf=safe with -faddrsig for some of our production ELF builds.

This sounds like a great plan - can I help in any way ?

@Colibrow
Copy link
Contributor

I'm glad you are interested. I'll try the safe_thunks option in my local build and post the result soon. After that, I'll create an ELF patch like this. I may need your advice on this pull request. Thanks.

alx32 added a commit that referenced this pull request Jan 17, 2025
Adding the `safe_thunks` option in `Options.td` as it was missing there
- mentioned by @Colibrow in
#106573
Also documenting what the various options mean. 

Help now looks like this:
```
..........
  --error-limit=<value>   Maximum number of errors to print before exiting (default: 20)
  --help-hidden           Display help for hidden options
  --icf=[none,safe,safe_thunks,all]
                          Set level for identical code folding (default: none). Possible values:
                            none        - Disable ICF
                            safe        - Only folds non-address significant functions (as described by `__addrsig` section)
                            safe_thunks - Like safe, but replaces address-significant functions with thunks
                            all         - Fold all identical functions
  --ignore-auto-link-option=<value>
                          Ignore a single auto-linked library or framework. Useful to ignore invalid options that ld64 ignores
  --irpgo-profile-sort=<profile>
                          Deprecated. Please use --irpgo-profile and --bp-startup-sort=function
..........
```
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 17, 2025
Adding the `safe_thunks` option in `Options.td` as it was missing there
- mentioned by @Colibrow in
llvm/llvm-project#106573
Also documenting what the various options mean.

Help now looks like this:
```
..........
  --error-limit=<value>   Maximum number of errors to print before exiting (default: 20)
  --help-hidden           Display help for hidden options
  --icf=[none,safe,safe_thunks,all]
                          Set level for identical code folding (default: none). Possible values:
                            none        - Disable ICF
                            safe        - Only folds non-address significant functions (as described by `__addrsig` section)
                            safe_thunks - Like safe, but replaces address-significant functions with thunks
                            all         - Fold all identical functions
  --ignore-auto-link-option=<value>
                          Ignore a single auto-linked library or framework. Useful to ignore invalid options that ld64 ignores
  --irpgo-profile-sort=<profile>
                          Deprecated. Please use --irpgo-profile and --bp-startup-sort=function
..........
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants