[lld-macho][arm64] Enhance safe ICF with thunk-based deduplication #106573

alx32 · 2024-08-29T15:37:04Z

Currently, our safe ICF mode only merges non-address-significant code, leaving duplicate address-significant functions in the output. This patch introduces safe_thunks ICF mode, which keeps a single master copy of each function and replaces address-significant duplicates with thunks that branch to the master copy.
Currently --icf=safe_thunks is only supported for arm64 architectures.

Perf stats for a large binary:

ICF Option	Total Size	__text Size	__unwind_info	% total
`--icf=none`	91.738 MB	55.220 MB	1.424 MB	0%
`--icf=safe`	85.042 MB	49.572 MB	1.168 MB	7.30%
`--icf=safe_thunks`	84.650 MB	49.219 MB	1.143 MB	7.72%
`--icf=all`	82.060 MB	48.726 MB	1.111 MB	10.55%

So overall we can expect a ~0.45% binary size reduction for a typical large binary compared to the --icf=safe option.

Runtime:
Linking the above binary took ~10 seconds. Comparing the link performance of --icf=safe_thunks vs --icf=safe, a ~2% slowdown was observed.

Currently, our `safe` ICF mode only merges non-address-significant code, leaving duplicate address-significant functions in the output. This patch introduces `safe_thunks` ICF mode, which keeps a single master copy of each function and replaces address-significant duplicates with thunks that branch to the master copy.

llvmbot · 2024-08-29T21:41:27Z

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-macho

Author: None (alx32)

Changes

Currently, our safe ICF mode only merges non-address-significant code, leaving duplicate address-significant functions in the output. This patch introduces safe_thunks ICF mode, which keeps a single master copy of each function and replaces address-significant duplicates with thunks that branch to the master copy.

Perf stats for a large binary:

ICF Option	Total Size	__text Size	__unwind_info	% reduction
`--icf=none`	91.738 MB	55.220 MB	1.424 MB	0%
`--icf=safe`	85.042 MB	49.572 MB	1.168 MB	7.30%
`--icf=safe_thunks`	84.667 MB	49.239 MB	1.143 MB	7.71%
`--icf=all`	82.060 MB	48.726 MB	1.111 MB	10.55%

So overall we can expect a ~0.45% binary size reduction for a typical large binary compared to the --icf=safe option.

Runtime:
Linking the above binary took ~10 seconds. Comparing the link performance of --icf=safe_thunks vs --icf=safe, a ~2% slowdown was observed.

Full diff: https://github.com/llvm/llvm-project/pull/106573.diff

6 Files Affected:

(modified) lld/MachO/Arch/ARM64.cpp (+23)
(modified) lld/MachO/Config.h (+1)
(modified) lld/MachO/Driver.cpp (+9-1)
(modified) lld/MachO/ICF.cpp (+83-3)
(modified) lld/MachO/Target.h (+10)
(added) lld/test/MachO/icf-safe-thunks.ll (+241)

diff --git a/lld/MachO/Arch/ARM64.cpp b/lld/MachO/Arch/ARM64.cpp
index e192676394c965..30b6c27ff99f89 100644
--- a/lld/MachO/Arch/ARM64.cpp
+++ b/lld/MachO/Arch/ARM64.cpp
@@ -41,6 +41,10 @@ struct ARM64 : ARM64Common {
                             Symbol *objcMsgSend) const override;
   void populateThunk(InputSection *thunk, Symbol *funcSym) override;
   void applyOptimizationHints(uint8_t *, const ObjFile &) const override;
+
+  virtual void initICFSafeThunkBody(InputSection *thunk,
+                                    InputSection *branchTarget) const override;
+  virtual uint32_t getICFSafeThunkSize() const override;
 };
 
 } // namespace
@@ -175,6 +179,25 @@ void ARM64::populateThunk(InputSection *thunk, Symbol *funcSym) {
                              /*offset=*/0, /*addend=*/0,
                              /*referent=*/funcSym);
 }
+// Just a single direct branch to the target function.
+static constexpr uint32_t icfSafeThunkCode[] = {
+    0x94000000, // 08: b    target
+};
+
+void ARM64::initICFSafeThunkBody(InputSection *thunk,
+                                 InputSection *branchTarget) const {
+  // The base data here will not be itself modified, we'll just be adding a
+  // reloc below. So we can directly use the constexpr above as the data.
+  thunk->data = {reinterpret_cast<const uint8_t *>(icfSafeThunkCode),
+                 sizeof(icfSafeThunkCode)};
+
+  thunk->relocs.emplace_back(/*type=*/ARM64_RELOC_BRANCH26,
+                             /*pcrel=*/true, /*length=*/2,
+                             /*offset=*/0, /*addend=*/0,
+                             /*referent=*/branchTarget);
+}
+
+uint32_t ARM64::getICFSafeThunkSize() const { return sizeof(icfSafeThunkCode); }
 
 ARM64::ARM64() : ARM64Common(LP64()) {
   cpuType = CPU_TYPE_ARM64;
diff --git a/lld/MachO/Config.h b/lld/MachO/Config.h
index 5beb0662ba7274..4e940693602c95 100644
--- a/lld/MachO/Config.h
+++ b/lld/MachO/Config.h
@@ -68,6 +68,7 @@ enum class ICFLevel {
   unknown,
   none,
   safe,
+  safe_thunks,
   all,
 };
 
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 6a1ff96ed65697..7b23fdfea1303d 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -847,8 +847,15 @@ static ICFLevel getICFLevel(const ArgList &args) {
   auto icfLevel = StringSwitch<ICFLevel>(icfLevelStr)
                       .Cases("none", "", ICFLevel::none)
                       .Case("safe", ICFLevel::safe)
+                      .Case("safe_thunks", ICFLevel::safe_thunks)
                       .Case("all", ICFLevel::all)
                       .Default(ICFLevel::unknown);
+
+  if (icfLevel == ICFLevel::safe_thunks &&
+      !is_contained({AK_x86_64h, AK_arm64}, config->arch())) {
+    error("--icf=safe_thunks is only supported on arm64 targets");
+  }
+
   if (icfLevel == ICFLevel::unknown) {
     warn(Twine("unknown --icf=OPTION `") + icfLevelStr +
          "', defaulting to `none'");
@@ -2104,7 +2111,8 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
     // foldIdenticalLiterals before foldIdenticalSections.
     foldIdenticalLiterals();
     if (config->icfLevel != ICFLevel::none) {
-      if (config->icfLevel == ICFLevel::safe)
+      if (config->icfLevel == ICFLevel::safe ||
+          config->icfLevel == ICFLevel::safe_thunks)
         markAddrSigSymbols();
       foldIdenticalSections(/*onlyCfStrings=*/false);
     } else if (config->dedupStrings) {
diff --git a/lld/MachO/ICF.cpp b/lld/MachO/ICF.cpp
index fc786b571dc64f..cac5e673986829 100644
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -45,6 +45,7 @@ class ICF {
                       const ConcatInputSection *ib);
   bool equalsVariable(const ConcatInputSection *ia,
                       const ConcatInputSection *ib);
+  void applySafeThunksToRange(size_t begin, size_t end);
 
   // ICF needs a copy of the inputs vector because its equivalence-class
   // segregation algorithm destroys the proper sequence.
@@ -251,6 +252,63 @@ void ICF::forEachClassRange(size_t begin, size_t end,
   }
 }
 
+// Given a range of identical icfInputs's, replace address significant functions
+// with a thunk that is just a direct branch to the first function in the
+// series. This way we end up we keep only one main body of the function but we
+// still retain address uniqueness of rellevant functions by having them be a
+// direct branch thunk rather than contain a full copy of the actual function
+// body.
+void ICF::applySafeThunksToRange(size_t begin, size_t end) {
+  // If we need to create a unique ICF thunk, use the first section as the
+  // section that all thunks will branch to.
+  ConcatInputSection *masterIsec = icfInputs[begin];
+  uint32_t thunkSize = target->getICFSafeThunkSize();
+  static std::mutex thunkInsertionMutex;
+
+  uint32_t keepUniqueCount = masterIsec->keepUnique ? 1 : 0;
+  for (size_t i = begin + 1; i < end; ++i) {
+    ConcatInputSection *isec = icfInputs[i];
+    if (isec->keepUnique)
+      ++keepUniqueCount;
+
+    // We create thunks for the 2nd, 3rd, ... keepUnique sections. The first
+    // keepUnique section we leave as is - as it will not end up sharing an
+    // address with any other keepUnique section.
+    if (keepUniqueCount >= 2 && isec->keepUnique) {
+      // If the target to be folded is smaller than the thunk size, then just
+      // leave it as-is - creating the thunk would be a net loss.
+      if (isec->data.size() <= thunkSize)
+        return;
+
+      // applySafeThunksToRange is called from multiple threads, but
+      // `makeSyntheticInputSection` and `addInputSection` are not thread safe.
+      // So we need to guard them with a mutex.
+      ConcatInputSection *thunk;
+      {
+        std::lock_guard<std::mutex> lock(thunkInsertionMutex);
+        thunk = makeSyntheticInputSection(isec->getSegName(), isec->getName());
+        addInputSection(thunk);
+      }
+
+      target->initICFSafeThunkBody(thunk, masterIsec);
+      thunk->foldIdentical(isec);
+
+      // Since we're folding the target function into a thunk, we need to adjust
+      // the symbols that now got relocated from the target function to the
+      // thunk.
+      // Since the thunk is only one branch, we move all symbols to offset 0 and
+      // make sure that the size of all non-zero-size symbols is equal to the
+      // size of the branch.
+      for (auto *sym : isec->symbols) {
+        if (sym->value != 0)
+          sym->value = 0;
+        if (sym->size != 0)
+          sym->size = thunkSize;
+      }
+    }
+  }
+}
+
 // Split icfInputs into shards, then parallelize invocation of FUNC on subranges
 // with matching equivalence class
 void ICF::forEachClass(llvm::function_ref<void(size_t, size_t)> func) {
@@ -335,9 +393,20 @@ void ICF::run() {
   forEachClass([&](size_t begin, size_t end) {
     if (end - begin < 2)
       return;
+    bool useSafeThunks = config->icfLevel == ICFLevel::safe_thunks;
+
+    // For ICF level safe_thunks, replace keepUnique function bodies with
+    // thunks. For all other ICF levles, directly merge the functions.
+    if (useSafeThunks)
+      applySafeThunksToRange(begin, end);
+
     ConcatInputSection *beginIsec = icfInputs[begin];
-    for (size_t i = begin + 1; i < end; ++i)
+    for (size_t i = begin + 1; i < end; ++i) {
+      // When using safe_thunks, keepUnique inputs are already handeled above
+      if (useSafeThunks && icfInputs[i]->keepUnique)
+        continue;
       beginIsec->foldIdentical(icfInputs[i]);
+    }
   });
 }
 
@@ -421,11 +490,22 @@ void macho::foldIdenticalSections(bool onlyCfStrings) {
     // can still fold it.
     bool hasFoldableFlags = (isSelRefsSection(isec) ||
                              sectionType(isec->getFlags()) == MachO::S_REGULAR);
+
+    bool isCodeSec = isCodeSection(isec);
+
+    // When keepUnique is true, the section is not foldable. Unless we are at
+    // icf level safe_thunks, in which case we still want to fold code sections.
+    // When using safe_thunks we'll apply the safe_thunks logic at merge time
+    // based on the 'keepUnique' flag.
+    bool noUniqueRequirement =
+        !isec->keepUnique ||
+        ((config->icfLevel == ICFLevel::safe_thunks) && isCodeSec);
+
     // FIXME: consider non-code __text sections as foldable?
     bool isFoldable = (!onlyCfStrings || isCfStringSection(isec)) &&
-                      (isCodeSection(isec) || isFoldableWithAddendsRemoved ||
+                      (isCodeSec || isFoldableWithAddendsRemoved ||
                        isGccExceptTabSection(isec)) &&
-                      !isec->keepUnique && !isec->hasAltEntry &&
+                      noUniqueRequirement && !isec->hasAltEntry &&
                       !isec->shouldOmitFromOutput() && hasFoldableFlags;
     if (isFoldable) {
       foldable.push_back(isec);
diff --git a/lld/MachO/Target.h b/lld/MachO/Target.h
index cc47ae4386b477..eaa0336e70cb6b 100644
--- a/lld/MachO/Target.h
+++ b/lld/MachO/Target.h
@@ -74,6 +74,16 @@ class TargetInfo {
                                     uint64_t selrefVA,
                                     Symbol *objcMsgSend) const = 0;
 
+  // Init 'thunk' so that it be a direct jump to 'branchTarget'.
+  virtual void initICFSafeThunkBody(InputSection *thunk,
+                                    InputSection *branchTarget) const {
+    llvm_unreachable("target does not support ICF safe thunks");
+  }
+
+  virtual uint32_t getICFSafeThunkSize() const {
+    llvm_unreachable("target does not support ICF safe thunks");
+  }
+
   // Symbols may be referenced via either the GOT or the stubs section,
   // depending on the relocation type. prepareSymbolRelocation() will set up the
   // GOT/stubs entries, and resolveSymbolVA() will return the addresses of those
diff --git a/lld/test/MachO/icf-safe-thunks.ll b/lld/test/MachO/icf-safe-thunks.ll
new file mode 100644
index 00000000000000..2a0ca8314036f8
--- /dev/null
+++ b/lld/test/MachO/icf-safe-thunks.ll
@@ -0,0 +1,241 @@
+; REQUIRES: aarch64
+
+; RUN: rm -rf %t; mkdir %t
+; RUN: llc -filetype=obj %s -O3 -o %t/icf-obj-safe-thunks.o -enable-machine-outliner=never -mtriple arm64-apple-macos -addrsig
+; RUN: %lld -arch arm64 -lSystem --icf=safe_thunks -dylib -o %t/icf-safe.dylib %t/icf-obj-safe-thunks.o
+; RUN: llvm-objdump %t/icf-safe.dylib -d --macho | FileCheck %s --check-prefixes=CHECK-ARM64
+
+; CHECK-ARM64:        (__TEXT,__text) section
+; CHECK-ARM64-NEXT:   _func_unique_1:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x1
+;
+; CHECK-ARM64:        _func_unique_2_canmerge:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x2
+;
+; CHECK-ARM64:        _func_2identical_v1:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x2
+;
+; CHECK-ARM64:        _func_3identical_v1:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x3
+;
+; CHECK-ARM64:        _func_3identical_v1_canmerge:
+; CHECK-ARM64-NEXT:   _func_3identical_v2_canmerge:
+; CHECK-ARM64-NEXT:   _func_3identical_v3_canmerge:
+; CHECK-ARM64-NEXT:        mov {{.*}}, #0x21
+;
+; CHECK-ARM64:        _call_all_funcs:
+; CHECK-ARM64-NEXT:        stp  x29
+;
+; CHECK-ARM64:        _take_func_addr:
+; CHECK-ARM64-NEXT:        adr
+;
+; CHECK-ARM64:        _func_2identical_v2:
+; CHECK-ARM64-NEXT:         bl  _func_unique_2_canmerge
+; CHECK-ARM64-NEXT:   _func_3identical_v2:
+; CHECK-ARM64-NEXT:        bl  _func_3identical_v1
+; CHECK-ARM64-NEXT:   _func_3identical_v3:
+; CHECK-ARM64-NEXT:        bl  _func_3identical_v1
+
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "arm64-apple-macosx11.0.0"
+
+@g_val = global i8 0, align 1
+@g_ptr = global ptr null, align 8
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_unique_1() #0 {
+entry:
+  store volatile i8 1, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_unique_2_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 2, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_2identical_v1() #0 {
+entry:
+  store volatile i8 2, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_2identical_v2() #0 {
+entry:
+  store volatile i8 2, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v1() #0 {
+entry:
+  store volatile i8 3, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v2() #0 {
+entry:
+  store volatile i8 3, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v3() #0 {
+entry:
+  store volatile i8 3, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v1_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 33, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v2_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 33, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @func_3identical_v3_canmerge() local_unnamed_addr #0 {
+entry:
+  store volatile i8 33, ptr @g_val, align 1, !tbaa !5
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp uwtable(sync)
+define void @call_all_funcs() local_unnamed_addr #1 {
+entry:
+  tail call void @func_unique_1()
+  tail call void @func_unique_2_canmerge()
+  tail call void @func_2identical_v1()
+  tail call void @func_2identical_v2()
+  tail call void @func_3identical_v1()
+  tail call void @func_3identical_v2()
+  tail call void @func_3identical_v3()
+  tail call void @func_3identical_v1_canmerge()
+  tail call void @func_3identical_v2_canmerge()
+  tail call void @func_3identical_v3_canmerge()
+  ret void
+}
+
+; Function Attrs: mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync)
+define void @take_func_addr() local_unnamed_addr #0 {
+entry:
+  store volatile ptr @func_unique_1, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_2identical_v1, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_2identical_v2, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_3identical_v1, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_3identical_v2, ptr @g_ptr, align 8, !tbaa !8
+  store volatile ptr @func_3identical_v3, ptr @g_ptr, align 8, !tbaa !8
+  ret void
+}
+
+attributes #0 = { mustprogress nofree noinline norecurse nounwind ssp memory(readwrite, argmem: none) uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+altnzcv,+ccdp,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a,+zcm,+zcz" }
+attributes #1 = { mustprogress nofree noinline norecurse nounwind ssp uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+altnzcv,+ccdp,+complxnum,+crc,+dotprod,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a,+zcm,+zcz" }
+
+!llvm.module.flags = !{!0, !1, !2, !3}
+!llvm.ident = !{!4}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 8, !"PIC Level", i32 2}
+!2 = !{i32 7, !"uwtable", i32 1}
+!3 = !{i32 7, !"frame-pointer", i32 1}
+!4 = !{!"clang"}
+!5 = !{!6, !6, i64 0}
+!6 = !{!"omnipotent char", !7, i64 0}
+!7 = !{!"Simple C++ TBAA"}
+!8 = !{!9, !9, i64 0}
+!9 = !{!"any pointer", !6, i64 0}
+
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;;;;;;;;;;;;; Generate the above LLVM IR with the below script ;;;;;;;;;;;;;;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; #!/bin/bash
+; set -ex
+; TOOLCHAIN_BIN="llvm-project/build/Debug/bin"
+;
+; # Create icf-safe-thunks.cpp file
+; cat > icf-safe-thunks.cpp <<EOF
+;
+; #define ATTR __attribute__((noinline)) extern "C"
+; typedef unsigned long long ULL;
+;
+; volatile char g_val = 0;
+; void *volatile g_ptr = 0;
+;
+; ATTR void func_unique_1() {
+;     g_val = 1;
+; }
+;
+; ATTR void func_unique_2_canmerge() {
+;     g_val = 2;
+; }
+;
+; ATTR void func_2identical_v1() {
+;     g_val = 2;
+; }
+;
+; ATTR void func_2identical_v2() {
+;     g_val = 2;
+; }
+;
+; ATTR void func_3identical_v1() {
+;     g_val = 3;
+; }
+;
+; ATTR void func_3identical_v2() {
+;     g_val = 3;
+; }
+;
+; ATTR void func_3identical_v3() {
+;     g_val = 3;
+; }
+;
+; ATTR void func_3identical_v1_canmerge() {
+;     g_val = 33;
+; }
+;
+; ATTR void func_3identical_v2_canmerge() {
+;     g_val = 33;
+; }
+;
+; ATTR void func_3identical_v3_canmerge() {
+;     g_val = 33;
+; }
+;
+; ATTR void call_all_funcs() {
+;     func_unique_1();
+;     func_unique_2_canmerge();
+;     func_2identical_v1();
+;     func_2identical_v2();
+;     func_3identical_v1();
+;     func_3identical_v2();
+;     func_3identical_v3();
+;     func_3identical_v1_canmerge();
+;     func_3identical_v2_canmerge();
+;     func_3identical_v3_canmerge();
+; }
+;
+; ATTR void take_func_addr() {
+;     g_ptr = (void*)func_unique_1;
+;     g_ptr = (void*)func_2identical_v1;
+;     g_ptr = (void*)func_2identical_v2;
+;     g_ptr = (void*)func_3identical_v1;
+;     g_ptr = (void*)func_3identical_v2;
+;     g_ptr = (void*)func_3identical_v3;
+; }
+; EOF
+;
+; $TOOLCHAIN_BIN/clang -target arm64-apple-macos11.0 -S -emit-llvm \
+;                      icf-safe-thunks.cpp -O3 -o icf-safe-thunks.ll

lld/MachO/Arch/ARM64.cpp

kyulee-com · 2024-08-30T21:10:06Z

lld/MachO/Arch/ARM64.cpp

@@ -175,6 +179,25 @@ void ARM64::populateThunk(InputSection *thunk, Symbol *funcSym) {
                             /*offset=*/0, /*addend=*/0,
                             /*referent=*/funcSym);
 }
+// Just a single direct branch to the target function.
+static constexpr uint32_t icfSafeThunkCode[] = {
+    0x14000000, // 08: b    target


When a binary is large, we may not reach the target using this direct branch. What's the plan for this case?

LLD needs generic support for such scenarios - not only relating to this feature, but relating to code generation in general. So this is a general problem - and yes, we don't introduce any special feature to mitigate this for this situation. When generic support will be added for this scenario, we will be inheriting it.

lld/MachO/Driver.cpp

lld/MachO/ICF.cpp

lld/test/MachO/icf-safe-thunks.ll

github-actions · 2024-08-31T02:22:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lld/MachO/ICF.cpp

alx32 · 2024-09-05T18:42:29Z

lld/MachO/ICF.cpp

+    for (size_t i = begin + 1; i < end; ++i) {
+      if (icfInputs[i]->keepUnique) {
+        std::swap(icfInputs[begin], icfInputs[i]);
+        haveKeepUnique = true;


yeah. remove the whole block.

kyulee-com · 2024-09-05T18:49:40Z

lld/MachO/Symbols.h

+  // Enum that describes the kind of ICF folding that a symbol is involved in.
+  // We need to keep track of this to correctly display symbol sizes in the map
+  // file.
+  enum ICFFoldKind { None, Folded_Body, Folded_Thunk };


The prefix Folded_ seems redundant. Can you clarify the comments for each case something like:

// Enum that describes the type of Identical Code Folding (ICF) applied to a symbol. // This information is crucial for accurately representing symbol sizes in the map file. enum ICFFoldKind { None, // No folding is applied. Body, // The entire body (function or data) is folded. Thunk // The function body is folded into a thunk. };

kyulee-com · 2024-09-05T18:53:03Z

lld/MachO/ICF.cpp

+    if (!isec->keepUnique)
+      break;
+
+    ConcatInputSection *thunk;


You could delete this separate declaration, but place it with the definition below with auto *thunk = ...

kyulee-com · 2024-09-05T18:53:29Z

lld/MachO/ICF.cpp

+    for (size_t i = begin + 1; i < end; ++i) {
+      if (icfInputs[i]->keepUnique) {
+        std::swap(icfInputs[begin], icfInputs[i]);
+        haveKeepUnique = true;


yeah. remove the whole block.

kyulee-com

LGTM

alx32 · 2024-09-06T03:50:38Z

This triggers a static assert on --target=x86_64-windows-gnu: #107511

   24 | static_assert(sizeof(void *) != 8 || sizeof(Defined) == 88,
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Looking into now. Will forward fix / revert soon.

alx32 · 2024-09-06T04:52:05Z

Fix for various platforms: #107514

Colibrow · 2025-01-16T08:32:35Z

@alx32 Does the options.td need to be more clear?

diff --git a/lld/MachO/Options.td b/lld/MachO/Options.td
index 4b1e9e439107..bc2a8d18f14b 100644
--- a/lld/MachO/Options.td
+++ b/lld/MachO/Options.td
@@ -83,7 +83,7 @@ def print_dylib_search: Flag<["--"], "print-dylib-search">,
     Group<grp_lld>;
 def icf_eq: Joined<["--"], "icf=">,
     HelpText<"Set level for identical code folding (default: none)">,
-    MetaVarName<"[none,safe,all]">,
+    MetaVarName<"[none,safe,safe_thunks,all]">,
     Group<grp_lld>;
 def keep_icf_stabs: Joined<["--"], "keep-icf-stabs">,
     HelpText<"Generate STABS entries for symbols folded by ICF. These entries can then be used by dsymutil to discover the address range where folded symbols are located.">,

I plan to port this into ELF and test it on some projects to evaluate the code size improvements. Does this approach sound reasonable?

If so, I will draft a proposal for the LLVM discussion group and prepare a PR for the ELF-specific changes.
@ellishg cc

alx32 · 2025-01-16T13:01:56Z

@alx32 Does the options.td need to be more clear?

Do you mean should we add docs on the various ICF levels ?

I plan to port this into ELF and test it on some projects to evaluate the code size improvements. Does this approach sound reasonable?

Yes, we should port these behaviors as much as possible. Which features are you looking to port ? Also, unsure if ELF has support for __addrsig which would be a prerequisite for safe and safe_thunks.

Colibrow · 2025-01-16T13:27:57Z

Since safe_thunks functionality is already implemented for Mach-O ARM64, I'm questioning whether we need to expose this option to users explicitly in the way shown in the patch I referenced.
My plan is to port the icf=safe_thunks optimization from Mach-O to ELF format and evaluate the potential binary size improvements. Currently, we use icf=safe with -faddrsig for some of our production ELF builds.

alx32 · 2025-01-16T19:56:37Z

I'm questioning whether we need to expose this option to users explicitly in the way shown in the patch I referenced.

Ah now I see - I forgot to put the option in Options.td. This is indeed a miss on my part - I will submit a PR to update it and add some docs also.

My plan is to port the icf=safe_thunks optimization from Mach-O to ELF format and evaluate the potential binary size improvements. Currently, we use icf=safe with -faddrsig for some of our production ELF builds.

This sounds like a great plan - can I help in any way ?

Colibrow · 2025-01-17T03:16:45Z

I'm glad you are interested. I'll try the safe_thunks option in my local build and post the result soon. After that, I'll create an ELF patch like this. I may need your advice on this pull request. Thanks.

@Colibrow

Adding the `safe_thunks` option in `Options.td` as it was missing there - mentioned by @Colibrow in #106573 Also documenting what the various options mean. Help now looks like this: ``` .......... --error-limit=<value> Maximum number of errors to print before exiting (default: 20) --help-hidden Display help for hidden options --icf=[none,safe,safe_thunks,all] Set level for identical code folding (default: none). Possible values: none - Disable ICF safe - Only folds non-address significant functions (as described by `__addrsig` section) safe_thunks - Like safe, but replaces address-significant functions with thunks all - Fold all identical functions --ignore-auto-link-option=<value> Ignore a single auto-linked library or framework. Useful to ignore invalid options that ld64 ignores --irpgo-profile-sort=<profile> Deprecated. Please use --irpgo-profile and --bp-startup-sort=function .......... ```

@Colibrow

Adding the `safe_thunks` option in `Options.td` as it was missing there - mentioned by @Colibrow in llvm/llvm-project#106573 Also documenting what the various options mean. Help now looks like this: ``` .......... --error-limit=<value> Maximum number of errors to print before exiting (default: 20) --help-hidden Display help for hidden options --icf=[none,safe,safe_thunks,all] Set level for identical code folding (default: none). Possible values: none - Disable ICF safe - Only folds non-address significant functions (as described by `__addrsig` section) safe_thunks - Like safe, but replaces address-significant functions with thunks all - Fold all identical functions --ignore-auto-link-option=<value> Ignore a single auto-linked library or framework. Useful to ignore invalid options that ld64 ignores --irpgo-profile-sort=<profile> Deprecated. Please use --irpgo-profile and --bp-startup-sort=function .......... ```

alx32 requested review from int3 and keith August 29, 2024 21:22

alx32 assigned ellishg, DataCorrupted and kyulee-com Aug 29, 2024

alx32 requested review from ellishg, DataCorrupted, BertalanD and kyulee-com August 29, 2024 21:23

alx32 unassigned ellishg, DataCorrupted and kyulee-com Aug 29, 2024

alx32 force-pushed the 07_icf_safe_thunks branch from 975225c to e2f8e5c Compare August 29, 2024 21:24

alx32 marked this pull request as ready for review August 29, 2024 21:40

llvmbot added lld lld:MachO labels Aug 29, 2024

alx32 changed the title ~~[lld-macho] Enhance safe ICF with thunk-based deduplication~~ [lld-macho][arm64] Enhance safe ICF with thunk-based deduplication Aug 29, 2024

Fix "br" => "b" branch

334d096

ellishg requested a review from MaskRay August 30, 2024 17:20

kyulee-com reviewed Aug 30, 2024

View reviewed changes

Address Feedback nr.1

8bc9599

Fix Formatting

4c69771

DataCorrupted reviewed Sep 3, 2024

View reviewed changes

lld/MachO/ICF.cpp Outdated Show resolved Hide resolved

lld/MachO/ICF.cpp Show resolved Hide resolved

kyulee-com reviewed Sep 4, 2024

View reviewed changes

lld/MachO/ICF.cpp Outdated Show resolved Hide resolved

lld/MachO/ICF.cpp Outdated Show resolved Hide resolved

lld/MachO/ICF.cpp Outdated Show resolved Hide resolved

lld/MachO/ICF.cpp Outdated Show resolved Hide resolved

Alex B added 3 commits September 5, 2024 08:20

Fix map file, remove thread locking

d48c157

Add assert

fe7d026

Fix assert

9f1a42f

alx32 commented Sep 5, 2024

View reviewed changes

kyulee-com reviewed Sep 5, 2024

View reviewed changes

Alex B added 2 commits September 5, 2024 12:49

Address feedback

5ed74ee

Fix formatting

88d7983

kyulee-com approved these changes Sep 5, 2024

View reviewed changes

DataCorrupted self-requested a review September 5, 2024 20:45

DataCorrupted approved these changes Sep 5, 2024

View reviewed changes

alx32 merged commit d175616 into llvm:main Sep 5, 2024
8 checks passed

alx32 mentioned this pull request Sep 6, 2024

[lld] build failure #107511

Closed

alx32 mentioned this pull request Sep 6, 2024

[lld-macho] Disable static assert failing on several platforms #107514

Closed

alx32 mentioned this pull request Jan 17, 2025

[lld-macho] Document '-icf' flag options #123372

Merged

Colibrow mentioned this pull request Feb 11, 2025

[lld][elf] add safe-thunks in ELF #126695

Closed

[lld-macho][arm64] Enhance safe ICF with thunk-based deduplication #106573

[lld-macho][arm64] Enhance safe ICF with thunk-based deduplication #106573

Uh oh!

Conversation

alx32 commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyulee-com Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

alx32 Aug 31, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alx32 Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

kyulee-com Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

kyulee-com Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

kyulee-com Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

kyulee-com Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

kyulee-com left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alx32 commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alx32 commented Sep 6, 2024

Uh oh!

Colibrow commented Jan 16, 2025

Uh oh!

alx32 commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Colibrow commented Jan 16, 2025

Uh oh!

alx32 commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Colibrow commented Jan 17, 2025

Uh oh!

Uh oh!

alx32 commented Aug 29, 2024 •

edited

Loading

llvmbot commented Aug 29, 2024 •

edited

Loading

github-actions bot commented Aug 31, 2024 •

edited

Loading

alx32 commented Sep 6, 2024 •

edited

Loading

alx32 commented Jan 16, 2025 •

edited

Loading

alx32 commented Jan 16, 2025 •

edited

Loading