[clang][llvm][fatlto] Avoid cloning modules in FatLTO #72180

ilovepi · 2023-11-14T00:48:13Z

#70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.

llvmbot · 2023-11-14T00:48:34Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-lto

@llvm/pr-subscribers-clang

Author: Paul Kirth (ilovepi)

Changes

#70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.

Full diff: https://github.com/llvm/llvm-project/pull/72180.diff

12 Files Affected:

(modified) clang/lib/CodeGen/BackendUtil.cpp (+4-5)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+3-1)
(modified) clang/test/CodeGen/fat-lto-objects.c (+5-5)
(modified) llvm/docs/FatLTO.rst (+18-17)
(modified) llvm/include/llvm/Passes/PassBuilder.h (+1-2)
(modified) llvm/include/llvm/Transforms/IPO/EmbedBitcodePass.h (+1-16)
(modified) llvm/lib/Passes/PassBuilder.cpp (-20)
(modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+4-7)
(modified) llvm/lib/Passes/PassRegistry.def (+1-7)
(modified) llvm/lib/Transforms/IPO/EmbedBitcodePass.cpp (+1-15)
(modified) llvm/test/CodeGen/X86/fat-lto-section.ll (+1-1)
(modified) llvm/test/Transforms/EmbedBitcode/embed.ll (-3)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index a7a47d17723cb73..4114860545ade1b 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   // Only enable CGProfilePass when using integrated assembler, since
   // non-integrated assemblers don't recognize .cgprofile section.
   PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
-  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO;
 
   LoopAnalysisManager LAM;
   FunctionAnalysisManager FAM;
@@ -996,9 +996,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
     }
 
     if (CodeGenOpts.FatLTO) {
-      MPM.addPass(PB.buildFatLTODefaultPipeline(
-          Level, PrepareForThinLTO,
-          PrepareForThinLTO || shouldEmitRegularLTOSummary()));
+      MPM.addPass(PB.buildFatLTODefaultPipeline(Level));
     } else if (PrepareForThinLTO) {
       MPM.addPass(PB.buildThinLTOPreLinkDefaultPipeline(Level));
     } else if (PrepareForLTO) {
@@ -1073,7 +1071,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
     if (!TheModule->getModuleFlag("EnableSplitLTOUnit"))
       TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
                                uint32_t(CodeGenOpts.EnableSplitLTOUnit));
-    if (CodeGenOpts.UnifiedLTO && !TheModule->getModuleFlag("UnifiedLTO"))
+    // FatLTO always means UnifiedLTO
+    if (!TheModule->getModuleFlag("UnifiedLTO"))
       TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
   }
 
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 3b98c7ae6e6ec66..f4cd9cbc5eccdec 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4845,7 +4845,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
   bool UnifiedLTO = false;
   if (IsUsingLTO) {
     UnifiedLTO = Args.hasFlag(options::OPT_funified_lto,
-                              options::OPT_fno_unified_lto, Triple.isPS());
+                              options::OPT_fno_unified_lto, Triple.isPS()) ||
+                 Args.hasFlag(options::OPT_ffat_lto_objects,
+                              options::OPT_fno_fat_lto_objects, false);
     if (UnifiedLTO)
       CmdArgs.push_back("-funified-lto");
   }
diff --git a/clang/test/CodeGen/fat-lto-objects.c b/clang/test/CodeGen/fat-lto-objects.c
index 2c3a4ef9c615529..6085762fa5a2467 100644
--- a/clang/test/CodeGen/fat-lto-objects.c
+++ b/clang/test/CodeGen/fat-lto-objects.c
@@ -9,22 +9,22 @@
 // RUN: %clang -cc1 -triple x86_64-unknown-linux-gnu -flto=full -ffat-lto-objects -fsplit-lto-unit -emit-obj < %s -o %t.full.split.o
 // RUN: llvm-readelf -S %t.full.split.o | FileCheck %s --check-prefixes=ELF
 // RUN: llvm-objcopy --dump-section=.llvm.lto=%t.full.split.bc %t.full.split.o
-// RUN: llvm-dis %t.full.split.bc -o - | FileCheck %s --check-prefixes=FULL,SPLIT,NOUNIFIED
+// RUN: llvm-dis %t.full.split.bc -o - | FileCheck %s --check-prefixes=FULL,SPLIT,UNIFIED
 
 // RUN: %clang -cc1 -triple x86_64-unknown-linux-gnu -flto=full -ffat-lto-objects -emit-obj < %s -o %t.full.nosplit.o
 // RUN: llvm-readelf -S %t.full.nosplit.o | FileCheck %s --check-prefixes=ELF
 // RUN: llvm-objcopy --dump-section=.llvm.lto=%t.full.nosplit.bc %t.full.nosplit.o
-// RUN: llvm-dis %t.full.nosplit.bc -o - | FileCheck %s --check-prefixes=FULL,NOSPLIT,NOUNIFIED
+// RUN: llvm-dis %t.full.nosplit.bc -o - | FileCheck %s --check-prefixes=FULL,NOSPLIT,UNIFIED
 
 // RUN: %clang -cc1 -triple x86_64-unknown-linux-gnu -flto=thin -fsplit-lto-unit -ffat-lto-objects -emit-obj < %s -o %t.thin.split.o
 // RUN: llvm-readelf -S %t.thin.split.o | FileCheck %s --check-prefixes=ELF
 // RUN: llvm-objcopy --dump-section=.llvm.lto=%t.thin.split.bc %t.thin.split.o
-// RUN: llvm-dis %t.thin.split.bc -o - | FileCheck %s --check-prefixes=THIN,SPLIT,NOUNIFIED
+// RUN: llvm-dis %t.thin.split.bc -o - | FileCheck %s --check-prefixes=THIN,SPLIT,UNIFIED
 
 // RUN: %clang -cc1 -triple x86_64-unknown-linux-gnu -flto=thin -ffat-lto-objects -emit-obj < %s -o %t.thin.nosplit.o
 // RUN: llvm-readelf -S %t.thin.nosplit.o | FileCheck %s --check-prefixes=ELF
 // RUN: llvm-objcopy --dump-section=.llvm.lto=%t.thin.nosplit.bc %t.thin.nosplit.o
-// RUN: llvm-dis %t.thin.nosplit.bc -o -  | FileCheck %s --check-prefixes=THIN,NOSPLIT,NOUNIFIED
+// RUN: llvm-dis %t.thin.nosplit.bc -o -  | FileCheck %s --check-prefixes=THIN,NOSPLIT,UNIFIED
 
 // RUN: %clang -cc1 -triple x86_64-unknown-linux-gnu -flto=thin -funified-lto -ffat-lto-objects -emit-obj < %s -o %t.unified.o
 // RUN: llvm-readelf -S %t.unified.o | FileCheck %s --check-prefixes=ELF
@@ -42,8 +42,8 @@
 // SPLIT: ![[#]] = !{i32 1, !"EnableSplitLTOUnit", i32 1}
 // NOSPLIT: ![[#]] = !{i32 1, !"EnableSplitLTOUnit", i32 0}
 
+/// FatLTO always uses UnifiedLTO
 // UNIFIED: ![[#]] = !{i32 1, !"UnifiedLTO", i32 1}
-// NOUNIFIED-NOT: ![[#]] = !{i32 1, !"UnifiedLTO", i32 1}
 
 // ELF: .llvm.lto
 
diff --git a/llvm/docs/FatLTO.rst b/llvm/docs/FatLTO.rst
index b505bb2a96fe160..0e424f694b1bf09 100644
--- a/llvm/docs/FatLTO.rst
+++ b/llvm/docs/FatLTO.rst
@@ -29,30 +29,31 @@ Overview
 Within LLVM, FatLTO is supported by choosing the ``FatLTODefaultPipeline``.
 This pipeline will:
 
-#) Clone the IR module.
-#) Run the pre-link (Thin)LTO pipeline using the cloned module.
+#) Run the pre-link UnifiedLTO pipeline on the current module.
 #) Embed the pre-link bitcode in a special ``.llvm.lto`` section.
-#) Optimize the unmodified copy of the module using the normal compilation pipeline.
+#) Finish optimizing the module using the post-link ThinLTO pipeline.
 #) Emit the object file, including the new ``.llvm.lto`` section.
 
 .. NOTE
 
-   At the time of writing, we conservatively run independent pipelines to
-   generate the bitcode section and the object code, which happen to be
-   identical to those used outside of FatLTO. This results in  compiled
-   artifacts that are identical to those produced by the default and (Thin)LTO
-   pipelines. However, this is not a guarantee, and we reserve the right to
-   change this at any time. Explicitly, users should not rely on the produced
-   bitcode or object code to mach their non-LTO counterparts precisely. They
-   will exhibit similar performance characteristics, but may not be bit-for-bit
-   the same.
+   Previously, we conservatively ran independent pipelines on separate copies
+   of the LLVM module to generate the bitcode section and the object code,
+   which happen to be identical to those used outside of FatLTO. While that
+   resulted in  compiled artifacts that were identical to those produced by the
+   default and (Thin)LTO pipelines, module cloning led to some cases of
+   miscompilation, and we have moved away from trying to keep bitcode
+   generation and optimization completely disjoint.
+
+   Bit-for-bit compatibility is not (and never was) a guarantee, and we reserve
+   the right to change this at any time. Explicitly, users should not rely on
+   the produced bitcode or object code to mach their non-LTO counterparts
+   precisely. They will exhibit similar performance characteristics, but may
+   not be bit-for-bit the same.
 
 Internally, the ``.llvm.lto`` section is created by running the
-``EmbedBitcodePass`` at the start of the ``PerModuleDefaultPipeline``. This
-pass is responsible for cloning and optimizing the module with the appropriate
-LTO pipeline and emitting the ``.llvm.lto`` section. Afterwards, the
-``PerModuleDefaultPipeline`` runs normally and the compiler can emit the fat
-object file.
+``EmbedBitcodePass`` after the ``ThinLTOPreLinkDefaultPipeline``. This pass is
+responsible for emitting the ``.llvm.lto`` section. Afterwards, the
+``ThinLTODefaultPipeline`` runs and the compiler can emit the fat object file.
 
 Limitations
 ===========
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 23bc891a8f1e97c..19ac90842bcb08d 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -246,8 +246,7 @@ class PassBuilder {
   /// separately to avoid any inconsistencies with an ad-hoc pipeline that tries
   /// to approximate the PerModuleDefaultPipeline from the pre-link LTO
   /// pipelines.
-  ModulePassManager buildFatLTODefaultPipeline(OptimizationLevel Level,
-                                               bool ThinLTO, bool EmitSummary);
+  ModulePassManager buildFatLTODefaultPipeline(OptimizationLevel Level);
 
   /// Build a pre-link, ThinLTO-targeting default optimization pipeline to
   /// a pass manager.
diff --git a/llvm/include/llvm/Transforms/IPO/EmbedBitcodePass.h b/llvm/include/llvm/Transforms/IPO/EmbedBitcodePass.h
index f323c61483fd30a..c35048c91aba207 100644
--- a/llvm/include/llvm/Transforms/IPO/EmbedBitcodePass.h
+++ b/llvm/include/llvm/Transforms/IPO/EmbedBitcodePass.h
@@ -25,28 +25,13 @@ class Module;
 class ModulePass;
 class Pass;
 
-struct EmbedBitcodeOptions {
-  EmbedBitcodeOptions() : EmbedBitcodeOptions(false, false) {}
-  EmbedBitcodeOptions(bool IsThinLTO, bool EmitLTOSummary)
-      : IsThinLTO(IsThinLTO), EmitLTOSummary(EmitLTOSummary) {}
-  bool IsThinLTO;
-  bool EmitLTOSummary;
-};
-
 /// Pass embeds a copy of the module optimized with the provided pass pipeline
 /// into a global variable.
 class EmbedBitcodePass : public PassInfoMixin<EmbedBitcodePass> {
-  bool IsThinLTO;
-  bool EmitLTOSummary;
   ModulePassManager MPM;
 
 public:
-  EmbedBitcodePass(EmbedBitcodeOptions Opts)
-      : EmbedBitcodePass(Opts.IsThinLTO, Opts.EmitLTOSummary,
-                         ModulePassManager()) {}
-  EmbedBitcodePass(bool IsThinLTO, bool EmitLTOSummary, ModulePassManager &&MPM)
-      : IsThinLTO(IsThinLTO), EmitLTOSummary(EmitLTOSummary),
-        MPM(std::move(MPM)) {}
+  EmbedBitcodePass() {}
 
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
 
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index dd9d799f9d55dcc..b2e45b50898640f 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -757,26 +757,6 @@ Expected<HWAddressSanitizerOptions> parseHWASanPassOptions(StringRef Params) {
   return Result;
 }
 
-Expected<EmbedBitcodeOptions> parseEmbedBitcodePassOptions(StringRef Params) {
-  EmbedBitcodeOptions Result;
-  while (!Params.empty()) {
-    StringRef ParamName;
-    std::tie(ParamName, Params) = Params.split(';');
-
-    if (ParamName == "thinlto") {
-      Result.IsThinLTO = true;
-    } else if (ParamName == "emit-summary") {
-      Result.EmitLTOSummary = true;
-    } else {
-      return make_error<StringError>(
-          formatv("invalid EmbedBitcode pass parameter '{0}' ", ParamName)
-              .str(),
-          inconvertibleErrorCode());
-    }
-  }
-  return Result;
-}
-
 Expected<MemorySanitizerOptions> parseMSanPassOptions(StringRef Params) {
   MemorySanitizerOptions Result;
   while (!Params.empty()) {
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index f3d280316e04077..80c191f880087c5 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1530,14 +1530,11 @@ PassBuilder::buildPerModuleDefaultPipeline(OptimizationLevel Level,
 }
 
 ModulePassManager
-PassBuilder::buildFatLTODefaultPipeline(OptimizationLevel Level, bool ThinLTO,
-                                        bool EmitSummary) {
+PassBuilder::buildFatLTODefaultPipeline(OptimizationLevel Level) {
   ModulePassManager MPM;
-  MPM.addPass(EmbedBitcodePass(ThinLTO, EmitSummary,
-                               ThinLTO
-                                   ? buildThinLTOPreLinkDefaultPipeline(Level)
-                                   : buildLTOPreLinkDefaultPipeline(Level)));
-  MPM.addPass(buildPerModuleDefaultPipeline(Level));
+  MPM.addPass(buildThinLTOPreLinkDefaultPipeline(Level));
+  MPM.addPass(EmbedBitcodePass());
+  MPM.addPass(buildThinLTODefaultPipeline(Level, /*ImportSummary=*/nullptr));
   return MPM;
 }
 
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 2067fc473b522db..9961312b7a59022 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -59,6 +59,7 @@ MODULE_PASS("debugify", NewPMDebugifyPass())
 MODULE_PASS("dot-callgraph", CallGraphDOTPrinterPass())
 MODULE_PASS("dxil-upgrade", DXILUpgradePass())
 MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
+MODULE_PASS("embed-bitcode", EmbedBitcodePass())
 MODULE_PASS("extract-blocks", BlockExtractorPass({}, false))
 MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
 MODULE_PASS("function-import", FunctionImportPass())
@@ -181,13 +182,6 @@ MODULE_PASS_WITH_PARAMS("ipsccp",
                         },
                         parseIPSCCPOptions,
                         "no-func-spec;func-spec")
-MODULE_PASS_WITH_PARAMS("embed-bitcode",
-                         "EmbedBitcodePass",
-                        [](EmbedBitcodeOptions Opts) {
-                          return EmbedBitcodePass(Opts);
-                        },
-                        parseEmbedBitcodePassOptions,
-                        "thinlto;emit-summary")
 MODULE_PASS_WITH_PARAMS("memprof-use",
                          "MemProfUsePass",
                         [](std::string Opts) {
diff --git a/llvm/lib/Transforms/IPO/EmbedBitcodePass.cpp b/llvm/lib/Transforms/IPO/EmbedBitcodePass.cpp
index fa56a5b564ae668..93a246e3a480320 100644
--- a/llvm/lib/Transforms/IPO/EmbedBitcodePass.cpp
+++ b/llvm/lib/Transforms/IPO/EmbedBitcodePass.cpp
@@ -7,8 +7,6 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/IPO/EmbedBitcodePass.h"
-#include "llvm/Bitcode/BitcodeWriter.h"
-#include "llvm/Bitcode/BitcodeWriterPass.h"
 #include "llvm/IR/PassManager.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -16,10 +14,8 @@
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TargetParser/Triple.h"
 #include "llvm/Transforms/IPO/ThinLTOBitcodeWriter.h"
-#include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/Transforms/Utils/ModuleUtils.h"
 
-#include <memory>
 #include <string>
 
 using namespace llvm;
@@ -34,19 +30,9 @@ PreservedAnalyses EmbedBitcodePass::run(Module &M, ModuleAnalysisManager &AM) {
     report_fatal_error(
         "EmbedBitcode pass currently only supports ELF object format",
         /*gen_crash_diag=*/false);
-
-  std::unique_ptr<Module> NewModule = CloneModule(M);
-  MPM.run(*NewModule, AM);
-
   std::string Data;
   raw_string_ostream OS(Data);
-  if (IsThinLTO)
-    ThinLTOBitcodeWriterPass(OS, /*ThinLinkOS=*/nullptr).run(*NewModule, AM);
-  else
-    BitcodeWriterPass(OS, /*ShouldPreserveUseListOrder=*/false, EmitLTOSummary)
-        .run(*NewModule, AM);
-
+    ThinLTOBitcodeWriterPass(OS, /*ThinLinkOS=*/nullptr).run(M, AM);
   embedBufferInModule(M, MemoryBufferRef(Data, "ModuleData"), ".llvm.lto");
-
   return PreservedAnalyses::all();
 }
diff --git a/llvm/test/CodeGen/X86/fat-lto-section.ll b/llvm/test/CodeGen/X86/fat-lto-section.ll
index 30c56229a0e2a31..9a4359bab6b5ddc 100644
--- a/llvm/test/CodeGen/X86/fat-lto-section.ll
+++ b/llvm/test/CodeGen/X86/fat-lto-section.ll
@@ -1,5 +1,5 @@
 ;; Ensure that the .llvm.lto section has SHT_EXCLUDE set.
-; RUN: opt --mtriple x86_64-unknown-linux-gnu < %s -passes="embed-bitcode<thinlto;emit-summary>" -S \
+; RUN: opt --mtriple x86_64-unknown-linux-gnu < %s -passes="embed-bitcode" -S \
 ; RUN:   | llc --mtriple x86_64-unknown-linux-gnu -filetype=obj \
 ; RUN:   | llvm-readelf - --sections \
 ; RUN:   | FileCheck %s --check-prefix=EXCLUDE
diff --git a/llvm/test/Transforms/EmbedBitcode/embed.ll b/llvm/test/Transforms/EmbedBitcode/embed.ll
index dffb5cf7554772a..734bf5274a5f2e5 100644
--- a/llvm/test/Transforms/EmbedBitcode/embed.ll
+++ b/llvm/test/Transforms/EmbedBitcode/embed.ll
@@ -1,7 +1,4 @@
 ; RUN: opt --mtriple x86_64-unknown-linux-gnu < %s -passes="embed-bitcode" -S | FileCheck %s
-; RUN: opt --mtriple x86_64-unknown-linux-gnu < %s -passes="embed-bitcode<thinlto>" -S | FileCheck %s
-; RUN: opt --mtriple x86_64-unknown-linux-gnu < %s -passes="embed-bitcode<emit-summary>" -S | FileCheck %s
-; RUN: opt --mtriple x86_64-unknown-linux-gnu < %s -passes="embed-bitcode<thinlto;emit-summary>" -S | FileCheck %s
 
 @a = global i32 1

github-actions · 2023-11-14T00:50:26Z

✅ With the latest revision this PR passed the C/C++ code formatter.

teresajohnson · 2023-11-27T19:46:15Z

clang/lib/CodeGen/BackendUtil.cpp

@@ -810,7 +810,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
  // Only enable CGProfilePass when using integrated assembler, since
  // non-integrated assemblers don't recognize .cgprofile section.
  PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
-  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+  PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO || CodeGenOpts.FatLTO;


Won't your change to lib/Driver/ToolChains/Clang.cpp below mean that CodeGenOpts.UnifiedLTO will always be set with FatLTO? Do we thus need this change or the one further below for the module flag (and maybe add an assert that UnifiedLTO is set of FatLTO is set)?

Yeah, this is probably redundant since I made that change earlier in the driver. The assert is also a good suggestion, so I'll update this patch to do that too.

For the module flag, I'll need to check, since I think that it won't be set otherwise, but maybe I can factor it out into a common block.

Ah, I remember why I did this now. The code in Driver/Toolchains/Clang.cpp set things nicely from clang, but didn't handle cc1. I believe that is because these are both CodeGenOpts defined in the TableGen files.

This feels like it ought to be simple, but the driver bits have never been that intuitive to me. I think if I wanted to handle this directly, then I'd need to supply the implementation for both FatLTO and UnifiedLTO instead of relying on the TableGened impl, which seems worse than the current version.

Do you have any thoughts on a nicer way to avoid the redundant code here? Maybe we don't need the changes to Driver/Toolchains/Clang.cpp?

Not sure I follow. Isn't the change in Driver/Toolchains/Clang.cpp going to ensure that -funified-lto is also passed to the cc1 invocation, and thus both options should be set in CodeGenOpts during the cc1 invocation?

Exactly, but that doesn't change the behavior for cc1 invocations themselves ... or is that the expected behavior? e.g. it was surprising to me that UnifiedLTO needed to be set on the cc1 invocation in the tests when passing -fat-lto-objects -flto.

Ah ok. I guess that would be expected. Whether it is test friendly is another question, but in general the driver does a lot of option set up for the cc1 invocation.

Thanks for the clarification. In that case I guess I can go back to my original plan and just update the test invocations accordingly.

Actually, I believe the cc1 parsing is handled by ParseCodeGenArgs in clang/lib/Frontend/CompilerInvocation.cpp. You can potentially add something in there to ensure UnifiedLTO is set with FatLTO? Or give your error there that UnifiedLTO must be specified with Fat LTO.

Thank you! that was exactly the kind of place I was hoping to find.

teresajohnson · 2023-11-28T00:55:20Z

clang/lib/Frontend/CompilerInvocation.cpp

+  if (Arg *A = Args.getLastArg(options::OPT_ffat_lto_objects,
+                               options::OPT_fno_fat_lto_objects)) {
+    if (!Args.hasArg(OPT_funified_lto))
+      Diags.Report(diag::err_drv_incompatible_options)


It might be less confusing to users if this error message is only given upon an explicit -fno-unified-lto, and diag::err_drv_argument_only_allowed_with is used for the lack of -funified-lto.

Also can you add driver tests to check that we get the expected error(s) in the expected option combinations?

Thanks for the suggestion. I think the new version handles that now.

teresajohnson

lgtm

nikic · 2023-11-28T17:53:11Z

llvm/docs/FatLTO.rst

+
+   Bit-for-bit compatibility is not (and never was) a guarantee, and we reserve
+   the right to change this at any time. Explicitly, users should not rely on
+   the produced bitcode or object code to mach their non-LTO counterparts


Suggested change

the produced bitcode or object code to mach their non-LTO counterparts

the produced bitcode or object code to match their non-LTO counterparts

Ah, good catch. Thank you!

nikic · 2023-11-28T17:59:27Z

I like the approach of saying that FatLTO implies UnifiedLTO. This nicely justifies why we are always using the ThinLTO pre-link pipeline.

nikic · 2023-11-28T18:30:37Z

llvm/lib/Passes/PassBuilderPipelines.cpp

-  MPM.addPass(buildPerModuleDefaultPipeline(Level));
+  MPM.addPass(buildThinLTOPreLinkDefaultPipeline(Level));
+  MPM.addPass(EmbedBitcodePass());
+  MPM.addPass(buildThinLTODefaultPipeline(Level, /*ImportSummary=*/nullptr));


I still don't agree with the use of the ThinLTO post-link pipeline for the non-LTO result, but I can submit a followup to change that.

Yeah, I'm not super happy with it, but based on the discussion from the initial patches, we'd need something that is close to make sure that all the various issues from optional passes like profiling, etc. were handled, correct?

Otherwise, I guess we can sub in ModuleOptimization here? @teresajohnson do you think that would run into any trouble based on your concerns in https://reviews.llvm.org/D146776#4302238 and the subsequent discussion.

Yes, ModuleOptimization (plus I guess a call to addAnnotationRemarksPass) is what I would substitute here, and then add a note to the FatLTO docs that says something alone the lines of:

If FatLTO is used together with SamplePGO (as opposed to normal PGO), some profile-based optimizations will only be applied when linking with LTO.

I believe the other discrepancies that we discussed at the time have already been addressed in the meantime and SamplePGO is the only remaining issue.

Another option would be to use ThinLTODefaultPipeline if SampleUse is set and ModuleOptimization otherwise.

oh, that's fantastic. Let me give that a try , since it seems better to just handle this right now instead of patching it after the fact.

Running some quick checks, that seems to work well, and non-LTO codegen looks more or less as expected (at least on a few test programs). I probably won't be able to benchmark this on compile times until later today or more likely tomorrow, though.

What was the downside of using ThinLTODefaultPipeline always? I guess it was essentially over-optimizing in the non-LTO case? I guess using the ThinLTODefaultPipeline only under SamplePGO is ok with me, although it seems like over time as the pipelines get modified it will be difficult to ensure that is the only case where the pipelines get out of sync. I think in either case we are essentially saying: if you use Fat LTO then don't expect the resulting non-LTO native code to be the same as that of a fully non-LTO compile. In one case there is more optimization, in the other there is the risk of future deoptimization if things don't stay in sync.

I think compile time was the big concern, otherwise, I think using the ThinLTODefaultPipeline would be fine.

ilovepi · 2023-11-30T18:42:19Z

@nikic are there any other changes you'd like to see? otherwise, I plan to rebase and land this later today.

nikic · 2023-11-30T19:19:21Z

llvm/docs/FatLTO.rst

+
+If FatLTO is used together with SamplePGO (as opposed to normal
+instrumentation-based PGO), some profile-based optimizations will only be
+applied when linking with LTO.


This note isn't necessary, as you're running the ThinLTO pipeline for SampleUse, so there should be no issues there.

Thanks for pointing that out. I forgot that I didn't need this comment after updating the SampleProfile case.

llvm#70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see llvm#55991 and llvm#47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.

ilovepi added clang Clang issues not falling into any other category miscompilation LTO Link time optimization (regular/full LTO or ThinLTO) labels Nov 14, 2023

ilovepi requested review from nikic, teresajohnson and mandlebug November 14, 2023 00:48

llvmbot added backend:X86 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:transforms labels Nov 14, 2023

ilovepi mentioned this pull request Nov 14, 2023

[LTO] Fat LTO pipeline miss-optimizes indirect goto. #70703

Closed

ilovepi force-pushed the fix_fatlto_pipeline branch from 0616509 to 6f07f56 Compare November 14, 2023 00:53

teresajohnson reviewed Nov 27, 2023

View reviewed changes

teresajohnson reviewed Nov 28, 2023

View reviewed changes

teresajohnson approved these changes Nov 28, 2023

View reviewed changes

nikic reviewed Nov 28, 2023

View reviewed changes

nikic reviewed Nov 30, 2023

View reviewed changes

ilovepi force-pushed the fix_fatlto_pipeline branch from cedc4ad to e8e0542 Compare November 30, 2023 21:47

ilovepi added 7 commits November 30, 2023 22:24

fixup! Properly handle CC1 options, and update tests

61a55fa

fixup! Make error reporting more useful

a8b2fe6

fixup! fix typo in docs

87aec2c

fixup! Use ModuleOptimization over the ThinLTOPostLink pipeline

dc36d6d

fixup! Handle sample profiling with the ThinLTO pipeline

6946e24

fixup! Document limitations with SamplePGO

310b46c

fixup! Correct documentation change.

93ce51e

ilovepi force-pushed the fix_fatlto_pipeline branch from e8e0542 to 93ce51e Compare November 30, 2023 22:24

ilovepi merged commit cfe1ece into llvm:main Dec 1, 2023

ilovepi deleted the fix_fatlto_pipeline branch December 4, 2023 17:42

ilovepi mentioned this pull request May 15, 2025

[llvm][EmbedBitcodePass] Prevent modifying the module with ThinLTO #139999

Merged

	the produced bitcode or object code to mach their non-LTO counterparts
	the produced bitcode or object code to match their non-LTO counterparts

[clang][llvm][fatlto] Avoid cloning modules in FatLTO #72180

[clang][llvm][fatlto] Avoid cloning modules in FatLTO #72180

Uh oh!

Conversation

ilovepi commented Nov 14, 2023

Uh oh!

llvmbot commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilovepi Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic commented Nov 28, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilovepi commented Nov 30, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Nov 14, 2023 •

edited

Loading

github-actions bot commented Nov 14, 2023 •

edited

Loading

ilovepi Nov 27, 2023 •

edited

Loading