Skip to content

[CodeGen] Implement post-opt linking option for builtin bitocdes #69371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 8, 2023

Conversation

lamb-j
Copy link
Contributor

@lamb-j lamb-j commented Oct 17, 2023

In this patch, we create a new ModulePass that mimics the LinkInModules API from CodeGenAction.cpp, and a new command line option to enable the pass. As part of the implementation, we needed to refactor the BackendConsumer class definition into a new separate header (instead of embedded in CodeGenAction.cpp). With this new pass, we can now re-link bitcodes supplied via the -mlink-built-in bitcodes as part of the RunOptimizationPipeline.

With the re-linking pass, we now handle cases where new device library functions are introduced as part of the optimization pipeline. Previously, these newly introduced functions (for example a fused sincos call) would result in a linking error due to a missing function definition. This new pass can be initiated via:

  -mllvm -relink-builtin-bitcode-postop

Also note we intentionally exclude bitcodes supplied via the -mlink-bitcode-file option from the second linking step

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 17, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2023

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Jacob Lambert (lamb-j)

Changes

This set of patches updates device library linking and optimization to do the following:

Link
Optimize
Link (New)

This handles the edge case where optimization introduces new device library functions, such as a fused sincos() from separate sin() and cos() calls. The second link step ensures the sincos() definition is also linked in from the device libraries.


Patch is 51.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69371.diff

6 Files Affected:

  • (added) clang/lib/CodeGen/BackendConsumer.h (+166)
  • (modified) clang/lib/CodeGen/BackendUtil.cpp (+42-30)
  • (modified) clang/lib/CodeGen/CMakeLists.txt (+1)
  • (modified) clang/lib/CodeGen/CodeGenAction.cpp (+303-361)
  • (added) clang/lib/CodeGen/LinkInModulesPass.cpp (+29)
  • (added) clang/lib/CodeGen/LinkInModulesPass.h (+43)
diff --git a/clang/lib/CodeGen/BackendConsumer.h b/clang/lib/CodeGen/BackendConsumer.h
new file mode 100644
index 000000000000000..72a814cd43d738d
--- /dev/null
+++ b/clang/lib/CodeGen/BackendConsumer.h
@@ -0,0 +1,166 @@
+//===--- BackendConsumer.h - LLVM BackendConsumer Header File -------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_LIB_CODEGEN_BACKENDCONSUMER_H
+#define LLVM_CLANG_LIB_CODEGEN_BACKENDCONSUMER_H
+
+#include "clang/CodeGen/BackendUtil.h"
+#include "clang/CodeGen/CodeGenAction.h"
+
+#include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/Support/Timer.h"
+
+namespace llvm {
+  class DiagnosticInfoDontCall;
+}
+
+namespace clang {
+class ASTContext;
+class CodeGenAction;
+class CoverageSourceInfo;
+
+class BackendConsumer : public ASTConsumer {
+  using LinkModule = CodeGenAction::LinkModule;
+
+  virtual void anchor();
+  DiagnosticsEngine &Diags;
+  BackendAction Action;
+  const HeaderSearchOptions &HeaderSearchOpts;
+  const CodeGenOptions &CodeGenOpts;
+  const TargetOptions &TargetOpts;
+  const LangOptions &LangOpts;
+  std::unique_ptr<raw_pwrite_stream> AsmOutStream;
+  ASTContext *Context;
+  IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS;
+
+  llvm::Timer LLVMIRGeneration;
+  unsigned LLVMIRGenerationRefCount;
+
+  /// True if we've finished generating IR. This prevents us from generating
+  /// additional LLVM IR after emitting output in HandleTranslationUnit. This
+  /// can happen when Clang plugins trigger additional AST deserialization.
+  bool IRGenFinished = false;
+
+  bool TimerIsEnabled = false;
+
+  std::unique_ptr<CodeGenerator> Gen;
+
+  SmallVector<LinkModule, 4> LinkModules;
+
+  // A map from mangled names to their function's source location, used for
+  // backend diagnostics as the Clang AST may be unavailable. We actually use
+  // the mangled name's hash as the key because mangled names can be very
+  // long and take up lots of space. Using a hash can cause name collision,
+  // but that is rare and the consequences are pointing to a wrong source
+  // location which is not severe. This is a vector instead of an actual map
+  // because we optimize for time building this map rather than time
+  // retrieving an entry, as backend diagnostics are uncommon.
+  std::vector<std::pair<llvm::hash_code, FullSourceLoc>>
+    ManglingFullSourceLocs;
+
+
+  // This is here so that the diagnostic printer knows the module a diagnostic
+  // refers to.
+  llvm::Module *CurLinkModule = nullptr;
+
+public:
+  BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
+                  IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+                  const HeaderSearchOptions &HeaderSearchOpts,
+                  const PreprocessorOptions &PPOpts,
+                  const CodeGenOptions &CodeGenOpts,
+                  const TargetOptions &TargetOpts,
+                  const LangOptions &LangOpts, const std::string &InFile,
+                  SmallVector<LinkModule, 4> LinkModules,
+                  std::unique_ptr<raw_pwrite_stream> OS, llvm::LLVMContext &C,
+                  CoverageSourceInfo *CoverageInfo = nullptr);
+
+  // This constructor is used in installing an empty BackendConsumer
+  // to use the clang diagnostic handler for IR input files. It avoids
+  // initializing the OS field.
+  BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
+                  IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+                  const HeaderSearchOptions &HeaderSearchOpts,
+                  const PreprocessorOptions &PPOpts,
+                  const CodeGenOptions &CodeGenOpts,
+                  const TargetOptions &TargetOpts,
+                  const LangOptions &LangOpts, llvm::Module *Module,
+                  SmallVector<LinkModule, 4> LinkModules, llvm::LLVMContext &C,
+                  CoverageSourceInfo *CoverageInfo = nullptr);
+
+  llvm::Module *getModule() const;
+  std::unique_ptr<llvm::Module> takeModule();
+
+  CodeGenerator *getCodeGenerator();
+
+  void HandleCXXStaticMemberVarInstantiation(VarDecl *VD) override;
+  void Initialize(ASTContext &Ctx) override;
+  bool HandleTopLevelDecl(DeclGroupRef D) override;
+  void HandleInlineFunctionDefinition(FunctionDecl *D) override;
+  void HandleInterestingDecl(DeclGroupRef D) override;
+  void HandleTranslationUnit(ASTContext &C) override;
+  void HandleTagDeclDefinition(TagDecl *D) override;
+  void HandleTagDeclRequiredDefinition(const TagDecl *D) override;
+  void CompleteTentativeDefinition(VarDecl *D) override;
+  void CompleteExternalDeclaration(VarDecl *D) override;
+  void AssignInheritanceModel(CXXRecordDecl *RD) override;
+  void HandleVTable(CXXRecordDecl *RD) override;
+
+
+ // Links each entry in LinkModules into our module.  Returns true on error.
+  bool LinkInModules(llvm::Module *M, bool ShouldLinkFiles = true);
+
+  /// Get the best possible source location to represent a diagnostic that
+  /// may have associated debug info.
+  const FullSourceLoc getBestLocationFromDebugLoc(
+    const llvm::DiagnosticInfoWithLocationBase &D,
+    bool &BadDebugInfo, StringRef &Filename,
+    unsigned &Line, unsigned &Column) const;
+
+  std::optional<FullSourceLoc> getFunctionSourceLocation(
+    const llvm::Function &F) const;
+
+  void DiagnosticHandlerImpl(const llvm::DiagnosticInfo &DI);
+  /// Specialized handler for InlineAsm diagnostic.
+  /// \return True if the diagnostic has been successfully reported, false
+  /// otherwise.
+  bool InlineAsmDiagHandler(const llvm::DiagnosticInfoInlineAsm &D);
+  /// Specialized handler for diagnostics reported using SMDiagnostic.
+  void SrcMgrDiagHandler(const llvm::DiagnosticInfoSrcMgr &D);
+  /// Specialized handler for StackSize diagnostic.
+  /// \return True if the diagnostic has been successfully reported, false
+  /// otherwise.
+  bool StackSizeDiagHandler(const llvm::DiagnosticInfoStackSize &D);
+  /// Specialized handler for ResourceLimit diagnostic.
+  /// \return True if the diagnostic has been successfully reported, false
+  /// otherwise.
+  bool ResourceLimitDiagHandler(const llvm::DiagnosticInfoResourceLimit &D);
+
+  /// Specialized handler for unsupported backend feature diagnostic.
+  void UnsupportedDiagHandler(const llvm::DiagnosticInfoUnsupported &D);
+  /// Specialized handlers for optimization remarks.
+  /// Note that these handlers only accept remarks and they always handle
+  /// them.
+  void EmitOptimizationMessage(const llvm::DiagnosticInfoOptimizationBase &D,
+                               unsigned DiagID);
+  void
+    OptimizationRemarkHandler(const llvm::DiagnosticInfoOptimizationBase &D);
+  void OptimizationRemarkHandler(
+    const llvm::OptimizationRemarkAnalysisFPCommute &D);
+  void OptimizationRemarkHandler(
+    const llvm::OptimizationRemarkAnalysisAliasing &D);
+  void OptimizationFailureHandler(
+    const llvm::DiagnosticInfoOptimizationFailure &D);
+  void DontCallDiagHandler(const llvm::DiagnosticInfoDontCall &D);
+  /// Specialized handler for misexpect warnings.
+  /// Note that misexpect remarks are emitted through ORE
+  void MisExpectDiagHandler(const llvm::DiagnosticInfoMisExpect &D);
+};
+
+} // namespace clang
+#endif
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index d066819871dfde3..894daced377fb55 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -6,6 +6,8 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "BackendConsumer.h"
+#include "LinkInModulesPass.h"
 #include "clang/CodeGen/BackendUtil.h"
 #include "clang/Basic/CodeGenOptions.h"
 #include "clang/Basic/Diagnostic.h"
@@ -112,7 +114,7 @@ class EmitAssemblyHelper {
   const CodeGenOptions &CodeGenOpts;
   const clang::TargetOptions &TargetOpts;
   const LangOptions &LangOpts;
-  Module *TheModule;
+  llvm::Module *TheModule;
   IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS;
 
   Timer CodeGenerationTime;
@@ -155,10 +157,10 @@ class EmitAssemblyHelper {
     return F;
   }
 
-  void
-  RunOptimizationPipeline(BackendAction Action,
+  void RunOptimizationPipeline(BackendAction Action,
                           std::unique_ptr<raw_pwrite_stream> &OS,
-                          std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS);
+                          std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS,
+                          BackendConsumer *BC);
   void RunCodegenPipeline(BackendAction Action,
                           std::unique_ptr<raw_pwrite_stream> &OS,
                           std::unique_ptr<llvm::ToolOutputFile> &DwoOS);
@@ -178,7 +180,7 @@ class EmitAssemblyHelper {
                      const HeaderSearchOptions &HeaderSearchOpts,
                      const CodeGenOptions &CGOpts,
                      const clang::TargetOptions &TOpts,
-                     const LangOptions &LOpts, Module *M,
+                     const LangOptions &LOpts, llvm::Module *M,
                      IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS)
       : Diags(_Diags), HSOpts(HeaderSearchOpts), CodeGenOpts(CGOpts),
         TargetOpts(TOpts), LangOpts(LOpts), TheModule(M), VFS(std::move(VFS)),
@@ -194,7 +196,8 @@ class EmitAssemblyHelper {
 
   // Emit output using the new pass manager for the optimization pipeline.
   void EmitAssembly(BackendAction Action,
-                    std::unique_ptr<raw_pwrite_stream> OS);
+                    std::unique_ptr<raw_pwrite_stream> OS,
+                    BackendConsumer *BC);
 };
 }
 
@@ -690,7 +693,7 @@ static void addSanitizers(const Triple &TargetTriple,
           // the logic of the original code, but operates on "shadow" values. It
           // can benefit from re-running some general purpose optimization
           // passes.
-          MPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());
+          MPM.addPass(RequireAnalysisPass<GlobalsAA, llvm::Module>());
           FunctionPassManager FPM;
           FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */));
           FPM.addPass(InstCombinePass());
@@ -749,7 +752,7 @@ static void addSanitizers(const Triple &TargetTriple,
           SanitizersCallback(NewMPM, Level);
           if (!NewMPM.isEmpty()) {
             // Sanitizers can abandon<GlobalsAA>.
-            NewMPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());
+            NewMPM.addPass(RequireAnalysisPass<GlobalsAA, llvm::Module>());
             MPM.addPass(std::move(NewMPM));
           }
         });
@@ -761,7 +764,7 @@ static void addSanitizers(const Triple &TargetTriple,
 
 void EmitAssemblyHelper::RunOptimizationPipeline(
     BackendAction Action, std::unique_ptr<raw_pwrite_stream> &OS,
-    std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS) {
+    std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS, BackendConsumer *BC) {
   std::optional<PGOOptions> PGOOpt;
 
   if (CodeGenOpts.hasProfileIRInstr())
@@ -1035,6 +1038,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
     }
   }
 
+  // Re-link against any bitcodes supplied via the -mlink-builtin-bitcode option
+  // Some optimizations may generate new function calls that would not have
+  // been linked pre-optimization (i.e. fused sincos calls generated by
+  // AMDGPULibCalls::fold_sincos.)
+  MPM.addPass(LinkInModulesPass(BC, false));
+
   // Add a verifier pass if requested. We don't have to do this if the action
   // requires code generation because there will already be a verifier pass in
   // the code-generation pipeline.
@@ -1046,7 +1055,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   if (Action == Backend_EmitBC || Action == Backend_EmitLL) {
     if (CodeGenOpts.PrepareForThinLTO && !CodeGenOpts.DisableLLVMPasses) {
       if (!TheModule->getModuleFlag("EnableSplitLTOUnit"))
-        TheModule->addModuleFlag(Module::Error, "EnableSplitLTOUnit",
+        TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
                                  CodeGenOpts.EnableSplitLTOUnit);
       if (Action == Backend_EmitBC) {
         if (!CodeGenOpts.ThinLinkBitcodeFile.empty()) {
@@ -1055,7 +1064,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
             return;
         }
         if (CodeGenOpts.UnifiedLTO)
-          TheModule->addModuleFlag(Module::Error, "UnifiedLTO", uint32_t(1));
+          TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
         MPM.addPass(ThinLTOBitcodeWriterPass(
             *OS, ThinLinkOS ? &ThinLinkOS->os() : nullptr));
       } else {
@@ -1069,12 +1078,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
       bool EmitLTOSummary = shouldEmitRegularLTOSummary();
       if (EmitLTOSummary) {
         if (!TheModule->getModuleFlag("ThinLTO") && !CodeGenOpts.UnifiedLTO)
-          TheModule->addModuleFlag(Module::Error, "ThinLTO", uint32_t(0));
+          TheModule->addModuleFlag(llvm::Module::Error, "ThinLTO", uint32_t(0));
         if (!TheModule->getModuleFlag("EnableSplitLTOUnit"))
-          TheModule->addModuleFlag(Module::Error, "EnableSplitLTOUnit",
+          TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
                                    uint32_t(1));
         if (CodeGenOpts.UnifiedLTO)
-          TheModule->addModuleFlag(Module::Error, "UnifiedLTO", uint32_t(1));
+          TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
       }
       if (Action == Backend_EmitBC)
         MPM.addPass(BitcodeWriterPass(*OS, CodeGenOpts.EmitLLVMUseLists,
@@ -1088,13 +1097,13 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
     // Set module flags, like EnableSplitLTOUnit and UnifiedLTO, since FatLTO
     // uses a different action than Backend_EmitBC or Backend_EmitLL.
     if (!TheModule->getModuleFlag("ThinLTO"))
-      TheModule->addModuleFlag(Module::Error, "ThinLTO",
+      TheModule->addModuleFlag(llvm::Module::Error, "ThinLTO",
                                uint32_t(CodeGenOpts.PrepareForThinLTO));
     if (!TheModule->getModuleFlag("EnableSplitLTOUnit"))
-      TheModule->addModuleFlag(Module::Error, "EnableSplitLTOUnit",
+      TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
                                uint32_t(CodeGenOpts.EnableSplitLTOUnit));
     if (CodeGenOpts.UnifiedLTO && !TheModule->getModuleFlag("UnifiedLTO"))
-      TheModule->addModuleFlag(Module::Error, "UnifiedLTO", uint32_t(1));
+      TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
   }
 
   // Print a textual, '-passes=' compatible, representation of pipeline if
@@ -1160,7 +1169,8 @@ void EmitAssemblyHelper::RunCodegenPipeline(
 }
 
 void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
-                                      std::unique_ptr<raw_pwrite_stream> OS) {
+                                      std::unique_ptr<raw_pwrite_stream> OS,
+                                      BackendConsumer *BC) {
   TimeRegion Region(CodeGenOpts.TimePasses ? &CodeGenerationTime : nullptr);
   setCommandLineOpts(CodeGenOpts);
 
@@ -1176,7 +1186,7 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
   cl::PrintOptionValues();
 
   std::unique_ptr<llvm::ToolOutputFile> ThinLinkOS, DwoOS;
-  RunOptimizationPipeline(Action, OS, ThinLinkOS);
+  RunOptimizationPipeline(Action, OS, ThinLinkOS, BC);
   RunCodegenPipeline(Action, OS, DwoOS);
 
   if (ThinLinkOS)
@@ -1186,11 +1196,12 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
 }
 
 static void runThinLTOBackend(
-    DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex, Module *M,
-    const HeaderSearchOptions &HeaderOpts, const CodeGenOptions &CGOpts,
-    const clang::TargetOptions &TOpts, const LangOptions &LOpts,
-    std::unique_ptr<raw_pwrite_stream> OS, std::string SampleProfile,
-    std::string ProfileRemapping, BackendAction Action) {
+    DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex,
+    llvm::Module *M, const HeaderSearchOptions &HeaderOpts,
+    const CodeGenOptions &CGOpts, const clang::TargetOptions &TOpts,
+    const LangOptions &LOpts, std::unique_ptr<raw_pwrite_stream> OS,
+    std::string SampleProfile, std::string ProfileRemapping,
+    BackendAction Action) {
   DenseMap<StringRef, DenseMap<GlobalValue::GUID, GlobalValueSummary *>>
       ModuleToDefinedGVSummaries;
   CombinedIndex->collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
@@ -1259,18 +1270,18 @@ static void runThinLTOBackend(
   Conf.SplitDwarfOutput = CGOpts.SplitDwarfOutput;
   switch (Action) {
   case Backend_EmitNothing:
-    Conf.PreCodeGenModuleHook = [](size_t Task, const Module &Mod) {
+    Conf.PreCodeGenModuleHook = [](size_t Task, const llvm::Module &Mod) {
       return false;
     };
     break;
   case Backend_EmitLL:
-    Conf.PreCodeGenModuleHook = [&](size_t Task, const Module &Mod) {
+    Conf.PreCodeGenModuleHook = [&](size_t Task, const llvm::Module &Mod) {
       M->print(*OS, nullptr, CGOpts.EmitLLVMUseLists);
       return false;
     };
     break;
   case Backend_EmitBC:
-    Conf.PreCodeGenModuleHook = [&](size_t Task, const Module &Mod) {
+    Conf.PreCodeGenModuleHook = [&](size_t Task, const llvm::Module &Mod) {
       WriteBitcodeToFile(*M, *OS, CGOpts.EmitLLVMUseLists);
       return false;
     };
@@ -1294,9 +1305,10 @@ void clang::EmitBackendOutput(DiagnosticsEngine &Diags,
                               const CodeGenOptions &CGOpts,
                               const clang::TargetOptions &TOpts,
                               const LangOptions &LOpts, StringRef TDesc,
-                              Module *M, BackendAction Action,
+                              llvm::Module *M, BackendAction Action,
                               IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
-                              std::unique_ptr<raw_pwrite_stream> OS) {
+                              std::unique_ptr<raw_pwrite_stream> OS,
+                              BackendConsumer *BC) {
 
   llvm::TimeTraceScope TimeScope("Backend");
 
@@ -1339,7 +1351,7 @@ void clang::EmitBackendOutput(DiagnosticsEngine &Diags,
   }
 
   EmitAssemblyHelper AsmHelper(Diags, HeaderOpts, CGOpts, TOpts, LOpts, M, VFS);
-  AsmHelper.EmitAssembly(Action, std::move(OS));
+  AsmHelper.EmitAssembly(Action, std::move(OS), BC);
 
   // Verify clang's TargetInfo DataLayout against the LLVM TargetMachine's
   // DataLayout.
diff --git a/clang/lib/CodeGen/CMakeLists.txt b/clang/lib/CodeGen/CMakeLists.txt
index 1debeb6d9cce9e0..41e31c12a16e641 100644
--- a/clang/lib/CodeGen/CMakeLists.txt
+++ b/clang/lib/CodeGen/CMakeLists.txt
@@ -81,6 +81,7 @@ add_clang_library(clangCodeGen
   ConstantInitBuilder.cpp
   CoverageMappingGen.cpp
   ItaniumCXXABI.cpp
+  LinkInModulesPass.cpp
   MacroPPCallbacks.cpp
   MicrosoftCXXABI.cpp
   ModuleBuilder.cpp
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index a3b72381d73fc54..f4586a776c0fefd 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -7,6 +7,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "clang/CodeGen/CodeGenAction.h"
+#include "BackendConsumer.h"
 #include "CGCall.h"
 #include "CodeGenModule.h"
 #include "CoverageMappingGen.h"
@@ -48,8 +49,8 @@
 #include "llvm/Support/ToolOutputFile.h"
 #include "llvm/Support/YAMLTraits.h"
 #include "llvm/Transforms/IPO/Internalize.h"
+#include "llvm/Transforms/Utils/Cloning.h"
 
-#include <memory>
 #include <optional>
 using namespace clang;
 using namespace llvm;
@@ -57,419 +58,360 @@ using namespace llvm;
 #define DEBUG_TYPE "codegenaction"
 
 namespace clang {
-  class BackendConsumer;
-  class ClangDiagnosticHandler final : public DiagnosticHandler {
-  public:
-    ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon)
-        : CodeGenOpts(CGOpts), BackendCon(BCon) {}
+class BackendConsumer;
+class ClangDiagnosticHandler final : public DiagnosticHandler {
+public:
+  ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon)
+      : CodeGenOpts(CGOpts), BackendCon(BCon) {}
 
-    bool handleDiagnostics(const DiagnosticInfo &DI) override;
+  bool handleDiagnostics(const DiagnosticInfo &DI) over...
[truncated]

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 17, 2023

This handles the edge case where optimization introduces new device library functions, such as a fused sincos() from separate sin() and cos() calls.

This seems like a pretty big change to solve just this problem. Could we not just put nobuiltin on said function to stop it from doing that? Is this optimization explicitly valuable in this case?

@lamb-j
Copy link
Contributor Author

lamb-j commented Oct 17, 2023

sincos() is just one example. There are several other cases that can trigger this issue. fold_rootn() generates new function calls for square and cubic roots, fold_pow() does a similar thing for specific powers (ex 2), etc.

We did try disabling -amdgpu-prelink, and it did lead to a significant performance difference for a couple of key applications

With this approach we're also hoping to cover future optimizations added that may fall under this category

@github-actions
Copy link

github-actions bot commented Oct 17, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 5aa934e2af8727852dec0ec1cfa0cba05d858f70 cf294be05ebdd10df35e7e857b118c8102b68362 -- clang/lib/CodeGen/BackendConsumer.h clang/lib/CodeGen/LinkInModulesPass.cpp clang/lib/CodeGen/LinkInModulesPass.h clang/include/clang/CodeGen/BackendUtil.h clang/lib/CodeGen/BackendUtil.cpp clang/lib/CodeGen/CodeGenAction.cpp
View the diff from clang-format here.
diff --git a/clang/lib/CodeGen/BackendConsumer.h b/clang/lib/CodeGen/BackendConsumer.h
index 72a814cd4..17d5ca3d1 100644
--- a/clang/lib/CodeGen/BackendConsumer.h
+++ b/clang/lib/CodeGen/BackendConsumer.h
@@ -16,7 +16,7 @@
 #include "llvm/Support/Timer.h"
 
 namespace llvm {
-  class DiagnosticInfoDontCall;
+class DiagnosticInfoDontCall;
 }
 
 namespace clang {
@@ -60,9 +60,7 @@ class BackendConsumer : public ASTConsumer {
   // location which is not severe. This is a vector instead of an actual map
   // because we optimize for time building this map rather than time
   // retrieving an entry, as backend diagnostics are uncommon.
-  std::vector<std::pair<llvm::hash_code, FullSourceLoc>>
-    ManglingFullSourceLocs;
-
+  std::vector<std::pair<llvm::hash_code, FullSourceLoc>> ManglingFullSourceLocs;
 
   // This is here so that the diagnostic printer knows the module a diagnostic
   // refers to.
@@ -74,8 +72,8 @@ public:
                   const HeaderSearchOptions &HeaderSearchOpts,
                   const PreprocessorOptions &PPOpts,
                   const CodeGenOptions &CodeGenOpts,
-                  const TargetOptions &TargetOpts,
-                  const LangOptions &LangOpts, const std::string &InFile,
+                  const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+                  const std::string &InFile,
                   SmallVector<LinkModule, 4> LinkModules,
                   std::unique_ptr<raw_pwrite_stream> OS, llvm::LLVMContext &C,
                   CoverageSourceInfo *CoverageInfo = nullptr);
@@ -88,9 +86,9 @@ public:
                   const HeaderSearchOptions &HeaderSearchOpts,
                   const PreprocessorOptions &PPOpts,
                   const CodeGenOptions &CodeGenOpts,
-                  const TargetOptions &TargetOpts,
-                  const LangOptions &LangOpts, llvm::Module *Module,
-                  SmallVector<LinkModule, 4> LinkModules, llvm::LLVMContext &C,
+                  const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+                  llvm::Module *Module, SmallVector<LinkModule, 4> LinkModules,
+                  llvm::LLVMContext &C,
                   CoverageSourceInfo *CoverageInfo = nullptr);
 
   llvm::Module *getModule() const;
@@ -111,19 +109,18 @@ public:
   void AssignInheritanceModel(CXXRecordDecl *RD) override;
   void HandleVTable(CXXRecordDecl *RD) override;
 
-
- // Links each entry in LinkModules into our module.  Returns true on error.
+  // Links each entry in LinkModules into our module.  Returns true on error.
   bool LinkInModules(llvm::Module *M, bool ShouldLinkFiles = true);
 
   /// Get the best possible source location to represent a diagnostic that
   /// may have associated debug info.
-  const FullSourceLoc getBestLocationFromDebugLoc(
-    const llvm::DiagnosticInfoWithLocationBase &D,
-    bool &BadDebugInfo, StringRef &Filename,
-    unsigned &Line, unsigned &Column) const;
+  const FullSourceLoc
+  getBestLocationFromDebugLoc(const llvm::DiagnosticInfoWithLocationBase &D,
+                              bool &BadDebugInfo, StringRef &Filename,
+                              unsigned &Line, unsigned &Column) const;
 
-  std::optional<FullSourceLoc> getFunctionSourceLocation(
-    const llvm::Function &F) const;
+  std::optional<FullSourceLoc>
+  getFunctionSourceLocation(const llvm::Function &F) const;
 
   void DiagnosticHandlerImpl(const llvm::DiagnosticInfo &DI);
   /// Specialized handler for InlineAsm diagnostic.
@@ -148,14 +145,13 @@ public:
   /// them.
   void EmitOptimizationMessage(const llvm::DiagnosticInfoOptimizationBase &D,
                                unsigned DiagID);
+  void OptimizationRemarkHandler(const llvm::DiagnosticInfoOptimizationBase &D);
+  void
+  OptimizationRemarkHandler(const llvm::OptimizationRemarkAnalysisFPCommute &D);
+  void
+  OptimizationRemarkHandler(const llvm::OptimizationRemarkAnalysisAliasing &D);
   void
-    OptimizationRemarkHandler(const llvm::DiagnosticInfoOptimizationBase &D);
-  void OptimizationRemarkHandler(
-    const llvm::OptimizationRemarkAnalysisFPCommute &D);
-  void OptimizationRemarkHandler(
-    const llvm::OptimizationRemarkAnalysisAliasing &D);
-  void OptimizationFailureHandler(
-    const llvm::DiagnosticInfoOptimizationFailure &D);
+  OptimizationFailureHandler(const llvm::DiagnosticInfoOptimizationFailure &D);
   void DontCallDiagHandler(const llvm::DiagnosticInfoDontCall &D);
   /// Specialized handler for misexpect warnings.
   /// Note that misexpect remarks are emitted through ORE
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index bd4a5b0e7..37ecb1238 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -57,6 +57,7 @@
 #include "llvm/Target/TargetOptions.h"
 #include "llvm/TargetParser/SubtargetFeature.h"
 #include "llvm/TargetParser/Triple.h"
+#include "llvm/Transforms/HipStdPar/HipStdPar.h"
 #include "llvm/Transforms/IPO/EmbedBitcodePass.h"
 #include "llvm/Transforms/IPO/LowerTypeTests.h"
 #include "llvm/Transforms/IPO/ThinLTOBitcodeWriter.h"
@@ -80,7 +81,6 @@
 #include "llvm/Transforms/Scalar/EarlyCSE.h"
 #include "llvm/Transforms/Scalar/GVN.h"
 #include "llvm/Transforms/Scalar/JumpThreading.h"
-#include "llvm/Transforms/HipStdPar/HipStdPar.h"
 #include "llvm/Transforms/Utils/Debugify.h"
 #include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
 #include "llvm/Transforms/Utils/ModuleUtils.h"
@@ -1073,7 +1073,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
             return;
         }
         if (CodeGenOpts.UnifiedLTO)
-          TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
+          TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO",
+                                   uint32_t(1));
         MPM.addPass(ThinLTOBitcodeWriterPass(
             *OS, ThinLinkOS ? &ThinLinkOS->os() : nullptr));
       } else {
@@ -1092,7 +1093,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
           TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
                                    uint32_t(1));
         if (CodeGenOpts.UnifiedLTO)
-          TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
+          TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO",
+                                   uint32_t(1));
       }
       if (Action == Backend_EmitBC)
         MPM.addPass(BitcodeWriterPass(*OS, CodeGenOpts.EmitLLVMUseLists,
@@ -1208,13 +1210,14 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
     DwoOS->keep();
 }
 
-static void runThinLTOBackend(
-    DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex,
-    llvm::Module *M, const HeaderSearchOptions &HeaderOpts,
-    const CodeGenOptions &CGOpts, const clang::TargetOptions &TOpts,
-    const LangOptions &LOpts, std::unique_ptr<raw_pwrite_stream> OS,
-    std::string SampleProfile, std::string ProfileRemapping,
-    BackendAction Action) {
+static void
+runThinLTOBackend(DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex,
+                  llvm::Module *M, const HeaderSearchOptions &HeaderOpts,
+                  const CodeGenOptions &CGOpts,
+                  const clang::TargetOptions &TOpts, const LangOptions &LOpts,
+                  std::unique_ptr<raw_pwrite_stream> OS,
+                  std::string SampleProfile, std::string ProfileRemapping,
+                  BackendAction Action) {
   DenseMap<StringRef, DenseMap<GlobalValue::GUID, GlobalValueSummary *>>
       ModuleToDefinedGVSummaries;
   CombinedIndex->collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index a31a271ed..a2dc937ea 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -91,83 +91,73 @@ static void reportOptRecordError(Error E, DiagnosticsEngine &Diags,
                                  const CodeGenOptions &CodeGenOpts) {
   handleAllErrors(
       std::move(E),
-    [&](const LLVMRemarkSetupFileError &E) {
+      [&](const LLVMRemarkSetupFileError &E) {
         Diags.Report(diag::err_cannot_open_file)
             << CodeGenOpts.OptRecordFile << E.message();
       },
-    [&](const LLVMRemarkSetupPatternError &E) {
+      [&](const LLVMRemarkSetupPatternError &E) {
         Diags.Report(diag::err_drv_optimization_remark_pattern)
             << E.message() << CodeGenOpts.OptRecordPasses;
       },
-    [&](const LLVMRemarkSetupFormatError &E) {
+      [&](const LLVMRemarkSetupFormatError &E) {
         Diags.Report(diag::err_drv_optimization_remark_format)
             << CodeGenOpts.OptRecordFormat;
       });
 }
 
-BackendConsumer::BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
-                                 IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
-                                 const HeaderSearchOptions &HeaderSearchOpts,
-                                 const PreprocessorOptions &PPOpts,
-                                 const CodeGenOptions &CodeGenOpts,
-                                 const TargetOptions &TargetOpts,
-                                 const LangOptions &LangOpts,
-                                 const std::string &InFile,
-                                 SmallVector<LinkModule, 4> LinkModules,
-                                 std::unique_ptr<raw_pwrite_stream> OS,
-                                 LLVMContext &C,
-                                 CoverageSourceInfo *CoverageInfo)
-  : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
-  CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
-  AsmOutStream(std::move(OS)), Context(nullptr), FS(VFS),
-  LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
-  LLVMIRGenerationRefCount(0),
-  Gen(CreateLLVMCodeGen(Diags, InFile, std::move(VFS), HeaderSearchOpts,
-                        PPOpts, CodeGenOpts, C, CoverageInfo)),
-  LinkModules(std::move(LinkModules)) {
-    TimerIsEnabled = CodeGenOpts.TimePasses;
-    llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
-    llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
+BackendConsumer::BackendConsumer(
+    BackendAction Action, DiagnosticsEngine &Diags,
+    IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+    const HeaderSearchOptions &HeaderSearchOpts,
+    const PreprocessorOptions &PPOpts, const CodeGenOptions &CodeGenOpts,
+    const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+    const std::string &InFile, SmallVector<LinkModule, 4> LinkModules,
+    std::unique_ptr<raw_pwrite_stream> OS, LLVMContext &C,
+    CoverageSourceInfo *CoverageInfo)
+    : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
+      CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
+      AsmOutStream(std::move(OS)), Context(nullptr), FS(VFS),
+      LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
+      LLVMIRGenerationRefCount(0),
+      Gen(CreateLLVMCodeGen(Diags, InFile, std::move(VFS), HeaderSearchOpts,
+                            PPOpts, CodeGenOpts, C, CoverageInfo)),
+      LinkModules(std::move(LinkModules)) {
+  TimerIsEnabled = CodeGenOpts.TimePasses;
+  llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
+  llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
 }
 
 // This constructor is used in installing an empty BackendConsumer
 // to use the clang diagnostic handler for IR input files. It avoids
 // initializing the OS field.
-BackendConsumer::BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
-                                 IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
-                                 const HeaderSearchOptions &HeaderSearchOpts,
-                                 const PreprocessorOptions &PPOpts,
-                                 const CodeGenOptions &CodeGenOpts,
-                                 const TargetOptions &TargetOpts,
-                                 const LangOptions &LangOpts,
-                                 llvm::Module *Module,
-                                 SmallVector<LinkModule, 4> LinkModules,
-                                 LLVMContext &C,
-                                 CoverageSourceInfo *CoverageInfo)
-  : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
-  CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
-  Context(nullptr), FS(VFS),
-  LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
-  LLVMIRGenerationRefCount(0),
-  Gen(CreateLLVMCodeGen(Diags, "", std::move(VFS), HeaderSearchOpts,
-                        PPOpts, CodeGenOpts, C, CoverageInfo)),
-  LinkModules(std::move(LinkModules)), CurLinkModule(Module) {
-    TimerIsEnabled = CodeGenOpts.TimePasses;
-    llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
-    llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
+BackendConsumer::BackendConsumer(
+    BackendAction Action, DiagnosticsEngine &Diags,
+    IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+    const HeaderSearchOptions &HeaderSearchOpts,
+    const PreprocessorOptions &PPOpts, const CodeGenOptions &CodeGenOpts,
+    const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+    llvm::Module *Module, SmallVector<LinkModule, 4> LinkModules,
+    LLVMContext &C, CoverageSourceInfo *CoverageInfo)
+    : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
+      CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
+      Context(nullptr), FS(VFS),
+      LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
+      LLVMIRGenerationRefCount(0),
+      Gen(CreateLLVMCodeGen(Diags, "", std::move(VFS), HeaderSearchOpts, PPOpts,
+                            CodeGenOpts, C, CoverageInfo)),
+      LinkModules(std::move(LinkModules)), CurLinkModule(Module) {
+  TimerIsEnabled = CodeGenOpts.TimePasses;
+  llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
+  llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
 }
 
-llvm::Module* BackendConsumer::getModule() const {
-  return Gen->GetModule();
-}
+llvm::Module *BackendConsumer::getModule() const { return Gen->GetModule(); }
 
 std::unique_ptr<llvm::Module> BackendConsumer::takeModule() {
   return std::unique_ptr<llvm::Module>(Gen->ReleaseModule());
 }
 
-CodeGenerator* BackendConsumer::getCodeGenerator() {
-  return Gen.get();
-}
+CodeGenerator *BackendConsumer::getCodeGenerator() { return Gen.get(); }
 
 void BackendConsumer::HandleCXXStaticMemberVarInstantiation(VarDecl *VD) {
   Gen->HandleCXXStaticMemberVarInstantiation(VD);
@@ -247,7 +237,7 @@ bool BackendConsumer::LinkInModules(llvm::Module *M, bool ShouldLinkFiles) {
         if (F.isIntrinsic())
           continue;
         CodeGen::mergeDefaultFunctionDefinitionAttributes(
-          F, CodeGenOpts, LangOpts, TargetOpts, LM.Internalize);
+            F, CodeGenOpts, LangOpts, TargetOpts, LM.Internalize);
       }
 
     CurLinkModule = LM.Module.get();
@@ -309,15 +299,15 @@ void BackendConsumer::HandleTranslationUnit(ASTContext &C) {
 
   LLVMContext &Ctx = getModule()->getContext();
   std::unique_ptr<DiagnosticHandler> OldDiagnosticHandler =
-    Ctx.getDiagnosticHandler();
-  Ctx.setDiagnosticHandler(std::make_unique<ClangDiagnosticHandler>(
-      CodeGenOpts, this));
+      Ctx.getDiagnosticHandler();
+  Ctx.setDiagnosticHandler(
+      std::make_unique<ClangDiagnosticHandler>(CodeGenOpts, this));
 
   Expected<std::unique_ptr<llvm::ToolOutputFile>> OptRecordFileOrErr =
-    setupLLVMOptimizationRemarks(
-      Ctx, CodeGenOpts.OptRecordFile, CodeGenOpts.OptRecordPasses,
-      CodeGenOpts.OptRecordFormat, CodeGenOpts.DiagnosticsWithHotness,
-      CodeGenOpts.DiagnosticsHotnessThreshold);
+      setupLLVMOptimizationRemarks(
+          Ctx, CodeGenOpts.OptRecordFile, CodeGenOpts.OptRecordPasses,
+          CodeGenOpts.OptRecordFormat, CodeGenOpts.DiagnosticsWithHotness,
+          CodeGenOpts.DiagnosticsHotnessThreshold);
 
   if (Error E = OptRecordFileOrErr.takeError()) {
     reportOptRecordError(std::move(E), Diags, CodeGenOpts);
@@ -325,7 +315,7 @@ void BackendConsumer::HandleTranslationUnit(ASTContext &C) {
   }
 
   std::unique_ptr<llvm::ToolOutputFile> OptRecordFile =
-    std::move(*OptRecordFileOrErr);
+      std::move(*OptRecordFileOrErr);
 
   if (OptRecordFile &&
       CodeGenOpts.getProfileUse() != CodeGenOptions::ProfileNone)
@@ -337,7 +327,7 @@ void BackendConsumer::HandleTranslationUnit(ASTContext &C) {
 
   if (CodeGenOpts.DiagnosticsMisExpectTolerance) {
     Ctx.setDiagnosticsMisExpectTolerance(
-      CodeGenOpts.DiagnosticsMisExpectTolerance);
+        CodeGenOpts.DiagnosticsMisExpectTolerance);
   }
 
   // Link each LinkModule into our module.
@@ -400,11 +390,9 @@ void BackendConsumer::AssignInheritanceModel(CXXRecordDecl *RD) {
   Gen->AssignInheritanceModel(RD);
 }
 
-void BackendConsumer::HandleVTable(CXXRecordDecl *RD) {
-  Gen->HandleVTable(RD);
-}
+void BackendConsumer::HandleVTable(CXXRecordDecl *RD) { Gen->HandleVTable(RD); }
 
-void BackendConsumer::anchor() { }
+void BackendConsumer::anchor() {}
 
 } // namespace clang
 

@lamb-j lamb-j requested a review from arsenm October 17, 2023 19:40
@jhuber6
Copy link
Contributor

jhuber6 commented Oct 17, 2023

sincos() is just one example. There are several other cases that can trigger this issue. fold_rootn() generates new function calls for square and cubic roots, fold_pow() does a similar thing for specific powers (ex 2), etc.

We did try disabling -amdgpu-prelink, and it did lead to a significant performance difference for a couple of key applications

With this approach we're also hoping to cover future optimizations added that may fall under this category

This approach assumes that whatever the function call was transformed into also exists in the same library, which isn't necessarily true. This is related to a whole host of problems with handling known runtime library calls inside of bitcode. It's possible that we could use some module metadata to indicate when these sorts of transformations are legal within LLVM-IR. I've had similar issues with LTO, where a sin call gets turned into llvm.sin.f64 which then no longer links with the sin function implemented in an LTO library. Then when the backend runs it will turn back into sin but will not be resolved.

@lamb-j
Copy link
Contributor Author

lamb-j commented Oct 17, 2023

This approach assumes that whatever the function call was transformed into also exists in the same library, which isn't necessarily true.

True, good point. But I don't think it's necessarily due to this approach, but more of how AMDGPULibCalls is implemented. It seems like the instruction folding implementations are assuming the definitions of the new functions they're inserting will be linked in at a later point?

Previously we were doing it via a separate llvm-link call to re-link all the device libraries. With this approach, we're doing it as part of the clang optimization pipeline, and would no longer need that extra link step.

Maybe this approach could be a solution for the time being? With a longer term solution being to think more carefully about the legality of AMDGPULibCalls inserting function calls that may or may not exist? Maybe device library calls are the exception?

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 17, 2023

This approach assumes that whatever the function call was transformed into also exists in the same library, which isn't necessarily true.

True, good point. But I don't think it's necessarily due to this approach, but more of how AMDGPULibCalls is implemented. It seems like the instruction folding implementations are assuming the definitions of the new functions they're inserting will be linked in at a later point?

Previously we were doing it via a separate llvm-link call to re-link all the device libraries. With this approach, we're doing it as part of the clang optimization pipeline, and would no longer need that extra link step.

Maybe this approach could be a solution for the time being? With a longer term solution being to think more carefully about the legality of AMDGPULibCalls inserting function calls that may or may not exist? Maybe device library calls are the exception?

This is part of why I'm not very happy with the current approach to just -mlink-builtin-bitcode everything and hope it works. We actually had similar issues with the OpenMPOpt pass as it would emit OpenMP runtime calls after the device library was already linked. The solution there was to simply guard emission of function calls to only be possible if we are either executing the optimizations pre-linking, or if the function call already exists in the module. I think this is a much easier fix than making this already quite complicated solution even more complicated. However, this would prevent these optimizations from firing without fixing it in other ways.

Generally this occurs because -mlink-builtin-bitcode only links needed symbols, so other ones are ignored. I think there's a few hacks in the ROCm device libs to get around this, such as emitting a printf symbol that only calls __printf_alloc so that it can be called. You could potentially link in all the symbols and internalize them, assuming that a future DCE or global opt pass will trim it down, but that would require some phase ordering to make sure this AMDGPULibCalls pass is run first.

@yxsamliu
Copy link
Collaborator

You could potentially link in all the symbols and internalize them

That probably won't work. After they are internalized, they have internal linkage and cannot be used to resolve newly added call of the same function.

The purpose of internalization is to allow you to optimize those functions, e.g. changing their signatures. After internalization, there is no guarantee they behave like the original functions.

@lamb-j lamb-j force-pushed the device-lib-linking branch from b7af255 to 155e15e Compare October 19, 2023 16:38
@jhuber6
Copy link
Contributor

jhuber6 commented Oct 19, 2023

You could potentially link in all the symbols and internalize them

That probably won't work. After they are internalized, they have internal linkage and cannot be used to resolve newly added call of the same function.

The purpose of internalization is to allow you to optimize those functions, e.g. changing their signatures. After internalization, there is no guarantee they behave like the original functions.

I meant this more in this particular case, if we internalize but keep all the symbols, then the functions will be preserved so we can call them inside of the module when the AMDGPU pass is run. This would then rely on some later pass to strip out the unused ones.

@lamb-j lamb-j force-pushed the device-lib-linking branch from 155e15e to ff10b14 Compare October 31, 2023 21:56
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I remember there was a reason we couldn't use the existing linking support and needed the new pass, what was that again?

// Some optimizations may generate new function calls that would not have
// been linked pre-optimization (i.e. fused sincos calls generated by
// AMDGPULibCalls::fold_sincos.)
if (ClRelinkBuiltinBitcodePostop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what I had in mind is that we could make a new clang option similar to -mlink-builtin-bitcode. This would then be used by the HIP toolchain or similar when constructing the list of files to pass via -mlink-builtin-bitcode. We would then simply register those with this secondary pass.

This approach seems much simpler, being a boolean option that just relinks everything, but I somewhat like the idea of -mlink-builtin-bitcode being a pre-link operation and having another one for post-linking. That being said, it may not be worth the extra work because this is a huge hack around this ecosystem already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look into having a file-specific option to link specific bitcodes post-optimization, and I think it is a good idea overall.

It is possible to come up with a specific set of bitcodes that need to be re-linked, based on the current optimizations we've implemented that introduce undefined functions. And we could then specify that set for a file-specific post-optimization linking.

But there are some downsides:

  • Any users would also need to know the specific set of bitcodes needed for post-optimization linking
  • We may need to hardcode that list in a bunch of different places, both within LLVM and external APIs
  • If a new optimization is implemented that expands the set of bitcodes needed for post-optimization linking, we'd need to go around and update any place that uses the file-specific options

Given that, I think the boolean relink option may serve us better for the time being

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the reason for LinkInModules pass is that we need to insert the second linking step into the RunOptimizationPipeline, specifically after the AMDGPULibCalls pass, but before the BitcodeWriter pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely the easier option. This problem is pretty specific, but I can see it being easier to just remove this class of bugs entirely. It's an ugly solution for an ugly problem overall.

std::unique_ptr<raw_pwrite_stream> &OS,
std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS);
std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS,
BackendConsumer *BC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this properly formatted?

Copy link
Contributor Author

@lamb-j lamb-j Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? I'm using the same indentation level as the other arguments, and BackendConsumer needs its own line to no not overflow the character limit. What looks off about it?

} else
Err = Linker::linkModules(*M, std::move(Clone), LM.LinkFlags);

if (Err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we do the attribute propagation now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeGen::mergeDefaultFunctionDefinitionAttributes()? That shouldn't be affected, lines 245-253. Or maybe I'm not understanding the question

@lamb-j lamb-j force-pushed the device-lib-linking branch from ff10b14 to 47c88a7 Compare November 6, 2023 19:09
@lamb-j lamb-j force-pushed the device-lib-linking branch 2 times, most recently from bd4e423 to 1aa1be1 Compare November 6, 2023 19:14
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy with the existence of this hack, but it's an ugly solution to an ugly self-inflicted problem, so I can live with it for now.

@jhuber6
Copy link
Contributor

jhuber6 commented Nov 6, 2023

The code formatter says that it's not happy. Can you try git clang-format HEAD~1 in your branch?

@lamb-j lamb-j force-pushed the device-lib-linking branch from 1aa1be1 to 83c922c Compare November 6, 2023 21:37
@lamb-j
Copy link
Contributor Author

lamb-j commented Nov 6, 2023

I am getting this from the formatter:

-  void RunOptimizationPipeline(BackendAction Action,
-                          std::unique_ptr<raw_pwrite_stream> &OS,
-                          std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS,
-                          BackendConsumer *BC);
+  void RunOptimizationPipeline(
+      BackendAction Action, std::unique_ptr<raw_pwrite_stream> &OS,
+      std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS, BackendConsumer *BC);

But in this case I am just following the existing style. I did notice a couple of other improvements from the formatter though, and I've added those changes.

@jhuber6
Copy link
Contributor

jhuber6 commented Nov 6, 2023

I am getting this from the formatter:

-  void RunOptimizationPipeline(BackendAction Action,
-                          std::unique_ptr<raw_pwrite_stream> &OS,
-                          std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS,
-                          BackendConsumer *BC);
+  void RunOptimizationPipeline(
+      BackendAction Action, std::unique_ptr<raw_pwrite_stream> &OS,
+      std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS, BackendConsumer *BC);

But in this case I am just following the existing style. I did notice a couple of other improvements from the formatter though, and I've added those changes.

Just do what the formatter says, not every file is 100% clang-formatted so there's bits of old code that haven't been properly cleaned yet. This was the same line that I thought looked wrong so it should probably be fixed. Using git clang-format HEAD~1 only formats what you've changed, so you don't need to worry about spurious edits.

@lamb-j
Copy link
Contributor Author

lamb-j commented Nov 6, 2023

Just do what the formatter says, not every file is 100% clang-formatted so there's bits of old code that haven't been properly cleaned yet. This was the same line that I thought looked wrong so it should probably be fixed. Using git clang-format HEAD~1 only formats what you've changed, so you don't need to worry about spurious edits.

Isn't the standard to follow the existing style, not re-format small sections of code during a commit to a different style?

[Always follow the golden rule:

If you are extending, enhancing, or bug fixing already implemented code, use the style that is already being used so that the source is uniform and easy to follow.](https://llvm.org/docs/CodingStandards.html)

@jhuber6
Copy link
Contributor

jhuber6 commented Nov 6, 2023

Just do what the formatter says, not every file is 100% clang-formatted so there's bits of old code that haven't been properly cleaned yet. This was the same line that I thought looked wrong so it should probably be fixed. Using git clang-format HEAD~1 only formats what you've changed, so you don't need to worry about spurious edits.

Isn't the standard to follow the existing style, not re-format small sections of code during a commit to a different style?

[Always follow the golden rule:

If you are extending, enhancing, or bug fixing already implemented code, use the style that is already being used so that the source is uniform and easy to follow.](https://llvm.org/docs/CodingStandards.html)

Yes, but this doesn't really apply since you changed the function signature so it needs to be reformatted. That rule primarily applies to sections that have been manually formatted but don't exactly match the clang-format rules. Another reason we don't do bulk clang-format everywhere is because it confuses git blame. However, in this case there's really no reason not to.

@lamb-j
Copy link
Contributor Author

lamb-j commented Nov 6, 2023

this doesn't really apply since you changed the function signature so it needs to be reformatted.

I don't follow the logic there. The function signature can have a style as well. And I think this is actually a good example to demonstrate a reason not to reformat if we look at the next few lines:

Following existing style:

  void RunOptimizationPipeline(BackendAction Action,
                          std::unique_ptr<raw_pwrite_stream> &OS,
                          std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS,
                          BackendConsumer *BC);
  void RunCodegenPipeline(BackendAction Action,
                          std::unique_ptr<raw_pwrite_stream> &OS,
                          std::unique_ptr<llvm::ToolOutputFile> &DwoOS);

Following clang format:

  void RunOptimizationPipeline(
      BackendAction Action, std::unique_ptr<raw_pwrite_stream> &OS,
      std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS,
      BackendConsumer *BC);
  void RunCodegenPipeline(BackendAction Action,
                          std::unique_ptr<raw_pwrite_stream> &OS,
                          std::unique_ptr<llvm::ToolOutputFile> &DwoOS);

To me, the first case aligns more with the spirit of the "golden rule". And if someone wants to reformat the whole file at a later point, that would be a good time to change the style for both signatures.

That said, I definitely don't want this to be a barrier to getting this patch in, so if you still feel like we should go with the clang-format recommendation, I'll change it and also update the EmitAssembly and EmitBackendOutput signatures which were flagged by clang-format for the same reasons.

@jhuber6
Copy link
Contributor

jhuber6 commented Nov 6, 2023

That said, I definitely don't want this to be a barrier to getting this patch in, so if you still feel like we should go with the clang-format recommendation, I'll change it and also update the EmitAssembly and EmitBackendOutput signatures which were flagged by clang-format for the same reasons.

You should generally just go with what clang-format says unless there's a compelling reason not to. There's a reason the CI complains if git clang-format HEAD~1 doesn't come back clean.

@lamb-j lamb-j force-pushed the device-lib-linking branch from 83c922c to 5c35c01 Compare November 6, 2023 23:17
@yxsamliu
Copy link
Collaborator

yxsamliu commented Nov 7, 2023

sometimes the formatter checks the diff of the rebase when you rebase your code. However if you rebase your commits against ToT of trunk the formatter issue may disappear.

Copy link
Collaborator

@yxsamliu yxsamliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@lamb-j lamb-j force-pushed the device-lib-linking branch from 5c35c01 to e358ed6 Compare November 7, 2023 18:59
Previously the BackendConsumer class was defined in CodeGenAction.cpp.
In order to use this class and its member functions in other files
for upcoming changes, we first need to refactor the class definition
into a separate header
Add an llvm namespace to Module references to disambiguate it from
clang::Module. This is needed to include new header files that
expose clang::Module
…odes

In this patch, we create a new ModulePass that mimics the LinkInModules
API from CodeGenAction.cpp, and a new command line option to enable the
pass. With this new pass, we can now re-link bitcodes supplied via the
-mlink-built-in bitcodes as part of the RunOptimizationPipeline.

With the re-linking pass, we now handle cases where new device library
functions are introduced as part of the optimization pipeline. Previously,
these newly introduced functions (for example a fused sincos call) would
result in a linking error due to a missing function defintion.

This new pass can be initiated via:

  -mllvm -relink-builtin-bitcode-postop

Also note we intentionally exclude bitcodes supplied via the
-mlink-bitcode-file option from the second linking step
@lamb-j lamb-j changed the title Correctly link and optimize device libraries with -mlink-builtin-bitcode [CodeGen] Implement post-opt linking option for builtin bitocdes Nov 8, 2023
@lamb-j lamb-j force-pushed the device-lib-linking branch from e358ed6 to cf294be Compare November 8, 2023 18:52
@lamb-j lamb-j merged commit c6cf329 into llvm:main Nov 8, 2023
@@ -113,7 +120,7 @@ class EmitAssemblyHelper {
const CodeGenOptions &CodeGenOpts;
const clang::TargetOptions &TargetOpts;
const LangOptions &LangOpts;
Module *TheModule;
llvm::Module *TheModule;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this suddenly need qualification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When including the new BackendConsumer.h header file to BackendUtils.cpp, we were indirectly including another header several layers deep that had defined clang::Module. Previously this file had no references to clang::Module via any includes, but with the new include we were getting an "ambiguous reference to Module" error. This was fixed by specifying which Module class we were using in this file.

I don't remember exactly which nested include referenced clang::Module, I could track down the include path again if it's helpful though.

Copy link
Contributor Author

@lamb-j lamb-j Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the qualification, we get the following:

llvm-project/clang/lib/CodeGen/BackendUtil.cpp:123:3: error: reference to ‘Module’ is ambiguous
  123 |   Module *TheModule;
      |   ^~~~~~

In file included from llvm-project/llvm/include/llvm/IR/ModuleSummaryIndex.h:28,
                 from llvm-project/clang/include/clang/CodeGen/BackendUtil.h:13,
                 from llvm-project/clang/lib/CodeGen/BackendUtil.cpp:9:
llvm-project/llvm/include/llvm/IR/Module.h:65:32: note: candidates are: ‘class llvm::Module’
   65 | class LLVM_EXTERNAL_VISIBILITY Module {
      |                                ^~~~~~
In file included from llvm-project/clang/include/clang/Lex/ModuleLoader.h:18,
                 from llvm-project/clang/include/clang/Frontend/ASTUnit.h:26,
                 from llvm-project/clang/include/clang/Frontend/FrontendAction.h:23,
                 from llvm-project/clang/include/clang/CodeGen/CodeGenAction.h:12,
                 from llvm-project/clang/lib/CodeGen/BackendConsumer.h:13,
                 from llvm-project/clang/lib/CodeGen/BackendUtil.cpp:10:
llvm-project/clang/include/clang/Basic/Module.h:105:18: note:                 ‘class clang::Module’
  105 | class alignas(8) Module {
      |                  ^~~~~~

Comment on lines +104 to +107
// Re-link builtin bitcodes after optimization
static cl::opt<bool> ClRelinkBuiltinBitcodePostop(
"relink-builtin-bitcode-postop", cl::Optional,
cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a proper flag? Where/how is -mlink-builtin-bitcode defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clang flag, this is presumably more of an LLVM one because this added a new pass that lives in Clang. I still think the solution to this was to just stop the backend from doing this optimization if it will obviously break it, but supposedly that caused performance regressions.

Comment on lines +261 to +262
// Create a Clone to move to the linker, which preserves the original
// linking modules, allowing them to be linked again in the future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this is outside my area of expertise, but I have a question about this. We are seeing huge slowdown running the LIBCLC tests after this change. Each test was running in a fraction of a second, but now with the cloning the tests are running 1-2 minutes (and there are a few sets of 1000 tests).

Is this expected? The command line looks something like this for the tests:

clang -fno-builtin -I /my/llvm/libclc/generic/include -target amdgcn--amdhsa -Xclang -mlink-builtin-bitcode -Xclang /mybuild/llvm/./lib/clc/libspirv-amdgcn--amdhsa.bc -nogpulib -Xclang -cl-ext=-cl_khr_fp16 -emit-llvm -S -o - /my/llvm/libclc/test/binding/core/ConvertUToF_Rdouble4_rtn.cl

Looks like the .bc file here has about 19,000 functions to clone.

Just thought I'd ask before I dig into this too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, we definitely did not expect module cloning to have significant performance impacts for compilation.

One thing we considered was to only do module cloning if we expect to need the clones for a second link step. That may address your performance issue. I'll draft up a patch and link it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 30, 2023
Compiling with the ROCm device libraries is currently possible via
the *_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC action. However, with
this action it is possible to encounter LLD errors if new device
library calls are introduced during optimization.

In this patch, we add a newly introduced LLVM option,
-relink-builtin-bitcode-postop (llvm#69371),
to re-link the device libraries after optimization, resolving any
newly introduced definitions. We also update the relevant test
to reflect this improvement

Change-Id: I3889b7a5ffab795065684c59486c6a9965a038f2
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Dec 12, 2023
…m#69371)

In this patch, we create a new ModulePass that mimics the LinkInModules
API from CodeGenAction.cpp, and a new command line option to enable the
pass. As part of the implementation, we needed to refactor the
BackendConsumer class definition into a new separate header (instead of
embedded in CodeGenAction.cpp). With this new pass, we can now re-link
bitcodes supplied via the -mlink-built-in bitcodes as part of the
RunOptimizationPipeline.

With the re-linking pass, we now handle cases where new device library
functions are introduced as part of the optimization pipeline.
Previously, these newly introduced functions (for example a fused sincos
call) would result in a linking error due to a missing function
definition. This new pass can be initiated via:

      -mllvm -relink-builtin-bitcode-postop

Also note we intentionally exclude bitcodes supplied via the
-mlink-bitcode-file option from the second linking step

Change-Id: I77b6e79e69b89feb8ca7c3ac6f792f6881a23129
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Dec 12, 2023
Compiling with the ROCm device libraries is currently possible via
the *_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC action. However, with
this action it is possible to encounter LLD errors if new device
library calls are introduced during optimization.

In this patch, we add a newly introduced LLVM option,
-relink-builtin-bitcode-postop (llvm#69371),
to re-link the device libraries after optimization, resolving any
newly introduced definitions. We also update the relevant test
to reflect this improvement

Change-Id: I3889b7a5ffab795065684c59486c6a9965a038f2
MaskRay added a commit that referenced this pull request Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants