-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CodeGen] Implement post-opt linking option for builtin bitocdes #69371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Jacob Lambert (lamb-j) ChangesThis set of patches updates device library linking and optimization to do the following: Link This handles the edge case where optimization introduces new device library functions, such as a fused sincos() from separate sin() and cos() calls. The second link step ensures the sincos() definition is also linked in from the device libraries. Patch is 51.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69371.diff 6 Files Affected:
diff --git a/clang/lib/CodeGen/BackendConsumer.h b/clang/lib/CodeGen/BackendConsumer.h
new file mode 100644
index 000000000000000..72a814cd43d738d
--- /dev/null
+++ b/clang/lib/CodeGen/BackendConsumer.h
@@ -0,0 +1,166 @@
+//===--- BackendConsumer.h - LLVM BackendConsumer Header File -------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_LIB_CODEGEN_BACKENDCONSUMER_H
+#define LLVM_CLANG_LIB_CODEGEN_BACKENDCONSUMER_H
+
+#include "clang/CodeGen/BackendUtil.h"
+#include "clang/CodeGen/CodeGenAction.h"
+
+#include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/Support/Timer.h"
+
+namespace llvm {
+ class DiagnosticInfoDontCall;
+}
+
+namespace clang {
+class ASTContext;
+class CodeGenAction;
+class CoverageSourceInfo;
+
+class BackendConsumer : public ASTConsumer {
+ using LinkModule = CodeGenAction::LinkModule;
+
+ virtual void anchor();
+ DiagnosticsEngine &Diags;
+ BackendAction Action;
+ const HeaderSearchOptions &HeaderSearchOpts;
+ const CodeGenOptions &CodeGenOpts;
+ const TargetOptions &TargetOpts;
+ const LangOptions &LangOpts;
+ std::unique_ptr<raw_pwrite_stream> AsmOutStream;
+ ASTContext *Context;
+ IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS;
+
+ llvm::Timer LLVMIRGeneration;
+ unsigned LLVMIRGenerationRefCount;
+
+ /// True if we've finished generating IR. This prevents us from generating
+ /// additional LLVM IR after emitting output in HandleTranslationUnit. This
+ /// can happen when Clang plugins trigger additional AST deserialization.
+ bool IRGenFinished = false;
+
+ bool TimerIsEnabled = false;
+
+ std::unique_ptr<CodeGenerator> Gen;
+
+ SmallVector<LinkModule, 4> LinkModules;
+
+ // A map from mangled names to their function's source location, used for
+ // backend diagnostics as the Clang AST may be unavailable. We actually use
+ // the mangled name's hash as the key because mangled names can be very
+ // long and take up lots of space. Using a hash can cause name collision,
+ // but that is rare and the consequences are pointing to a wrong source
+ // location which is not severe. This is a vector instead of an actual map
+ // because we optimize for time building this map rather than time
+ // retrieving an entry, as backend diagnostics are uncommon.
+ std::vector<std::pair<llvm::hash_code, FullSourceLoc>>
+ ManglingFullSourceLocs;
+
+
+ // This is here so that the diagnostic printer knows the module a diagnostic
+ // refers to.
+ llvm::Module *CurLinkModule = nullptr;
+
+public:
+ BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
+ IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+ const HeaderSearchOptions &HeaderSearchOpts,
+ const PreprocessorOptions &PPOpts,
+ const CodeGenOptions &CodeGenOpts,
+ const TargetOptions &TargetOpts,
+ const LangOptions &LangOpts, const std::string &InFile,
+ SmallVector<LinkModule, 4> LinkModules,
+ std::unique_ptr<raw_pwrite_stream> OS, llvm::LLVMContext &C,
+ CoverageSourceInfo *CoverageInfo = nullptr);
+
+ // This constructor is used in installing an empty BackendConsumer
+ // to use the clang diagnostic handler for IR input files. It avoids
+ // initializing the OS field.
+ BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
+ IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+ const HeaderSearchOptions &HeaderSearchOpts,
+ const PreprocessorOptions &PPOpts,
+ const CodeGenOptions &CodeGenOpts,
+ const TargetOptions &TargetOpts,
+ const LangOptions &LangOpts, llvm::Module *Module,
+ SmallVector<LinkModule, 4> LinkModules, llvm::LLVMContext &C,
+ CoverageSourceInfo *CoverageInfo = nullptr);
+
+ llvm::Module *getModule() const;
+ std::unique_ptr<llvm::Module> takeModule();
+
+ CodeGenerator *getCodeGenerator();
+
+ void HandleCXXStaticMemberVarInstantiation(VarDecl *VD) override;
+ void Initialize(ASTContext &Ctx) override;
+ bool HandleTopLevelDecl(DeclGroupRef D) override;
+ void HandleInlineFunctionDefinition(FunctionDecl *D) override;
+ void HandleInterestingDecl(DeclGroupRef D) override;
+ void HandleTranslationUnit(ASTContext &C) override;
+ void HandleTagDeclDefinition(TagDecl *D) override;
+ void HandleTagDeclRequiredDefinition(const TagDecl *D) override;
+ void CompleteTentativeDefinition(VarDecl *D) override;
+ void CompleteExternalDeclaration(VarDecl *D) override;
+ void AssignInheritanceModel(CXXRecordDecl *RD) override;
+ void HandleVTable(CXXRecordDecl *RD) override;
+
+
+ // Links each entry in LinkModules into our module. Returns true on error.
+ bool LinkInModules(llvm::Module *M, bool ShouldLinkFiles = true);
+
+ /// Get the best possible source location to represent a diagnostic that
+ /// may have associated debug info.
+ const FullSourceLoc getBestLocationFromDebugLoc(
+ const llvm::DiagnosticInfoWithLocationBase &D,
+ bool &BadDebugInfo, StringRef &Filename,
+ unsigned &Line, unsigned &Column) const;
+
+ std::optional<FullSourceLoc> getFunctionSourceLocation(
+ const llvm::Function &F) const;
+
+ void DiagnosticHandlerImpl(const llvm::DiagnosticInfo &DI);
+ /// Specialized handler for InlineAsm diagnostic.
+ /// \return True if the diagnostic has been successfully reported, false
+ /// otherwise.
+ bool InlineAsmDiagHandler(const llvm::DiagnosticInfoInlineAsm &D);
+ /// Specialized handler for diagnostics reported using SMDiagnostic.
+ void SrcMgrDiagHandler(const llvm::DiagnosticInfoSrcMgr &D);
+ /// Specialized handler for StackSize diagnostic.
+ /// \return True if the diagnostic has been successfully reported, false
+ /// otherwise.
+ bool StackSizeDiagHandler(const llvm::DiagnosticInfoStackSize &D);
+ /// Specialized handler for ResourceLimit diagnostic.
+ /// \return True if the diagnostic has been successfully reported, false
+ /// otherwise.
+ bool ResourceLimitDiagHandler(const llvm::DiagnosticInfoResourceLimit &D);
+
+ /// Specialized handler for unsupported backend feature diagnostic.
+ void UnsupportedDiagHandler(const llvm::DiagnosticInfoUnsupported &D);
+ /// Specialized handlers for optimization remarks.
+ /// Note that these handlers only accept remarks and they always handle
+ /// them.
+ void EmitOptimizationMessage(const llvm::DiagnosticInfoOptimizationBase &D,
+ unsigned DiagID);
+ void
+ OptimizationRemarkHandler(const llvm::DiagnosticInfoOptimizationBase &D);
+ void OptimizationRemarkHandler(
+ const llvm::OptimizationRemarkAnalysisFPCommute &D);
+ void OptimizationRemarkHandler(
+ const llvm::OptimizationRemarkAnalysisAliasing &D);
+ void OptimizationFailureHandler(
+ const llvm::DiagnosticInfoOptimizationFailure &D);
+ void DontCallDiagHandler(const llvm::DiagnosticInfoDontCall &D);
+ /// Specialized handler for misexpect warnings.
+ /// Note that misexpect remarks are emitted through ORE
+ void MisExpectDiagHandler(const llvm::DiagnosticInfoMisExpect &D);
+};
+
+} // namespace clang
+#endif
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index d066819871dfde3..894daced377fb55 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -6,6 +6,8 @@
//
//===----------------------------------------------------------------------===//
+#include "BackendConsumer.h"
+#include "LinkInModulesPass.h"
#include "clang/CodeGen/BackendUtil.h"
#include "clang/Basic/CodeGenOptions.h"
#include "clang/Basic/Diagnostic.h"
@@ -112,7 +114,7 @@ class EmitAssemblyHelper {
const CodeGenOptions &CodeGenOpts;
const clang::TargetOptions &TargetOpts;
const LangOptions &LangOpts;
- Module *TheModule;
+ llvm::Module *TheModule;
IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS;
Timer CodeGenerationTime;
@@ -155,10 +157,10 @@ class EmitAssemblyHelper {
return F;
}
- void
- RunOptimizationPipeline(BackendAction Action,
+ void RunOptimizationPipeline(BackendAction Action,
std::unique_ptr<raw_pwrite_stream> &OS,
- std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS);
+ std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS,
+ BackendConsumer *BC);
void RunCodegenPipeline(BackendAction Action,
std::unique_ptr<raw_pwrite_stream> &OS,
std::unique_ptr<llvm::ToolOutputFile> &DwoOS);
@@ -178,7 +180,7 @@ class EmitAssemblyHelper {
const HeaderSearchOptions &HeaderSearchOpts,
const CodeGenOptions &CGOpts,
const clang::TargetOptions &TOpts,
- const LangOptions &LOpts, Module *M,
+ const LangOptions &LOpts, llvm::Module *M,
IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS)
: Diags(_Diags), HSOpts(HeaderSearchOpts), CodeGenOpts(CGOpts),
TargetOpts(TOpts), LangOpts(LOpts), TheModule(M), VFS(std::move(VFS)),
@@ -194,7 +196,8 @@ class EmitAssemblyHelper {
// Emit output using the new pass manager for the optimization pipeline.
void EmitAssembly(BackendAction Action,
- std::unique_ptr<raw_pwrite_stream> OS);
+ std::unique_ptr<raw_pwrite_stream> OS,
+ BackendConsumer *BC);
};
}
@@ -690,7 +693,7 @@ static void addSanitizers(const Triple &TargetTriple,
// the logic of the original code, but operates on "shadow" values. It
// can benefit from re-running some general purpose optimization
// passes.
- MPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());
+ MPM.addPass(RequireAnalysisPass<GlobalsAA, llvm::Module>());
FunctionPassManager FPM;
FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */));
FPM.addPass(InstCombinePass());
@@ -749,7 +752,7 @@ static void addSanitizers(const Triple &TargetTriple,
SanitizersCallback(NewMPM, Level);
if (!NewMPM.isEmpty()) {
// Sanitizers can abandon<GlobalsAA>.
- NewMPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());
+ NewMPM.addPass(RequireAnalysisPass<GlobalsAA, llvm::Module>());
MPM.addPass(std::move(NewMPM));
}
});
@@ -761,7 +764,7 @@ static void addSanitizers(const Triple &TargetTriple,
void EmitAssemblyHelper::RunOptimizationPipeline(
BackendAction Action, std::unique_ptr<raw_pwrite_stream> &OS,
- std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS) {
+ std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS, BackendConsumer *BC) {
std::optional<PGOOptions> PGOOpt;
if (CodeGenOpts.hasProfileIRInstr())
@@ -1035,6 +1038,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
}
}
+ // Re-link against any bitcodes supplied via the -mlink-builtin-bitcode option
+ // Some optimizations may generate new function calls that would not have
+ // been linked pre-optimization (i.e. fused sincos calls generated by
+ // AMDGPULibCalls::fold_sincos.)
+ MPM.addPass(LinkInModulesPass(BC, false));
+
// Add a verifier pass if requested. We don't have to do this if the action
// requires code generation because there will already be a verifier pass in
// the code-generation pipeline.
@@ -1046,7 +1055,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
if (Action == Backend_EmitBC || Action == Backend_EmitLL) {
if (CodeGenOpts.PrepareForThinLTO && !CodeGenOpts.DisableLLVMPasses) {
if (!TheModule->getModuleFlag("EnableSplitLTOUnit"))
- TheModule->addModuleFlag(Module::Error, "EnableSplitLTOUnit",
+ TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
CodeGenOpts.EnableSplitLTOUnit);
if (Action == Backend_EmitBC) {
if (!CodeGenOpts.ThinLinkBitcodeFile.empty()) {
@@ -1055,7 +1064,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
return;
}
if (CodeGenOpts.UnifiedLTO)
- TheModule->addModuleFlag(Module::Error, "UnifiedLTO", uint32_t(1));
+ TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
MPM.addPass(ThinLTOBitcodeWriterPass(
*OS, ThinLinkOS ? &ThinLinkOS->os() : nullptr));
} else {
@@ -1069,12 +1078,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
bool EmitLTOSummary = shouldEmitRegularLTOSummary();
if (EmitLTOSummary) {
if (!TheModule->getModuleFlag("ThinLTO") && !CodeGenOpts.UnifiedLTO)
- TheModule->addModuleFlag(Module::Error, "ThinLTO", uint32_t(0));
+ TheModule->addModuleFlag(llvm::Module::Error, "ThinLTO", uint32_t(0));
if (!TheModule->getModuleFlag("EnableSplitLTOUnit"))
- TheModule->addModuleFlag(Module::Error, "EnableSplitLTOUnit",
+ TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
uint32_t(1));
if (CodeGenOpts.UnifiedLTO)
- TheModule->addModuleFlag(Module::Error, "UnifiedLTO", uint32_t(1));
+ TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
}
if (Action == Backend_EmitBC)
MPM.addPass(BitcodeWriterPass(*OS, CodeGenOpts.EmitLLVMUseLists,
@@ -1088,13 +1097,13 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// Set module flags, like EnableSplitLTOUnit and UnifiedLTO, since FatLTO
// uses a different action than Backend_EmitBC or Backend_EmitLL.
if (!TheModule->getModuleFlag("ThinLTO"))
- TheModule->addModuleFlag(Module::Error, "ThinLTO",
+ TheModule->addModuleFlag(llvm::Module::Error, "ThinLTO",
uint32_t(CodeGenOpts.PrepareForThinLTO));
if (!TheModule->getModuleFlag("EnableSplitLTOUnit"))
- TheModule->addModuleFlag(Module::Error, "EnableSplitLTOUnit",
+ TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
uint32_t(CodeGenOpts.EnableSplitLTOUnit));
if (CodeGenOpts.UnifiedLTO && !TheModule->getModuleFlag("UnifiedLTO"))
- TheModule->addModuleFlag(Module::Error, "UnifiedLTO", uint32_t(1));
+ TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
}
// Print a textual, '-passes=' compatible, representation of pipeline if
@@ -1160,7 +1169,8 @@ void EmitAssemblyHelper::RunCodegenPipeline(
}
void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
- std::unique_ptr<raw_pwrite_stream> OS) {
+ std::unique_ptr<raw_pwrite_stream> OS,
+ BackendConsumer *BC) {
TimeRegion Region(CodeGenOpts.TimePasses ? &CodeGenerationTime : nullptr);
setCommandLineOpts(CodeGenOpts);
@@ -1176,7 +1186,7 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
cl::PrintOptionValues();
std::unique_ptr<llvm::ToolOutputFile> ThinLinkOS, DwoOS;
- RunOptimizationPipeline(Action, OS, ThinLinkOS);
+ RunOptimizationPipeline(Action, OS, ThinLinkOS, BC);
RunCodegenPipeline(Action, OS, DwoOS);
if (ThinLinkOS)
@@ -1186,11 +1196,12 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
}
static void runThinLTOBackend(
- DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex, Module *M,
- const HeaderSearchOptions &HeaderOpts, const CodeGenOptions &CGOpts,
- const clang::TargetOptions &TOpts, const LangOptions &LOpts,
- std::unique_ptr<raw_pwrite_stream> OS, std::string SampleProfile,
- std::string ProfileRemapping, BackendAction Action) {
+ DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex,
+ llvm::Module *M, const HeaderSearchOptions &HeaderOpts,
+ const CodeGenOptions &CGOpts, const clang::TargetOptions &TOpts,
+ const LangOptions &LOpts, std::unique_ptr<raw_pwrite_stream> OS,
+ std::string SampleProfile, std::string ProfileRemapping,
+ BackendAction Action) {
DenseMap<StringRef, DenseMap<GlobalValue::GUID, GlobalValueSummary *>>
ModuleToDefinedGVSummaries;
CombinedIndex->collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
@@ -1259,18 +1270,18 @@ static void runThinLTOBackend(
Conf.SplitDwarfOutput = CGOpts.SplitDwarfOutput;
switch (Action) {
case Backend_EmitNothing:
- Conf.PreCodeGenModuleHook = [](size_t Task, const Module &Mod) {
+ Conf.PreCodeGenModuleHook = [](size_t Task, const llvm::Module &Mod) {
return false;
};
break;
case Backend_EmitLL:
- Conf.PreCodeGenModuleHook = [&](size_t Task, const Module &Mod) {
+ Conf.PreCodeGenModuleHook = [&](size_t Task, const llvm::Module &Mod) {
M->print(*OS, nullptr, CGOpts.EmitLLVMUseLists);
return false;
};
break;
case Backend_EmitBC:
- Conf.PreCodeGenModuleHook = [&](size_t Task, const Module &Mod) {
+ Conf.PreCodeGenModuleHook = [&](size_t Task, const llvm::Module &Mod) {
WriteBitcodeToFile(*M, *OS, CGOpts.EmitLLVMUseLists);
return false;
};
@@ -1294,9 +1305,10 @@ void clang::EmitBackendOutput(DiagnosticsEngine &Diags,
const CodeGenOptions &CGOpts,
const clang::TargetOptions &TOpts,
const LangOptions &LOpts, StringRef TDesc,
- Module *M, BackendAction Action,
+ llvm::Module *M, BackendAction Action,
IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
- std::unique_ptr<raw_pwrite_stream> OS) {
+ std::unique_ptr<raw_pwrite_stream> OS,
+ BackendConsumer *BC) {
llvm::TimeTraceScope TimeScope("Backend");
@@ -1339,7 +1351,7 @@ void clang::EmitBackendOutput(DiagnosticsEngine &Diags,
}
EmitAssemblyHelper AsmHelper(Diags, HeaderOpts, CGOpts, TOpts, LOpts, M, VFS);
- AsmHelper.EmitAssembly(Action, std::move(OS));
+ AsmHelper.EmitAssembly(Action, std::move(OS), BC);
// Verify clang's TargetInfo DataLayout against the LLVM TargetMachine's
// DataLayout.
diff --git a/clang/lib/CodeGen/CMakeLists.txt b/clang/lib/CodeGen/CMakeLists.txt
index 1debeb6d9cce9e0..41e31c12a16e641 100644
--- a/clang/lib/CodeGen/CMakeLists.txt
+++ b/clang/lib/CodeGen/CMakeLists.txt
@@ -81,6 +81,7 @@ add_clang_library(clangCodeGen
ConstantInitBuilder.cpp
CoverageMappingGen.cpp
ItaniumCXXABI.cpp
+ LinkInModulesPass.cpp
MacroPPCallbacks.cpp
MicrosoftCXXABI.cpp
ModuleBuilder.cpp
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index a3b72381d73fc54..f4586a776c0fefd 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#include "clang/CodeGen/CodeGenAction.h"
+#include "BackendConsumer.h"
#include "CGCall.h"
#include "CodeGenModule.h"
#include "CoverageMappingGen.h"
@@ -48,8 +49,8 @@
#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Support/YAMLTraits.h"
#include "llvm/Transforms/IPO/Internalize.h"
+#include "llvm/Transforms/Utils/Cloning.h"
-#include <memory>
#include <optional>
using namespace clang;
using namespace llvm;
@@ -57,419 +58,360 @@ using namespace llvm;
#define DEBUG_TYPE "codegenaction"
namespace clang {
- class BackendConsumer;
- class ClangDiagnosticHandler final : public DiagnosticHandler {
- public:
- ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon)
- : CodeGenOpts(CGOpts), BackendCon(BCon) {}
+class BackendConsumer;
+class ClangDiagnosticHandler final : public DiagnosticHandler {
+public:
+ ClangDiagnosticHandler(const CodeGenOptions &CGOpts, BackendConsumer *BCon)
+ : CodeGenOpts(CGOpts), BackendCon(BCon) {}
- bool handleDiagnostics(const DiagnosticInfo &DI) override;
+ bool handleDiagnostics(const DiagnosticInfo &DI) over...
[truncated]
|
This seems like a pretty big change to solve just this problem. Could we not just put |
sincos() is just one example. There are several other cases that can trigger this issue. fold_rootn() generates new function calls for square and cubic roots, fold_pow() does a similar thing for specific powers (ex 2), etc. We did try disabling -amdgpu-prelink, and it did lead to a significant performance difference for a couple of key applications With this approach we're also hoping to cover future optimizations added that may fall under this category |
You can test this locally with the following command:git-clang-format --diff 5aa934e2af8727852dec0ec1cfa0cba05d858f70 cf294be05ebdd10df35e7e857b118c8102b68362 -- clang/lib/CodeGen/BackendConsumer.h clang/lib/CodeGen/LinkInModulesPass.cpp clang/lib/CodeGen/LinkInModulesPass.h clang/include/clang/CodeGen/BackendUtil.h clang/lib/CodeGen/BackendUtil.cpp clang/lib/CodeGen/CodeGenAction.cpp View the diff from clang-format here.diff --git a/clang/lib/CodeGen/BackendConsumer.h b/clang/lib/CodeGen/BackendConsumer.h
index 72a814cd4..17d5ca3d1 100644
--- a/clang/lib/CodeGen/BackendConsumer.h
+++ b/clang/lib/CodeGen/BackendConsumer.h
@@ -16,7 +16,7 @@
#include "llvm/Support/Timer.h"
namespace llvm {
- class DiagnosticInfoDontCall;
+class DiagnosticInfoDontCall;
}
namespace clang {
@@ -60,9 +60,7 @@ class BackendConsumer : public ASTConsumer {
// location which is not severe. This is a vector instead of an actual map
// because we optimize for time building this map rather than time
// retrieving an entry, as backend diagnostics are uncommon.
- std::vector<std::pair<llvm::hash_code, FullSourceLoc>>
- ManglingFullSourceLocs;
-
+ std::vector<std::pair<llvm::hash_code, FullSourceLoc>> ManglingFullSourceLocs;
// This is here so that the diagnostic printer knows the module a diagnostic
// refers to.
@@ -74,8 +72,8 @@ public:
const HeaderSearchOptions &HeaderSearchOpts,
const PreprocessorOptions &PPOpts,
const CodeGenOptions &CodeGenOpts,
- const TargetOptions &TargetOpts,
- const LangOptions &LangOpts, const std::string &InFile,
+ const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+ const std::string &InFile,
SmallVector<LinkModule, 4> LinkModules,
std::unique_ptr<raw_pwrite_stream> OS, llvm::LLVMContext &C,
CoverageSourceInfo *CoverageInfo = nullptr);
@@ -88,9 +86,9 @@ public:
const HeaderSearchOptions &HeaderSearchOpts,
const PreprocessorOptions &PPOpts,
const CodeGenOptions &CodeGenOpts,
- const TargetOptions &TargetOpts,
- const LangOptions &LangOpts, llvm::Module *Module,
- SmallVector<LinkModule, 4> LinkModules, llvm::LLVMContext &C,
+ const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+ llvm::Module *Module, SmallVector<LinkModule, 4> LinkModules,
+ llvm::LLVMContext &C,
CoverageSourceInfo *CoverageInfo = nullptr);
llvm::Module *getModule() const;
@@ -111,19 +109,18 @@ public:
void AssignInheritanceModel(CXXRecordDecl *RD) override;
void HandleVTable(CXXRecordDecl *RD) override;
-
- // Links each entry in LinkModules into our module. Returns true on error.
+ // Links each entry in LinkModules into our module. Returns true on error.
bool LinkInModules(llvm::Module *M, bool ShouldLinkFiles = true);
/// Get the best possible source location to represent a diagnostic that
/// may have associated debug info.
- const FullSourceLoc getBestLocationFromDebugLoc(
- const llvm::DiagnosticInfoWithLocationBase &D,
- bool &BadDebugInfo, StringRef &Filename,
- unsigned &Line, unsigned &Column) const;
+ const FullSourceLoc
+ getBestLocationFromDebugLoc(const llvm::DiagnosticInfoWithLocationBase &D,
+ bool &BadDebugInfo, StringRef &Filename,
+ unsigned &Line, unsigned &Column) const;
- std::optional<FullSourceLoc> getFunctionSourceLocation(
- const llvm::Function &F) const;
+ std::optional<FullSourceLoc>
+ getFunctionSourceLocation(const llvm::Function &F) const;
void DiagnosticHandlerImpl(const llvm::DiagnosticInfo &DI);
/// Specialized handler for InlineAsm diagnostic.
@@ -148,14 +145,13 @@ public:
/// them.
void EmitOptimizationMessage(const llvm::DiagnosticInfoOptimizationBase &D,
unsigned DiagID);
+ void OptimizationRemarkHandler(const llvm::DiagnosticInfoOptimizationBase &D);
+ void
+ OptimizationRemarkHandler(const llvm::OptimizationRemarkAnalysisFPCommute &D);
+ void
+ OptimizationRemarkHandler(const llvm::OptimizationRemarkAnalysisAliasing &D);
void
- OptimizationRemarkHandler(const llvm::DiagnosticInfoOptimizationBase &D);
- void OptimizationRemarkHandler(
- const llvm::OptimizationRemarkAnalysisFPCommute &D);
- void OptimizationRemarkHandler(
- const llvm::OptimizationRemarkAnalysisAliasing &D);
- void OptimizationFailureHandler(
- const llvm::DiagnosticInfoOptimizationFailure &D);
+ OptimizationFailureHandler(const llvm::DiagnosticInfoOptimizationFailure &D);
void DontCallDiagHandler(const llvm::DiagnosticInfoDontCall &D);
/// Specialized handler for misexpect warnings.
/// Note that misexpect remarks are emitted through ORE
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index bd4a5b0e7..37ecb1238 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -57,6 +57,7 @@
#include "llvm/Target/TargetOptions.h"
#include "llvm/TargetParser/SubtargetFeature.h"
#include "llvm/TargetParser/Triple.h"
+#include "llvm/Transforms/HipStdPar/HipStdPar.h"
#include "llvm/Transforms/IPO/EmbedBitcodePass.h"
#include "llvm/Transforms/IPO/LowerTypeTests.h"
#include "llvm/Transforms/IPO/ThinLTOBitcodeWriter.h"
@@ -80,7 +81,6 @@
#include "llvm/Transforms/Scalar/EarlyCSE.h"
#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/JumpThreading.h"
-#include "llvm/Transforms/HipStdPar/HipStdPar.h"
#include "llvm/Transforms/Utils/Debugify.h"
#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"
@@ -1073,7 +1073,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
return;
}
if (CodeGenOpts.UnifiedLTO)
- TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
+ TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO",
+ uint32_t(1));
MPM.addPass(ThinLTOBitcodeWriterPass(
*OS, ThinLinkOS ? &ThinLinkOS->os() : nullptr));
} else {
@@ -1092,7 +1093,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
TheModule->addModuleFlag(llvm::Module::Error, "EnableSplitLTOUnit",
uint32_t(1));
if (CodeGenOpts.UnifiedLTO)
- TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO", uint32_t(1));
+ TheModule->addModuleFlag(llvm::Module::Error, "UnifiedLTO",
+ uint32_t(1));
}
if (Action == Backend_EmitBC)
MPM.addPass(BitcodeWriterPass(*OS, CodeGenOpts.EmitLLVMUseLists,
@@ -1208,13 +1210,14 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
DwoOS->keep();
}
-static void runThinLTOBackend(
- DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex,
- llvm::Module *M, const HeaderSearchOptions &HeaderOpts,
- const CodeGenOptions &CGOpts, const clang::TargetOptions &TOpts,
- const LangOptions &LOpts, std::unique_ptr<raw_pwrite_stream> OS,
- std::string SampleProfile, std::string ProfileRemapping,
- BackendAction Action) {
+static void
+runThinLTOBackend(DiagnosticsEngine &Diags, ModuleSummaryIndex *CombinedIndex,
+ llvm::Module *M, const HeaderSearchOptions &HeaderOpts,
+ const CodeGenOptions &CGOpts,
+ const clang::TargetOptions &TOpts, const LangOptions &LOpts,
+ std::unique_ptr<raw_pwrite_stream> OS,
+ std::string SampleProfile, std::string ProfileRemapping,
+ BackendAction Action) {
DenseMap<StringRef, DenseMap<GlobalValue::GUID, GlobalValueSummary *>>
ModuleToDefinedGVSummaries;
CombinedIndex->collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index a31a271ed..a2dc937ea 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -91,83 +91,73 @@ static void reportOptRecordError(Error E, DiagnosticsEngine &Diags,
const CodeGenOptions &CodeGenOpts) {
handleAllErrors(
std::move(E),
- [&](const LLVMRemarkSetupFileError &E) {
+ [&](const LLVMRemarkSetupFileError &E) {
Diags.Report(diag::err_cannot_open_file)
<< CodeGenOpts.OptRecordFile << E.message();
},
- [&](const LLVMRemarkSetupPatternError &E) {
+ [&](const LLVMRemarkSetupPatternError &E) {
Diags.Report(diag::err_drv_optimization_remark_pattern)
<< E.message() << CodeGenOpts.OptRecordPasses;
},
- [&](const LLVMRemarkSetupFormatError &E) {
+ [&](const LLVMRemarkSetupFormatError &E) {
Diags.Report(diag::err_drv_optimization_remark_format)
<< CodeGenOpts.OptRecordFormat;
});
}
-BackendConsumer::BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
- IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
- const HeaderSearchOptions &HeaderSearchOpts,
- const PreprocessorOptions &PPOpts,
- const CodeGenOptions &CodeGenOpts,
- const TargetOptions &TargetOpts,
- const LangOptions &LangOpts,
- const std::string &InFile,
- SmallVector<LinkModule, 4> LinkModules,
- std::unique_ptr<raw_pwrite_stream> OS,
- LLVMContext &C,
- CoverageSourceInfo *CoverageInfo)
- : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
- CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
- AsmOutStream(std::move(OS)), Context(nullptr), FS(VFS),
- LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
- LLVMIRGenerationRefCount(0),
- Gen(CreateLLVMCodeGen(Diags, InFile, std::move(VFS), HeaderSearchOpts,
- PPOpts, CodeGenOpts, C, CoverageInfo)),
- LinkModules(std::move(LinkModules)) {
- TimerIsEnabled = CodeGenOpts.TimePasses;
- llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
- llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
+BackendConsumer::BackendConsumer(
+ BackendAction Action, DiagnosticsEngine &Diags,
+ IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+ const HeaderSearchOptions &HeaderSearchOpts,
+ const PreprocessorOptions &PPOpts, const CodeGenOptions &CodeGenOpts,
+ const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+ const std::string &InFile, SmallVector<LinkModule, 4> LinkModules,
+ std::unique_ptr<raw_pwrite_stream> OS, LLVMContext &C,
+ CoverageSourceInfo *CoverageInfo)
+ : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
+ CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
+ AsmOutStream(std::move(OS)), Context(nullptr), FS(VFS),
+ LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
+ LLVMIRGenerationRefCount(0),
+ Gen(CreateLLVMCodeGen(Diags, InFile, std::move(VFS), HeaderSearchOpts,
+ PPOpts, CodeGenOpts, C, CoverageInfo)),
+ LinkModules(std::move(LinkModules)) {
+ TimerIsEnabled = CodeGenOpts.TimePasses;
+ llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
+ llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
}
// This constructor is used in installing an empty BackendConsumer
// to use the clang diagnostic handler for IR input files. It avoids
// initializing the OS field.
-BackendConsumer::BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
- IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
- const HeaderSearchOptions &HeaderSearchOpts,
- const PreprocessorOptions &PPOpts,
- const CodeGenOptions &CodeGenOpts,
- const TargetOptions &TargetOpts,
- const LangOptions &LangOpts,
- llvm::Module *Module,
- SmallVector<LinkModule, 4> LinkModules,
- LLVMContext &C,
- CoverageSourceInfo *CoverageInfo)
- : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
- CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
- Context(nullptr), FS(VFS),
- LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
- LLVMIRGenerationRefCount(0),
- Gen(CreateLLVMCodeGen(Diags, "", std::move(VFS), HeaderSearchOpts,
- PPOpts, CodeGenOpts, C, CoverageInfo)),
- LinkModules(std::move(LinkModules)), CurLinkModule(Module) {
- TimerIsEnabled = CodeGenOpts.TimePasses;
- llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
- llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
+BackendConsumer::BackendConsumer(
+ BackendAction Action, DiagnosticsEngine &Diags,
+ IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
+ const HeaderSearchOptions &HeaderSearchOpts,
+ const PreprocessorOptions &PPOpts, const CodeGenOptions &CodeGenOpts,
+ const TargetOptions &TargetOpts, const LangOptions &LangOpts,
+ llvm::Module *Module, SmallVector<LinkModule, 4> LinkModules,
+ LLVMContext &C, CoverageSourceInfo *CoverageInfo)
+ : Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
+ CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
+ Context(nullptr), FS(VFS),
+ LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
+ LLVMIRGenerationRefCount(0),
+ Gen(CreateLLVMCodeGen(Diags, "", std::move(VFS), HeaderSearchOpts, PPOpts,
+ CodeGenOpts, C, CoverageInfo)),
+ LinkModules(std::move(LinkModules)), CurLinkModule(Module) {
+ TimerIsEnabled = CodeGenOpts.TimePasses;
+ llvm::TimePassesIsEnabled = CodeGenOpts.TimePasses;
+ llvm::TimePassesPerRun = CodeGenOpts.TimePassesPerRun;
}
-llvm::Module* BackendConsumer::getModule() const {
- return Gen->GetModule();
-}
+llvm::Module *BackendConsumer::getModule() const { return Gen->GetModule(); }
std::unique_ptr<llvm::Module> BackendConsumer::takeModule() {
return std::unique_ptr<llvm::Module>(Gen->ReleaseModule());
}
-CodeGenerator* BackendConsumer::getCodeGenerator() {
- return Gen.get();
-}
+CodeGenerator *BackendConsumer::getCodeGenerator() { return Gen.get(); }
void BackendConsumer::HandleCXXStaticMemberVarInstantiation(VarDecl *VD) {
Gen->HandleCXXStaticMemberVarInstantiation(VD);
@@ -247,7 +237,7 @@ bool BackendConsumer::LinkInModules(llvm::Module *M, bool ShouldLinkFiles) {
if (F.isIntrinsic())
continue;
CodeGen::mergeDefaultFunctionDefinitionAttributes(
- F, CodeGenOpts, LangOpts, TargetOpts, LM.Internalize);
+ F, CodeGenOpts, LangOpts, TargetOpts, LM.Internalize);
}
CurLinkModule = LM.Module.get();
@@ -309,15 +299,15 @@ void BackendConsumer::HandleTranslationUnit(ASTContext &C) {
LLVMContext &Ctx = getModule()->getContext();
std::unique_ptr<DiagnosticHandler> OldDiagnosticHandler =
- Ctx.getDiagnosticHandler();
- Ctx.setDiagnosticHandler(std::make_unique<ClangDiagnosticHandler>(
- CodeGenOpts, this));
+ Ctx.getDiagnosticHandler();
+ Ctx.setDiagnosticHandler(
+ std::make_unique<ClangDiagnosticHandler>(CodeGenOpts, this));
Expected<std::unique_ptr<llvm::ToolOutputFile>> OptRecordFileOrErr =
- setupLLVMOptimizationRemarks(
- Ctx, CodeGenOpts.OptRecordFile, CodeGenOpts.OptRecordPasses,
- CodeGenOpts.OptRecordFormat, CodeGenOpts.DiagnosticsWithHotness,
- CodeGenOpts.DiagnosticsHotnessThreshold);
+ setupLLVMOptimizationRemarks(
+ Ctx, CodeGenOpts.OptRecordFile, CodeGenOpts.OptRecordPasses,
+ CodeGenOpts.OptRecordFormat, CodeGenOpts.DiagnosticsWithHotness,
+ CodeGenOpts.DiagnosticsHotnessThreshold);
if (Error E = OptRecordFileOrErr.takeError()) {
reportOptRecordError(std::move(E), Diags, CodeGenOpts);
@@ -325,7 +315,7 @@ void BackendConsumer::HandleTranslationUnit(ASTContext &C) {
}
std::unique_ptr<llvm::ToolOutputFile> OptRecordFile =
- std::move(*OptRecordFileOrErr);
+ std::move(*OptRecordFileOrErr);
if (OptRecordFile &&
CodeGenOpts.getProfileUse() != CodeGenOptions::ProfileNone)
@@ -337,7 +327,7 @@ void BackendConsumer::HandleTranslationUnit(ASTContext &C) {
if (CodeGenOpts.DiagnosticsMisExpectTolerance) {
Ctx.setDiagnosticsMisExpectTolerance(
- CodeGenOpts.DiagnosticsMisExpectTolerance);
+ CodeGenOpts.DiagnosticsMisExpectTolerance);
}
// Link each LinkModule into our module.
@@ -400,11 +390,9 @@ void BackendConsumer::AssignInheritanceModel(CXXRecordDecl *RD) {
Gen->AssignInheritanceModel(RD);
}
-void BackendConsumer::HandleVTable(CXXRecordDecl *RD) {
- Gen->HandleVTable(RD);
-}
+void BackendConsumer::HandleVTable(CXXRecordDecl *RD) { Gen->HandleVTable(RD); }
-void BackendConsumer::anchor() { }
+void BackendConsumer::anchor() {}
} // namespace clang
|
This approach assumes that whatever the function call was transformed into also exists in the same library, which isn't necessarily true. This is related to a whole host of problems with handling known runtime library calls inside of bitcode. It's possible that we could use some module metadata to indicate when these sorts of transformations are legal within LLVM-IR. I've had similar issues with LTO, where a |
True, good point. But I don't think it's necessarily due to this approach, but more of how AMDGPULibCalls is implemented. It seems like the instruction folding implementations are assuming the definitions of the new functions they're inserting will be linked in at a later point? Previously we were doing it via a separate llvm-link call to re-link all the device libraries. With this approach, we're doing it as part of the clang optimization pipeline, and would no longer need that extra link step. Maybe this approach could be a solution for the time being? With a longer term solution being to think more carefully about the legality of AMDGPULibCalls inserting function calls that may or may not exist? Maybe device library calls are the exception? |
This is part of why I'm not very happy with the current approach to just Generally this occurs because |
That probably won't work. After they are internalized, they have internal linkage and cannot be used to resolve newly added call of the same function. The purpose of internalization is to allow you to optimize those functions, e.g. changing their signatures. After internalization, there is no guarantee they behave like the original functions. |
b7af255
to
155e15e
Compare
I meant this more in this particular case, if we internalize but keep all the symbols, then the functions will be preserved so we can call them inside of the module when the AMDGPU pass is run. This would then rely on some later pass to strip out the unused ones. |
155e15e
to
ff10b14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. I remember there was a reason we couldn't use the existing linking support and needed the new pass, what was that again?
// Some optimizations may generate new function calls that would not have | ||
// been linked pre-optimization (i.e. fused sincos calls generated by | ||
// AMDGPULibCalls::fold_sincos.) | ||
if (ClRelinkBuiltinBitcodePostop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what I had in mind is that we could make a new clang
option similar to -mlink-builtin-bitcode
. This would then be used by the HIP toolchain or similar when constructing the list of files to pass via -mlink-builtin-bitcode
. We would then simply register those with this secondary pass.
This approach seems much simpler, being a boolean option that just relinks everything, but I somewhat like the idea of -mlink-builtin-bitcode
being a pre-link operation and having another one for post-linking. That being said, it may not be worth the extra work because this is a huge hack around this ecosystem already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did look into having a file-specific option to link specific bitcodes post-optimization, and I think it is a good idea overall.
It is possible to come up with a specific set of bitcodes that need to be re-linked, based on the current optimizations we've implemented that introduce undefined functions. And we could then specify that set for a file-specific post-optimization linking.
But there are some downsides:
- Any users would also need to know the specific set of bitcodes needed for post-optimization linking
- We may need to hardcode that list in a bunch of different places, both within LLVM and external APIs
- If a new optimization is implemented that expands the set of bitcodes needed for post-optimization linking, we'd need to go around and update any place that uses the file-specific options
Given that, I think the boolean relink option may serve us better for the time being
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the reason for LinkInModules pass is that we need to insert the second linking step into the RunOptimizationPipeline, specifically after the AMDGPULibCalls pass, but before the BitcodeWriter pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely the easier option. This problem is pretty specific, but I can see it being easier to just remove this class of bugs entirely. It's an ugly solution for an ugly problem overall.
clang/lib/CodeGen/BackendUtil.cpp
Outdated
std::unique_ptr<raw_pwrite_stream> &OS, | ||
std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS); | ||
std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS, | ||
BackendConsumer *BC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this properly formatted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? I'm using the same indentation level as the other arguments, and BackendConsumer needs its own line to no not overflow the character limit. What looks off about it?
} else | ||
Err = Linker::linkModules(*M, std::move(Clone), LM.LinkFlags); | ||
|
||
if (Err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we do the attribute propagation now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeGen::mergeDefaultFunctionDefinitionAttributes()? That shouldn't be affected, lines 245-253. Or maybe I'm not understanding the question
ff10b14
to
47c88a7
Compare
bd4e423
to
1aa1be1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely happy with the existence of this hack, but it's an ugly solution to an ugly self-inflicted problem, so I can live with it for now.
The code formatter says that it's not happy. Can you try |
1aa1be1
to
83c922c
Compare
I am getting this from the formatter:
But in this case I am just following the existing style. I did notice a couple of other improvements from the formatter though, and I've added those changes. |
Just do what the formatter says, not every file is 100% clang-formatted so there's bits of old code that haven't been properly cleaned yet. This was the same line that I thought looked wrong so it should probably be fixed. Using |
Isn't the standard to follow the existing style, not re-format small sections of code during a commit to a different style? [Always follow the golden rule: If you are extending, enhancing, or bug fixing already implemented code, use the style that is already being used so that the source is uniform and easy to follow.](https://llvm.org/docs/CodingStandards.html) |
Yes, but this doesn't really apply since you changed the function signature so it needs to be reformatted. That rule primarily applies to sections that have been manually formatted but don't exactly match the |
I don't follow the logic there. The function signature can have a style as well. And I think this is actually a good example to demonstrate a reason not to reformat if we look at the next few lines: Following existing style:
Following clang format:
To me, the first case aligns more with the spirit of the "golden rule". And if someone wants to reformat the whole file at a later point, that would be a good time to change the style for both signatures. That said, I definitely don't want this to be a barrier to getting this patch in, so if you still feel like we should go with the clang-format recommendation, I'll change it and also update the EmitAssembly and EmitBackendOutput signatures which were flagged by clang-format for the same reasons. |
You should generally just go with what |
83c922c
to
5c35c01
Compare
sometimes the formatter checks the diff of the rebase when you rebase your code. However if you rebase your commits against ToT of trunk the formatter issue may disappear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
5c35c01
to
e358ed6
Compare
Previously the BackendConsumer class was defined in CodeGenAction.cpp. In order to use this class and its member functions in other files for upcoming changes, we first need to refactor the class definition into a separate header
Add an llvm namespace to Module references to disambiguate it from clang::Module. This is needed to include new header files that expose clang::Module
…odes In this patch, we create a new ModulePass that mimics the LinkInModules API from CodeGenAction.cpp, and a new command line option to enable the pass. With this new pass, we can now re-link bitcodes supplied via the -mlink-built-in bitcodes as part of the RunOptimizationPipeline. With the re-linking pass, we now handle cases where new device library functions are introduced as part of the optimization pipeline. Previously, these newly introduced functions (for example a fused sincos call) would result in a linking error due to a missing function defintion. This new pass can be initiated via: -mllvm -relink-builtin-bitcode-postop Also note we intentionally exclude bitcodes supplied via the -mlink-bitcode-file option from the second linking step
e358ed6
to
cf294be
Compare
@@ -113,7 +120,7 @@ class EmitAssemblyHelper { | |||
const CodeGenOptions &CodeGenOpts; | |||
const clang::TargetOptions &TargetOpts; | |||
const LangOptions &LangOpts; | |||
Module *TheModule; | |||
llvm::Module *TheModule; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this suddenly need qualification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When including the new BackendConsumer.h header file to BackendUtils.cpp, we were indirectly including another header several layers deep that had defined clang::Module. Previously this file had no references to clang::Module via any includes, but with the new include we were getting an "ambiguous reference to Module" error. This was fixed by specifying which Module class we were using in this file.
I don't remember exactly which nested include referenced clang::Module, I could track down the include path again if it's helpful though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the qualification, we get the following:
llvm-project/clang/lib/CodeGen/BackendUtil.cpp:123:3: error: reference to ‘Module’ is ambiguous
123 | Module *TheModule;
| ^~~~~~
In file included from llvm-project/llvm/include/llvm/IR/ModuleSummaryIndex.h:28,
from llvm-project/clang/include/clang/CodeGen/BackendUtil.h:13,
from llvm-project/clang/lib/CodeGen/BackendUtil.cpp:9:
llvm-project/llvm/include/llvm/IR/Module.h:65:32: note: candidates are: ‘class llvm::Module’
65 | class LLVM_EXTERNAL_VISIBILITY Module {
| ^~~~~~
In file included from llvm-project/clang/include/clang/Lex/ModuleLoader.h:18,
from llvm-project/clang/include/clang/Frontend/ASTUnit.h:26,
from llvm-project/clang/include/clang/Frontend/FrontendAction.h:23,
from llvm-project/clang/include/clang/CodeGen/CodeGenAction.h:12,
from llvm-project/clang/lib/CodeGen/BackendConsumer.h:13,
from llvm-project/clang/lib/CodeGen/BackendUtil.cpp:10:
llvm-project/clang/include/clang/Basic/Module.h:105:18: note: ‘class clang::Module’
105 | class alignas(8) Module {
| ^~~~~~
// Re-link builtin bitcodes after optimization | ||
static cl::opt<bool> ClRelinkBuiltinBitcodePostop( | ||
"relink-builtin-bitcode-postop", cl::Optional, | ||
cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a proper flag? Where/how is -mlink-builtin-bitcode defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a clang flag, this is presumably more of an LLVM one because this added a new pass that lives in Clang. I still think the solution to this was to just stop the backend from doing this optimization if it will obviously break it, but supposedly that caused performance regressions.
// Create a Clone to move to the linker, which preserves the original | ||
// linking modules, allowing them to be linked again in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is outside my area of expertise, but I have a question about this. We are seeing huge slowdown running the LIBCLC tests after this change. Each test was running in a fraction of a second, but now with the cloning the tests are running 1-2 minutes (and there are a few sets of 1000 tests).
Is this expected? The command line looks something like this for the tests:
clang -fno-builtin -I /my/llvm/libclc/generic/include -target amdgcn--amdhsa -Xclang -mlink-builtin-bitcode -Xclang /mybuild/llvm/./lib/clc/libspirv-amdgcn--amdhsa.bc -nogpulib -Xclang -cl-ext=-cl_khr_fp16 -emit-llvm -S -o - /my/llvm/libclc/test/binding/core/ConvertUToF_Rdouble4_rtn.cl
Looks like the .bc file here has about 19,000 functions to clone.
Just thought I'd ask before I dig into this too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, we definitely did not expect module cloning to have significant performance impacts for compilation.
One thing we considered was to only do module cloning if we expect to need the clones for a second link step. That may address your performance issue. I'll draft up a patch and link it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling with the ROCm device libraries is currently possible via the *_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC action. However, with this action it is possible to encounter LLD errors if new device library calls are introduced during optimization. In this patch, we add a newly introduced LLVM option, -relink-builtin-bitcode-postop (llvm#69371), to re-link the device libraries after optimization, resolving any newly introduced definitions. We also update the relevant test to reflect this improvement Change-Id: I3889b7a5ffab795065684c59486c6a9965a038f2
…m#69371) In this patch, we create a new ModulePass that mimics the LinkInModules API from CodeGenAction.cpp, and a new command line option to enable the pass. As part of the implementation, we needed to refactor the BackendConsumer class definition into a new separate header (instead of embedded in CodeGenAction.cpp). With this new pass, we can now re-link bitcodes supplied via the -mlink-built-in bitcodes as part of the RunOptimizationPipeline. With the re-linking pass, we now handle cases where new device library functions are introduced as part of the optimization pipeline. Previously, these newly introduced functions (for example a fused sincos call) would result in a linking error due to a missing function definition. This new pass can be initiated via: -mllvm -relink-builtin-bitcode-postop Also note we intentionally exclude bitcodes supplied via the -mlink-bitcode-file option from the second linking step Change-Id: I77b6e79e69b89feb8ca7c3ac6f792f6881a23129
Compiling with the ROCm device libraries is currently possible via the *_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC action. However, with this action it is possible to encounter LLD errors if new device library calls are introduced during optimization. In this patch, we add a newly introduced LLVM option, -relink-builtin-bitcode-postop (llvm#69371), to re-link the device libraries after optimization, resolving any newly introduced definitions. We also update the relevant test to reflect this improvement Change-Id: I3889b7a5ffab795065684c59486c6a9965a038f2
In this patch, we create a new ModulePass that mimics the LinkInModules API from CodeGenAction.cpp, and a new command line option to enable the pass. As part of the implementation, we needed to refactor the BackendConsumer class definition into a new separate header (instead of embedded in CodeGenAction.cpp). With this new pass, we can now re-link bitcodes supplied via the -mlink-built-in bitcodes as part of the RunOptimizationPipeline.
With the re-linking pass, we now handle cases where new device library functions are introduced as part of the optimization pipeline. Previously, these newly introduced functions (for example a fused sincos call) would result in a linking error due to a missing function definition. This new pass can be initiated via:
Also note we intentionally exclude bitcodes supplied via the -mlink-bitcode-file option from the second linking step