-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Reland [HIP] use offload wrapper for non-device-only non-rdc (#132869) #143964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR relands the HIP offload wrapper changes to address two issues: an assertion failure with -flto due to a missing linker wrapper action for the device binary and kernel lookup failures when multiple HIP files are present.
- Update the section renaming logic in the Clang linker wrapper to correctly handle the offload entries for HIP and LLVM.
- Adjust tests and driver/toolchain logic for both -fgpu‑rdc and -fno‑gpu‑rdc modes.
- Modify conditions in CodeGen and Driver to properly account for HIP non-RDC compilation.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp | Update offload entries section renaming to include "llvm" and adjust the corresponding arguments. |
clang/test/Driver/linker-wrapper.c | Update test expectations to match the new section rename. |
clang/test/Driver/hip-toolchain-no-rdc.hip | Adjust the FileCheck prefixes in tests for HIP non-RDC mode. |
clang/test/Driver/hip-phases.hip | Update phase printing checks to align with new offload entries and packaging behavior. |
clang/test/Driver/hip-binding.hip | Revise LTO packaging expectations for HIP binding. |
clang/lib/Driver/ToolChains/Clang.cpp | Narrow the condition to only CUDA for one branch, excluding HIP; adjust linker wrapper argument logic. |
clang/lib/Driver/Driver.cpp | Introduce and use HIPNoRDC flag and adjust conditions for offloading actions in HIP mode. |
clang/lib/CodeGen/CGCUDANV.cpp | Modify condition to include HIP along with relocatable device code when using the new driver. |
Comments suppressed due to low confidence (4)
clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:313
- Consider adding an inline comment explaining the addition of the 'llvm' prefix to clarify that it ensures the offload entries are made unique across object files according to test expectations.
for (StringRef Prefix : {"omp", "cuda", "hip", "llvm"}) {
clang/lib/Driver/ToolChains/Clang.cpp:7705
- Please add a comment clarifying why HIP is excluded from this branch, which would improve the maintainability and clarity of this condition.
if (IsCuda && !IsRDCMode) {
clang/lib/Driver/Driver.cpp:5203
- [nitpick] Consider refactoring this compound conditional to improve its readability and maintainability, possibly by extracting parts of the condition into a well-named variable.
((Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) || (Args.hasFlag(options::OPT_offload_new_driver, options::OPT_no_offload_new_driver, false) && !offloadDeviceOnly())) ||
clang/lib/CodeGen/CGCUDANV.cpp:1283
- Add a comment explaining why the HIP flag is combined with RelocatableDeviceCode in this conditional to clarify the intended behavior for offloading when using the new driver.
if (CGM.getLangOpts().OffloadingNewDriver && (CGM.getLangOpts().HIP || RelocatableDeviceCode))
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-driver Author: Yaxun (Sam) Liu (yxsamliu) ChangesFixed two issues:
Patch is 26.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143964.diff 8 Files Affected:
diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 38f514304df5e..dd26be74e561b 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -1280,7 +1280,8 @@ llvm::Function *CGNVCUDARuntime::finalizeModule() {
return nullptr;
}
if (CGM.getLangOpts().OffloadViaLLVM ||
- (CGM.getLangOpts().OffloadingNewDriver && RelocatableDeviceCode))
+ (CGM.getLangOpts().OffloadingNewDriver &&
+ (CGM.getLangOpts().HIP || RelocatableDeviceCode)))
createOffloadingEntries();
else
return makeModuleCtorFunction();
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index eb60d907d2218..060f76fb653c9 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4423,6 +4423,10 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
options::OPT_no_offload_new_driver,
C.isOffloadingHostKind(Action::OFK_Cuda));
+ bool HIPNoRDC =
+ C.isOffloadingHostKind(Action::OFK_HIP) &&
+ !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
+
// Builder to be used to build offloading actions.
std::unique_ptr<OffloadingActionBuilder> OffloadBuilder =
!UseNewOffloadingDriver
@@ -4556,7 +4560,7 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
// Check if this Linker Job should emit a static library.
if (ShouldEmitStaticLibrary(Args)) {
LA = C.MakeAction<StaticLibJobAction>(LinkerInputs, types::TY_Image);
- } else if (UseNewOffloadingDriver ||
+ } else if ((UseNewOffloadingDriver && !HIPNoRDC) ||
Args.hasArg(options::OPT_offload_link)) {
LA = C.MakeAction<LinkerWrapperJobAction>(LinkerInputs, types::TY_Image);
LA->propagateHostOffloadInfo(C.getActiveOffloadKinds(),
@@ -4867,10 +4871,31 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
const InputTy &Input, StringRef CUID,
Action *HostAction) const {
// Don't build offloading actions if explicitly disabled or we do not have a
- // valid source input and compile action to embed it in. If preprocessing only
- // ignore embedding.
- if (offloadHostOnly() || !types::isSrcFile(Input.first) ||
- !(isa<CompileJobAction>(HostAction) ||
+ // valid source input.
+ if (offloadHostOnly() || !types::isSrcFile(Input.first))
+ return HostAction;
+
+ bool HIPNoRDC =
+ C.isOffloadingHostKind(Action::OFK_HIP) &&
+ !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
+
+ // For HIP non-rdc non-device-only compilation, create a linker wrapper
+ // action for each host object to link, bundle and wrap device files in
+ // it.
+ if ((isa<AssembleJobAction>(HostAction) ||
+ (isa<BackendJobAction>(HostAction) &&
+ HostAction->getType() == types::TY_LTO_BC)) &&
+ HIPNoRDC && !offloadDeviceOnly()) {
+ ActionList AL{HostAction};
+ HostAction = C.MakeAction<LinkerWrapperJobAction>(AL, types::TY_Object);
+ HostAction->propagateHostOffloadInfo(C.getActiveOffloadKinds(),
+ /*BoundArch=*/nullptr);
+ return HostAction;
+ }
+
+ // Don't build offloading actions if we do not have a compile action. If
+ // preprocessing only ignore embedding.
+ if (!(isa<CompileJobAction>(HostAction) ||
getFinalPhase(Args) == phases::Preprocess))
return HostAction;
@@ -4966,12 +4991,12 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
}
}
- // Compiling HIP in non-RDC mode requires linking each action individually.
+ // Compiling HIP in device-only non-RDC mode requires linking each action
+ // individually.
for (Action *&A : DeviceActions) {
if ((A->getType() != types::TY_Object &&
A->getType() != types::TY_LTO_BC) ||
- Kind != Action::OFK_HIP ||
- Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))
+ !HIPNoRDC || !offloadDeviceOnly())
continue;
ActionList LinkerInput = {A};
A = C.MakeAction<LinkJobAction>(LinkerInput, types::TY_Image);
@@ -4995,12 +5020,12 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
}
}
- // HIP code in non-RDC mode will bundle the output if it invoked the linker.
+ // HIP code in device-only non-RDC mode will bundle the output if it invoked
+ // the linker.
bool ShouldBundleHIP =
- C.isOffloadingHostKind(Action::OFK_HIP) &&
+ HIPNoRDC && offloadDeviceOnly() &&
Args.hasFlag(options::OPT_gpu_bundle_output,
options::OPT_no_gpu_bundle_output, true) &&
- !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) &&
!llvm::any_of(OffloadActions,
[](Action *A) { return A->getType() != types::TY_Image; });
@@ -5020,11 +5045,9 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
C.MakeAction<LinkJobAction>(OffloadActions, types::TY_CUDA_FATBIN);
DDep.add(*FatbinAction, *C.getSingleOffloadToolChain<Action::OFK_Cuda>(),
nullptr, Action::OFK_Cuda);
- } else if (C.isOffloadingHostKind(Action::OFK_HIP) &&
- !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
- false)) {
- // If we are not in RDC-mode we just emit the final HIP fatbinary for each
- // translation unit, linking each input individually.
+ } else if (HIPNoRDC && offloadDeviceOnly()) {
+ // If we are in device-only non-RDC-mode we just emit the final HIP
+ // fatbinary for each translation unit, linking each input individually.
Action *FatbinAction =
C.MakeAction<LinkJobAction>(OffloadActions, types::TY_HIP_FATBIN);
DDep.add(*FatbinAction, *C.getSingleOffloadToolChain<Action::OFK_HIP>(),
@@ -5177,8 +5200,11 @@ Action *Driver::ConstructPhaseAction(
(((Input->getOffloadingToolChain() &&
Input->getOffloadingToolChain()->getTriple().isAMDGPU()) ||
TargetDeviceOffloadKind == Action::OFK_HIP) &&
- (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
- false) ||
+ ((Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+ false) ||
+ (Args.hasFlag(options::OPT_offload_new_driver,
+ options::OPT_no_offload_new_driver, false) &&
+ !offloadDeviceOnly())) ||
TargetDeviceOffloadKind == Action::OFK_OpenMP))) {
types::ID Output =
Args.hasArg(options::OPT_S) &&
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 1d11be1d82be8..d6b89c324936f 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7702,7 +7702,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-fcuda-include-gpubinary");
CmdArgs.push_back(CudaDeviceInput->getFilename());
} else if (!HostOffloadingInputs.empty()) {
- if ((IsCuda || IsHIP) && !IsRDCMode) {
+ if (IsCuda && !IsRDCMode) {
assert(HostOffloadingInputs.size() == 1 && "Only one input expected");
CmdArgs.push_back("-fcuda-include-gpubinary");
CmdArgs.push_back(HostOffloadingInputs.front().getFilename());
@@ -9249,8 +9249,20 @@ void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
// Add the linker arguments to be forwarded by the wrapper.
CmdArgs.push_back(Args.MakeArgString(Twine("--linker-path=") +
LinkCommand->getExecutable()));
- for (const char *LinkArg : LinkCommand->getArguments())
- CmdArgs.push_back(LinkArg);
+
+ // We use action type to differentiate two use cases of the linker wrapper.
+ // TY_Image for normal linker wrapper work.
+ // TY_Object for HIP fno-gpu-rdc embedding device binary in a relocatable
+ // object.
+ assert(JA.getType() == types::TY_Object || JA.getType() == types::TY_Image);
+ if (JA.getType() == types::TY_Object) {
+ CmdArgs.append({"-o", Output.getFilename()});
+ for (auto Input : Inputs)
+ CmdArgs.push_back(Input.getFilename());
+ CmdArgs.push_back("-r");
+ } else
+ for (const char *LinkArg : LinkCommand->getArguments())
+ CmdArgs.push_back(LinkArg);
addOffloadCompressArgs(Args, CmdArgs);
diff --git a/clang/test/Driver/hip-binding.hip b/clang/test/Driver/hip-binding.hip
index 57e57194ec87b..d8b3f1e242018 100644
--- a/clang/test/Driver/hip-binding.hip
+++ b/clang/test/Driver/hip-binding.hip
@@ -93,7 +93,7 @@
// RUN: -nogpulib -nogpuinc -foffload-lto --offload-arch=gfx90a --offload-arch=gfx908 -c %s 2>&1 \
// RUN: | FileCheck -check-prefix=LTO-NO-RDC %s
// LTO-NO-RDC: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[LTO_908:.+]]"
-// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[LTO_908]]"], output: "[[OBJ_908:.+]]"
// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]"], output: "[[LTO_90A:.+]]"
-// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[LTO_90A]]"], output: "[[OBJ_90A:.+]]"
-// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[OBJ_908]]", "[[OBJ_90A]]"], output: "[[HIPFB:.+]]"
+// LTO-NO-RDC-NEXT: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[LTO_908]]", "[[LTO_90A]]"], output: "[[PKG:.+]]"
+// LTO-NO-RDC-NEXT: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT]]", "[[PKG]]"], output: "[[OBJ:.+]]"
+// LTO-NO-RDC-NEXT: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[OBJ]]"], output: "hip-binding.o"
diff --git a/clang/test/Driver/hip-phases.hip b/clang/test/Driver/hip-phases.hip
index 5fd2c0216ccc3..d8a58b78d6d5c 100644
--- a/clang/test/Driver/hip-phases.hip
+++ b/clang/test/Driver/hip-phases.hip
@@ -8,39 +8,57 @@
//
// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
// RUN: --no-offload-new-driver --cuda-gpu-arch=gfx803 %s 2>&1 \
-// RUN: | FileCheck -check-prefixes=BIN,NRD,OLD %s
+// RUN: | FileCheck -check-prefixes=BIN,OLD,OLDN %s
// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
// RUN: --offload-new-driver --cuda-gpu-arch=gfx803 %s 2>&1 \
-// RUN: | FileCheck -check-prefixes=BIN,NRD,NEW %s
+// RUN: | FileCheck -check-prefixes=BIN,NEW,NEWN %s
+// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
+// RUN: --offload-new-driver --cuda-gpu-arch=gfx803 -flto -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=BIN,NEW,NEWLTO %s
//
// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
// RUN: --no-offload-new-driver --cuda-gpu-arch=gfx803 -fgpu-rdc %s 2>&1 \
-// RUN: | FileCheck -check-prefixes=BIN,RDC %s
+// RUN: | FileCheck -check-prefixes=BIN,OLD,OLDR %s
+// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
+// RUN: --offload-new-driver --cuda-gpu-arch=gfx803 -fgpu-rdc %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=BIN,NEW,NEWR %s
//
// BIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T:hip]], (host-[[T]])
// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
-// RDC-DAG: [[P12:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])
-// RDC-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// OLDR-DAG: [[P12:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])
+// OLDR-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
// BIN-DAG: [[P3:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T]], (device-[[T]], [[ARCH:gfx803]])
// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-[[T]], [[ARCH]])
-// NRD-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH]])
-// NRD-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH]])
-// RDC-DAG: [[P7:[0-9]+]]: backend, {[[P5]]}, ir, (device-[[T]], [[ARCH]])
-// BIN-DAG: [[P8:[0-9]+]]: linker, {[[P7]]}, image, (device-[[T]], [[ARCH]])
-// BIN-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P8]]}, image
-// NRD-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, hip-fatbin, (device-[[T]])
-// RDC-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, object, (device-[[T]])
-
-// NRD-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (x86_64-unknown-linux-gnu)" {[[P2]]}, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, ir
-// RDC-DAG: [[P11:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, object
-// NRD-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
-// NRD-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
-// OLD-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-[[T]])
-// NEW-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, image, (host-[[T]])
-// RDC-DAG: [[P14:[0-9]+]]: linker, {[[P13]], [[P11]]}, image, (host-[[T]])
+// OLDN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH]])
+// NEW-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, ir, (device-[[T]], [[ARCH]])
+// OLDN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH]])
+// OLDR-DAG: [[P7:[0-9]+]]: backend, {[[P5]]}, ir, (device-[[T]], [[ARCH]])
+// OLD-DAG: [[P8:[0-9]+]]: linker, {[[P7]]}, image, (device-[[T]], [[ARCH]])
+// OLD-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P8]]}, image
+// NEW-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P6]]}, ir
+// OLDN-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, hip-fatbin, (device-[[T]])
+// NEW-DAG: [[P10:[0-9]+]]: clang-offload-packager, {[[P9]]}, image, (device-[[T]])
+// OLDR-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, object, (device-[[T]])
+
+// OLDN-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (x86_64-unknown-linux-gnu)" {[[P2]]}, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, ir
+// NEW-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (x86_64-unknown-linux-gnu)" {[[P2]]}, "device-[[T]] (x86_64-unknown-linux-gnu)" {[[P10]]}, ir
+// OLDR-DAG: [[P11:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, object
+// OLDN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
+// OLDN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// NEWN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
+// NEWN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// NEWLTO-DAG: [[P13:[0-9]+]]: backend, {[[P11]]}, lto-bc, (host-hip)
+// NEWR-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
+// NEWR-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// OLDN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-[[T]])
+// NEWN-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, object, (host-[[T]])
+// NEWLTO-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, object, (host-[[T]])
+// OLDR-DAG: [[P14:[0-9]+]]: linker, {[[P13]], [[P11]]}, image, (host-[[T]])
+// NEWR-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, image, (host-[[T]])
+// NEWN-DAG: [[P15:[0-9]+]]: linker, {[[P14]]}, image
//
// Test single gpu architecture up to the assemble phase.
diff --git a/clang/test/Driver/hip-toolchain-no-rdc.hip b/clang/test/Driver/hip-toolchain-no-rdc.hip
index 6c69d1d51a260..ddd251b67cc57 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -7,7 +7,7 @@
// RUN: -fuse-ld=lld -B%S/Inputs/lld -nogpuinc \
// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,LINK %s
+// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,LINK,OLD %s
// RUN: %clang -### --target=x86_64-linux-gnu -fno-gpu-rdc \
// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
@@ -17,7 +17,7 @@
// RUN: -fuse-ld=lld -B%S/Inputs/lld -nogpuinc -c \
// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck -check-prefixes=CHECK %s
+// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,OLD %s
// RUN: %clang -### --target=x86_64-linux-gnu -fno-gpu-rdc \
// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
@@ -27,7 +27,7 @@
// RUN: -fuse-ld=lld -B%S/Inputs/lld -nogpuinc --offload-new-driver -c \
// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck -check-prefixes=CHECK %s
+// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,NEW %s
// RUN: touch %t/a.o %t/b.o
// RUN: %clang -### --target=x86_64-linux-gnu \
@@ -47,22 +47,23 @@
// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
-// CHECK-SAME: "-emit-obj"
+// OLD-SAME: "-emit-obj"
+// NEW-SAME: "-emit-llvm-bc"
// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"
// CHECK-SAME: "-fcuda-is-device" "-fno-threadsafe-statics" "-mllvm" "-amdgpu-internalize-symbols"
// CHECK-SAME: "-fcuda-allow-variadic-functions" "-fvisibility=hidden"
// CHECK-SAME: "-fapply-global-visibility-to-externs"
// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc"
// CHECK-SAME: "-target-cpu" "gfx803"
-// CHECK-SAME: {{.*}} "-o" [[OBJ_DEV_A_803:".*o"]] "-x" "hip"
+// CHECK-SAME: {{.*}} "-o" "[[OBJ_DEV_A_803:.*(o|bc)]]" "-x" "hip"
// CHECK-SAME: {{.*}} [[A_SRC:".*a.cu"]]
// CHECK-NOT: {{".*llvm-link"}}
// CHECK-NOT: {{".*opt"}}
// CHECK-NOT: {{".*llc"}}
-// CHECK: [[LLD: ".*lld.*"]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
-// CHECK-SAME: "-o" "[[IMG_DEV_A_803:.*out]]" [[OBJ_DEV_A_803]]
+// OLD: [[LLD: ".*lld.*"]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
+// OLD-SAME: "-o" "[[IMG_DEV_A_803:.*out]]" "[[OBJ_DEV_A_803]]"
//
// Compile device code in a.cu to code object for gfx900.
@@ -70,62 +71,71 @@
// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
-// CHECK-SAME: "-emit-obj"
+// CHECK-SAME: "-emit-{{(obj|llvm-bc)}}"
// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"
// CHECK-SAME: "-fcuda-is-device" "-fno-threadsafe-statics" "-mllvm" "-amdgpu-internalize-symbols"
// CHECK-SAME: "-fcuda-allow-variadic-functions" "-fvisibility=hidden"
// CHECK-SAME: "-fapply-global-visibility-to-externs"
// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc"
// CHECK-SAME: "-target-cpu" "gfx900"
-// CHECK-SAME: {{.*}} "-o" [[OBJ_DEV_A_900:".*o"]] "-x" "hip"
+// CHECK-SAME: {{.*}} "-o" "[[OBJ_DEV_A_900:.*(o|bc)]]" "-x" "hip"
// CHECK-SAME: {{.*}} [[A_SRC]]
// CHECK-NOT: {{".*llvm-link"}}
// CHECK-NOT: {{".*opt"}}
// CHECK-NOT: {{".*llc"}}
-// CHECK: [[LLD]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
-// CHECK-SAME: "-o" "[[IMG_DEV_A_900:.*out]]" [[OBJ_DEV_A_900]]
+// OLD: [[LLD]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
+// OLD-SAME: "-o" "[[IMG_DEV_A_900:.*out]]" "[[OBJ_DEV_A_900]]"
//
// Bundle and embed device code in host object for a.cu.
//
-// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
-// CHECK-SAME: "-bundle-align=4096"
-// CHECK-SAME: "-targets={{.*}},hipv4-amdgcn-amd-amdhsa--gfx803,hipv4-amdgcn-amd-amdhsa--gfx900"
-// CHECK-SAME: "-input={{.*}}" "-input=[[IMG_DEV_A_803]]" "-input=[[IMG_DEV_A_900]]" "-output=[[BUNDLE_A:.*hipfb]]"
+// OLD: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
+// OLD-SAME: "-bundle-align=4096"
+// OLD-SAME: "-targets={{.*}},hipv4-amdgcn-amd-amdhsa--gfx803,hipv4-amdgcn-amd-amdhsa--gfx900"
+// OLD-SAME: "-input={{.*}}" "-input=[[IMG_DEV_A_803]]" "-input=[[IMG_DEV_A_900]]" "-output=[[BUNDLE_A:.*hipfb]]"
+
+// NEW: [[PACKAGER:".*clang-offload-packager"]] "-o" "[[PACKAGE_A:.*.out]]"
+// NEW-SAME: "--image=file=[[OBJ_DEV_A_803]],triple=amdgcn-amd-amdhsa,arch=gfx803,kind=hip"
+// NEW-SAME: "--image=file=[[OBJ_DEV_A_900]],triple=amdgcn-amd-amdhsa,arch=gfx900,kind=hip"
// CHECK: [[CLANG]] "-cc1" "-triple" "x86_64-unknown-linux-gnu"
// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa"
// CHECK-SAME: "-emit-obj"
// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"
-// CHECK-SAME: {{.*}} "-fcuda-include-gpubinary" "[[BUNDLE_A]]"
-// CHECK-SAME: {{.*}} "-o" [[A_OBJ_HOST:".*o"]] "-x" "hip"
+// OLD-SAME: {{.*}} "-fcuda-include-gpubinary" "[[BUNDLE_A]]"
+// NEW-SAME: {{.*}} "-fembed-offload-object=[[PACKAGE_A]]"
+// O...
[truncated]
|
@llvm/pr-subscribers-clang-codegen Author: Yaxun (Sam) Liu (yxsamliu) ChangesFixed two issues:
Patch is 26.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143964.diff 8 Files Affected:
diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index 38f514304df5e..dd26be74e561b 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -1280,7 +1280,8 @@ llvm::Function *CGNVCUDARuntime::finalizeModule() {
return nullptr;
}
if (CGM.getLangOpts().OffloadViaLLVM ||
- (CGM.getLangOpts().OffloadingNewDriver && RelocatableDeviceCode))
+ (CGM.getLangOpts().OffloadingNewDriver &&
+ (CGM.getLangOpts().HIP || RelocatableDeviceCode)))
createOffloadingEntries();
else
return makeModuleCtorFunction();
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index eb60d907d2218..060f76fb653c9 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4423,6 +4423,10 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
options::OPT_no_offload_new_driver,
C.isOffloadingHostKind(Action::OFK_Cuda));
+ bool HIPNoRDC =
+ C.isOffloadingHostKind(Action::OFK_HIP) &&
+ !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
+
// Builder to be used to build offloading actions.
std::unique_ptr<OffloadingActionBuilder> OffloadBuilder =
!UseNewOffloadingDriver
@@ -4556,7 +4560,7 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
// Check if this Linker Job should emit a static library.
if (ShouldEmitStaticLibrary(Args)) {
LA = C.MakeAction<StaticLibJobAction>(LinkerInputs, types::TY_Image);
- } else if (UseNewOffloadingDriver ||
+ } else if ((UseNewOffloadingDriver && !HIPNoRDC) ||
Args.hasArg(options::OPT_offload_link)) {
LA = C.MakeAction<LinkerWrapperJobAction>(LinkerInputs, types::TY_Image);
LA->propagateHostOffloadInfo(C.getActiveOffloadKinds(),
@@ -4867,10 +4871,31 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
const InputTy &Input, StringRef CUID,
Action *HostAction) const {
// Don't build offloading actions if explicitly disabled or we do not have a
- // valid source input and compile action to embed it in. If preprocessing only
- // ignore embedding.
- if (offloadHostOnly() || !types::isSrcFile(Input.first) ||
- !(isa<CompileJobAction>(HostAction) ||
+ // valid source input.
+ if (offloadHostOnly() || !types::isSrcFile(Input.first))
+ return HostAction;
+
+ bool HIPNoRDC =
+ C.isOffloadingHostKind(Action::OFK_HIP) &&
+ !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
+
+ // For HIP non-rdc non-device-only compilation, create a linker wrapper
+ // action for each host object to link, bundle and wrap device files in
+ // it.
+ if ((isa<AssembleJobAction>(HostAction) ||
+ (isa<BackendJobAction>(HostAction) &&
+ HostAction->getType() == types::TY_LTO_BC)) &&
+ HIPNoRDC && !offloadDeviceOnly()) {
+ ActionList AL{HostAction};
+ HostAction = C.MakeAction<LinkerWrapperJobAction>(AL, types::TY_Object);
+ HostAction->propagateHostOffloadInfo(C.getActiveOffloadKinds(),
+ /*BoundArch=*/nullptr);
+ return HostAction;
+ }
+
+ // Don't build offloading actions if we do not have a compile action. If
+ // preprocessing only ignore embedding.
+ if (!(isa<CompileJobAction>(HostAction) ||
getFinalPhase(Args) == phases::Preprocess))
return HostAction;
@@ -4966,12 +4991,12 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
}
}
- // Compiling HIP in non-RDC mode requires linking each action individually.
+ // Compiling HIP in device-only non-RDC mode requires linking each action
+ // individually.
for (Action *&A : DeviceActions) {
if ((A->getType() != types::TY_Object &&
A->getType() != types::TY_LTO_BC) ||
- Kind != Action::OFK_HIP ||
- Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))
+ !HIPNoRDC || !offloadDeviceOnly())
continue;
ActionList LinkerInput = {A};
A = C.MakeAction<LinkJobAction>(LinkerInput, types::TY_Image);
@@ -4995,12 +5020,12 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
}
}
- // HIP code in non-RDC mode will bundle the output if it invoked the linker.
+ // HIP code in device-only non-RDC mode will bundle the output if it invoked
+ // the linker.
bool ShouldBundleHIP =
- C.isOffloadingHostKind(Action::OFK_HIP) &&
+ HIPNoRDC && offloadDeviceOnly() &&
Args.hasFlag(options::OPT_gpu_bundle_output,
options::OPT_no_gpu_bundle_output, true) &&
- !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) &&
!llvm::any_of(OffloadActions,
[](Action *A) { return A->getType() != types::TY_Image; });
@@ -5020,11 +5045,9 @@ Action *Driver::BuildOffloadingActions(Compilation &C,
C.MakeAction<LinkJobAction>(OffloadActions, types::TY_CUDA_FATBIN);
DDep.add(*FatbinAction, *C.getSingleOffloadToolChain<Action::OFK_Cuda>(),
nullptr, Action::OFK_Cuda);
- } else if (C.isOffloadingHostKind(Action::OFK_HIP) &&
- !Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
- false)) {
- // If we are not in RDC-mode we just emit the final HIP fatbinary for each
- // translation unit, linking each input individually.
+ } else if (HIPNoRDC && offloadDeviceOnly()) {
+ // If we are in device-only non-RDC-mode we just emit the final HIP
+ // fatbinary for each translation unit, linking each input individually.
Action *FatbinAction =
C.MakeAction<LinkJobAction>(OffloadActions, types::TY_HIP_FATBIN);
DDep.add(*FatbinAction, *C.getSingleOffloadToolChain<Action::OFK_HIP>(),
@@ -5177,8 +5200,11 @@ Action *Driver::ConstructPhaseAction(
(((Input->getOffloadingToolChain() &&
Input->getOffloadingToolChain()->getTriple().isAMDGPU()) ||
TargetDeviceOffloadKind == Action::OFK_HIP) &&
- (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
- false) ||
+ ((Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
+ false) ||
+ (Args.hasFlag(options::OPT_offload_new_driver,
+ options::OPT_no_offload_new_driver, false) &&
+ !offloadDeviceOnly())) ||
TargetDeviceOffloadKind == Action::OFK_OpenMP))) {
types::ID Output =
Args.hasArg(options::OPT_S) &&
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 1d11be1d82be8..d6b89c324936f 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7702,7 +7702,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-fcuda-include-gpubinary");
CmdArgs.push_back(CudaDeviceInput->getFilename());
} else if (!HostOffloadingInputs.empty()) {
- if ((IsCuda || IsHIP) && !IsRDCMode) {
+ if (IsCuda && !IsRDCMode) {
assert(HostOffloadingInputs.size() == 1 && "Only one input expected");
CmdArgs.push_back("-fcuda-include-gpubinary");
CmdArgs.push_back(HostOffloadingInputs.front().getFilename());
@@ -9249,8 +9249,20 @@ void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
// Add the linker arguments to be forwarded by the wrapper.
CmdArgs.push_back(Args.MakeArgString(Twine("--linker-path=") +
LinkCommand->getExecutable()));
- for (const char *LinkArg : LinkCommand->getArguments())
- CmdArgs.push_back(LinkArg);
+
+ // We use action type to differentiate two use cases of the linker wrapper.
+ // TY_Image for normal linker wrapper work.
+ // TY_Object for HIP fno-gpu-rdc embedding device binary in a relocatable
+ // object.
+ assert(JA.getType() == types::TY_Object || JA.getType() == types::TY_Image);
+ if (JA.getType() == types::TY_Object) {
+ CmdArgs.append({"-o", Output.getFilename()});
+ for (auto Input : Inputs)
+ CmdArgs.push_back(Input.getFilename());
+ CmdArgs.push_back("-r");
+ } else
+ for (const char *LinkArg : LinkCommand->getArguments())
+ CmdArgs.push_back(LinkArg);
addOffloadCompressArgs(Args, CmdArgs);
diff --git a/clang/test/Driver/hip-binding.hip b/clang/test/Driver/hip-binding.hip
index 57e57194ec87b..d8b3f1e242018 100644
--- a/clang/test/Driver/hip-binding.hip
+++ b/clang/test/Driver/hip-binding.hip
@@ -93,7 +93,7 @@
// RUN: -nogpulib -nogpuinc -foffload-lto --offload-arch=gfx90a --offload-arch=gfx908 -c %s 2>&1 \
// RUN: | FileCheck -check-prefix=LTO-NO-RDC %s
// LTO-NO-RDC: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[LTO_908:.+]]"
-// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[LTO_908]]"], output: "[[OBJ_908:.+]]"
// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]"], output: "[[LTO_90A:.+]]"
-// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[LTO_90A]]"], output: "[[OBJ_90A:.+]]"
-// LTO-NO-RDC-NEXT: # "amdgcn-amd-amdhsa" - "AMDGCN::Linker", inputs: ["[[OBJ_908]]", "[[OBJ_90A]]"], output: "[[HIPFB:.+]]"
+// LTO-NO-RDC-NEXT: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[LTO_908]]", "[[LTO_90A]]"], output: "[[PKG:.+]]"
+// LTO-NO-RDC-NEXT: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT]]", "[[PKG]]"], output: "[[OBJ:.+]]"
+// LTO-NO-RDC-NEXT: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[OBJ]]"], output: "hip-binding.o"
diff --git a/clang/test/Driver/hip-phases.hip b/clang/test/Driver/hip-phases.hip
index 5fd2c0216ccc3..d8a58b78d6d5c 100644
--- a/clang/test/Driver/hip-phases.hip
+++ b/clang/test/Driver/hip-phases.hip
@@ -8,39 +8,57 @@
//
// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
// RUN: --no-offload-new-driver --cuda-gpu-arch=gfx803 %s 2>&1 \
-// RUN: | FileCheck -check-prefixes=BIN,NRD,OLD %s
+// RUN: | FileCheck -check-prefixes=BIN,OLD,OLDN %s
// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
// RUN: --offload-new-driver --cuda-gpu-arch=gfx803 %s 2>&1 \
-// RUN: | FileCheck -check-prefixes=BIN,NRD,NEW %s
+// RUN: | FileCheck -check-prefixes=BIN,NEW,NEWN %s
+// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
+// RUN: --offload-new-driver --cuda-gpu-arch=gfx803 -flto -c %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=BIN,NEW,NEWLTO %s
//
// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
// RUN: --no-offload-new-driver --cuda-gpu-arch=gfx803 -fgpu-rdc %s 2>&1 \
-// RUN: | FileCheck -check-prefixes=BIN,RDC %s
+// RUN: | FileCheck -check-prefixes=BIN,OLD,OLDR %s
+// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
+// RUN: --offload-new-driver --cuda-gpu-arch=gfx803 -fgpu-rdc %s 2>&1 \
+// RUN: | FileCheck -check-prefixes=BIN,NEW,NEWR %s
//
// BIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T:hip]], (host-[[T]])
// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
-// RDC-DAG: [[P12:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])
-// RDC-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// OLDR-DAG: [[P12:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])
+// OLDR-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
// BIN-DAG: [[P3:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T]], (device-[[T]], [[ARCH:gfx803]])
// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-[[T]], [[ARCH]])
-// NRD-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH]])
-// NRD-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH]])
-// RDC-DAG: [[P7:[0-9]+]]: backend, {[[P5]]}, ir, (device-[[T]], [[ARCH]])
-// BIN-DAG: [[P8:[0-9]+]]: linker, {[[P7]]}, image, (device-[[T]], [[ARCH]])
-// BIN-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P8]]}, image
-// NRD-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, hip-fatbin, (device-[[T]])
-// RDC-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, object, (device-[[T]])
-
-// NRD-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (x86_64-unknown-linux-gnu)" {[[P2]]}, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, ir
-// RDC-DAG: [[P11:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, object
-// NRD-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
-// NRD-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
-// OLD-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-[[T]])
-// NEW-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, image, (host-[[T]])
-// RDC-DAG: [[P14:[0-9]+]]: linker, {[[P13]], [[P11]]}, image, (host-[[T]])
+// OLDN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH]])
+// NEW-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, ir, (device-[[T]], [[ARCH]])
+// OLDN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH]])
+// OLDR-DAG: [[P7:[0-9]+]]: backend, {[[P5]]}, ir, (device-[[T]], [[ARCH]])
+// OLD-DAG: [[P8:[0-9]+]]: linker, {[[P7]]}, image, (device-[[T]], [[ARCH]])
+// OLD-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P8]]}, image
+// NEW-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P6]]}, ir
+// OLDN-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, hip-fatbin, (device-[[T]])
+// NEW-DAG: [[P10:[0-9]+]]: clang-offload-packager, {[[P9]]}, image, (device-[[T]])
+// OLDR-DAG: [[P10:[0-9]+]]: linker, {[[P9]]}, object, (device-[[T]])
+
+// OLDN-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (x86_64-unknown-linux-gnu)" {[[P2]]}, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, ir
+// NEW-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (x86_64-unknown-linux-gnu)" {[[P2]]}, "device-[[T]] (x86_64-unknown-linux-gnu)" {[[P10]]}, ir
+// OLDR-DAG: [[P11:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa)" {[[P10]]}, object
+// OLDN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
+// OLDN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// NEWN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
+// NEWN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// NEWLTO-DAG: [[P13:[0-9]+]]: backend, {[[P11]]}, lto-bc, (host-hip)
+// NEWR-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
+// NEWR-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
+// OLDN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-[[T]])
+// NEWN-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, object, (host-[[T]])
+// NEWLTO-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, object, (host-[[T]])
+// OLDR-DAG: [[P14:[0-9]+]]: linker, {[[P13]], [[P11]]}, image, (host-[[T]])
+// NEWR-DAG: [[P14:[0-9]+]]: clang-linker-wrapper, {[[P13]]}, image, (host-[[T]])
+// NEWN-DAG: [[P15:[0-9]+]]: linker, {[[P14]]}, image
//
// Test single gpu architecture up to the assemble phase.
diff --git a/clang/test/Driver/hip-toolchain-no-rdc.hip b/clang/test/Driver/hip-toolchain-no-rdc.hip
index 6c69d1d51a260..ddd251b67cc57 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -7,7 +7,7 @@
// RUN: -fuse-ld=lld -B%S/Inputs/lld -nogpuinc \
// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,LINK %s
+// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,LINK,OLD %s
// RUN: %clang -### --target=x86_64-linux-gnu -fno-gpu-rdc \
// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
@@ -17,7 +17,7 @@
// RUN: -fuse-ld=lld -B%S/Inputs/lld -nogpuinc -c \
// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck -check-prefixes=CHECK %s
+// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,OLD %s
// RUN: %clang -### --target=x86_64-linux-gnu -fno-gpu-rdc \
// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
@@ -27,7 +27,7 @@
// RUN: -fuse-ld=lld -B%S/Inputs/lld -nogpuinc --offload-new-driver -c \
// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck -check-prefixes=CHECK %s
+// RUN: 2>&1 | FileCheck -check-prefixes=CHECK,NEW %s
// RUN: touch %t/a.o %t/b.o
// RUN: %clang -### --target=x86_64-linux-gnu \
@@ -47,22 +47,23 @@
// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
-// CHECK-SAME: "-emit-obj"
+// OLD-SAME: "-emit-obj"
+// NEW-SAME: "-emit-llvm-bc"
// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"
// CHECK-SAME: "-fcuda-is-device" "-fno-threadsafe-statics" "-mllvm" "-amdgpu-internalize-symbols"
// CHECK-SAME: "-fcuda-allow-variadic-functions" "-fvisibility=hidden"
// CHECK-SAME: "-fapply-global-visibility-to-externs"
// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc"
// CHECK-SAME: "-target-cpu" "gfx803"
-// CHECK-SAME: {{.*}} "-o" [[OBJ_DEV_A_803:".*o"]] "-x" "hip"
+// CHECK-SAME: {{.*}} "-o" "[[OBJ_DEV_A_803:.*(o|bc)]]" "-x" "hip"
// CHECK-SAME: {{.*}} [[A_SRC:".*a.cu"]]
// CHECK-NOT: {{".*llvm-link"}}
// CHECK-NOT: {{".*opt"}}
// CHECK-NOT: {{".*llc"}}
-// CHECK: [[LLD: ".*lld.*"]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
-// CHECK-SAME: "-o" "[[IMG_DEV_A_803:.*out]]" [[OBJ_DEV_A_803]]
+// OLD: [[LLD: ".*lld.*"]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
+// OLD-SAME: "-o" "[[IMG_DEV_A_803:.*out]]" "[[OBJ_DEV_A_803]]"
//
// Compile device code in a.cu to code object for gfx900.
@@ -70,62 +71,71 @@
// CHECK: [[CLANG:".*clang.*"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
-// CHECK-SAME: "-emit-obj"
+// CHECK-SAME: "-emit-{{(obj|llvm-bc)}}"
// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"
// CHECK-SAME: "-fcuda-is-device" "-fno-threadsafe-statics" "-mllvm" "-amdgpu-internalize-symbols"
// CHECK-SAME: "-fcuda-allow-variadic-functions" "-fvisibility=hidden"
// CHECK-SAME: "-fapply-global-visibility-to-externs"
// CHECK-SAME: "{{.*}}lib1.bc" "{{.*}}lib2.bc"
// CHECK-SAME: "-target-cpu" "gfx900"
-// CHECK-SAME: {{.*}} "-o" [[OBJ_DEV_A_900:".*o"]] "-x" "hip"
+// CHECK-SAME: {{.*}} "-o" "[[OBJ_DEV_A_900:.*(o|bc)]]" "-x" "hip"
// CHECK-SAME: {{.*}} [[A_SRC]]
// CHECK-NOT: {{".*llvm-link"}}
// CHECK-NOT: {{".*opt"}}
// CHECK-NOT: {{".*llc"}}
-// CHECK: [[LLD]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
-// CHECK-SAME: "-o" "[[IMG_DEV_A_900:.*out]]" [[OBJ_DEV_A_900]]
+// OLD: [[LLD]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
+// OLD-SAME: "-o" "[[IMG_DEV_A_900:.*out]]" "[[OBJ_DEV_A_900]]"
//
// Bundle and embed device code in host object for a.cu.
//
-// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
-// CHECK-SAME: "-bundle-align=4096"
-// CHECK-SAME: "-targets={{.*}},hipv4-amdgcn-amd-amdhsa--gfx803,hipv4-amdgcn-amd-amdhsa--gfx900"
-// CHECK-SAME: "-input={{.*}}" "-input=[[IMG_DEV_A_803]]" "-input=[[IMG_DEV_A_900]]" "-output=[[BUNDLE_A:.*hipfb]]"
+// OLD: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
+// OLD-SAME: "-bundle-align=4096"
+// OLD-SAME: "-targets={{.*}},hipv4-amdgcn-amd-amdhsa--gfx803,hipv4-amdgcn-amd-amdhsa--gfx900"
+// OLD-SAME: "-input={{.*}}" "-input=[[IMG_DEV_A_803]]" "-input=[[IMG_DEV_A_900]]" "-output=[[BUNDLE_A:.*hipfb]]"
+
+// NEW: [[PACKAGER:".*clang-offload-packager"]] "-o" "[[PACKAGE_A:.*.out]]"
+// NEW-SAME: "--image=file=[[OBJ_DEV_A_803]],triple=amdgcn-amd-amdhsa,arch=gfx803,kind=hip"
+// NEW-SAME: "--image=file=[[OBJ_DEV_A_900]],triple=amdgcn-amd-amdhsa,arch=gfx900,kind=hip"
// CHECK: [[CLANG]] "-cc1" "-triple" "x86_64-unknown-linux-gnu"
// CHECK-SAME: "-aux-triple" "amdgcn-amd-amdhsa"
// CHECK-SAME: "-emit-obj"
// CHECK-SAME: {{.*}} "-main-file-name" "a.cu"
-// CHECK-SAME: {{.*}} "-fcuda-include-gpubinary" "[[BUNDLE_A]]"
-// CHECK-SAME: {{.*}} "-o" [[A_OBJ_HOST:".*o"]] "-x" "hip"
+// OLD-SAME: {{.*}} "-fcuda-include-gpubinary" "[[BUNDLE_A]]"
+// NEW-SAME: {{.*}} "-fembed-offload-object=[[PACKAGE_A]]"
+// O...
[truncated]
|
I split it into 3 commits for ease of reviewing: the original change and the two fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for digging into this, guess I forgot to update the relocatable bit when I made that change.
…llvm#132869)" (llvm#143432)" This reverts commit f5e499a.
Missing offload linker wrapper job action to wrap device binary for -flto.
HIP now uses llvm_offload_entries as section name.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/19330 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/95/builds/14453 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/13094 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/125/builds/8357 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/208/builds/1948 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/10425 Here is the relevant piece of the build log for the reference
|
Fixed a typo: - auto Section = (Prefix + "llvm_offload_entries").str(); + auto Section = (Prefix + "_offload_entries").str(); which broke buildbot e.g. https://lab.llvm.org/buildbot/#/builders/208/builds/1948
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/34690 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/28320 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/30141 Here is the relevant piece of the build log for the reference
|
…2869) (llvm#143964) Fixed two issues: 1. assertion with -flto. the linker wrapper action is missing for wrapping the device binary. Added it for -flto. 2. when there are two HIP files, the kernels in the second file were not found. This is because the -r option of linker wrapper assumes offload entries section of HIP to be hip_offloading_entries but it is actually llvm_offload_entries, causing the offload entries sections not made unique for different object files. Fixed and tested working for both -fgpu-rdc and -fno-gpu-rdc case with and without -r
…lvm#132869) (llvm#143964)" This reverts commit 22f9b4a.
…3964) Fixed a typo: - auto Section = (Prefix + "llvm_offload_entries").str(); + auto Section = (Prefix + "_offload_entries").str(); which broke buildbot e.g. https://lab.llvm.org/buildbot/#/builders/208/builds/1948
…2869) (llvm#143964) Fixed two issues: 1. assertion with -flto. the linker wrapper action is missing for wrapping the device binary. Added it for -flto. 2. when there are two HIP files, the kernels in the second file were not found. This is because the -r option of linker wrapper assumes offload entries section of HIP to be hip_offloading_entries but it is actually llvm_offload_entries, causing the offload entries sections not made unique for different object files. Fixed and tested working for both -fgpu-rdc and -fno-gpu-rdc case with and without -r
…lvm#132869) (llvm#143964)" This reverts commit 22f9b4a.
…3964) Fixed a typo: - auto Section = (Prefix + "llvm_offload_entries").str(); + auto Section = (Prefix + "_offload_entries").str(); which broke buildbot e.g. https://lab.llvm.org/buildbot/#/builders/208/builds/1948
Fixed two issues:
assertion with -flto. the linker wrapper action is missing for wrapping the device binary. Added it for -flto.
when there are two HIP files, the kernels in the second file were not found. This is because the -r option of linker wrapper assumes offload entries section of HIP to be hip_offloading_entries but it is actually llvm_offload_entries, causing the offload entries sections not made unique for different object files. Fixed and tested working for both -fgpu-rdc and -fno-gpu-rdc case with and without -r