-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Allow compiling LLVM-IR without -march
set
#79873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: The NVPTX tools require an architecture to be used, however if we are creating generic LLVM-IR we should be able to leave it unspecified. This will result in the `target-cpu` attributes not being set on the functions so it can be changed when linked into code. This allows the standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR simmilar to how CUDA's deviceRTL looks from C/C++
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-driver Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/79873.diff 6 Files Affected:
diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 094fe1950941270..476528375fb8889 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -57,6 +57,8 @@ def warn_drv_avr_stdlib_not_linked: Warning<
InGroup<AVRRtlibLinkingQuirks>;
def err_drv_cuda_bad_gpu_arch : Error<"unsupported CUDA gpu architecture: %0">;
def err_drv_offload_bad_gpu_arch : Error<"unsupported %0 gpu architecture: %1">;
+def err_drv_offload_missing_gpu_arch : Error<
+ "Must pass in an explicit %0 gpu architecture to '%1'">;
def err_drv_no_cuda_installation : Error<
"cannot find CUDA installation; provide its path via '--cuda-path', or pass "
"'-nocudainc' to build without CUDA includes">;
diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp
index 0b9d97f69d146af..7687e3faad770d0 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -59,7 +59,7 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
// Define available target features
// These must be defined in sorted order!
NoAsmVariants = true;
- GPU = CudaArch::SM_20;
+ GPU = CudaArch::UNUSED;
if (TargetPointerWidth == 32)
resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64");
@@ -169,6 +169,11 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const {
Builder.defineMacro("__PTX__");
Builder.defineMacro("__NVPTX__");
+
+ // Skip setting architecture dependent macros if undefined.
+ if (GPU == CudaArch::UNUSED && !HostTarget)
+ return;
+
if (Opts.CUDAIsDevice || Opts.OpenMPIsTargetDevice || !HostTarget) {
// Set __CUDA_ARCH__ for the GPU specified.
std::string CUDAArchCode = [this] {
diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index 20d76b702a9426e..f476d49047c0138 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -79,7 +79,8 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo {
initFeatureMap(llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags,
StringRef CPU,
const std::vector<std::string> &FeaturesVec) const override {
- Features[CudaArchToString(GPU)] = true;
+ if (GPU != CudaArch::UNUSED)
+ Features[CudaArchToString(GPU)] = true;
Features["ptx" + std::to_string(PTXVersion)] = true;
return TargetInfo::initFeatureMap(Features, Diags, CPU, FeaturesVec);
}
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 8a9d0caaccf30bb..ca54d2d55426b9f 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -389,7 +389,11 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
GPUArchName = JA.getOffloadingArch();
} else {
GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);
- assert(!GPUArchName.empty() && "Must have an architecture passed in.");
+ if (GPUArchName.empty()) {
+ C.getDriver().Diag(diag::err_drv_offload_missing_gpu_arch)
+ << getToolChain().getArchName() << getShortName();
+ return;
+ }
}
// Obtain architecture from the action.
@@ -593,7 +597,11 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-v");
StringRef GPUArch = Args.getLastArgValue(options::OPT_march_EQ);
- assert(!GPUArch.empty() && "At least one GPU Arch required for nvlink.");
+ if (GPUArch.empty()) {
+ C.getDriver().Diag(diag::err_drv_offload_missing_gpu_arch)
+ << getToolChain().getArchName() << getShortName();
+ return;
+ }
CmdArgs.push_back("-arch");
CmdArgs.push_back(Args.MakeArgString(GPUArch));
@@ -726,9 +734,8 @@ NVPTXToolChain::NVPTXToolChain(const Driver &D, const llvm::Triple &Triple,
llvm::opt::DerivedArgList *
NVPTXToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
StringRef BoundArch,
- Action::OffloadKind DeviceOffloadKind) const {
- DerivedArgList *DAL =
- ToolChain::TranslateArgs(Args, BoundArch, DeviceOffloadKind);
+ Action::OffloadKind OffloadKind) const {
+ DerivedArgList *DAL = ToolChain::TranslateArgs(Args, BoundArch, OffloadKind);
if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());
@@ -738,7 +745,7 @@ NVPTXToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
if (!llvm::is_contained(*DAL, A))
DAL->append(A);
- if (!DAL->hasArg(options::OPT_march_EQ)) {
+ if (!DAL->hasArg(options::OPT_march_EQ) && OffloadKind != Action::OFK_None) {
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
CudaArchToString(CudaArch::CudaDefault));
} else if (DAL->getLastArgValue(options::OPT_march_EQ) == "native") {
diff --git a/clang/test/Driver/cuda-cross-compiling.c b/clang/test/Driver/cuda-cross-compiling.c
index 12d0af3b45f32f6..6c9e2cac736b792 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -59,16 +59,6 @@
// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
-//
-// Test the generated arguments default to a value with no architecture.
-//
-// RUN: %clang --target=nvptx64-nvidia-cuda -### --cuda-path=%S/Inputs/CUDA/usr/local/cuda %s 2>&1 \
-// RUN: | FileCheck -check-prefix=DEFAULT %s
-
-// DEFAULT: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_52" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// DEFAULT-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_52" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// DEFAULT-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_52" {{.*}} "[[CUBIN]].cubin"
-
//
// Test to ensure that we enable handling global constructors in a freestanding
// Nvidia compilation.
@@ -77,3 +67,17 @@
// RUN: | FileCheck -check-prefix=LOWERING %s
// LOWERING: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-mllvm" "--nvptx-lower-global-ctor-dtor"
+
+//
+// Tests for handling a missing architecture.
+//
+// RUN: not %clang -target nvptx64-nvidia-cuda %s -### 2>&1 \
+// RUN: | FileCheck -check-prefix=MISSING %s
+
+// MISSING: error: Must pass in an explicit nvptx64 gpu architecture to 'ptxas'
+// MISSING: error: Must pass in an explicit nvptx64 gpu architecture to 'nvlink'
+
+// RUN: %clang -target nvptx64-nvidia-cuda -flto -c %s -### 2>&1 \
+// RUN: | FileCheck -check-prefix=GENERIC %s
+
+// GENERIC-NOT: -cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}} "-target-cpu"
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c
index 27c7b4a271fee80..be9323adb4f640d 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -4292,6 +4292,18 @@
// RUN: | FileCheck -match-full-lines %s -check-prefix=CHECK_SYSTEMZ_ZVECTOR
// CHECK_SYSTEMZ_ZVECTOR: #define __VEC__ 10304
+// Begin nvptx tests ----------------
+
+// RUN: %clang -march=sm_75 -E -dM %s -o - 2>&1 \
+// RUN: -target nvptx64-unknown-unknown \
+// RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_NVPTX,CHECK_ARCH_SM_75
+// RUN: %clang -E -dM %s -o - 2>&1 \
+// RUN: -target nvptx64-unknown-unknown \
+// RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_NVPTX,CHECK_ARCH_UNSET
+// CHECK_ARCH_SM_75: #define __CUDA_ARCH__ 750
+// CHECK_ARCH_UNSET-NOT: #define __CUDA_ARCH__
+// CHECK_NVPTX: #define __NVPTX__ 1
+
// Begin amdgcn tests ----------------
// RUN: %clang -march=amdgcn -E -dM %s -o - 2>&1 \
|
Relying on something not being defined is probably not the best way to handle 'generic' target. For starters it makes it hard or impossible to recreate the same compilation state by undoing already-specified option. It also breaks established assumption that there is a default target CPU/GPU. If we do want to have a generic GPU target, then we should grow an explicit 'generic' GPU variant, IMO. It would be a functional opposite of 'native'. |
AMDGPU uses a missing |
It sounds more like a bug than a feature to me. The major difference between "you get sm_xx by default" and this "you get generic by default" is that With specific sm_XX, I can override it both ways -- I wan enable/disable it if I need to regardless of how it was specified before my overriding options. With the magic unnameable 'generic' target, I can only disable it by specifying it, but there's no way to enable it once a preceding option names some specific architecture. It makes little difference where you control complete build, but that is not the case for all builds. E.g. Tensorflow builds with bazel and the end user does not have access to whatever compiler flags global build rules may set. So if you want to build for generic GPU target, you will have to jump through way more hoops than is reasonable, as opposed to specifying a few overriding options you're interested in. I'm fine with defaulting to such generic target, but I do believe we need to handle it the same way as specific targets. |
We could make the driver require some special argument, but the question is how that should be reflected in the LLVM-IR. Right now if you specify |
I'm fine handling 'generic' in a special way under the hood and not specifying target-CPU. My concern is about user-facing interface. Command line options must be overridable. |
I can add |
Considering that it's for the stand-alone compilation only, I'm not going to block this patch. |
…85222) This PR adds `-march=generic` support for the NVPTX backend. This fulfills a TODO introduced in #79873. With this PR, users can explicitly request the "default" CUDA architecture, which makes sure that no specific architecture is specified. This PR does not address any compatibility issues between different CUDA versions. --------- Co-authored-by: Joseph Huber <[email protected]>
Summary:
The NVPTX tools require an architecture to be used, however if we are
creating generic LLVM-IR we should be able to leave it unspecified. This
will result in the
target-cpu
attributes not being set on thefunctions so it can be changed when linked into code. This allows the
standalone
--target=nvptx64-nvidia-cuda
toolchain to create LLVM-IRsimmilar to how CUDA's deviceRTL looks from C/C++