-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Do not emit arch dependent macros with unspecified cpu #79660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-amdgpu Author: Joseph Huber (jhuber6) ChangesSummary: However, there are a handful of macros that can leak incorrect Full diff: https://github.com/llvm/llvm-project/pull/79660.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 6f3a4908623da77..1fe09c1ce67abc1 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -274,30 +274,31 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
else
Builder.defineMacro("__R600__");
- if (GPUKind != llvm::AMDGPU::GK_NONE) {
- StringRef CanonName = isAMDGCN(getTriple()) ?
- getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
- Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
- // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
- if (isAMDGCN(getTriple())) {
- assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
- Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
- Twine("__"));
- }
- if (isAMDGCN(getTriple())) {
- Builder.defineMacro("__amdgcn_processor__",
- Twine("\"") + Twine(CanonName) + Twine("\""));
- Builder.defineMacro("__amdgcn_target_id__",
- Twine("\"") + Twine(*getTargetID()) + Twine("\""));
- for (auto F : getAllPossibleTargetIDFeatures(getTriple(), CanonName)) {
- auto Loc = OffloadArchFeatures.find(F);
- if (Loc != OffloadArchFeatures.end()) {
- std::string NewF = F.str();
- std::replace(NewF.begin(), NewF.end(), '-', '_');
- Builder.defineMacro(Twine("__amdgcn_feature_") + Twine(NewF) +
- Twine("__"),
- Loc->second ? "1" : "0");
- }
+ if (GPUKind == llvm::AMDGPU::GK_NONE)
+ return;
+
+ StringRef CanonName = isAMDGCN(getTriple()) ?
+ getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
+ Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
+ // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
+ if (isAMDGCN(getTriple())) {
+ assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
+ Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+ Twine("__"));
+ }
+ if (isAMDGCN(getTriple())) {
+ Builder.defineMacro("__amdgcn_processor__",
+ Twine("\"") + Twine(CanonName) + Twine("\""));
+ Builder.defineMacro("__amdgcn_target_id__",
+ Twine("\"") + Twine(*getTargetID()) + Twine("\""));
+ for (auto F : getAllPossibleTargetIDFeatures(getTriple(), CanonName)) {
+ auto Loc = OffloadArchFeatures.find(F);
+ if (Loc != OffloadArchFeatures.end()) {
+ std::string NewF = F.str();
+ std::replace(NewF.begin(), NewF.end(), '-', '_');
+ Builder.defineMacro(Twine("__amdgcn_feature_") + Twine(NewF) +
+ Twine("__"),
+ Loc->second ? "1" : "0");
}
}
}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
index 53f34c6a44ae7dc..5875f6fef2f2787 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
@@ -44,6 +44,6 @@ void test_read_exec_hi(global ulong* out) {
*out = __builtin_amdgcn_read_exec_hi();
}
-#if __AMDGCN_WAVEFRONT_SIZE != 64
+#if defined(__AMDGCN_WAVEFRONT_SIZE__) && __AMDGCN_WAVEFRONT_SIZE__ != 64
#error Wrong wavesize detected
#endif
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c
index 27c7b4a271fee80..9879a61825fbbb1 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -4294,13 +4294,20 @@
// Begin amdgcn tests ----------------
-// RUN: %clang -march=amdgcn -E -dM %s -o - 2>&1 \
+// RUN: %clang -mcpu=gfx803 -E -dM %s -o - 2>&1 \
+// RUN: -target amdgcn-unknown-unknown \
+// RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_AMDGCN,CHECK_AMDGCN_803
+// RUN: %clang -E -dM %s -o - 2>&1 \
// RUN: -target amdgcn-unknown-unknown \
-// RUN: | FileCheck -match-full-lines %s -check-prefix=CHECK_AMDGCN
+// RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_AMDGCN,CHECK_AMDGCN_NONE
// CHECK_AMDGCN: #define __AMDGCN__ 1
-// CHECK_AMDGCN: #define __HAS_FMAF__ 1
-// CHECK_AMDGCN: #define __HAS_FP64__ 1
-// CHECK_AMDGCN: #define __HAS_LDEXPF__ 1
+// CHECK_AMDGCN_803: #define __HAS_FMAF__ 1
+// CHECK_AMDGCN_803: #define __HAS_FP64__ 1
+// CHECK_AMDGCN_803: #define __HAS_LDEXPF__ 1
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_FMAF__
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_FP64__
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_LDEXPF__
+// CHECK_AMDGCN_NONE-NOT: #define __AMDGCN_WAVEFRONT_SIZE__
// Begin r600 tests ----------------
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.
63d0328
to
ba04b20
Compare
LGTM. AFAIK only device libs compile OpenCL code without -mcpu. I don't think it uses any of these predefined macros. |
That's what I figured from a cursory look at the ROCm-Device-Libs. The goal is to formalize this more to make more generic LLVM-IR. |
The "generic IR" thing is more emergent behaviour than a documented / intentional design. This patch is fine - we shouldn't set macros to nonsense values - but if this is a step towards building libc like the rocm-device-libs there may be push back on that one. |
This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22 The problem is that we used to set We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host. I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think? |
can you revert it for now, since this will break all HIP apps. Let's figure out a proper solution. |
Sure, but this seems to be caused by some absolutely awful technical debt. We are apparently defining |
…pu (#79660)" This reverts commit c9a6e99. This breaks HIP code that incorrectly depended on GPU-specific macros to be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the host is absolutely meaningless, but it seems this entire corner of the toolchain is fundmentally broken. Reverting for now to avoid breakages.
Reverted. I don't think there's a "proper" solution here since this seems to have leaked into the headers due to whoever set this up initially not properly setting these on the host. That seems to be endemic now, so the best we can do it just set it to some dummy values I think. |
Ideally we should not emit them for host compilation and ask users to quote their device code depending on these macros with conditions that these macros exist and if necessary replace device function definitions with declarations. However, this is a breaking change and we have to communicate such changes to users before we make such changes. On the other hand, most HIP apps do not care about the value of these macros in host compilation, as long as they exist to make the device code compile in host compilation. In most cases, which value to take does not affect the host code generated. In rare cases, it may cause ODR violation, e.g. defining struct type kernel argument layout using these macros. I think for now we should keep the original behavior for HIP host compilation, and document in https://clang.llvm.org/docs/HIPSupport.html that these macros take default values in host compilation which may differ from device compilation and there is risk of ODR violation if using them in host compilation. At the same time, I will open some internal tickets trying to eliminate their usage in host compilation in ROCm. |
HIP host compilation can be identified by LangOpts.HIP && !LangOpts.CUDAIsDevice |
Summary:
Currently, the AMDGPU toolchain accepts not passing
-mcpu
as a meansto create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via
-mlink-builtin-bitcode
to inherit its attributes.However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.