[AMDGPU] Do not emit arch dependent macros with unspecified cpu #79660

jhuber6 · 2024-01-26T22:32:40Z

Summary:
Currently, the AMDGPU toolchain accepts not passing -mcpu as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via -mlink-builtin-bitcode to inherit its attributes.

However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.

llvmbot · 2024-01-26T22:33:09Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently, the AMDGPU toolchain accepts not passing -mcpu as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via -mlink-builtin-bitcode to inherit its attributes.

However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.

Full diff: https://github.com/llvm/llvm-project/pull/79660.diff

3 Files Affected:

(modified) clang/lib/Basic/Targets/AMDGPU.cpp (+25-24)
(modified) clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl (+1-1)
(modified) clang/test/Preprocessor/predefined-arch-macros.c (+12-5)

diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 6f3a4908623da77..1fe09c1ce67abc1 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -274,30 +274,31 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
   else
     Builder.defineMacro("__R600__");
 
-  if (GPUKind != llvm::AMDGPU::GK_NONE) {
-    StringRef CanonName = isAMDGCN(getTriple()) ?
-      getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
-    Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
-    // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
-    if (isAMDGCN(getTriple())) {
-      assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-      Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
-                          Twine("__"));
-    }
-    if (isAMDGCN(getTriple())) {
-      Builder.defineMacro("__amdgcn_processor__",
-                          Twine("\"") + Twine(CanonName) + Twine("\""));
-      Builder.defineMacro("__amdgcn_target_id__",
-                          Twine("\"") + Twine(*getTargetID()) + Twine("\""));
-      for (auto F : getAllPossibleTargetIDFeatures(getTriple(), CanonName)) {
-        auto Loc = OffloadArchFeatures.find(F);
-        if (Loc != OffloadArchFeatures.end()) {
-          std::string NewF = F.str();
-          std::replace(NewF.begin(), NewF.end(), '-', '_');
-          Builder.defineMacro(Twine("__amdgcn_feature_") + Twine(NewF) +
-                                  Twine("__"),
-                              Loc->second ? "1" : "0");
-        }
+  if (GPUKind == llvm::AMDGPU::GK_NONE)
+    return;
+
+  StringRef CanonName = isAMDGCN(getTriple()) ?
+    getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
+  Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
+  // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
+  if (isAMDGCN(getTriple())) {
+    assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
+    Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+                        Twine("__"));
+  }
+  if (isAMDGCN(getTriple())) {
+    Builder.defineMacro("__amdgcn_processor__",
+                        Twine("\"") + Twine(CanonName) + Twine("\""));
+    Builder.defineMacro("__amdgcn_target_id__",
+                        Twine("\"") + Twine(*getTargetID()) + Twine("\""));
+    for (auto F : getAllPossibleTargetIDFeatures(getTriple(), CanonName)) {
+      auto Loc = OffloadArchFeatures.find(F);
+      if (Loc != OffloadArchFeatures.end()) {
+        std::string NewF = F.str();
+        std::replace(NewF.begin(), NewF.end(), '-', '_');
+        Builder.defineMacro(Twine("__amdgcn_feature_") + Twine(NewF) +
+                                Twine("__"),
+                            Loc->second ? "1" : "0");
       }
     }
   }
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
index 53f34c6a44ae7dc..5875f6fef2f2787 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
@@ -44,6 +44,6 @@ void test_read_exec_hi(global ulong* out) {
   *out = __builtin_amdgcn_read_exec_hi();
 }
 
-#if __AMDGCN_WAVEFRONT_SIZE != 64
+#if defined(__AMDGCN_WAVEFRONT_SIZE__) && __AMDGCN_WAVEFRONT_SIZE__ != 64
 #error Wrong wavesize detected
 #endif
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c
index 27c7b4a271fee80..9879a61825fbbb1 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -4294,13 +4294,20 @@
 
 // Begin amdgcn tests ----------------
 
-// RUN: %clang -march=amdgcn -E -dM %s -o - 2>&1 \
+// RUN: %clang -mcpu=gfx803 -E -dM %s -o - 2>&1 \
+// RUN:     -target amdgcn-unknown-unknown \
+// RUN:   | FileCheck -match-full-lines %s -check-prefixes=CHECK_AMDGCN,CHECK_AMDGCN_803
+// RUN: %clang -E -dM %s -o - 2>&1 \
 // RUN:     -target amdgcn-unknown-unknown \
-// RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_AMDGCN
+// RUN:   | FileCheck -match-full-lines %s -check-prefixes=CHECK_AMDGCN,CHECK_AMDGCN_NONE
 // CHECK_AMDGCN: #define __AMDGCN__ 1
-// CHECK_AMDGCN: #define __HAS_FMAF__ 1
-// CHECK_AMDGCN: #define __HAS_FP64__ 1
-// CHECK_AMDGCN: #define __HAS_LDEXPF__ 1
+// CHECK_AMDGCN_803: #define __HAS_FMAF__ 1
+// CHECK_AMDGCN_803: #define __HAS_FP64__ 1
+// CHECK_AMDGCN_803: #define __HAS_LDEXPF__ 1
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_FMAF__
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_FP64__
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_LDEXPF__
+// CHECK_AMDGCN_NONE-NOT: #define __AMDGCN_WAVEFRONT_SIZE__
 
 // Begin r600 tests ----------------

github-actions · 2024-01-26T22:35:11Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.

clang/lib/Basic/Targets/AMDGPU.cpp

yxsamliu · 2024-01-26T23:42:31Z

LGTM. AFAIK only device libs compile OpenCL code without -mcpu. I don't think it uses any of these predefined macros.

jhuber6 · 2024-01-27T02:27:10Z

LGTM. AFAIK only device libs compile OpenCL code without -mcpu. I don't think it uses any of these predefined macros.

That's what I figured from a cursory look at the ROCm-Device-Libs. The goal is to formalize this more to make more generic LLVM-IR.

JonChesterfield · 2024-01-29T14:39:45Z

The "generic IR" thing is more emergent behaviour than a documented / intentional design.

This patch is fine - we shouldn't set macros to nonsense values - but if this is a step towards building libc like the rocm-device-libs there may be push back on that one.

jhuber6 · 2024-01-29T16:59:56Z

This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22

The problem is that we used to set __AMDGCN_WAVEFRONTSIZE for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming, because the host compilation has no what of knowing what the wave size is considering the fact that --offload-arch=gfx1030,gfx90a is totally legal and would result in two conflicting wave front sizes from the perspective of the host. The old behavior would just default this to 64 all the time.

We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host.

I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think?

yxsamliu · 2024-01-29T17:05:39Z

This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22

The problem is that we used to set __AMDGCN_WAVEFRONTSIZE for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming, because the host compilation has no what of knowing what the wave size is considering the fact that --offload-arch=gfx1030,gfx90a is totally legal and would result in two conflicting wave front sizes from the perspective of the host. The old behavior would just default this to 64 all the time.

We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host.

I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think?

can you revert it for now, since this will break all HIP apps. Let's figure out a proper solution.

jhuber6 · 2024-01-29T17:06:55Z

This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22
The problem is that we used to set __AMDGCN_WAVEFRONTSIZE for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming, because the host compilation has no what of knowing what the wave size is considering the fact that --offload-arch=gfx1030,gfx90a is totally legal and would result in two conflicting wave front sizes from the perspective of the host. The old behavior would just default this to 64 all the time.
We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host.
I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think?

can you revert it for now, since this will break all HIP apps. Let's figure out a proper solution.

Sure, but this seems to be caused by some absolutely awful technical debt. We are apparently defining __AMDGPU__ and other GPU specific macros for all host facing compilations in HIP.

…pu (#79660)" This reverts commit c9a6e99. This breaks HIP code that incorrectly depended on GPU-specific macros to be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the host is absolutely meaningless, but it seems this entire corner of the toolchain is fundmentally broken. Reverting for now to avoid breakages.

jhuber6 · 2024-01-29T17:12:32Z

Reverted. I don't think there's a "proper" solution here since this seems to have leaked into the headers due to whoever set this up initially not properly setting these on the host. That seems to be endemic now, so the best we can do it just set it to some dummy values I think.

yxsamliu · 2024-01-29T17:51:48Z

Reverted. I don't think there's a "proper" solution here since this seems to have leaked into the headers due to whoever set this up initially not properly setting these on the host. That seems to be endemic now, so the best we can do it just set it to some dummy values I think.

Ideally we should not emit them for host compilation and ask users to quote their device code depending on these macros with conditions that these macros exist and if necessary replace device function definitions with declarations. However, this is a breaking change and we have to communicate such changes to users before we make such changes.

On the other hand, most HIP apps do not care about the value of these macros in host compilation, as long as they exist to make the device code compile in host compilation. In most cases, which value to take does not affect the host code generated. In rare cases, it may cause ODR violation, e.g. defining struct type kernel argument layout using these macros.

I think for now we should keep the original behavior for HIP host compilation, and document in https://clang.llvm.org/docs/HIPSupport.html that these macros take default values in host compilation which may differ from device compilation and there is risk of ODR violation if using them in host compilation. At the same time, I will open some internal tickets trying to eliminate their usage in host compilation in ROCm.

yxsamliu · 2024-01-29T18:15:01Z

HIP host compilation can be identified by LangOpts.HIP && !LangOpts.CUDAIsDevice

jhuber6 requested review from arsenm, b-sumner, jayfoad, JonChesterfield, shiltian and yxsamliu January 26, 2024 22:32

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 26, 2024

jhuber6 force-pushed the TargetDependentMacros branch from 63d0328 to ba04b20 Compare January 26, 2024 22:35

yxsamliu reviewed Jan 26, 2024

View reviewed changes

clang/lib/Basic/Targets/AMDGPU.cpp Outdated Show resolved Hide resolved

Address comments

0ee2a13

JonChesterfield approved these changes Jan 29, 2024

View reviewed changes

jhuber6 merged commit c9a6e99 into llvm:main Jan 29, 2024

jhuber6 mentioned this pull request Jan 30, 2024

[AMDGPU] Do not emit arch dependent macros with unspecified cpu #80035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Do not emit arch dependent macros with unspecified cpu #79660

[AMDGPU] Do not emit arch dependent macros with unspecified cpu #79660

Uh oh!

jhuber6 commented Jan 26, 2024

Uh oh!

llvmbot commented Jan 26, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jan 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

yxsamliu commented Jan 26, 2024

Uh oh!

jhuber6 commented Jan 27, 2024

Uh oh!

JonChesterfield commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

yxsamliu commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

yxsamliu commented Jan 29, 2024

Uh oh!

yxsamliu commented Jan 29, 2024

Uh oh!

Uh oh!

[AMDGPU] Do not emit arch dependent macros with unspecified cpu #79660

[AMDGPU] Do not emit arch dependent macros with unspecified cpu #79660

Uh oh!

Conversation

jhuber6 commented Jan 26, 2024

Uh oh!

llvmbot commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yxsamliu commented Jan 26, 2024

Uh oh!

jhuber6 commented Jan 27, 2024

Uh oh!

JonChesterfield commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

yxsamliu commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

yxsamliu commented Jan 29, 2024

Uh oh!

yxsamliu commented Jan 29, 2024

Uh oh!

Uh oh!

llvmbot commented Jan 26, 2024 •

edited

Loading

github-actions bot commented Jan 26, 2024 •

edited

Loading