Skip to content

[AMDGPU] Do not emit arch dependent macros with unspecified cpu #79660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 29, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 26, 2024

Summary:
Currently, the AMDGPU toolchain accepts not passing -mcpu as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via -mlink-builtin-bitcode to inherit its attributes.

However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 26, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently, the AMDGPU toolchain accepts not passing -mcpu as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via -mlink-builtin-bitcode to inherit its attributes.

However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.


Full diff: https://github.com/llvm/llvm-project/pull/79660.diff

3 Files Affected:

  • (modified) clang/lib/Basic/Targets/AMDGPU.cpp (+25-24)
  • (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl (+1-1)
  • (modified) clang/test/Preprocessor/predefined-arch-macros.c (+12-5)
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 6f3a4908623da77..1fe09c1ce67abc1 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -274,30 +274,31 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
   else
     Builder.defineMacro("__R600__");
 
-  if (GPUKind != llvm::AMDGPU::GK_NONE) {
-    StringRef CanonName = isAMDGCN(getTriple()) ?
-      getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
-    Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
-    // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
-    if (isAMDGCN(getTriple())) {
-      assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
-      Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
-                          Twine("__"));
-    }
-    if (isAMDGCN(getTriple())) {
-      Builder.defineMacro("__amdgcn_processor__",
-                          Twine("\"") + Twine(CanonName) + Twine("\""));
-      Builder.defineMacro("__amdgcn_target_id__",
-                          Twine("\"") + Twine(*getTargetID()) + Twine("\""));
-      for (auto F : getAllPossibleTargetIDFeatures(getTriple(), CanonName)) {
-        auto Loc = OffloadArchFeatures.find(F);
-        if (Loc != OffloadArchFeatures.end()) {
-          std::string NewF = F.str();
-          std::replace(NewF.begin(), NewF.end(), '-', '_');
-          Builder.defineMacro(Twine("__amdgcn_feature_") + Twine(NewF) +
-                                  Twine("__"),
-                              Loc->second ? "1" : "0");
-        }
+  if (GPUKind == llvm::AMDGPU::GK_NONE)
+    return;
+
+  StringRef CanonName = isAMDGCN(getTriple()) ?
+    getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
+  Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
+  // Emit macros for gfx family e.g. gfx906 -> __GFX9__, gfx1030 -> __GFX10___
+  if (isAMDGCN(getTriple())) {
+    assert(CanonName.starts_with("gfx") && "Invalid amdgcn canonical name");
+    Builder.defineMacro(Twine("__") + Twine(CanonName.drop_back(2).upper()) +
+                        Twine("__"));
+  }
+  if (isAMDGCN(getTriple())) {
+    Builder.defineMacro("__amdgcn_processor__",
+                        Twine("\"") + Twine(CanonName) + Twine("\""));
+    Builder.defineMacro("__amdgcn_target_id__",
+                        Twine("\"") + Twine(*getTargetID()) + Twine("\""));
+    for (auto F : getAllPossibleTargetIDFeatures(getTriple(), CanonName)) {
+      auto Loc = OffloadArchFeatures.find(F);
+      if (Loc != OffloadArchFeatures.end()) {
+        std::string NewF = F.str();
+        std::replace(NewF.begin(), NewF.end(), '-', '_');
+        Builder.defineMacro(Twine("__amdgcn_feature_") + Twine(NewF) +
+                                Twine("__"),
+                            Loc->second ? "1" : "0");
       }
     }
   }
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
index 53f34c6a44ae7dc..5875f6fef2f2787 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-wave64.cl
@@ -44,6 +44,6 @@ void test_read_exec_hi(global ulong* out) {
   *out = __builtin_amdgcn_read_exec_hi();
 }
 
-#if __AMDGCN_WAVEFRONT_SIZE != 64
+#if defined(__AMDGCN_WAVEFRONT_SIZE__) && __AMDGCN_WAVEFRONT_SIZE__ != 64
 #error Wrong wavesize detected
 #endif
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c
index 27c7b4a271fee80..9879a61825fbbb1 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -4294,13 +4294,20 @@
 
 // Begin amdgcn tests ----------------
 
-// RUN: %clang -march=amdgcn -E -dM %s -o - 2>&1 \
+// RUN: %clang -mcpu=gfx803 -E -dM %s -o - 2>&1 \
+// RUN:     -target amdgcn-unknown-unknown \
+// RUN:   | FileCheck -match-full-lines %s -check-prefixes=CHECK_AMDGCN,CHECK_AMDGCN_803
+// RUN: %clang -E -dM %s -o - 2>&1 \
 // RUN:     -target amdgcn-unknown-unknown \
-// RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_AMDGCN
+// RUN:   | FileCheck -match-full-lines %s -check-prefixes=CHECK_AMDGCN,CHECK_AMDGCN_NONE
 // CHECK_AMDGCN: #define __AMDGCN__ 1
-// CHECK_AMDGCN: #define __HAS_FMAF__ 1
-// CHECK_AMDGCN: #define __HAS_FP64__ 1
-// CHECK_AMDGCN: #define __HAS_LDEXPF__ 1
+// CHECK_AMDGCN_803: #define __HAS_FMAF__ 1
+// CHECK_AMDGCN_803: #define __HAS_FP64__ 1
+// CHECK_AMDGCN_803: #define __HAS_LDEXPF__ 1
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_FMAF__
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_FP64__
+// CHECK_AMDGCN_NONE-NOT: #define __HAS_LDEXPF__
+// CHECK_AMDGCN_NONE-NOT: #define __AMDGCN_WAVEFRONT_SIZE__
 
 // Begin r600 tests ----------------
 

Copy link

github-actions bot commented Jan 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via `-mlink-builtin-bitcode` to inherit its attributes.

However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.
@jhuber6 jhuber6 force-pushed the TargetDependentMacros branch from 63d0328 to ba04b20 Compare January 26, 2024 22:35
@yxsamliu
Copy link
Collaborator

LGTM. AFAIK only device libs compile OpenCL code without -mcpu. I don't think it uses any of these predefined macros.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 27, 2024

LGTM. AFAIK only device libs compile OpenCL code without -mcpu. I don't think it uses any of these predefined macros.

That's what I figured from a cursory look at the ROCm-Device-Libs. The goal is to formalize this more to make more generic LLVM-IR.

@JonChesterfield
Copy link
Collaborator

The "generic IR" thing is more emergent behaviour than a documented / intentional design.

This patch is fine - we shouldn't set macros to nonsense values - but if this is a step towards building libc like the rocm-device-libs there may be push back on that one.

@jhuber6 jhuber6 merged commit c9a6e99 into llvm:main Jan 29, 2024
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 29, 2024

This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22

The problem is that we used to set __AMDGCN_WAVEFRONTSIZE for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming, because the host compilation has no what of knowing what the wave size is considering the fact that --offload-arch=gfx1030,gfx90a is totally legal and would result in two conflicting wave front sizes from the perspective of the host. The old behavior would just default this to 64 all the time.

We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host.

I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think?

@yxsamliu
Copy link
Collaborator

This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22

The problem is that we used to set __AMDGCN_WAVEFRONTSIZE for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming, because the host compilation has no what of knowing what the wave size is considering the fact that --offload-arch=gfx1030,gfx90a is totally legal and would result in two conflicting wave front sizes from the perspective of the host. The old behavior would just default this to 64 all the time.

We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host.

I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think?

can you revert it for now, since this will break all HIP apps. Let's figure out a proper solution.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 29, 2024

This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22
The problem is that we used to set __AMDGCN_WAVEFRONTSIZE for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming, because the host compilation has no what of knowing what the wave size is considering the fact that --offload-arch=gfx1030,gfx90a is totally legal and would result in two conflicting wave front sizes from the perspective of the host. The old behavior would just default this to 64 all the time.
We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host.
I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think?

can you revert it for now, since this will break all HIP apps. Let's figure out a proper solution.

Sure, but this seems to be caused by some absolutely awful technical debt. We are apparently defining __AMDGPU__ and other GPU specific macros for all host facing compilations in HIP.

jhuber6 added a commit that referenced this pull request Jan 29, 2024
…pu (#79660)"

This reverts commit c9a6e99.

This breaks HIP code that incorrectly depended on GPU-specific macros to
be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the
host is absolutely meaningless, but it seems this entire corner of the
toolchain is fundmentally broken. Reverting for now to avoid breakages.
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 29, 2024

Reverted. I don't think there's a "proper" solution here since this seems to have leaked into the headers due to whoever set this up initially not properly setting these on the host. That seems to be endemic now, so the best we can do it just set it to some dummy values I think.

@yxsamliu
Copy link
Collaborator

Reverted. I don't think there's a "proper" solution here since this seems to have leaked into the headers due to whoever set this up initially not properly setting these on the host. That seems to be endemic now, so the best we can do it just set it to some dummy values I think.

Ideally we should not emit them for host compilation and ask users to quote their device code depending on these macros with conditions that these macros exist and if necessary replace device function definitions with declarations. However, this is a breaking change and we have to communicate such changes to users before we make such changes.

On the other hand, most HIP apps do not care about the value of these macros in host compilation, as long as they exist to make the device code compile in host compilation. In most cases, which value to take does not affect the host code generated. In rare cases, it may cause ODR violation, e.g. defining struct type kernel argument layout using these macros.

I think for now we should keep the original behavior for HIP host compilation, and document in https://clang.llvm.org/docs/HIPSupport.html that these macros take default values in host compilation which may differ from device compilation and there is risk of ODR violation if using them in host compilation. At the same time, I will open some internal tickets trying to eliminate their usage in host compilation in ROCm.

@yxsamliu
Copy link
Collaborator

HIP host compilation can be identified by LangOpts.HIP && !LangOpts.CUDAIsDevice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants