Skip to content

[AMDGPU] Allow w64 ballot to be used on w32 targets #80183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion clang/include/clang/Basic/BuiltinsAMDGPU.def
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc")
//===----------------------------------------------------------------------===//

TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32")
TARGET_BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc", "wavefrontsize64")
BUILTIN(__builtin_amdgcn_ballot_w64, "WUib", "nc")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this will cause the intrinsic being removed by backend if the required target feature is missing. I remember some pass is added for that.

Also removing the check increases chance of miss-use in HIP or OpenCL.

device libs bypass this requirement by adding __attribute__((target("wavefrontsize64"))) to the callers. can you do the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is simply on the return value from the intrinsic. It's always legal to do type promotion, so on a system with a 32 wide wavefront it will just get promoted to a 64-bit value which will be correct.

The ROCm-Device-Libs can only do this because they have the __oclc_wavefrontsize64 variable, which I don't want to copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, the main difference is whether or not we read exec_hi, here's some code https://godbolt.org/z/z64Yoo7ve. I believe @arsenm said that exec_hi is simply zero when read from wave32 mode? I could double check that.

The problem with using the attributes is that it puts the attribute into the function, which prevents it from being called in wave32 correctly. This would require some compile-time switch like the ROCm libs, or perhaps a runtime check on the wave size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some basic checks and it works as expected when called from w32 code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target attribute is just a giant footgun for AMDGPU. This one only works in the library with special handholding by the compiler


// Deprecated intrinsics in favor of __builtin_amdgn_ballot_{w32|w64}
BUILTIN(__builtin_amdgcn_uicmp, "WUiUiUiIi", "nc")
Expand Down
6 changes: 3 additions & 3 deletions clang/test/SemaOpenCL/builtins-amdgcn-error-wave64.cl
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
// RUN: %clang_cc1 -triple amdgcn-- -verify -S -o - %s
// RUN: %clang_cc1 -triple amdgcn-- -target-feature +wavefrontsize32 -verify -S -o - %s
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature +wavefrontsize32 -verify -S -o - %s
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s
// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s

// expected-no-diagnostics

typedef unsigned long ulong;

void test_ballot_wave64(global ulong* out, int a, int b) {
*out = __builtin_amdgcn_ballot_w64(a == b); // expected-error {{'__builtin_amdgcn_ballot_w64' needs target feature wavefrontsize64}}
*out = __builtin_amdgcn_ballot_w64(a == b);
}

__attribute__((target("wavefrontsize64")))
Expand Down