[CUDA] Change 'activemask' to use 'nvvm_activemask()' #79892

jhuber6 · 2024-01-29T20:29:28Z

Summary:
We recently added builitin support for this function.

Summary: We recently added builitin support for this function.

llvmbot · 2024-01-29T20:29:56Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
We recently added builitin support for this function.

Full diff: https://github.com/llvm/llvm-project/pull/79892.diff

1 Files Affected:

(modified) clang/lib/Headers/__clang_cuda_intrinsics.h (+1-3)

diff --git a/clang/lib/Headers/__clang_cuda_intrinsics.h b/clang/lib/Headers/__clang_cuda_intrinsics.h
index 3c3948863c1d453..a04e8b6de44d053 100644
--- a/clang/lib/Headers/__clang_cuda_intrinsics.h
+++ b/clang/lib/Headers/__clang_cuda_intrinsics.h
@@ -215,9 +215,7 @@ inline __device__ unsigned int __activemask() {
 #if CUDA_VERSION < 9020
   return __nvvm_vote_ballot(1);
 #else
-  unsigned int mask;
-  asm volatile("activemask.b32 %0;" : "=r"(mask));
-  return mask;
+  return __nvvm_activemask();
 #endif
 }

jhuber6 · 2024-01-29T22:58:39Z

I've actually encountered some really strange behavior when trying to update libc to use the new intrinsic. The following returns a common 64-bit value to be compatible with AMDGPU's 64 lane wide mode. When I run this against the test suite, it fails on tests that specifically check against divergence.

This works

[[clang::convergent, gnu::noinline]]  uint64_t get_lane_mask() {
  uint32_t mask;              
  mask = __nvvm_activemask();
  return mask;               
}

But this does not

[[clang::convergent, gnu::noinline]] uint64_t get_lane_mask() {
  return __nvvm_activemask();     
}

If I check the PTX, the main difference seems to be the cvt instruction, here's the output respectively.

.weak .func  (.param .b64 func_retval0) _ZN22__llvm_libc_19_0_0_git3gpu13get_lane_maskEv()
{
  .reg .b32   %r<2>;
  .reg .b64   %rd<2>;

// %bb.0:                               // %entry
  activemask.b32  %r1;
  cvt.u64.u32   %rd1, %r1;
  st.param.b64  [func_retval0+0], %rd1;
  ret;
}

.weak .func  (.param .b64 func_retval0) _ZN22__llvm_libc_19_0_0_git3gpu13get_lane_maskEv()
{
  .reg .b32   %r<2>;
  .reg .b64   %rd<2>;

// %bb.0:                               // %entry
  activemask.b32  %r1;
  cvt.s64.s32   %rd1, %r1;
  st.param.b64  [func_retval0+0], %rd1;
  ret;
}

So, the difference is that the version that works uses cvt.u64.u32 while the version that's broken uses cvt.s64.s32. This means that likely this is returning a "signed" value, and the conversion is treating it like a negative number when all threads are active. @Artem-B is there a correct way to assert that this is unsigned so it does the correct thing?

jhuber6 · 2024-01-29T23:29:50Z

Scratch that, I missed Ui in the builtin definition. I'll do a quick fix.

[CUDA] Change '__activemask' to use '__nvvm_activemask()'

5f316d3

Summary: We recently added builitin support for this function.

jhuber6 requested review from Artem-B and jlebar January 29, 2024 20:29

llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Jan 29, 2024

jlebar approved these changes Jan 29, 2024

View reviewed changes

Artem-B approved these changes Jan 29, 2024

View reviewed changes

jhuber6 merged commit 51379a9 into llvm:main Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] Change 'activemask' to use 'nvvm_activemask()' #79892

[CUDA] Change 'activemask' to use 'nvvm_activemask()' #79892

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

llvmbot commented Jan 29, 2024 •

edited

Loading

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

Uh oh!

[CUDA] Change '__activemask' to use '__nvvm_activemask()' #79892

[CUDA] Change '__activemask' to use '__nvvm_activemask()' #79892

Uh oh!

Conversation

jhuber6 commented Jan 29, 2024

Uh oh!

llvmbot commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

jhuber6 commented Jan 29, 2024

Uh oh!

Uh oh!

[CUDA] Change 'activemask' to use 'nvvm_activemask()' #79892

[CUDA] Change 'activemask' to use 'nvvm_activemask()' #79892

llvmbot commented Jan 29, 2024 •

edited

Loading