Skip to content

Commit c167a25

Browse files
authored
[libc] Fix lane-id utility function not using built-in (#84902)
Summary: Previously we got the lane-id from taking the global thread ID and taking off the bottom 5 bits. This works but is inefficient compared to the NVPTX intrinsic simply dedicated to get this value.
1 parent 3924363 commit c167a25

File tree

1 file changed

+1
-1
lines changed
  • libc/src/__support/GPU/nvptx

1 file changed

+1
-1
lines changed

libc/src/__support/GPU/nvptx/utils.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ LIBC_INLINE uint32_t get_lane_size() { return 32; }
9797

9898
/// Returns the id of the thread inside of a CUDA warp executing together.
9999
[[clang::convergent]] LIBC_INLINE uint32_t get_lane_id() {
100-
return get_thread_id() & (get_lane_size() - 1);
100+
return __nvvm_read_ptx_sreg_laneid();
101101
}
102102

103103
/// Returns the bit-mask of active threads in the current warp.

0 commit comments

Comments
 (0)