[Clang] Implement resource directory headers for common GPU intrinsics #110179

jhuber6 · 2024-09-26T22:21:09Z

Summary:
All GPU based languages provide some way to access things like the
thread ID or other resources. However, this is spread between many
different languages and it varies between targets. The goal here is to
provide a resource directory header that just provides these in an
easier to understand way, primarily so this can be used for C/C++ code.
The interface aims to be common, to faciliate easier porting, but target
specific stuff could be put in the individual headers.

JonChesterfield · 2024-09-27T06:07:45Z

Probably want a longer prefix. _gpu or_llvm or similar.

If the shared header gets the declarations then people can include the intrin.h and look at it to see what functions they have available, without going and looking through all the implementations. That seems like a good thing. Can put descriptive comments in the main header then.

With a little care Fortran could do the compile header to bitcode thing.

Otherwise all looks very familiar. We've written things like this repeatedly over the years, would be nice to stop doing that. Doesn't help with normalising IR as written though, would need the periodically suggested GPU IR intrinsics to do that, which presumably seems like too much work? Does have advantages like pattern matching on the intrinsics in instcombine though, and would make this header redundant.

jhuber6 · 2024-09-27T17:24:48Z

Probably want a longer prefix. _gpu or_llvm or similar.

Yeah, just wasn't sure. Also, do resource headers need to be in a reserved namespace? Probably nothing wrong with gpu_get_thread_id vs _gpu_get_thread_id.

If the shared header gets the declarations then people can include the intrin.h and look at it to see what functions they have available, without going and looking through all the implementations. That seems like a good thing. Can put descriptive comments in the main header then.

Yeah I was actually wondering if I should go for something like this:

#ifdef __NVPTX__
uint32_t nvptx_get_thread_id_x() { return __nvvm_ptx_read_sreg_tid_x(); }
#define IMPL nvptx
#endif
uint32_t gpu_get_thread_id_x() { return ##IMPL##_get_thread_id_x(); }
#undef IMPL

kparzysz · 2024-09-27T18:23:33Z

Yeah I was actually wondering if I should go for something like this:

#ifdef __NVPTX__
uint32_t nvptx_get_thread_id_x() { return __nvvm_ptx_read_sreg_tid_x(); }
#define IMPL nvptx
#endif
uint32_t gpu_get_thread_id_x() { return ##IMPL##_get_thread_id_x(); }
#undef IMPL

You could put all the common prototypes in the common include, e.g.

inline uint32_t gpu_get_thread_id_x() { return __impl_gpu_get_thread_id_x(); }

Then each arch-specific header would define the "impl" versions:

inline uint32_t __impl_gpu_get_thread_id_x() { return __nvvm_ptx_read_sreg_tid_x(); }

This way the common intrinsics would be defined in a single location, and it would be harder for someone to add a new intrinsic without realizing that all files should implement a common interface.

jdoerfert

I believe this is useful. Left lots of minor comments.

clang/lib/Headers/amdgpuintrin.h

jdoerfert · 2024-09-27T18:15:27Z

clang/lib/Headers/amdgpuintrin.h

+}
+
+// Returns the total number of workgruops in the grid.
+_DEFAULT_ATTRS static inline uint64_t _get_num_blocks() {


The block id's per dimension are 32 bit but the number is 64? Can we make it more uniform?

This is being conservative with the API as listed, AFAIK CUDA says that the maximum X block size is 2^31 - 1 and Y and Z are 2^16 -1, meaning if you get the "global" ID, it could technically be more than 2^32 - 1. Maybe @Artem-B could chime in here.

clang/lib/Headers/amdgpuintrin.h

jdoerfert · 2024-09-27T18:22:36Z

clang/lib/Headers/amdgpuintrin.h

+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t _get_lane_id() {


I'd hope the gpu target triple would turn on auto-convergent. Setting it manually is always going to fail horribly (since it's not set transitively). Thus, adding it here is pointless and might just hide the real issue under one layer of "it seems to be working".

We should really just rip out the convergent source attribute. We should only have noconvergent. You have to compile any of this with -fconvergent-functions

I consider this more of a documentation thing honestly, it's useless to the compiler but lets the reader know that this function has convergent effects.

I also need to fix -fno-convergent-functions and then pass it by default for GPU architectures.

clang/lib/Headers/nvptxintrin.h

clang/lib/Headers/amdgpuintrin.h

clang/lib/Headers/nvptxintrin.h

clang/lib/Headers/amdgpuintrin.h

clang/lib/Headers/gpuintrin.h

jdoerfert · 2024-09-27T18:43:28Z

This way the common intrinsics would be defined in a single location, and it would be harder for someone to add a new intrinsic without realizing that all files should implement a common interface.

That's one of the reasons mapping.{h,cpp} in DeviceRTL uses the extra level of indirection.

Summary: There's an intrinsic for the warp size, we want to expose this to make the interface proposed in llvm#110179 more generic.

clang/lib/Headers/amdgpuintrin.h

Summary: There's an intrinsic for the warp size, we want to expose this to make the interface proposed in #110179 more generic.

jhuber6 · 2024-09-27T20:59:23Z

I am wondering if it would be easier to provide generic builtins in clang and just codegen them. I guess in that case we'd just upscale everything to 64-bit and say "If you need the other one use the target specific version".

shiltian · 2024-09-28T02:16:47Z

I am wondering if it would be easier to provide generic builtins in clang and just codegen them. I guess in that case we'd just upscale everything to 64-bit and say "If you need the other one use the target specific version".

I'm not sure if that's a good idea. For simple cases where there is a 1:1 mapping, it would be just fine. It's gonna be more difficult to implements things in codegen than in high level languages.

Summary: There's an intrinsic for the warp size, we want to expose this to make the interface proposed in llvm/llvm-project#110179 more generic.

llvmbot · 2024-10-03T21:04:20Z

@llvm/pr-subscribers-libc

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
All GPU based languages provide some way to access things like the
thread ID or other resources. However, this is spread between many
different languages and it varies between targets. The goal here is to
provide a resource directory header that just provides these in an
easier to understand way, primarily so this can be used for C/C++ code.
The interface aims to be common, to faciliate easier porting, but target
specific stuff could be put in the individual headers.

Patch is 58.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110179.diff

6 Files Affected:

(modified) clang/lib/Headers/CMakeLists.txt (+14)
(added) clang/lib/Headers/amdgpuintrin.h (+153)
(added) clang/lib/Headers/gpuintrin.h (+76)
(added) clang/lib/Headers/nvptxintrin.h (+153)
(added) clang/test/Headers/gpuintrin.c (+508)
(added) clang/test/Headers/gpuintrin_lang.c (+61)

diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index ff392e7122a448..a0e7ae67b7219a 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -268,6 +268,12 @@ set(x86_files
   cpuid.h
   )
 
+set(gpu_files
+  gpuintrin.h
+  nvptxintrin.h
+  amdgpuintrin.h
+  )
+
 set(windows_only_files
   intrin0.h
   intrin.h
@@ -296,6 +302,7 @@ set(files
   ${systemz_files}
   ${ve_files}
   ${x86_files}
+  ${gpu_files}
   ${webassembly_files}
   ${windows_only_files}
   ${utility_files}
@@ -518,6 +525,7 @@ add_header_target("systemz-resource-headers" "${systemz_files};${zos_wrapper_fil
 add_header_target("ve-resource-headers" "${ve_files}")
 add_header_target("webassembly-resource-headers" "${webassembly_files}")
 add_header_target("x86-resource-headers" "${x86_files}")
+add_header_target("gpu-resource-headers" "${gpu_files}")
 
 # Other header groupings
 add_header_target("hlsl-resource-headers" ${hlsl_files})
@@ -704,6 +712,12 @@ install(
   EXCLUDE_FROM_ALL
   COMPONENT x86-resource-headers)
 
+install(
+  FILES ${gpu_files}
+  DESTINATION ${header_install_dir}
+  EXCLUDE_FROM_ALL
+  COMPONENT gpu-resource-headers)
+
 if(NOT CLANG_ENABLE_HLSL)
   set(EXCLUDE_HLSL EXCLUDE_FROM_ALL)
 endif()
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
new file mode 100644
index 00000000000000..1fd7261cf4ca75
--- /dev/null
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -0,0 +1,153 @@
+//===-- amdgpuintrin.h - AMDPGU intrinsic functions -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __AMDGPUINTRIN_H
+#define __AMDGPUINTRIN_H
+
+#ifndef __AMDGPU__
+#error "This file is intended for AMDGPU targets or offloading to AMDGPU"
+#endif
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device))
+#elif !defined(_DEFAULT_ATTRS)
+#define _DEFAULT_ATTRS
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(amdgcn)})
+
+// Type aliases to the address spaces used by the AMDGPU backend.
+#define _Private __attribute__((opencl_private))
+#define _Constant __attribute__((opencl_constant))
+#define _Local __attribute__((opencl_local))
+#define _Global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _Kernel __attribute__((amdgpu_kernel, visibility("protected")))
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_x() {
+  return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_y() {
+  return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_z() {
+  return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_x() {
+  return __builtin_amdgcn_workgroup_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_y() {
+  return __builtin_amdgcn_workgroup_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_z() {
+  return __builtin_amdgcn_workgroup_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_x() {
+  return __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_y() {
+  return __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_z() {
+  return __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_x() {
+  return __builtin_amdgcn_workitem_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_y() {
+  return __builtin_amdgcn_workitem_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_z() {
+  return __builtin_amdgcn_workitem_id_z();
+}
+
+// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware
+// and compilation options.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_lanes() {
+  return __builtin_amdgcn_wavefrontsize();
+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t __gpu_lane_id() {
+  return __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u));
+}
+
+// Returns the bit-mask of active threads in the current wavefront.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t __gpu_lane_mask() {
+  return __builtin_amdgcn_read_exec();
+}
+
+// Copies the value from the first active thread in the wavefront to the rest.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_broadcast(uint64_t __lane_mask, uint32_t __x) {
+  return __builtin_amdgcn_readfirstlane(__x);
+}
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t
+__gpu_ballot(uint64_t __lane_mask, bool __x) {
+  // The lane_mask & gives the nvptx semantics when lane_mask is a subset of
+  // the active threads
+  return __lane_mask & __builtin_amdgcn_ballot_w64(__x);
+}
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void __gpu_sync_threads() {
+  __builtin_amdgcn_s_barrier();
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "workgroup");
+}
+
+// Wait for all threads in the wavefront to converge, this is a noop on AMDGPU.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void
+__gpu_sync_lane(uint64_t __lane_mask) {
+  __builtin_amdgcn_wave_barrier();
+}
+
+// Shuffles the the lanes inside the wavefront according to the given index.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_shuffle_idx(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
+  return __builtin_amdgcn_ds_bpermute(__idx << 2, __x);
+}
+
+// Terminates execution of the associated wavefront.
+_DEFAULT_ATTRS [[noreturn]] static inline void __gpu_exit() {
+  __builtin_amdgcn_endpgm();
+}
+
+#pragma omp end declare variant
+#pragma omp end declare target
+
+#endif // __AMDGPUINTRIN_H
diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h
new file mode 100644
index 00000000000000..2531ad8bba70c8
--- /dev/null
+++ b/clang/lib/Headers/gpuintrin.h
@@ -0,0 +1,76 @@
+//===-- gpuintrin.h - Generic GPU intrinsic functions ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __GPUINTRIN_H
+#define __GPUINTRIN_H
+
+#if defined(__NVPTX__)
+#include <nvptxintrin.h>
+#elif defined(__AMDGPU__)
+#include <amdgpuintrin.h>
+#endif
+
+// Returns the total number of blocks / workgroups.
+_DEFAULT_ATTRS static inline uint64_t __gpu_num_blocks() {
+  return __gpu_num_blocks_x() * __gpu_num_blocks_y() * __gpu_num_blocks_z();
+}
+
+// Returns the absolute id of the block / workgroup.
+_DEFAULT_ATTRS static inline uint64_t __gpu_block_id() {
+  return __gpu_block_id_x() +
+         (uint64_t)__gpu_num_blocks_x() * __gpu_block_id_y() +
+         (uint64_t)__gpu_num_blocks_x() * __gpu_num_blocks_y() *
+             __gpu_block_id_z();
+}
+
+// Returns the total number of threads in the block / workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads() {
+  return __gpu_num_threads_x() * __gpu_num_threads_y() * __gpu_num_threads_z();
+}
+
+// Returns the absolute id of the thread in the current block / workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id() {
+  return __gpu_thread_id_x() + __gpu_num_threads_x() * __gpu_thread_id_y() +
+         __gpu_num_threads_x() * __gpu_num_threads_y() * __gpu_thread_id_z();
+}
+
+// Get the first active thread inside the lane.
+_DEFAULT_ATTRS static inline uint64_t
+__gpu_first_lane_id(uint64_t __lane_mask) {
+  return __builtin_ffsll(__lane_mask) - 1;
+}
+
+// Conditional that is only true for a single thread in a lane.
+_DEFAULT_ATTRS static inline bool __gpu_is_first_lane(uint64_t __lane_mask) {
+  return __gpu_lane_id() == __gpu_first_lane_id(__lane_mask);
+}
+
+// Gets the sum of all lanes inside the warp or wavefront.
+_DEFAULT_ATTRS static inline uint32_t __gpu_lane_reduce(uint64_t __lane_mask,
+                                                        uint32_t x) {
+  for (uint32_t step = __gpu_num_lanes() / 2; step > 0; step /= 2) {
+    uint32_t index = step + __gpu_lane_id();
+    x += __gpu_shuffle_idx(__lane_mask, index, x);
+  }
+  return __gpu_broadcast(__lane_mask, x);
+}
+
+// Gets the accumulator scan of the threads in the warp or wavefront.
+_DEFAULT_ATTRS static inline uint32_t __gpu_lane_scan(uint64_t __lane_mask,
+                                                      uint32_t x) {
+  for (uint32_t step = 1; step < __gpu_num_lanes(); step *= 2) {
+    uint32_t index = __gpu_lane_id() - step;
+    uint32_t bitmask = __gpu_lane_id() >= step;
+    x += -bitmask & __gpu_shuffle_idx(__lane_mask, index, x);
+  }
+  return x;
+}
+
+#undef _DEFAULT_ATTRS
+
+#endif // __GPUINTRIN_H
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
new file mode 100644
index 00000000000000..fc9769d4c578dd
--- /dev/null
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -0,0 +1,153 @@
+//===-- nvptxintrin.h - NVPTX intrinsic functions -------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __NVPTXINTRIN_H
+#define __NVPTXINTRIN_H
+
+#ifndef __NVPTX__
+#error "This file is intended for NVPTX targets or offloading to NVPTX"
+#endif
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device))
+#elif !defined(_DEFAULT_ATTRS)
+#define _DEFAULT_ATTRS
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(nvptx64)})
+
+// Type aliases to the address spaces used by the NVPTX backend.
+#define _Private __attribute__((opencl_private))
+#define _Constant __attribute__((opencl_constant))
+#define _Local __attribute__((opencl_local))
+#define _Global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _Kernel __attribute__((nvptx_kernel, visibility("protected")))
+
+// Returns the number of CUDA blocks in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_x() {
+  return __nvvm_read_ptx_sreg_nctaid_x();
+}
+
+// Returns the number of CUDA blocks in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_y() {
+  return __nvvm_read_ptx_sreg_nctaid_y();
+}
+
+// Returns the number of CUDA blocks in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_z() {
+  return __nvvm_read_ptx_sreg_nctaid_z();
+}
+
+// Returns the 'x' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_x() {
+  return __nvvm_read_ptx_sreg_ctaid_x();
+}
+
+// Returns the 'y' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_y() {
+  return __nvvm_read_ptx_sreg_ctaid_y();
+}
+
+// Returns the 'z' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_z() {
+  return __nvvm_read_ptx_sreg_ctaid_z();
+}
+
+// Returns the number of CUDA threads in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_x() {
+  return __nvvm_read_ptx_sreg_ntid_x();
+}
+
+// Returns the number of CUDA threads in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_y() {
+  return __nvvm_read_ptx_sreg_ntid_y();
+}
+
+// Returns the number of CUDA threads in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_z() {
+  return __nvvm_read_ptx_sreg_ntid_z();
+}
+
+// Returns the 'x' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_x() {
+  return __nvvm_read_ptx_sreg_tid_x();
+}
+
+// Returns the 'y' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_y() {
+  return __nvvm_read_ptx_sreg_tid_y();
+}
+
+// Returns the 'z' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_z() {
+  return __nvvm_read_ptx_sreg_tid_z();
+}
+
+// Returns the size of a CUDA warp, always 32 on NVIDIA hardware.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_lanes() {
+  return __nvvm_read_ptx_sreg_warpsize();
+}
+
+// Returns the id of the thread inside of a CUDA warp executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t __gpu_lane_id() {
+  return __nvvm_read_ptx_sreg_laneid();
+}
+
+// Returns the bit-mask of active threads in the current warp.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t __gpu_lane_mask() {
+  return __nvvm_activemask();
+}
+
+// Copies the value from the first active thread in the warp to the rest.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_broadcast(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  uint32_t __id = __builtin_ffs(__mask) - 1;
+  return __nvvm_shfl_sync_idx_i32(__mask, __x, __id, __gpu_num_lanes() - 1);
+}
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t
+__gpu_ballot(uint64_t __lane_mask, bool __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  return __nvvm_vote_ballot_sync(__mask, __x);
+}
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void __gpu_sync_threads() {
+  __syncthreads();
+}
+
+// Waits for all threads in the warp to reconverge for independent scheduling.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void
+__gpu_sync_lane(uint64_t __lane_mask) {
+  __nvvm_bar_warp_sync((uint32_t)__lane_mask);
+}
+
+// Shuffles the the lanes inside the warp according to the given index.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_shuffle_idx(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  uint32_t __bitmask = (__mask >> __idx) & 1u;
+  return -__bitmask &
+         __nvvm_shfl_sync_idx_i32(__mask, __x, __idx, __gpu_num_lanes() - 1u);
+}
+
+// Terminates execution of the calling thread.
+_DEFAULT_ATTRS [[noreturn]] static inline void __gpu_exit() { __nvvm_exit(); }
+
+#pragma omp end declare variant
+#pragma omp end declare target
+
+#endif // __NVPTXINTRIN_H
diff --git a/clang/test/Headers/gpuintrin.c b/clang/test/Headers/gpuintrin.c
new file mode 100644
index 00000000000000..ff5a816699ebef
--- /dev/null
+++ b/clang/test/Headers/gpuintrin.c
@@ -0,0 +1,508 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include \
+// RUN:   -internal-isystem %S/../../lib/Headers/ \
+// RUN:   -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefix=AMDGPU
+//
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include \
+// RUN:   -internal-isystem %S/../../lib/Headers/ \
+// RUN:   -target-feature +ptx62 \
+// RUN:   -triple nvptx64-nvidia-cuda -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefix=NVPTX
+
+#define _DEFAULT_ATTRS __attribute__((always_inline))
+#include <gpuintrin.h>
+
+// AMDGPU-LABEL: define dso_local void @foo(
+// AMDGPU-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGPU-NEXT:  [[ENTRY:.*:]]
+// AMDGPU-NEXT:    [[RETVAL_I116:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I114:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I112:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I110:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I19_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I17_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I15_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I12_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I9_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I103:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I104:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I101:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I99:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I97:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I7_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I4_I87:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I88:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I89:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I84:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I81:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I78:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I26_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I24_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I22_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I18_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I14_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I70:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I71:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I68:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I66:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I64:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I8_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I4_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I58:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I59:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I54:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I50:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I47:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I_I:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I42:%.*]] = alloca i1, align 1, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I43:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I38:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I39:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I32:%.*]] = a...
[truncated]

llvmbot · 2024-10-03T21:04:21Z

@llvm/pr-subscribers-backend-x86

Author: Joseph Huber (jhuber6)

Changes

Summary:
All GPU based languages provide some way to access things like the
thread ID or other resources. However, this is spread between many
different languages and it varies between targets. The goal here is to
provide a resource directory header that just provides these in an
easier to understand way, primarily so this can be used for C/C++ code.
The interface aims to be common, to faciliate easier porting, but target
specific stuff could be put in the individual headers.

Patch is 58.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110179.diff

6 Files Affected:

(modified) clang/lib/Headers/CMakeLists.txt (+14)
(added) clang/lib/Headers/amdgpuintrin.h (+153)
(added) clang/lib/Headers/gpuintrin.h (+76)
(added) clang/lib/Headers/nvptxintrin.h (+153)
(added) clang/test/Headers/gpuintrin.c (+508)
(added) clang/test/Headers/gpuintrin_lang.c (+61)

diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index ff392e7122a448..a0e7ae67b7219a 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -268,6 +268,12 @@ set(x86_files
   cpuid.h
   )
 
+set(gpu_files
+  gpuintrin.h
+  nvptxintrin.h
+  amdgpuintrin.h
+  )
+
 set(windows_only_files
   intrin0.h
   intrin.h
@@ -296,6 +302,7 @@ set(files
   ${systemz_files}
   ${ve_files}
   ${x86_files}
+  ${gpu_files}
   ${webassembly_files}
   ${windows_only_files}
   ${utility_files}
@@ -518,6 +525,7 @@ add_header_target("systemz-resource-headers" "${systemz_files};${zos_wrapper_fil
 add_header_target("ve-resource-headers" "${ve_files}")
 add_header_target("webassembly-resource-headers" "${webassembly_files}")
 add_header_target("x86-resource-headers" "${x86_files}")
+add_header_target("gpu-resource-headers" "${gpu_files}")
 
 # Other header groupings
 add_header_target("hlsl-resource-headers" ${hlsl_files})
@@ -704,6 +712,12 @@ install(
   EXCLUDE_FROM_ALL
   COMPONENT x86-resource-headers)
 
+install(
+  FILES ${gpu_files}
+  DESTINATION ${header_install_dir}
+  EXCLUDE_FROM_ALL
+  COMPONENT gpu-resource-headers)
+
 if(NOT CLANG_ENABLE_HLSL)
   set(EXCLUDE_HLSL EXCLUDE_FROM_ALL)
 endif()
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
new file mode 100644
index 00000000000000..1fd7261cf4ca75
--- /dev/null
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -0,0 +1,153 @@
+//===-- amdgpuintrin.h - AMDPGU intrinsic functions -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __AMDGPUINTRIN_H
+#define __AMDGPUINTRIN_H
+
+#ifndef __AMDGPU__
+#error "This file is intended for AMDGPU targets or offloading to AMDGPU"
+#endif
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device))
+#elif !defined(_DEFAULT_ATTRS)
+#define _DEFAULT_ATTRS
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(amdgcn)})
+
+// Type aliases to the address spaces used by the AMDGPU backend.
+#define _Private __attribute__((opencl_private))
+#define _Constant __attribute__((opencl_constant))
+#define _Local __attribute__((opencl_local))
+#define _Global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _Kernel __attribute__((amdgpu_kernel, visibility("protected")))
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_x() {
+  return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_y() {
+  return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_z() {
+  return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_x() {
+  return __builtin_amdgcn_workgroup_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_y() {
+  return __builtin_amdgcn_workgroup_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_z() {
+  return __builtin_amdgcn_workgroup_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_x() {
+  return __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_y() {
+  return __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_z() {
+  return __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_x() {
+  return __builtin_amdgcn_workitem_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_y() {
+  return __builtin_amdgcn_workitem_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_z() {
+  return __builtin_amdgcn_workitem_id_z();
+}
+
+// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware
+// and compilation options.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_lanes() {
+  return __builtin_amdgcn_wavefrontsize();
+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t __gpu_lane_id() {
+  return __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u));
+}
+
+// Returns the bit-mask of active threads in the current wavefront.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t __gpu_lane_mask() {
+  return __builtin_amdgcn_read_exec();
+}
+
+// Copies the value from the first active thread in the wavefront to the rest.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_broadcast(uint64_t __lane_mask, uint32_t __x) {
+  return __builtin_amdgcn_readfirstlane(__x);
+}
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t
+__gpu_ballot(uint64_t __lane_mask, bool __x) {
+  // The lane_mask & gives the nvptx semantics when lane_mask is a subset of
+  // the active threads
+  return __lane_mask & __builtin_amdgcn_ballot_w64(__x);
+}
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void __gpu_sync_threads() {
+  __builtin_amdgcn_s_barrier();
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "workgroup");
+}
+
+// Wait for all threads in the wavefront to converge, this is a noop on AMDGPU.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void
+__gpu_sync_lane(uint64_t __lane_mask) {
+  __builtin_amdgcn_wave_barrier();
+}
+
+// Shuffles the the lanes inside the wavefront according to the given index.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_shuffle_idx(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
+  return __builtin_amdgcn_ds_bpermute(__idx << 2, __x);
+}
+
+// Terminates execution of the associated wavefront.
+_DEFAULT_ATTRS [[noreturn]] static inline void __gpu_exit() {
+  __builtin_amdgcn_endpgm();
+}
+
+#pragma omp end declare variant
+#pragma omp end declare target
+
+#endif // __AMDGPUINTRIN_H
diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h
new file mode 100644
index 00000000000000..2531ad8bba70c8
--- /dev/null
+++ b/clang/lib/Headers/gpuintrin.h
@@ -0,0 +1,76 @@
+//===-- gpuintrin.h - Generic GPU intrinsic functions ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __GPUINTRIN_H
+#define __GPUINTRIN_H
+
+#if defined(__NVPTX__)
+#include <nvptxintrin.h>
+#elif defined(__AMDGPU__)
+#include <amdgpuintrin.h>
+#endif
+
+// Returns the total number of blocks / workgroups.
+_DEFAULT_ATTRS static inline uint64_t __gpu_num_blocks() {
+  return __gpu_num_blocks_x() * __gpu_num_blocks_y() * __gpu_num_blocks_z();
+}
+
+// Returns the absolute id of the block / workgroup.
+_DEFAULT_ATTRS static inline uint64_t __gpu_block_id() {
+  return __gpu_block_id_x() +
+         (uint64_t)__gpu_num_blocks_x() * __gpu_block_id_y() +
+         (uint64_t)__gpu_num_blocks_x() * __gpu_num_blocks_y() *
+             __gpu_block_id_z();
+}
+
+// Returns the total number of threads in the block / workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads() {
+  return __gpu_num_threads_x() * __gpu_num_threads_y() * __gpu_num_threads_z();
+}
+
+// Returns the absolute id of the thread in the current block / workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id() {
+  return __gpu_thread_id_x() + __gpu_num_threads_x() * __gpu_thread_id_y() +
+         __gpu_num_threads_x() * __gpu_num_threads_y() * __gpu_thread_id_z();
+}
+
+// Get the first active thread inside the lane.
+_DEFAULT_ATTRS static inline uint64_t
+__gpu_first_lane_id(uint64_t __lane_mask) {
+  return __builtin_ffsll(__lane_mask) - 1;
+}
+
+// Conditional that is only true for a single thread in a lane.
+_DEFAULT_ATTRS static inline bool __gpu_is_first_lane(uint64_t __lane_mask) {
+  return __gpu_lane_id() == __gpu_first_lane_id(__lane_mask);
+}
+
+// Gets the sum of all lanes inside the warp or wavefront.
+_DEFAULT_ATTRS static inline uint32_t __gpu_lane_reduce(uint64_t __lane_mask,
+                                                        uint32_t x) {
+  for (uint32_t step = __gpu_num_lanes() / 2; step > 0; step /= 2) {
+    uint32_t index = step + __gpu_lane_id();
+    x += __gpu_shuffle_idx(__lane_mask, index, x);
+  }
+  return __gpu_broadcast(__lane_mask, x);
+}
+
+// Gets the accumulator scan of the threads in the warp or wavefront.
+_DEFAULT_ATTRS static inline uint32_t __gpu_lane_scan(uint64_t __lane_mask,
+                                                      uint32_t x) {
+  for (uint32_t step = 1; step < __gpu_num_lanes(); step *= 2) {
+    uint32_t index = __gpu_lane_id() - step;
+    uint32_t bitmask = __gpu_lane_id() >= step;
+    x += -bitmask & __gpu_shuffle_idx(__lane_mask, index, x);
+  }
+  return x;
+}
+
+#undef _DEFAULT_ATTRS
+
+#endif // __GPUINTRIN_H
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
new file mode 100644
index 00000000000000..fc9769d4c578dd
--- /dev/null
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -0,0 +1,153 @@
+//===-- nvptxintrin.h - NVPTX intrinsic functions -------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __NVPTXINTRIN_H
+#define __NVPTXINTRIN_H
+
+#ifndef __NVPTX__
+#error "This file is intended for NVPTX targets or offloading to NVPTX"
+#endif
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device))
+#elif !defined(_DEFAULT_ATTRS)
+#define _DEFAULT_ATTRS
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(nvptx64)})
+
+// Type aliases to the address spaces used by the NVPTX backend.
+#define _Private __attribute__((opencl_private))
+#define _Constant __attribute__((opencl_constant))
+#define _Local __attribute__((opencl_local))
+#define _Global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _Kernel __attribute__((nvptx_kernel, visibility("protected")))
+
+// Returns the number of CUDA blocks in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_x() {
+  return __nvvm_read_ptx_sreg_nctaid_x();
+}
+
+// Returns the number of CUDA blocks in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_y() {
+  return __nvvm_read_ptx_sreg_nctaid_y();
+}
+
+// Returns the number of CUDA blocks in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_z() {
+  return __nvvm_read_ptx_sreg_nctaid_z();
+}
+
+// Returns the 'x' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_x() {
+  return __nvvm_read_ptx_sreg_ctaid_x();
+}
+
+// Returns the 'y' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_y() {
+  return __nvvm_read_ptx_sreg_ctaid_y();
+}
+
+// Returns the 'z' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_z() {
+  return __nvvm_read_ptx_sreg_ctaid_z();
+}
+
+// Returns the number of CUDA threads in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_x() {
+  return __nvvm_read_ptx_sreg_ntid_x();
+}
+
+// Returns the number of CUDA threads in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_y() {
+  return __nvvm_read_ptx_sreg_ntid_y();
+}
+
+// Returns the number of CUDA threads in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_z() {
+  return __nvvm_read_ptx_sreg_ntid_z();
+}
+
+// Returns the 'x' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_x() {
+  return __nvvm_read_ptx_sreg_tid_x();
+}
+
+// Returns the 'y' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_y() {
+  return __nvvm_read_ptx_sreg_tid_y();
+}
+
+// Returns the 'z' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_z() {
+  return __nvvm_read_ptx_sreg_tid_z();
+}
+
+// Returns the size of a CUDA warp, always 32 on NVIDIA hardware.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_lanes() {
+  return __nvvm_read_ptx_sreg_warpsize();
+}
+
+// Returns the id of the thread inside of a CUDA warp executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t __gpu_lane_id() {
+  return __nvvm_read_ptx_sreg_laneid();
+}
+
+// Returns the bit-mask of active threads in the current warp.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t __gpu_lane_mask() {
+  return __nvvm_activemask();
+}
+
+// Copies the value from the first active thread in the warp to the rest.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_broadcast(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  uint32_t __id = __builtin_ffs(__mask) - 1;
+  return __nvvm_shfl_sync_idx_i32(__mask, __x, __id, __gpu_num_lanes() - 1);
+}
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t
+__gpu_ballot(uint64_t __lane_mask, bool __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  return __nvvm_vote_ballot_sync(__mask, __x);
+}
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void __gpu_sync_threads() {
+  __syncthreads();
+}
+
+// Waits for all threads in the warp to reconverge for independent scheduling.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void
+__gpu_sync_lane(uint64_t __lane_mask) {
+  __nvvm_bar_warp_sync((uint32_t)__lane_mask);
+}
+
+// Shuffles the the lanes inside the warp according to the given index.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_shuffle_idx(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  uint32_t __bitmask = (__mask >> __idx) & 1u;
+  return -__bitmask &
+         __nvvm_shfl_sync_idx_i32(__mask, __x, __idx, __gpu_num_lanes() - 1u);
+}
+
+// Terminates execution of the calling thread.
+_DEFAULT_ATTRS [[noreturn]] static inline void __gpu_exit() { __nvvm_exit(); }
+
+#pragma omp end declare variant
+#pragma omp end declare target
+
+#endif // __NVPTXINTRIN_H
diff --git a/clang/test/Headers/gpuintrin.c b/clang/test/Headers/gpuintrin.c
new file mode 100644
index 00000000000000..ff5a816699ebef
--- /dev/null
+++ b/clang/test/Headers/gpuintrin.c
@@ -0,0 +1,508 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include \
+// RUN:   -internal-isystem %S/../../lib/Headers/ \
+// RUN:   -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefix=AMDGPU
+//
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include \
+// RUN:   -internal-isystem %S/../../lib/Headers/ \
+// RUN:   -target-feature +ptx62 \
+// RUN:   -triple nvptx64-nvidia-cuda -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefix=NVPTX
+
+#define _DEFAULT_ATTRS __attribute__((always_inline))
+#include <gpuintrin.h>
+
+// AMDGPU-LABEL: define dso_local void @foo(
+// AMDGPU-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGPU-NEXT:  [[ENTRY:.*:]]
+// AMDGPU-NEXT:    [[RETVAL_I116:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I114:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I112:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I110:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I19_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I17_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I15_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I12_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I9_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I103:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I104:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I101:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I99:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I97:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I7_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I4_I87:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I88:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I89:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I84:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I81:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I78:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I26_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I24_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I22_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I18_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I14_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I70:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I71:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I68:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I66:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I64:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I8_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I4_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I58:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I59:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I54:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I50:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I47:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I_I:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I42:%.*]] = alloca i1, align 1, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I43:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I38:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I39:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I32:%.*]] = a...
[truncated]

llvmbot · 2024-10-03T21:04:21Z

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
All GPU based languages provide some way to access things like the
thread ID or other resources. However, this is spread between many
different languages and it varies between targets. The goal here is to
provide a resource directory header that just provides these in an
easier to understand way, primarily so this can be used for C/C++ code.
The interface aims to be common, to faciliate easier porting, but target
specific stuff could be put in the individual headers.

Patch is 58.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110179.diff

6 Files Affected:

(modified) clang/lib/Headers/CMakeLists.txt (+14)
(added) clang/lib/Headers/amdgpuintrin.h (+153)
(added) clang/lib/Headers/gpuintrin.h (+76)
(added) clang/lib/Headers/nvptxintrin.h (+153)
(added) clang/test/Headers/gpuintrin.c (+508)
(added) clang/test/Headers/gpuintrin_lang.c (+61)

diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index ff392e7122a448..a0e7ae67b7219a 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -268,6 +268,12 @@ set(x86_files
   cpuid.h
   )
 
+set(gpu_files
+  gpuintrin.h
+  nvptxintrin.h
+  amdgpuintrin.h
+  )
+
 set(windows_only_files
   intrin0.h
   intrin.h
@@ -296,6 +302,7 @@ set(files
   ${systemz_files}
   ${ve_files}
   ${x86_files}
+  ${gpu_files}
   ${webassembly_files}
   ${windows_only_files}
   ${utility_files}
@@ -518,6 +525,7 @@ add_header_target("systemz-resource-headers" "${systemz_files};${zos_wrapper_fil
 add_header_target("ve-resource-headers" "${ve_files}")
 add_header_target("webassembly-resource-headers" "${webassembly_files}")
 add_header_target("x86-resource-headers" "${x86_files}")
+add_header_target("gpu-resource-headers" "${gpu_files}")
 
 # Other header groupings
 add_header_target("hlsl-resource-headers" ${hlsl_files})
@@ -704,6 +712,12 @@ install(
   EXCLUDE_FROM_ALL
   COMPONENT x86-resource-headers)
 
+install(
+  FILES ${gpu_files}
+  DESTINATION ${header_install_dir}
+  EXCLUDE_FROM_ALL
+  COMPONENT gpu-resource-headers)
+
 if(NOT CLANG_ENABLE_HLSL)
   set(EXCLUDE_HLSL EXCLUDE_FROM_ALL)
 endif()
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
new file mode 100644
index 00000000000000..1fd7261cf4ca75
--- /dev/null
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -0,0 +1,153 @@
+//===-- amdgpuintrin.h - AMDPGU intrinsic functions -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __AMDGPUINTRIN_H
+#define __AMDGPUINTRIN_H
+
+#ifndef __AMDGPU__
+#error "This file is intended for AMDGPU targets or offloading to AMDGPU"
+#endif
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device))
+#elif !defined(_DEFAULT_ATTRS)
+#define _DEFAULT_ATTRS
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(amdgcn)})
+
+// Type aliases to the address spaces used by the AMDGPU backend.
+#define _Private __attribute__((opencl_private))
+#define _Constant __attribute__((opencl_constant))
+#define _Local __attribute__((opencl_local))
+#define _Global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _Kernel __attribute__((amdgpu_kernel, visibility("protected")))
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_x() {
+  return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_y() {
+  return __builtin_amdgcn_grid_size_y() / __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_z() {
+  return __builtin_amdgcn_grid_size_z() / __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_x() {
+  return __builtin_amdgcn_workgroup_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_y() {
+  return __builtin_amdgcn_workgroup_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_z() {
+  return __builtin_amdgcn_workgroup_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_x() {
+  return __builtin_amdgcn_workgroup_size_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_y() {
+  return __builtin_amdgcn_workgroup_size_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_z() {
+  return __builtin_amdgcn_workgroup_size_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_x() {
+  return __builtin_amdgcn_workitem_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_y() {
+  return __builtin_amdgcn_workitem_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current AMD workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_z() {
+  return __builtin_amdgcn_workitem_id_z();
+}
+
+// Returns the size of an AMD wavefront, either 32 or 64 depending on hardware
+// and compilation options.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_lanes() {
+  return __builtin_amdgcn_wavefrontsize();
+}
+
+// Returns the id of the thread inside of an AMD wavefront executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t __gpu_lane_id() {
+  return __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u));
+}
+
+// Returns the bit-mask of active threads in the current wavefront.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t __gpu_lane_mask() {
+  return __builtin_amdgcn_read_exec();
+}
+
+// Copies the value from the first active thread in the wavefront to the rest.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_broadcast(uint64_t __lane_mask, uint32_t __x) {
+  return __builtin_amdgcn_readfirstlane(__x);
+}
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t
+__gpu_ballot(uint64_t __lane_mask, bool __x) {
+  // The lane_mask & gives the nvptx semantics when lane_mask is a subset of
+  // the active threads
+  return __lane_mask & __builtin_amdgcn_ballot_w64(__x);
+}
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void __gpu_sync_threads() {
+  __builtin_amdgcn_s_barrier();
+  __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "workgroup");
+}
+
+// Wait for all threads in the wavefront to converge, this is a noop on AMDGPU.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void
+__gpu_sync_lane(uint64_t __lane_mask) {
+  __builtin_amdgcn_wave_barrier();
+}
+
+// Shuffles the the lanes inside the wavefront according to the given index.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_shuffle_idx(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
+  return __builtin_amdgcn_ds_bpermute(__idx << 2, __x);
+}
+
+// Terminates execution of the associated wavefront.
+_DEFAULT_ATTRS [[noreturn]] static inline void __gpu_exit() {
+  __builtin_amdgcn_endpgm();
+}
+
+#pragma omp end declare variant
+#pragma omp end declare target
+
+#endif // __AMDGPUINTRIN_H
diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h
new file mode 100644
index 00000000000000..2531ad8bba70c8
--- /dev/null
+++ b/clang/lib/Headers/gpuintrin.h
@@ -0,0 +1,76 @@
+//===-- gpuintrin.h - Generic GPU intrinsic functions ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __GPUINTRIN_H
+#define __GPUINTRIN_H
+
+#if defined(__NVPTX__)
+#include <nvptxintrin.h>
+#elif defined(__AMDGPU__)
+#include <amdgpuintrin.h>
+#endif
+
+// Returns the total number of blocks / workgroups.
+_DEFAULT_ATTRS static inline uint64_t __gpu_num_blocks() {
+  return __gpu_num_blocks_x() * __gpu_num_blocks_y() * __gpu_num_blocks_z();
+}
+
+// Returns the absolute id of the block / workgroup.
+_DEFAULT_ATTRS static inline uint64_t __gpu_block_id() {
+  return __gpu_block_id_x() +
+         (uint64_t)__gpu_num_blocks_x() * __gpu_block_id_y() +
+         (uint64_t)__gpu_num_blocks_x() * __gpu_num_blocks_y() *
+             __gpu_block_id_z();
+}
+
+// Returns the total number of threads in the block / workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads() {
+  return __gpu_num_threads_x() * __gpu_num_threads_y() * __gpu_num_threads_z();
+}
+
+// Returns the absolute id of the thread in the current block / workgroup.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id() {
+  return __gpu_thread_id_x() + __gpu_num_threads_x() * __gpu_thread_id_y() +
+         __gpu_num_threads_x() * __gpu_num_threads_y() * __gpu_thread_id_z();
+}
+
+// Get the first active thread inside the lane.
+_DEFAULT_ATTRS static inline uint64_t
+__gpu_first_lane_id(uint64_t __lane_mask) {
+  return __builtin_ffsll(__lane_mask) - 1;
+}
+
+// Conditional that is only true for a single thread in a lane.
+_DEFAULT_ATTRS static inline bool __gpu_is_first_lane(uint64_t __lane_mask) {
+  return __gpu_lane_id() == __gpu_first_lane_id(__lane_mask);
+}
+
+// Gets the sum of all lanes inside the warp or wavefront.
+_DEFAULT_ATTRS static inline uint32_t __gpu_lane_reduce(uint64_t __lane_mask,
+                                                        uint32_t x) {
+  for (uint32_t step = __gpu_num_lanes() / 2; step > 0; step /= 2) {
+    uint32_t index = step + __gpu_lane_id();
+    x += __gpu_shuffle_idx(__lane_mask, index, x);
+  }
+  return __gpu_broadcast(__lane_mask, x);
+}
+
+// Gets the accumulator scan of the threads in the warp or wavefront.
+_DEFAULT_ATTRS static inline uint32_t __gpu_lane_scan(uint64_t __lane_mask,
+                                                      uint32_t x) {
+  for (uint32_t step = 1; step < __gpu_num_lanes(); step *= 2) {
+    uint32_t index = __gpu_lane_id() - step;
+    uint32_t bitmask = __gpu_lane_id() >= step;
+    x += -bitmask & __gpu_shuffle_idx(__lane_mask, index, x);
+  }
+  return x;
+}
+
+#undef _DEFAULT_ATTRS
+
+#endif // __GPUINTRIN_H
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
new file mode 100644
index 00000000000000..fc9769d4c578dd
--- /dev/null
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -0,0 +1,153 @@
+//===-- nvptxintrin.h - NVPTX intrinsic functions -------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __NVPTXINTRIN_H
+#define __NVPTXINTRIN_H
+
+#ifndef __NVPTX__
+#error "This file is intended for NVPTX targets or offloading to NVPTX"
+#endif
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#if defined(__HIP__) || defined(__CUDA__)
+#define _DEFAULT_ATTRS __attribute__((device))
+#elif !defined(_DEFAULT_ATTRS)
+#define _DEFAULT_ATTRS
+#endif
+
+#pragma omp begin declare target device_type(nohost)
+#pragma omp begin declare variant match(device = {arch(nvptx64)})
+
+// Type aliases to the address spaces used by the NVPTX backend.
+#define _Private __attribute__((opencl_private))
+#define _Constant __attribute__((opencl_constant))
+#define _Local __attribute__((opencl_local))
+#define _Global __attribute__((opencl_global))
+
+// Attribute to declare a function as a kernel.
+#define _Kernel __attribute__((nvptx_kernel, visibility("protected")))
+
+// Returns the number of CUDA blocks in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_x() {
+  return __nvvm_read_ptx_sreg_nctaid_x();
+}
+
+// Returns the number of CUDA blocks in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_y() {
+  return __nvvm_read_ptx_sreg_nctaid_y();
+}
+
+// Returns the number of CUDA blocks in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_blocks_z() {
+  return __nvvm_read_ptx_sreg_nctaid_z();
+}
+
+// Returns the 'x' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_x() {
+  return __nvvm_read_ptx_sreg_ctaid_x();
+}
+
+// Returns the 'y' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_y() {
+  return __nvvm_read_ptx_sreg_ctaid_y();
+}
+
+// Returns the 'z' dimension of the current CUDA block's id.
+_DEFAULT_ATTRS static inline uint32_t __gpu_block_id_z() {
+  return __nvvm_read_ptx_sreg_ctaid_z();
+}
+
+// Returns the number of CUDA threads in the 'x' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_x() {
+  return __nvvm_read_ptx_sreg_ntid_x();
+}
+
+// Returns the number of CUDA threads in the 'y' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_y() {
+  return __nvvm_read_ptx_sreg_ntid_y();
+}
+
+// Returns the number of CUDA threads in the 'z' dimension.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_threads_z() {
+  return __nvvm_read_ptx_sreg_ntid_z();
+}
+
+// Returns the 'x' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_x() {
+  return __nvvm_read_ptx_sreg_tid_x();
+}
+
+// Returns the 'y' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_y() {
+  return __nvvm_read_ptx_sreg_tid_y();
+}
+
+// Returns the 'z' dimension id of the thread in the current CUDA block.
+_DEFAULT_ATTRS static inline uint32_t __gpu_thread_id_z() {
+  return __nvvm_read_ptx_sreg_tid_z();
+}
+
+// Returns the size of a CUDA warp, always 32 on NVIDIA hardware.
+_DEFAULT_ATTRS static inline uint32_t __gpu_num_lanes() {
+  return __nvvm_read_ptx_sreg_warpsize();
+}
+
+// Returns the id of the thread inside of a CUDA warp executing together.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t __gpu_lane_id() {
+  return __nvvm_read_ptx_sreg_laneid();
+}
+
+// Returns the bit-mask of active threads in the current warp.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t __gpu_lane_mask() {
+  return __nvvm_activemask();
+}
+
+// Copies the value from the first active thread in the warp to the rest.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_broadcast(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  uint32_t __id = __builtin_ffs(__mask) - 1;
+  return __nvvm_shfl_sync_idx_i32(__mask, __x, __id, __gpu_num_lanes() - 1);
+}
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint64_t
+__gpu_ballot(uint64_t __lane_mask, bool __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  return __nvvm_vote_ballot_sync(__mask, __x);
+}
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void __gpu_sync_threads() {
+  __syncthreads();
+}
+
+// Waits for all threads in the warp to reconverge for independent scheduling.
+_DEFAULT_ATTRS [[clang::convergent]] static inline void
+__gpu_sync_lane(uint64_t __lane_mask) {
+  __nvvm_bar_warp_sync((uint32_t)__lane_mask);
+}
+
+// Shuffles the the lanes inside the warp according to the given index.
+_DEFAULT_ATTRS [[clang::convergent]] static inline uint32_t
+__gpu_shuffle_idx(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
+  uint32_t __mask = (uint32_t)__lane_mask;
+  uint32_t __bitmask = (__mask >> __idx) & 1u;
+  return -__bitmask &
+         __nvvm_shfl_sync_idx_i32(__mask, __x, __idx, __gpu_num_lanes() - 1u);
+}
+
+// Terminates execution of the calling thread.
+_DEFAULT_ATTRS [[noreturn]] static inline void __gpu_exit() { __nvvm_exit(); }
+
+#pragma omp end declare variant
+#pragma omp end declare target
+
+#endif // __NVPTXINTRIN_H
diff --git a/clang/test/Headers/gpuintrin.c b/clang/test/Headers/gpuintrin.c
new file mode 100644
index 00000000000000..ff5a816699ebef
--- /dev/null
+++ b/clang/test/Headers/gpuintrin.c
@@ -0,0 +1,508 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include \
+// RUN:   -internal-isystem %S/../../lib/Headers/ \
+// RUN:   -triple amdgcn-amd-amdhsa -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefix=AMDGPU
+//
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include \
+// RUN:   -internal-isystem %S/../../lib/Headers/ \
+// RUN:   -target-feature +ptx62 \
+// RUN:   -triple nvptx64-nvidia-cuda -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefix=NVPTX
+
+#define _DEFAULT_ATTRS __attribute__((always_inline))
+#include <gpuintrin.h>
+
+// AMDGPU-LABEL: define dso_local void @foo(
+// AMDGPU-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGPU-NEXT:  [[ENTRY:.*:]]
+// AMDGPU-NEXT:    [[RETVAL_I116:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I114:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I112:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I110:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I19_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I17_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I15_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I12_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I9_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I103:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I104:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I101:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I99:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I97:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I7_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I4_I87:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I88:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I89:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I84:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I81:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I78:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I26_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I24_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I22_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I18_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I14_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I70:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I71:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I68:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I66:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I64:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I8_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I4_I:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I58:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I59:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I54:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I50:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I47:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I_I:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I_I:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I42:%.*]] = alloca i1, align 1, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I43:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I38:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[__LANE_MASK_ADDR_I39:%.*]] = alloca i64, align 8, addrspace(5)
+// AMDGPU-NEXT:    [[RETVAL_I32:%.*]] = a...
[truncated]

jdoerfert

LG

clang/lib/Headers/gpuintrin.h

github-actions · 2024-11-11T14:08:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm-ci · 2024-11-11T18:16:05Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-ubuntu-fast running on sie-linux-worker while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/11351

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/lib/clang/20/include -nostdsysteminc -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/Inputs/include    -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -  | /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/lib/clang/20/include -nostdsysteminc -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/Inputs/include -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
�[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c:39:15: �[0m�[0;1;31merror: �[0m�[1mCUDA-NEXT: expected string not found in input
�[0m// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
�[0;1;32m              ^
�[0m�[1m<stdin>:8:7: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0mentry:
�[0;1;32m      ^
�[0m�[1m<stdin>:9:2: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
�[0;1;32m ^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m; ModuleID = '/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c' �[0m
�[0;1;30m            2: �[0m�[1m�[0;1;46msource_filename = "/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Headers/gpuintrin_lang.c" �[0m
�[0;1;30m            3: �[0m�[1m�[0;1;46mtarget datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" �[0m
�[0;1;30m            4: �[0m�[1m�[0;1;46mtarget triple = "nvptx64" �[0m
�[0;1;30m            5: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m            6: �[0m�[1m�[0;1;46m; Function Attrs: convergent noinline nounwind optnone �[0m
�[0;1;30m            7: �[0m�[1m�[0;1;46m�[0mdefine dso_local i32 @foo() #0 {�[0;1;46m �[0m
�[0;1;32mlabel:36'0     ^~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32mlabel:36'1     ^~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32msame:37'0                                ^~~~~~
�[0m�[0;1;32msame:37'1                                   ^    captured var "ATTR0"
�[0m�[0;1;30m            8: �[0m�[1m�[0;1;46m�[0mentry:�[0;1;46m �[0m
�[0;1;32mnext:38'0      ^~~~~~
�[0m�[0;1;32mnext:38'1      ^~~~~~  captured var "ENTRY"
�[0m�[0;1;31mnext:39'0            X error: no match found
�[0m�[0;1;30m            9: �[0m�[1m�[0;1;46m %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() �[0m
�[0;1;31mnext:39'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;35mnext:39'1       ?                                               possible intended match
�[0m�[0;1;30m           10: �[0m�[1m�[0;1;46m ret i32 %0 �[0m
�[0;1;31mnext:39'0      ~~~~~~~~~~~~
�[0m�[0;1;30m           11: �[0m�[1m�[0;1;46m} �[0m
�[0;1;31mnext:39'0      ~~
�[0m�[0;1;30m           12: �[0m�[1m�[0;1;46m �[0m
�[0;1;31mnext:39'0      ~
�[0m�[0;1;30m           13: �[0m�[1m�[0;1;46m; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) �[0m
...

llvm-ci · 2024-11-11T18:17:14Z

LLVM Buildbot has detected a new failure on builder clang-ve-ninja running on hpce-ve-main while building clang at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/12/builds/9354

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/ve-linux.py ...' (failure)
...
[295/301] Linking CXX executable tools/clang/unittests/Driver/ClangDriverTests
[296/301] Linking CXX executable tools/clang/unittests/CodeGen/ClangCodeGenTests
[297/301] Linking CXX executable tools/clang/unittests/Tooling/ToolingTests
[298/301] Linking CXX executable tools/clang/unittests/Frontend/FrontendTests
[299/301] Linking CXX executable tools/clang/unittests/Interpreter/ExceptionTests/ClangReplInterpreterExceptionTests
[300/301] Linking CXX executable tools/clang/unittests/Interpreter/ClangReplInterpreterTests
[300/301] Running the Clang regression tests
-- Testing: 21288 tests, 48 workers --
llvm-lit: /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using clang: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang
Testing:  0.. 10.. 20.. 30.. 40.. 50.
FAIL: Clang :: Headers/gpuintrin_lang.c (12017 of 21288)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/Inputs/include    -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -  | /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/Inputs/include -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
Step 8 (check-llvm) failure: check-llvm (failure)
...
[295/301] Linking CXX executable tools/clang/unittests/Driver/ClangDriverTests
[296/301] Linking CXX executable tools/clang/unittests/CodeGen/ClangCodeGenTests
[297/301] Linking CXX executable tools/clang/unittests/Tooling/ToolingTests
[298/301] Linking CXX executable tools/clang/unittests/Frontend/FrontendTests
[299/301] Linking CXX executable tools/clang/unittests/Interpreter/ExceptionTests/ClangReplInterpreterExceptionTests
[300/301] Linking CXX executable tools/clang/unittests/Interpreter/ClangReplInterpreterTests
[300/301] Running the Clang regression tests
-- Testing: 21288 tests, 48 workers --
llvm-lit: /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using clang: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang
Testing:  0.. 10.. 20.. 30.. 40.. 50.
FAIL: Clang :: Headers/gpuintrin_lang.c (12017 of 21288)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/Inputs/include    -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -  | /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/Inputs/include -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0

llvm-ci · 2024-11-11T18:25:33Z

LLVM Buildbot has detected a new failure on builder openmp-offload-sles-build-only running on rocm-worker-hw-04-sles while building clang at step 6 "Add check check-clang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/10631

Here is the relevant piece of the build log for the reference

Step 6 (Add check check-clang) failure: test (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/clang -cc1 -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/lib/clang/20/include -nostdsysteminc -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/Inputs/include    -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c -o -  | /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/clang -cc1 -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/lib/clang/20/include -nostdsysteminc -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/Inputs/include -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c -o -
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T18:26:24Z

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/9263

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/20/include -nostdsysteminc -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/Inputs/include    -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -  | /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/20/include -nostdsysteminc -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/Inputs/include -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c:39:15: �[0m�[0;1;31merror: �[0m�[1mCUDA-NEXT: expected string not found in input
�[0m// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
�[0;1;32m              ^
�[0m�[1m<stdin>:8:7: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0mentry:
�[0;1;32m      ^
�[0m�[1m<stdin>:9:2: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
�[0;1;32m ^
�[0m
Input file: <stdin>
Check file: /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m; ModuleID = '/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c' �[0m
�[0;1;30m            2: �[0m�[1m�[0;1;46msource_filename = "/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c" �[0m
�[0;1;30m            3: �[0m�[1m�[0;1;46mtarget datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" �[0m
�[0;1;30m            4: �[0m�[1m�[0;1;46mtarget triple = "nvptx64" �[0m
�[0;1;30m            5: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m            6: �[0m�[1m�[0;1;46m; Function Attrs: convergent noinline nounwind optnone �[0m
�[0;1;30m            7: �[0m�[1m�[0;1;46m�[0mdefine dso_local i32 @foo() #0 {�[0;1;46m �[0m
�[0;1;32mlabel:36'0     ^~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32mlabel:36'1     ^~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32msame:37'0                                ^~~~~~
�[0m�[0;1;32msame:37'1                                   ^    captured var "ATTR0"
�[0m�[0;1;30m            8: �[0m�[1m�[0;1;46m�[0mentry:�[0;1;46m �[0m
�[0;1;32mnext:38'0      ^~~~~~
�[0m�[0;1;32mnext:38'1      ^~~~~~  captured var "ENTRY"
�[0m�[0;1;31mnext:39'0            X error: no match found
�[0m�[0;1;30m            9: �[0m�[1m�[0;1;46m %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() �[0m
�[0;1;31mnext:39'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;35mnext:39'1       ?                                               possible intended match
�[0m�[0;1;30m           10: �[0m�[1m�[0;1;46m ret i32 %0 �[0m
�[0;1;31mnext:39'0      ~~~~~~~~~~~~
�[0m�[0;1;30m           11: �[0m�[1m�[0;1;46m} �[0m
�[0;1;31mnext:39'0      ~~
�[0m�[0;1;30m           12: �[0m�[1m�[0;1;46m �[0m
�[0;1;31mnext:39'0      ~
�[0m�[0;1;30m           13: �[0m�[1m�[0;1;46m; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) �[0m
...

llvm-ci · 2024-11-11T18:29:11Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-win running on sie-win-worker while building clang at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/46/builds/7704

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
z:\b\llvm-clang-x86_64-sie-win\build\bin\clang.exe -cc1 -internal-isystem Z:\b\llvm-clang-x86_64-sie-win\build\lib\clang\20\include -nostdsysteminc -internal-isystem Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers/Inputs/include    -internal-isystem Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers/../../lib/Headers/cuda_wrappers    -internal-isystem Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers\gpuintrin_lang.c -o -  | z:\b\llvm-clang-x86_64-sie-win\build\bin\filecheck.exe Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers\gpuintrin_lang.c --check-prefix=CUDA
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\clang.exe' -cc1 -internal-isystem 'Z:\b\llvm-clang-x86_64-sie-win\build\lib\clang\20\include' -nostdsysteminc -internal-isystem 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers/Inputs/include' -internal-isystem 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers/../../lib/Headers/cuda_wrappers' -internal-isystem 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers/../../lib/Headers/' -fcuda-is-device -triple nvptx64 -emit-llvm 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers\gpuintrin_lang.c' -o -
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\filecheck.exe' 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers\gpuintrin_lang.c' --check-prefix=CUDA
# .---command stderr------------
# | �[1mZ:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers\gpuintrin_lang.c:39:15: �[0m�[0;1;31merror: �[0m�[1mCUDA-NEXT: expected string not found in input
�[0m# | �[1m�[0m// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
# | �[0;1;32m              ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:8:7: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m# | �[1m�[0mentry:
# | �[0;1;32m      ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:9:2: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m# | �[1m�[0m %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
# | �[0;1;32m ^
�[0m# | �[0;1;32m�[0m
# | Input file: <stdin>
# | Check file: Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers\gpuintrin_lang.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# | �[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m; ModuleID = 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\Headers\gpuintrin_lang.c' �[0m
# | �[0;1;30m            2: �[0m�[1m�[0;1;46msource_filename = "Z:\\b\\llvm-clang-x86_64-sie-win\\llvm-project\\clang\\test\\Headers\\gpuintrin_lang.c" �[0m
# | �[0;1;30m            3: �[0m�[1m�[0;1;46mtarget datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" �[0m
# | �[0;1;30m            4: �[0m�[1m�[0;1;46mtarget triple = "nvptx64" �[0m
# | �[0;1;30m            5: �[0m�[1m�[0;1;46m �[0m
# | �[0;1;30m            6: �[0m�[1m�[0;1;46m; Function Attrs: convergent noinline nounwind optnone �[0m
# | �[0;1;30m            7: �[0m�[1m�[0;1;46m�[0mdefine dso_local i32 @foo() #0 {�[0;1;46m �[0m
# | �[0;1;32mlabel:36'0     ^~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;32mlabel:36'1     ^~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;32msame:37'0                                ^~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;32msame:37'1                                   ^    captured var "ATTR0"
�[0m# | �[0;1;32m�[0m�[0;1;30m            8: �[0m�[1m�[0;1;46m�[0mentry:�[0;1;46m �[0m
# | �[0;1;32mnext:38'0      ^~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;32mnext:38'1      ^~~~~~  captured var "ENTRY"
�[0m# | �[0;1;32m�[0m�[0;1;31mnext:39'0            X error: no match found
�[0m# | �[0;1;31m�[0m�[0;1;30m            9: �[0m�[1m�[0;1;46m %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() �[0m
# | �[0;1;31mnext:39'0      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;31m�[0m�[0;1;35mnext:39'1       ?                                               possible intended match
�[0m# | �[0;1;35m�[0m�[0;1;30m           10: �[0m�[1m�[0;1;46m ret i32 %0 �[0m
# | �[0;1;31mnext:39'0      ~~~~~~~~~~~~
�[0m# | �[0;1;31m�[0m�[0;1;30m           11: �[0m�[1m�[0;1;46m} �[0m
# | �[0;1;31mnext:39'0      ~~
�[0m# | �[0;1;31m�[0m�[0;1;30m           12: �[0m�[1m�[0;1;46m �[0m
...

jhuber6 · 2024-11-11T18:29:43Z

Seems there's something slightly different from the autogenerated IR for the language test. I'll see if I can fix it.

llvm-ci · 2024-11-11T18:33:28Z

LLVM Buildbot has detected a new failure on builder clang-cmake-x86_64-avx512-linux running on avx512-intel64 while building clang at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/133/builds/6621

Here is the relevant piece of the build log for the reference

Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/clang -cc1 -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/Inputs/include    -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c -o -  | /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/FileCheck /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/clang -cc1 -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/Inputs/include -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c -o -
+ /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/FileCheck /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T18:33:40Z

LLVM Buildbot has detected a new failure on builder clang-armv8-quick running on linaro-clang-armv8-quick while building clang at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/7298

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/Inputs/include    -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c -o -  | /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/Inputs/include -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c -o -
+ /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T18:33:41Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-quick running on linaro-clang-aarch64-quick while building clang at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/65/builds/7589

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/Inputs/include    -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c -o -  | /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/Inputs/include -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c -o -
+ /home/tcwg-buildbot/worker/clang-aarch64-quick/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/home/tcwg-buildbot/worker/clang-aarch64-quick/llvm/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T19:00:41Z

LLVM Buildbot has detected a new failure on builder arc-builder running on arc-worker while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/7462

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /buildbot/worker/arc-folder/build/bin/clang -cc1 -internal-isystem /buildbot/worker/arc-folder/build/lib/clang/20/include -nostdsysteminc -internal-isystem /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/Inputs/include    -internal-isystem /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -  | /buildbot/worker/arc-folder/build/bin/FileCheck /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /buildbot/worker/arc-folder/build/bin/FileCheck /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /buildbot/worker/arc-folder/build/bin/clang -cc1 -internal-isystem /buildbot/worker/arc-folder/build/lib/clang/20/include -nostdsysteminc -internal-isystem /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/Inputs/include -internal-isystem /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -
/buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/buildbot/worker/arc-folder/llvm-project/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T19:00:51Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-darwin running on doug-worker-3 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/23/builds/4705

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/20/include -nostdsysteminc -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/Inputs/include    -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -  | /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/FileCheck /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/20/include -nostdsysteminc -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/Inputs/include -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c -o -
+ /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/FileCheck /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T19:06:39Z

LLVM Buildbot has detected a new failure on builder clang-m68k-linux-cross running on suse-gary-m68k-cross while building clang at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/1868

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/clang -cc1 -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/Inputs/include    -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c -o -  | /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/FileCheck /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/clang -cc1 -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/Inputs/include -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c -o -
+ /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/FileCheck /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T21:00:36Z

LLVM Buildbot has detected a new failure on builder clang-hexagon-elf running on hexagon-build-03 while building clang at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/40/builds/2750

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/stage1/bin/clang -cc1 -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/Inputs/include    -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c -o -  | /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/stage1/bin/FileCheck /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/stage1/bin/clang -cc1 -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/Inputs/include -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c -o -
+ /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/stage1/bin/FileCheck /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/local/mnt/workspace/bots/hexagon-build-03/clang-hexagon-elf/llvm/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm-ci · 2024-11-11T21:03:35Z

LLVM Buildbot has detected a new failure on builder clang-solaris11-sparcv9 running on solaris11-sparcv9 while building clang at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/13/builds/3473

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: Headers/gpuintrin_lang.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/clang -cc1 -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/Inputs/include    -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers    -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/../../lib/Headers/    -fcuda-is-device -triple nvptx64 -emit-llvm /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c -o -  | /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/FileCheck /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
+ /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/clang -cc1 -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/lib/clang/20/include -nostdsysteminc -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/Inputs/include -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/../../lib/Headers/cuda_wrappers -internal-isystem /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/../../lib/Headers/ -fcuda-is-device -triple nvptx64 -emit-llvm /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c -o -
+ /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/stage1/bin/FileCheck /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c --check-prefix=CUDA
/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c:39:15: error: CUDA-NEXT: expected string not found in input
// CUDA-NEXT: [[TMP0:%.*]] = call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
              ^
<stdin>:8:7: note: scanning from here
entry:
      ^
<stdin>:9:2: note: possible intended match here
 %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
 ^

Input file: <stdin>
Check file: /opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c' 
           2: source_filename = "/opt/llvm-buildbot/home/solaris11-sparcv9/clang-solaris11-sparcv9/llvm/clang/test/Headers/gpuintrin_lang.c" 
           3: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64" 
           4: target triple = "nvptx64" 
           5:  
           6: ; Function Attrs: convergent noinline nounwind optnone 
           7: define dso_local i32 @foo() #0 { 
           8: entry: 
next:39'0           X error: no match found
           9:  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:39'1      ?                                               possible intended match
          10:  ret i32 %0 
next:39'0     ~~~~~~~~~~~~
          11: } 
next:39'0     ~~
          12:  
next:39'0     ~
          13: ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          14: declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1 
next:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
...

llvm#110179) Summary: All GPU based languages provide some way to access things like the thread ID or other resources. However, this is spread between many different languages and it varies between targets. The goal here is to provide a resource directory header that just provides these in an easier to understand way, primarily so this can be used for C/C++ code. The interface aims to be common, to faciliate easier porting, but target specific stuff could be put in the individual headers.

jhuber6 force-pushed the gpuintrinsics branch from 8a81731 to f5a8afe Compare September 26, 2024 22:31

jdoerfert reviewed Sep 27, 2024

View reviewed changes

jhuber6 mentioned this pull request Sep 27, 2024

[NVPTX] Add a clang builtin for the warpsize intrinsic #110316

Merged

jhuber6 added a commit to jhuber6/llvm-project that referenced this pull request Sep 27, 2024

[NVPTX] Add a clang builtin for the warpsize intrinsic

63d4584

Summary: There's an intrinsic for the warp size, we want to expose this to make the interface proposed in llvm#110179 more generic.

shiltian reviewed Sep 27, 2024

View reviewed changes

clang/lib/Headers/amdgpuintrin.h Outdated Show resolved Hide resolved

jhuber6 added a commit that referenced this pull request Sep 27, 2024

[NVPTX] Add a clang builtin for the warpsize intrinsic (#110316)

c650067

Summary: There's an intrinsic for the warp size, we want to expose this to make the interface proposed in #110179 more generic.

jhuber6 force-pushed the gpuintrinsics branch from f5a8afe to 952bb44 Compare October 3, 2024 20:46

jhuber6 marked this pull request as ready for review October 3, 2024 21:03

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Oct 3, 2024

jhuber6 requested review from compnerd, Artem-B, AaronBallman, yxsamliu, tschuett, JonChesterfield and saiislam October 3, 2024 21:04

jhuber6 added 2 commits November 7, 2024 10:23

Add gpu variant for OpenMP

1703f01

Support for u64

f727e6a

jhuber6 force-pushed the gpuintrinsics branch from 467dd11 to f727e6a Compare November 7, 2024 18:20

jhuber6 added 3 commits November 7, 2024 15:16

Add floating point versions

ffc10e8

Fix floatoing point scans

f9f0e83

comments

860b550

jdoerfert approved these changes Nov 9, 2024

View reviewed changes

clang/lib/Headers/gpuintrin.h Show resolved Hide resolved

comments

2386733

jhuber6 force-pushed the gpuintrinsics branch from 35f20bb to 5f049b0 Compare November 11, 2024 14:09

Add more functions

d48b335

jhuber6 force-pushed the gpuintrinsics branch from 5f049b0 to d48b335 Compare November 11, 2024 14:50

jhuber6 added 2 commits November 11, 2024 09:54

Fix OpenCL

239f482

Fix OpenMP

3f70c93

jhuber6 merged commit 11cc826 into llvm:main Nov 11, 2024

[Clang] Implement resource directory headers for common GPU intrinsics #110179

[Clang] Implement resource directory headers for common GPU intrinsics #110179

Uh oh!

Conversation

jhuber6 commented Sep 26, 2024

Uh oh!

JonChesterfield commented Sep 27, 2024

Uh oh!

jhuber6 commented Sep 27, 2024

Uh oh!

kparzysz commented Sep 27, 2024

Uh oh!

jdoerfert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jdoerfert Sep 27, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Sep 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jdoerfert Sep 27, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jdoerfert commented Sep 27, 2024

Uh oh!

Uh oh!

jhuber6 commented Sep 27, 2024

Uh oh!

shiltian commented Sep 28, 2024

Uh oh!

llvmbot commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 3, 2024

Uh oh!

llvmbot commented Oct 3, 2024

Uh oh!

jdoerfert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

jhuber6 commented Nov 11, 2024

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

llvm-ci commented Nov 11, 2024

Uh oh!

llvmbot commented Oct 3, 2024 •

edited

Loading

github-actions bot commented Nov 11, 2024 •

edited

Loading