Skip to content

Commit 33a5d21

Browse files
committed
[OpenMP][NVPTX] Added forward declaration to pave the way for building deviceRTLs with OpenMP
Once we switch to build deviceRTLs with OpenMP, primitives and CUDA intrinsics cannot be used directly anymore because `__device__` is not recognized by OpenMP compiler. To avoid involving all CUDA internal headers we had in `clang`, we forward declared these functions. Eventually they will be transformed into right LLVM instrinsics. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95058
1 parent a3d7cee commit 33a5d21

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,23 @@
1616

1717
#include <cuda.h>
1818

19+
// Forward declaration of CUDA primitives which will be evetually transformed
20+
// into LLVM intrinsics.
21+
extern "C" {
22+
unsigned int __activemask();
23+
unsigned int __ballot(unsigned);
24+
// The default argument here is based on NVIDIA's website
25+
// https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/
26+
int __shfl_sync(unsigned mask, int val, int src_line, int width = WARPSIZE);
27+
int __shfl(int val, int src_line, int width = WARPSIZE);
28+
int __shfl_down(int var, unsigned detla, int width);
29+
int __shfl_down_sync(unsigned mask, int var, unsigned detla, int width);
30+
void __syncwarp(int mask);
31+
void __threadfence();
32+
void __threadfence_block();
33+
void __threadfence_system();
34+
}
35+
1936
DEVICE void __kmpc_impl_unpack(uint64_t val, uint32_t &lo, uint32_t &hi) {
2037
asm volatile("mov.b64 {%0,%1}, %2;" : "=r"(lo), "=r"(hi) : "l"(val));
2138
}

0 commit comments

Comments
 (0)