Skip to content

[flang][cuda] Add entry point to launch global function with cluster_dims #113958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

clementval
Copy link
Contributor

No description provided.

@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category labels Oct 28, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 28, 2024

@llvm/pr-subscribers-flang-runtime

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/113958.diff

2 Files Affected:

  • (modified) flang/include/flang/Runtime/CUDA/kernel.h (+7-1)
  • (modified) flang/runtime/CUDA/kernel.cpp (+24-1)
diff --git a/flang/include/flang/Runtime/CUDA/kernel.h b/flang/include/flang/Runtime/CUDA/kernel.h
index cf07d874a082c0..85afda09e347ae 100644
--- a/flang/include/flang/Runtime/CUDA/kernel.h
+++ b/flang/include/flang/Runtime/CUDA/kernel.h
@@ -15,13 +15,19 @@
 
 extern "C" {
 
-// This function uses intptr_t instead of CUDA's unsigned int to match
+// These functions use intptr_t instead of CUDA's unsigned int to match
 // the type of MLIR's index type. This avoids the need for casts in the
 // generated MLIR code.
+
 void RTDEF(CUFLaunchKernel)(const void *kernelName, intptr_t gridX,
     intptr_t gridY, intptr_t gridZ, intptr_t blockX, intptr_t blockY,
     intptr_t blockZ, int32_t smem, void **params, void **extra);
 
+void RTDEF(CUFLaunchClusterKernel)(const void *kernelName, intptr_t clusterX,
+    intptr_t clusterY, intptr_t clusterZ, intptr_t gridX, intptr_t gridY,
+    intptr_t gridZ, intptr_t blockX, intptr_t blockY, intptr_t blockZ,
+    int32_t smem, void **params, void **extra);
+
 } // extern "C"
 
 #endif // FORTRAN_RUNTIME_CUDA_KERNEL_H_
diff --git a/flang/runtime/CUDA/kernel.cpp b/flang/runtime/CUDA/kernel.cpp
index f81153a1af4bc7..abb7ebb72e5923 100644
--- a/flang/runtime/CUDA/kernel.cpp
+++ b/flang/runtime/CUDA/kernel.cpp
@@ -25,9 +25,32 @@ void RTDEF(CUFLaunchKernel)(const void *kernel, intptr_t gridX, intptr_t gridY,
   blockDim.x = blockX;
   blockDim.y = blockY;
   blockDim.z = blockZ;
-  cudaStream_t stream = 0;
+  cudaStream_t stream = 0; // TODO stream managment
   CUDA_REPORT_IF_ERROR(
       cudaLaunchKernel(kernel, gridDim, blockDim, params, smem, stream));
 }
 
+void RTDEF(CUFLaunchClusterKernel)(const void *kernel, intptr_t clusterX,
+    intptr_t clusterY, intptr_t clusterZ, intptr_t gridX, intptr_t gridY,
+    intptr_t gridZ, intptr_t blockX, intptr_t blockY, intptr_t blockZ,
+    int32_t smem, void **params, void **extra) {
+  cudaLaunchConfig_t config;
+  config.gridDim.x = gridX;
+  config.gridDim.y = gridY;
+  config.gridDim.z = gridZ;
+  config.blockDim.x = blockX;
+  config.blockDim.y = blockY;
+  config.blockDim.z = blockZ;
+  config.dynamicSmemBytes = smem;
+  config.stream = 0; // TODO stream managment
+  cudaLaunchAttribute launchAttr[1];
+  launchAttr[0].id = cudaLaunchAttributeClusterDimension;
+  launchAttr[0].val.clusterDim.x = clusterX;
+  launchAttr[0].val.clusterDim.y = clusterY;
+  launchAttr[0].val.clusterDim.z = clusterZ;
+  config.numAttrs = 1;
+  config.attrs = launchAttr;
+  CUDA_REPORT_IF_ERROR(cudaLaunchKernelExC(&config, kernel, params));
+}
+
 } // extern "C"

@clementval clementval merged commit 0b700f2 into main Oct 29, 2024
11 checks passed
@clementval clementval deleted the users/clementval/cuf_cluster_rt branch October 29, 2024 17:01
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:runtime flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants