[flang][cuda] Propagate stream information to gpu.launch_func op #135227

clementval · 2025-04-10T18:16:56Z

Use the information from cuf.kernel_launch to gpu.launch_func

llvmbot · 2025-04-10T18:17:33Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタインクレメン) (clementval)

Changes

Use the information from cuf.kernel_launch to gpu.launch_func

Full diff: https://github.com/llvm/llvm-project/pull/135227.diff

2 Files Affected:

(modified) flang/lib/Optimizer/Transforms/CUFOpConversion.cpp (+2)
(modified) flang/test/Fir/CUDA/cuda-launch.fir (+28)

diff --git a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
index a01100511ec66..caa59c6c17d0f 100644
--- a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
+++ b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
@@ -879,6 +879,8 @@ struct CUFLaunchOpConversion
       gpuLaunchOp.getClusterSizeYMutable().assign(clusterDimY);
       gpuLaunchOp.getClusterSizeZMutable().assign(clusterDimZ);
     }
+    if (op.getStream())
+      gpuLaunchOp.getAsyncObjectMutable().assign(op.getStream());
     if (procAttr)
       gpuLaunchOp->setAttr(cuf::getProcAttrName(), procAttr);
     rewriter.replaceOp(op, gpuLaunchOp);
diff --git a/flang/test/Fir/CUDA/cuda-launch.fir b/flang/test/Fir/CUDA/cuda-launch.fir
index b8d79ca06ffd6..621772efff415 100644
--- a/flang/test/Fir/CUDA/cuda-launch.fir
+++ b/flang/test/Fir/CUDA/cuda-launch.fir
@@ -129,3 +129,31 @@ module attributes {gpu.container_module, dlti.dl_spec = #dlti.dl_spec<#dlti.dl_e
 
 // CHECK-LABEL: func.func @_QQmain()
 // CHECK: gpu.launch_func  @cuda_device_mod::@_QMdevptrPtest blocks in (%{{.*}}, %{{.*}}, %{{.*}}) threads in (%{{.*}}, %{{.*}}, %{{.*}})  dynamic_shared_memory_size %{{.*}}  {cuf.proc_attr = #cuf.cuda_proc<grid_global>}
+
+// -----
+
+module attributes {gpu.container_module, dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<f80, dense<128> : vector<2xi64>>, #dlti.dl_entry<i128, dense<128> : vector<2xi64>>, #dlti.dl_entry<i64, dense<64> : vector<2xi64>>, #dlti.dl_entry<!llvm.ptr<272>, dense<64> : vector<4xi64>>, #dlti.dl_entry<!llvm.ptr<271>, dense<32> : vector<4xi64>>, #dlti.dl_entry<!llvm.ptr<270>, dense<32> : vector<4xi64>>, #dlti.dl_entry<f128, dense<128> : vector<2xi64>>, #dlti.dl_entry<f64, dense<64> : vector<2xi64>>, #dlti.dl_entry<f16, dense<16> : vector<2xi64>>, #dlti.dl_entry<i32, dense<32> : vector<2xi64>>, #dlti.dl_entry<i16, dense<16> : vector<2xi64>>, #dlti.dl_entry<i8, dense<8> : vector<2xi64>>, #dlti.dl_entry<i1, dense<8> : vector<2xi64>>, #dlti.dl_entry<!llvm.ptr, dense<64> : vector<4xi64>>, #dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<"dlti.stack_alignment", 128 : i64>>} {
+  gpu.module @cuda_device_mod {
+    gpu.func @_QMdevptrPtest() kernel {
+      gpu.return
+    }
+  }
+  func.func @_QMdevptrPtest() attributes {cuf.proc_attr = #cuf.cuda_proc<grid_global>} {
+    return
+  }
+  func.func @_QQmain() {
+    %0 = fir.alloca i64 {bindc_name = "stream", uniq_name = "_QMtest_callFhostEstream"}
+    %1:2 = hlfir.declare %0 {uniq_name = "_QMtest_callFhostEstream"} : (!fir.ref<i64>) -> (!fir.ref<i64>, !fir.ref<i64>)
+    %c1_i32 = arith.constant 1 : i32
+    %c0_i32 = arith.constant 0 : i32
+    %2 = fir.load %1#0 : !fir.ref<i64>
+    cuf.kernel_launch @_QMdevptrPtest<<<%c1_i32, %c1_i32, %c1_i32, %c1_i32, %c1_i32, %c1_i32, %c0_i32, %2 : i64>>>()
+    return
+  }
+}
+
+// CHECK-LABEL: func.func @_QQmain()
+// CHECK: %[[STREAM:.*]] = fir.alloca i64 {bindc_name = "stream", uniq_name = "_QMtest_callFhostEstream"}
+// CHECK: %[[DECL_STREAM:.*]]:2 = hlfir.declare %[[STREAM]] {uniq_name = "_QMtest_callFhostEstream"} : (!fir.ref<i64>) -> (!fir.ref<i64>, !fir.ref<i64>)
+// CHECK: %[[STREAM_LOADED:.*]] = fir.load %[[DECL_STREAM]]#0 : !fir.ref<i64>
+// CHECK: gpu.launch_func <%[[STREAM_LOADED]] : i64> @cuda_device_mod::@_QMdevptrPtest

…m#135227) Use the information from `cuf.kernel_launch` to `gpu.launch_func`

[flang][cuda] Propagate stream information to gpu.launch_func op

f1ee54b

clementval requested a review from wangzpgi April 10, 2025 18:16

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Apr 10, 2025

wangzpgi approved these changes Apr 10, 2025

View reviewed changes

clementval merged commit ca53463 into llvm:main Apr 10, 2025
12 of 13 checks passed

clementval deleted the cuf_stream_gpu branch April 10, 2025 18:58

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025

[flang][cuda] Propagate stream information to gpu.launch_func op (llv…

bbe3592

…m#135227) Use the information from `cuf.kernel_launch` to `gpu.launch_func`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][cuda] Propagate stream information to gpu.launch_func op #135227

[flang][cuda] Propagate stream information to gpu.launch_func op #135227

Uh oh!

clementval commented Apr 10, 2025

Uh oh!

llvmbot commented Apr 10, 2025

Uh oh!

Uh oh!

Uh oh!

[flang][cuda] Propagate stream information to gpu.launch_func op #135227

[flang][cuda] Propagate stream information to gpu.launch_func op #135227

Uh oh!

Conversation

clementval commented Apr 10, 2025

Uh oh!

llvmbot commented Apr 10, 2025

Uh oh!

Uh oh!

Uh oh!