Skip to content

[flang][cuda] Materialize the box in memory when src is emboxed #116289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

clementval
Copy link
Contributor

Make sure the box is in memory so we can call the runtime.

@clementval clementval merged commit 98daf22 into llvm:main Nov 15, 2024
9 of 10 checks passed
@clementval clementval deleted the cuf_data_transfer_box_src branch November 15, 2024 02:33
clementval added a commit that referenced this pull request Nov 15, 2024
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Nov 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 16, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Make sure the box is in memory so we can call the runtime.


Full diff: https://github.com/llvm/llvm-project/pull/116289.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/Transforms/CUFOpConversion.cpp (+7-1)
  • (modified) flang/test/Fir/CUDA/cuda-data-transfer.fir (+39)
diff --git a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
index 58a348314573d5..7728bb068daf53 100644
--- a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
+++ b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
@@ -641,8 +641,14 @@ struct CUFDataTransferOpConversion
       mlir::Value dst = op.getDst();
       mlir::Value src = op.getSrc();
 
-      if (!mlir::isa<fir::BaseBoxType>(srcTy))
+      if (!mlir::isa<fir::BaseBoxType>(srcTy)) {
         src = emboxSrc(rewriter, op, symtab);
+      } else if (mlir::isa<fir::EmboxOp>(src.getDefiningOp())) {
+        // Materialize the box to memory to be able to call the runtime.
+        mlir::Value box = builder.createTemporary(loc, src.getType());
+        builder.create<fir::StoreOp>(loc, src, box);
+        src = box;
+      }
 
       auto fTy = func.getFunctionType();
       mlir::Value sourceFile = fir::factory::locationToFilename(builder, loc);
diff --git a/flang/test/Fir/CUDA/cuda-data-transfer.fir b/flang/test/Fir/CUDA/cuda-data-transfer.fir
index 9c6d9e0c100125..69baf7d15a7d03 100644
--- a/flang/test/Fir/CUDA/cuda-data-transfer.fir
+++ b/flang/test/Fir/CUDA/cuda-data-transfer.fir
@@ -385,4 +385,43 @@ func.func @_QPdevice_addr_conv() {
 // CHECK: fir.embox %[[DEV_ADDR_CONV]](%{{.*}}) : (!fir.ref<!fir.array<4xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<4xf32>>
 // CHECK: fir.call @_FortranACUFDataTransferDescDescNoRealloc
 
+func.func @_QPdevmul(%arg0: !fir.ref<!fir.array<1x?xf32>> {fir.bindc_name = "b"}, %arg1: !fir.ref<i32> {fir.bindc_name = "wa"}, %arg2: !fir.ref<i32> {fir.bindc_name = "wb"}) {
+  %c0_i64 = arith.constant 0 : i64
+  %c1_i32 = arith.constant 1 : i32
+  %c0_i32 = arith.constant 0 : i32
+  %c1 = arith.constant 1 : index
+  %c0 = arith.constant 0 : index
+  %0 = fir.dummy_scope : !fir.dscope
+  %1 = fir.declare %arg2 dummy_scope %0 {uniq_name = "_QFdevmulEwb"} : (!fir.ref<i32>, !fir.dscope) -> !fir.ref<i32>
+  %2 = cuf.alloc !fir.box<!fir.heap<!fir.array<?x?xf32>>> {bindc_name = "bdev", data_attr = #cuf.cuda<device>, uniq_name = "_QFdevmulEbdev"} -> !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>
+  %6 = fir.declare %2 {data_attr = #cuf.cuda<device>, fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFdevmulEbdev"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>
+  %7 = fir.declare %arg1 dummy_scope %0 {uniq_name = "_QFdevmulEwa"} : (!fir.ref<i32>, !fir.dscope) -> !fir.ref<i32>
+  %8 = fir.load %1 : !fir.ref<i32>
+  %9 = fir.convert %8 : (i32) -> index
+  %12 = fir.shape %c1, %9 : (index, index) -> !fir.shape<2>
+  %13 = fir.declare %arg0(%12) dummy_scope %0 {uniq_name = "_QFdevmulEb"} : (!fir.ref<!fir.array<1x?xf32>>, !fir.shape<2>, !fir.dscope) -> !fir.ref<!fir.array<1x?xf32>>
+
+  %24 = fir.load %7 : !fir.ref<i32>
+  %25 = fir.convert %24 : (i32) -> index
+  %26 = arith.cmpi sgt, %25, %c0 : index
+  %27 = arith.select %26, %25, %c0 : index
+  %28 = fir.load %1 : !fir.ref<i32>
+  %29 = fir.convert %28 : (i32) -> index
+  %30 = arith.cmpi sgt, %29, %c0 : index
+  %31 = arith.select %30, %29, %c0 : index
+  %32 = fir.shape %27, %31 : (index, index) -> !fir.shape<2>
+  %33 = fir.undefined index
+  %34 = fir.slice %c1, %25, %c1, %c1, %29, %c1 : (index, index, index, index, index, index) -> !fir.slice<2>
+  %35 = fir.embox %13(%12) [%34] : (!fir.ref<!fir.array<1x?xf32>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?xf32>>
+  cuf.data_transfer %35 to %6 {transfer_kind = #cuf.cuda_transfer<host_device>} : !fir.box<!fir.array<?x?xf32>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>
+  return
+}
+
+// CHECK-LABEL: func.func @_QPdevmul(%arg0: !fir.ref<!fir.array<1x?xf32>> {fir.bindc_name = "b"}, %arg1: !fir.ref<i32> {fir.bindc_name = "wa"}, %arg2: !fir.ref<i32> {fir.bindc_name = "wb"}) {
+// CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.box<!fir.array<?x?xf32>>
+// CHECK: %[[EMBOX:.*]] = fir.embox %{{.*}}(%{{.*}}) [%{{.*}}] : (!fir.ref<!fir.array<1x?xf32>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?xf32>>
+// CHECK: fir.store %[[EMBOX]] to %[[ALLOCA]] : !fir.ref<!fir.box<!fir.array<?x?xf32>>>
+// CHECK: %[[SRC:.*]] = fir.convert %[[ALLOCA]] : (!fir.ref<!fir.box<!fir.array<?x?xf32>>>) -> !fir.ref<!fir.box<none>>
+// CHECK: fir.call @_FortranACUFDataTransferDescDesc(%{{.*}}, %[[SRC]], %{{.*}}, %{{.*}}, %{{.*}}) : (!fir.ref<!fir.box<none>>, !fir.ref<!fir.box<none>>, i32, !fir.ref<i8>, i32) -> none
+
 } // end of module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:cuf flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants