Skip to content

[flang][cuda] Materialize the box in memory when dst is emboxed #116320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

clementval
Copy link
Contributor

Similar to #116289 but for the dst.

@clementval clementval merged commit 012fad9 into llvm:main Nov 15, 2024
8 of 9 checks passed
@clementval clementval deleted the cuf_data_transfer_box_dst branch November 15, 2024 22:31
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Nov 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 16, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Similar to #116289 but for the dst.


Full diff: https://github.com/llvm/llvm-project/pull/116320.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/Transforms/CUFOpConversion.cpp (+11-6)
  • (modified) flang/test/Fir/CUDA/cuda-data-transfer.fir (+8-4)
diff --git a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
index b14a1c338e087f..c070c3de94cf81 100644
--- a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
+++ b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
@@ -640,15 +640,20 @@ struct CUFDataTransferOpConversion
                     loc, builder);
       mlir::Value dst = op.getDst();
       mlir::Value src = op.getSrc();
-
       if (!mlir::isa<fir::BaseBoxType>(srcTy)) {
         src = emboxSrc(rewriter, op, symtab);
-      } else if (mlir::isa<fir::EmboxOp>(src.getDefiningOp())) {
-        // Materialize the box to memory to be able to call the runtime.
-        mlir::Value box = builder.createTemporary(loc, src.getType());
-        builder.create<fir::StoreOp>(loc, src, box);
-        src = box;
       }
+      auto materializeBoxIfNeeded = [&](mlir::Value val) -> mlir::Value {
+        if (mlir::isa<fir::EmboxOp>(val.getDefiningOp())) {
+          // Materialize the box to memory to be able to call the runtime.
+          mlir::Value box = builder.createTemporary(loc, val.getType());
+          builder.create<fir::StoreOp>(loc, val, box);
+          return box;
+        }
+        return val;
+      };
+      src = materializeBoxIfNeeded(src);
+      dst = materializeBoxIfNeeded(dst);
 
       auto fTy = func.getFunctionType();
       mlir::Value sourceFile = fir::factory::locationToFilename(builder, loc);
diff --git a/flang/test/Fir/CUDA/cuda-data-transfer.fir b/flang/test/Fir/CUDA/cuda-data-transfer.fir
index 69baf7d15a7d03..ad392beed56d47 100644
--- a/flang/test/Fir/CUDA/cuda-data-transfer.fir
+++ b/flang/test/Fir/CUDA/cuda-data-transfer.fir
@@ -400,7 +400,6 @@ func.func @_QPdevmul(%arg0: !fir.ref<!fir.array<1x?xf32>> {fir.bindc_name = "b"}
   %9 = fir.convert %8 : (i32) -> index
   %12 = fir.shape %c1, %9 : (index, index) -> !fir.shape<2>
   %13 = fir.declare %arg0(%12) dummy_scope %0 {uniq_name = "_QFdevmulEb"} : (!fir.ref<!fir.array<1x?xf32>>, !fir.shape<2>, !fir.dscope) -> !fir.ref<!fir.array<1x?xf32>>
-
   %24 = fir.load %7 : !fir.ref<i32>
   %25 = fir.convert %24 : (i32) -> index
   %26 = arith.cmpi sgt, %25, %c0 : index
@@ -414,14 +413,19 @@ func.func @_QPdevmul(%arg0: !fir.ref<!fir.array<1x?xf32>> {fir.bindc_name = "b"}
   %34 = fir.slice %c1, %25, %c1, %c1, %29, %c1 : (index, index, index, index, index, index) -> !fir.slice<2>
   %35 = fir.embox %13(%12) [%34] : (!fir.ref<!fir.array<1x?xf32>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?xf32>>
   cuf.data_transfer %35 to %6 {transfer_kind = #cuf.cuda_transfer<host_device>} : !fir.box<!fir.array<?x?xf32>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>
+  cuf.data_transfer %6 to %35 {transfer_kind = #cuf.cuda_transfer<device_host>} : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xf32>>>>, !fir.box<!fir.array<?x?xf32>>
   return
 }
 
 // CHECK-LABEL: func.func @_QPdevmul(%arg0: !fir.ref<!fir.array<1x?xf32>> {fir.bindc_name = "b"}, %arg1: !fir.ref<i32> {fir.bindc_name = "wa"}, %arg2: !fir.ref<i32> {fir.bindc_name = "wb"}) {
-// CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.box<!fir.array<?x?xf32>>
+// CHECK: %[[ALLOCA0:.*]] = fir.alloca !fir.box<!fir.array<?x?xf32>>
+// CHECK: %[[ALLOCA1:.*]] = fir.alloca !fir.box<!fir.array<?x?xf32>>
 // CHECK: %[[EMBOX:.*]] = fir.embox %{{.*}}(%{{.*}}) [%{{.*}}] : (!fir.ref<!fir.array<1x?xf32>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?xf32>>
-// CHECK: fir.store %[[EMBOX]] to %[[ALLOCA]] : !fir.ref<!fir.box<!fir.array<?x?xf32>>>
-// CHECK: %[[SRC:.*]] = fir.convert %[[ALLOCA]] : (!fir.ref<!fir.box<!fir.array<?x?xf32>>>) -> !fir.ref<!fir.box<none>>
+// CHECK: fir.store %[[EMBOX]] to %[[ALLOCA1]] : !fir.ref<!fir.box<!fir.array<?x?xf32>>>
+// CHECK: %[[SRC:.*]] = fir.convert %[[ALLOCA1]] : (!fir.ref<!fir.box<!fir.array<?x?xf32>>>) -> !fir.ref<!fir.box<none>>
 // CHECK: fir.call @_FortranACUFDataTransferDescDesc(%{{.*}}, %[[SRC]], %{{.*}}, %{{.*}}, %{{.*}}) : (!fir.ref<!fir.box<none>>, !fir.ref<!fir.box<none>>, i32, !fir.ref<i8>, i32) -> none
+// CHECK: fir.store %[[EMBOX]] to %[[ALLOCA0]] : !fir.ref<!fir.box<!fir.array<?x?xf32>>>
+// CHECK: %[[DST:.*]] = fir.convert %[[ALLOCA0]] : (!fir.ref<!fir.box<!fir.array<?x?xf32>>>) -> !fir.ref<!fir.box<none>>
+// CHECK: fir.call @_FortranACUFDataTransferDescDesc(%[[DST]], %{{.*}}, %{{.*}}, %{{.*}}, %{{.*}}) : (!fir.ref<!fir.box<none>>, !fir.ref<!fir.box<none>>, i32, !fir.ref<i8>, i32) -> none
 
 } // end of module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants