[flang] introduce fir.copy to avoid load store of aggregates #130289

jeanPerier · 2025-03-07T14:50:52Z

Introduce a FIR operation to do memcopy/memove of compile time constant size types.

This is to avoid requiring derived type copies to done with load/store which is badly supported in LLVM when the aggregate type is "big" (no threshold can easily be defined here, better to always avoid them for fir.type).

This was the root cause of the regressions caused by #114002 which introduced a load/store of fir.type<> which caused hand/asserts to fire in LLVM on several benchmarks.

See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type

We could directly use LLVM dialect memmove and memcopy operations, but I feel this would be intrusive for lowering tests because of the target dependent byte size, and also the conversions to llvm pointer types are noisy in the IR, so instead, this patch adds a simple op that will lower to these LLVM intrinsics and compute the byte size as late as possible.

I think it would be great to add it the FirAliasTagOpInterface, but the TBAA pass currently does not support multiple operands, and I am not familiar enough with this to know how the TBAA from the source/destination should be merged (a relevant patch on the topic is this llvm patch that preserved TBAA information in the load/store to memcopy pass, it is probably a matter of doing something similar).

llvmbot · 2025-03-07T14:51:28Z

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-codegen

Author: None (jeanPerier)

Changes

Introduce a FIR operation to do memcopy/memove of compile time constant size types.

This is to avoid requiring derived type copies to done with load/store which is badly supported in LLVM when the aggregate type is "big" (no threshold can easily be defined here, better to always avoid them for fir.type).

This was the root cause of the regressions caused by #114002 which introduced a load/store of fir.type<> which caused hand/asserts to fire in LLVM on several benchmarks.

See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type

We could directly use LLVM dialect memmove and memcopy operations, but I feel this would be intrusive for lowering tests because of the target dependent byte size, and also the conversions to llvm pointer types are noisy in the IR, so instead, this patch adds a simple op that will lower to these LLVM intrinsics and compute the byte size as late as possible.

I think it would be great to add it the FirAliasTagOpInterface, but the TBAA pass currently does not support multiple operands, and I am not familiar enough with this to know how the TBAA from the source/destination should be merged (a relevant patch on the topic is this llvm patch that preserved TBAA information in the load/store to memcopy pass, it is probably a matter of doing something similar).

Full diff: https://github.com/llvm/llvm-project/pull/130289.diff

6 Files Affected:

(modified) flang/include/flang/Optimizer/Dialect/FIROps.td (+38)
(modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+35-5)
(modified) flang/lib/Optimizer/Dialect/FIROps.cpp (+23)
(added) flang/test/Fir/copy-codegen.fir (+35)
(modified) flang/test/Fir/fir-ops.fir (+9)
(modified) flang/test/Fir/invalid.fir (+28)

diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td
index c83c57186b46d..74e5e1fd2d6be 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -342,6 +342,44 @@ def fir_StoreOp : fir_Op<"store", [FirAliasTagOpInterface]> {
   }];
 }
 
+def fir_CopyOp : fir_Op<"copy", []> {
+  let summary = "copy constant size memory";
+
+  let description = [{
+    Copy the memory from a source with compile time constant size to
+    a destination of the same type.
+
+    This is meant to be used for aggregate types where load and store
+    are not appropriate to make a copy because LLVM is not meant to
+    handle load and store of "big" aggregates.
+
+    Its "no_overlap" attribute allows indicating that the source and destination
+    are known to not overlap at compile time.
+
+    ```
+      !t =!fir.type<t{x:!fir.array<1000xi32>}>
+      fir.copy %x to %y : !fir.ref<!t>, !fir.ref<!t>
+    ```
+    TODO: add FirAliasTagOpInterface to carry TBAA.
+  }];
+
+  let arguments = (ins Arg<AnyReferenceLike, "", [MemRead]>:$source,
+                       Arg<AnyReferenceLike, "", [MemWrite]>:$destination,
+                       OptionalAttr<UnitAttr>:$no_overlap);
+
+  let builders = [OpBuilder<(ins "mlir::Value":$source,
+                                 "mlir::Value":$destination,
+                                  CArg<"bool", "false">:$no_overlap)>];
+
+  let assemblyFormat = [{
+    $source `to` $destination (`no_overlap` $no_overlap^)?
+    attr-dict `:` type(operands)
+  }];
+
+  let hasVerifier = 1;
+}
+
+
 def fir_SaveResultOp : fir_Op<"save_result", [AttrSizedOperandSegments]> {
   let summary = [{
     save an array, box, or record function result SSA-value to a memory location
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index a2743edd7844a..1d8372774ef9a 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -3539,6 +3539,36 @@ struct StoreOpConversion : public fir::FIROpConversion<fir::StoreOp> {
   }
 };
 
+/// `fir.copy` --> `llvm.memcpy` or `llvm.memmove`
+struct CopyOpConversion : public fir::FIROpConversion<fir::CopyOp> {
+  using FIROpConversion::FIROpConversion;
+
+  llvm::LogicalResult
+  matchAndRewrite(fir::CopyOp copy, OpAdaptor adaptor,
+                  mlir::ConversionPatternRewriter &rewriter) const override {
+    mlir::Location loc = copy.getLoc();
+    mlir::Value llvmSource = adaptor.getSource();
+    mlir::Value llvmDestination = adaptor.getDestination();
+    mlir::Type i64Ty = mlir::IntegerType::get(rewriter.getContext(), 64);
+    mlir::Type copyTy = fir::unwrapRefType(copy.getSource().getType());
+    mlir::Value copySize =
+        genTypeStrideInBytes(loc, i64Ty, rewriter, convertType(copyTy));
+
+    mlir::LLVM::AliasAnalysisOpInterface newOp;
+    if (copy.getNoOverlap())
+      newOp = rewriter.create<mlir::LLVM::MemcpyOp>(
+          loc, llvmDestination, llvmSource, copySize, /*isVolatile=*/false);
+    else
+      newOp = rewriter.create<mlir::LLVM::MemmoveOp>(
+          loc, llvmDestination, llvmSource, copySize, /*isVolatile=*/false);
+
+    // TODO: propagate TBAA once FirAliasTagOpInterface added to CopyOp.
+    attachTBAATag(newOp, copyTy, copyTy, nullptr);
+    rewriter.eraseOp(copy);
+    return mlir::success();
+  }
+};
+
 namespace {
 
 /// Convert `fir.unboxchar` into two `llvm.extractvalue` instructions. One for
@@ -4142,11 +4172,11 @@ void fir::populateFIRToLLVMConversionPatterns(
       BoxOffsetOpConversion, BoxProcHostOpConversion, BoxRankOpConversion,
       BoxTypeCodeOpConversion, BoxTypeDescOpConversion, CallOpConversion,
       CmpcOpConversion, ConvertOpConversion, CoordinateOpConversion,
-      DTEntryOpConversion, DeclareOpConversion, DivcOpConversion,
-      EmboxOpConversion, EmboxCharOpConversion, EmboxProcOpConversion,
-      ExtractValueOpConversion, FieldIndexOpConversion, FirEndOpConversion,
-      FreeMemOpConversion, GlobalLenOpConversion, GlobalOpConversion,
-      InsertOnRangeOpConversion, IsPresentOpConversion,
+      CopyOpConversion, DTEntryOpConversion, DeclareOpConversion,
+      DivcOpConversion, EmboxOpConversion, EmboxCharOpConversion,
+      EmboxProcOpConversion, ExtractValueOpConversion, FieldIndexOpConversion,
+      FirEndOpConversion, FreeMemOpConversion, GlobalLenOpConversion,
+      GlobalOpConversion, InsertOnRangeOpConversion, IsPresentOpConversion,
       LenParamIndexOpConversion, LoadOpConversion, MulcOpConversion,
       NegcOpConversion, NoReassocOpConversion, SelectCaseOpConversion,
       SelectOpConversion, SelectRankOpConversion, SelectTypeOpConversion,
diff --git a/flang/lib/Optimizer/Dialect/FIROps.cpp b/flang/lib/Optimizer/Dialect/FIROps.cpp
index 7efb733eb565c..059514cbb0183 100644
--- a/flang/lib/Optimizer/Dialect/FIROps.cpp
+++ b/flang/lib/Optimizer/Dialect/FIROps.cpp
@@ -3940,6 +3940,29 @@ void fir::StoreOp::build(mlir::OpBuilder &builder, mlir::OperationState &result,
   build(builder, result, value, memref, {});
 }
 
+//===----------------------------------------------------------------------===//
+// CopyOp
+//===----------------------------------------------------------------------===//
+
+void fir::CopyOp::build(mlir::OpBuilder &builder, mlir::OperationState &result,
+                        mlir::Value source, mlir::Value destination,
+                        bool noOverlap) {
+  mlir::UnitAttr noOverlapAttr =
+      noOverlap ? builder.getUnitAttr() : mlir::UnitAttr{};
+  build(builder, result, source, destination, noOverlapAttr);
+}
+
+llvm::LogicalResult fir::CopyOp::verify() {
+  mlir::Type sourceType = fir::unwrapRefType(getSource().getType());
+  mlir::Type destinationType = fir::unwrapRefType(getDestination().getType());
+  if (sourceType != destinationType)
+    return emitOpError("source and destination must have the same value type");
+  if (fir::hasDynamicSize(sourceType))
+    return emitOpError(
+        "source value type must have a compile time constant size");
+  return mlir::success();
+}
+
 //===----------------------------------------------------------------------===//
 // StringLitOp
 //===----------------------------------------------------------------------===//
diff --git a/flang/test/Fir/copy-codegen.fir b/flang/test/Fir/copy-codegen.fir
new file mode 100644
index 0000000000000..eef1885c6a49c
--- /dev/null
+++ b/flang/test/Fir/copy-codegen.fir
@@ -0,0 +1,35 @@
+// Test fir.copy codegen.
+// RUN: fir-opt --fir-to-llvm-ir %s -o - | FileCheck %s
+
+!t=!fir.type<sometype{i:!fir.array<9xi32>}>
+
+module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"} {
+
+func.func @test_copy_1(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t>) {
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!t>, !fir.ref<!t>
+  return
+}
+// CHECK-LABEL:   llvm.func @test_copy_1(
+// CHECK-SAME:                           %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr,
+// CHECK-SAME:                           %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr) {
+// CHECK:           %[[VAL_2:.*]] = llvm.mlir.zero : !llvm.ptr
+// CHECK:           %[[VAL_3:.*]] = llvm.getelementptr %[[VAL_2]][1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<"sometype", (array<9 x i32>)>
+// CHECK:           %[[VAL_4:.*]] = llvm.ptrtoint %[[VAL_3]] : !llvm.ptr to i64
+// CHECK:           "llvm.intr.memcpy"(%[[VAL_1]], %[[VAL_0]], %[[VAL_4]]) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> ()
+// CHECK:           llvm.return
+// CHECK:         }
+
+func.func @test_copy_2(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t>) {
+  fir.copy %arg0 to %arg1 : !fir.ref<!t>, !fir.ref<!t>
+  return
+}
+// CHECK-LABEL:   llvm.func @test_copy_2(
+// CHECK-SAME:                           %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr,
+// CHECK-SAME:                           %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr) {
+// CHECK:           %[[VAL_2:.*]] = llvm.mlir.zero : !llvm.ptr
+// CHECK:           %[[VAL_3:.*]] = llvm.getelementptr %[[VAL_2]][1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<"sometype", (array<9 x i32>)>
+// CHECK:           %[[VAL_4:.*]] = llvm.ptrtoint %[[VAL_3]] : !llvm.ptr to i64
+// CHECK:           "llvm.intr.memmove"(%[[VAL_1]], %[[VAL_0]], %[[VAL_4]]) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> ()
+// CHECK:           llvm.return
+// CHECK:         }
+}
diff --git a/flang/test/Fir/fir-ops.fir b/flang/test/Fir/fir-ops.fir
index 1bfcb3a9f3dc8..06b0bbbf0bd20 100644
--- a/flang/test/Fir/fir-ops.fir
+++ b/flang/test/Fir/fir-ops.fir
@@ -933,3 +933,12 @@ func.func @test_call_arg_attrs_indirect(%arg0: i16, %arg1: (i16)-> i16) -> i16 {
   %0 = fir.call %arg1(%arg0) : (i16 {llvm.noundef, llvm.signext}) -> (i16 {llvm.signext})
   return %0 : i16
 }
+
+// CHECK-LABEL: @test_copy(
+// CHECK-SAME:    %[[VAL_0:.*]]: !fir.ref<!fir.type<sometype{i:i32}>>,
+// CHECK-SAME:    %[[VAL_1:.*]]: !fir.ptr<!fir.type<sometype{i:i32}>>
+func.func @test_copy(%arg0: !fir.ref<!fir.type<sometype{i:i32}>>, %arg1: !fir.ptr<!fir.type<sometype{i:i32}>>) {
+  // CHECK: fir.copy %[[VAL_0]] to %[[VAL_1]] no_overlap : !fir.ref<!fir.type<sometype{i:i32}>>, !fir.ptr<!fir.type<sometype{i:i32}>>
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!fir.type<sometype{i:i32}>>, !fir.ptr<!fir.type<sometype{i:i32}>>
+  return
+}
diff --git a/flang/test/Fir/invalid.fir b/flang/test/Fir/invalid.fir
index 7e3f9d6498412..0ef64b547f3e9 100644
--- a/flang/test/Fir/invalid.fir
+++ b/flang/test/Fir/invalid.fir
@@ -1018,3 +1018,31 @@ func.func @bad_is_assumed_size(%arg0: !fir.ref<!fir.array<*:none>>) {
   %1 = fir.is_assumed_size %arg0 : (!fir.ref<!fir.array<*:none>>) -> i1
   return
 }
+
+// -----
+
+!t=!fir.type<sometype{i:i32}>
+!t2=!fir.type<sometype2{j:i32}>
+func.func @bad_copy_1(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t2>) {
+  // expected-error@+1{{'fir.copy' op source and destination must have the same value type}}
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!t>, !fir.ref<!t2>
+  return
+}
+
+// -----
+
+!t=!fir.type<sometype{i:i32}>
+func.func @bad_copy_2(%arg0: !fir.ref<!t>, %arg1: !t) {
+  // expected-error@+1{{'fir.copy' op operand #0 must be any reference, but got '!fir.type<sometype{i:i32}>'}}
+  fir.copy %arg1 to %arg0 no_overlap : !t, !fir.ref<!t>
+  return
+}
+
+// -----
+
+!t=!fir.array<?xi32>
+func.func @test_copy(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t>) {
+  // expected-error@+1{{'fir.copy' op source value type must have a compile time constant size}}
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!t>, !fir.ref<!t>
+  return
+}

kiranchandramohan · 2025-03-07T15:03:31Z

flang/include/flang/Optimizer/Dialect/FIROps.td

+    This is meant to be used for aggregate types where load and store
+    are not appropriate to make a copy because LLVM is not meant to
+    handle load and store of "big" aggregates.


Would it make sense to add a verifier check or a predicate (AnyRefOfAggregateType) to ensure it is used only for aggregate types?

Yes, although there is nothing that would technically prevent it to be used for intrinsic types, it seems saner to prevent it. I added a predicate.

clementval

LGTM. Kiran's comment might make sense to avoid using it on any type.

tblah

LGTM with Kiran's comment

As for TBAA, it is difficult to compare it to the fabricator diff you mentioned because we don't really use tbaa in the way it was intended. TBAA is supposed to say that different types can't alias e.g. "loads of integers don't overlap with loads of floats".

The property we use to implement Fortran aliasing rules is that with tags arranged in a tree structure, parents alias with children, but siblings to not alias. Tags from trees with different roots are assumed to alias.

From what I remember, the tree looks something like this

root (unique per function and scope)
- any access
   - data
     - global data
         - global a
         - global b
     - locally allocated data
       - local alloca a
       - local alloca b
     - dummy arg data
       - dummy arg a
       - dummy arg b
     - direct data
  - descriptor member

I think the best we can do here would be to find the common parent of the tag given to each argument. For example, if copying one dummy argument to another then this should be tagged as an argument because it could alias with any dummy argument but not a global. If copying a local allocation to a global variable then we would have to make it alias with any other data access (so it would still not alias with descriptor accesses).

To be clear I mean tagging as that parent node, not as a child of it.

vzakhari · 2025-03-07T18:52:54Z

I think while we are using TBAA the way Tom described we have to be able to attach two TBAA tags for the source and the destination of fir.copy, then in CodeGen we will be able to apply them to the corresponding load and store.

The AddAliasTagsPass will need to handle fir.copy specially by tracing both operands to their sources.

kiranchandramohan

LGTM. Thanks @jeanPerier.

jeanPerier · 2025-03-10T12:41:49Z

then in CodeGen we will be able to apply them to the corresponding load and store.

Codegen is generating a llvm.memcopy, not individual loads/stores, and I think one can only set a single TBAA tag for llvm.memcopy.
Maybe the tag can point to a metadata array? I am just not quite sure what are LLVM expectations regarding memcopy /memmove TBAA tags (in C/C++, you would probably be copying from/to entities of the same type, so a single TBAA node would probably be enough for "normal" TBAA). https://llvm.org/docs/LangRef.html#tbaa-metadata does not really talk about memcopy/memove (except for the tbaa.struct, that seems to be a quite different metadata node).

vzakhari · 2025-03-10T16:39:57Z

Codegen is generating a llvm.memcopy, not individual loads/stores, and I think one can only set a single TBAA tag for llvm.memcopy. Maybe the tag can point to a metadata array? I am just not quite sure what are LLVM expectations regarding memcopy /memmove TBAA tags (in C/C++, you would probably be copying from/to entities of the same type, so a single TBAA node would probably be enough for "normal" TBAA). https://llvm.org/docs/LangRef.html#tbaa-metadata does not really talk about memcopy/memove (except for the tbaa.struct, that seems to be a quite different metadata node).

Right :(
Then the best we can do with the current TBAA scheme is what Tom described, i.e. tagging fir.copy with the common parent sub-root of the TBAA tree.

[flang] introduce fir.copy to avoid load store of aggregates

c54efad

jeanPerier requested review from clementval, tblah and vzakhari March 7, 2025 14:50

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Mar 7, 2025

kiranchandramohan reviewed Mar 7, 2025

View reviewed changes

clementval approved these changes Mar 7, 2025

View reviewed changes

tblah approved these changes Mar 7, 2025

View reviewed changes

add predicate to strengthen verifier

adb58ab

kiranchandramohan approved these changes Mar 10, 2025

View reviewed changes

remove duplicate check in verifier

420f3ed

jeanPerier merged commit 1ddf180 into main Mar 11, 2025
9 of 11 checks passed

jeanPerier deleted the users/jeanPerier/fir_copy branch March 11, 2025 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang] introduce fir.copy to avoid load store of aggregates #130289

[flang] introduce fir.copy to avoid load store of aggregates #130289

Uh oh!

jeanPerier commented Mar 7, 2025

Uh oh!

llvmbot commented Mar 7, 2025 •

edited

Loading

Uh oh!

kiranchandramohan Mar 7, 2025

Uh oh!

jeanPerier Mar 10, 2025

Uh oh!

clementval left a comment

Uh oh!

tblah left a comment

Uh oh!

vzakhari commented Mar 7, 2025

Uh oh!

kiranchandramohan left a comment

Uh oh!

jeanPerier commented Mar 10, 2025 •

edited

Loading

Uh oh!

vzakhari commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

[flang] introduce fir.copy to avoid load store of aggregates #130289

[flang] introduce fir.copy to avoid load store of aggregates #130289

Uh oh!

Conversation

jeanPerier commented Mar 7, 2025

Uh oh!

llvmbot commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiranchandramohan Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

jeanPerier Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

clementval left a comment

Choose a reason for hiding this comment

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

vzakhari commented Mar 7, 2025

Uh oh!

kiranchandramohan left a comment

Choose a reason for hiding this comment

Uh oh!

jeanPerier commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vzakhari commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Mar 7, 2025 •

edited

Loading

jeanPerier commented Mar 10, 2025 •

edited

Loading