Skip to content

[flang] introduce fir.copy to avoid load store of aggregates #130289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 11, 2025

Conversation

jeanPerier
Copy link
Contributor

Introduce a FIR operation to do memcopy/memove of compile time constant size types.

This is to avoid requiring derived type copies to done with load/store which is badly supported in LLVM when the aggregate type is "big" (no threshold can easily be defined here, better to always avoid them for fir.type).

This was the root cause of the regressions caused by #114002 which introduced a load/store of fir.type<> which caused hand/asserts to fire in LLVM on several benchmarks.

See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type

We could directly use LLVM dialect memmove and memcopy operations, but I feel this would be intrusive for lowering tests because of the target dependent byte size, and also the conversions to llvm pointer types are noisy in the IR, so instead, this patch adds a simple op that will lower to these LLVM intrinsics and compute the byte size as late as possible.

I think it would be great to add it the FirAliasTagOpInterface, but the TBAA pass currently does not support multiple operands, and I am not familiar enough with this to know how the TBAA from the source/destination should be merged (a relevant patch on the topic is this llvm patch that preserved TBAA information in the load/store to memcopy pass, it is probably a matter of doing something similar).

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Mar 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 7, 2025

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-codegen

Author: None (jeanPerier)

Changes

Introduce a FIR operation to do memcopy/memove of compile time constant size types.

This is to avoid requiring derived type copies to done with load/store which is badly supported in LLVM when the aggregate type is "big" (no threshold can easily be defined here, better to always avoid them for fir.type).

This was the root cause of the regressions caused by #114002 which introduced a load/store of fir.type<> which caused hand/asserts to fire in LLVM on several benchmarks.

See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type

We could directly use LLVM dialect memmove and memcopy operations, but I feel this would be intrusive for lowering tests because of the target dependent byte size, and also the conversions to llvm pointer types are noisy in the IR, so instead, this patch adds a simple op that will lower to these LLVM intrinsics and compute the byte size as late as possible.

I think it would be great to add it the FirAliasTagOpInterface, but the TBAA pass currently does not support multiple operands, and I am not familiar enough with this to know how the TBAA from the source/destination should be merged (a relevant patch on the topic is this llvm patch that preserved TBAA information in the load/store to memcopy pass, it is probably a matter of doing something similar).


Full diff: https://github.com/llvm/llvm-project/pull/130289.diff

6 Files Affected:

  • (modified) flang/include/flang/Optimizer/Dialect/FIROps.td (+38)
  • (modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+35-5)
  • (modified) flang/lib/Optimizer/Dialect/FIROps.cpp (+23)
  • (added) flang/test/Fir/copy-codegen.fir (+35)
  • (modified) flang/test/Fir/fir-ops.fir (+9)
  • (modified) flang/test/Fir/invalid.fir (+28)
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td
index c83c57186b46d..74e5e1fd2d6be 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -342,6 +342,44 @@ def fir_StoreOp : fir_Op<"store", [FirAliasTagOpInterface]> {
   }];
 }
 
+def fir_CopyOp : fir_Op<"copy", []> {
+  let summary = "copy constant size memory";
+
+  let description = [{
+    Copy the memory from a source with compile time constant size to
+    a destination of the same type.
+
+    This is meant to be used for aggregate types where load and store
+    are not appropriate to make a copy because LLVM is not meant to
+    handle load and store of "big" aggregates.
+
+    Its "no_overlap" attribute allows indicating that the source and destination
+    are known to not overlap at compile time.
+
+    ```
+      !t =!fir.type<t{x:!fir.array<1000xi32>}>
+      fir.copy %x to %y : !fir.ref<!t>, !fir.ref<!t>
+    ```
+    TODO: add FirAliasTagOpInterface to carry TBAA.
+  }];
+
+  let arguments = (ins Arg<AnyReferenceLike, "", [MemRead]>:$source,
+                       Arg<AnyReferenceLike, "", [MemWrite]>:$destination,
+                       OptionalAttr<UnitAttr>:$no_overlap);
+
+  let builders = [OpBuilder<(ins "mlir::Value":$source,
+                                 "mlir::Value":$destination,
+                                  CArg<"bool", "false">:$no_overlap)>];
+
+  let assemblyFormat = [{
+    $source `to` $destination (`no_overlap` $no_overlap^)?
+    attr-dict `:` type(operands)
+  }];
+
+  let hasVerifier = 1;
+}
+
+
 def fir_SaveResultOp : fir_Op<"save_result", [AttrSizedOperandSegments]> {
   let summary = [{
     save an array, box, or record function result SSA-value to a memory location
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index a2743edd7844a..1d8372774ef9a 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -3539,6 +3539,36 @@ struct StoreOpConversion : public fir::FIROpConversion<fir::StoreOp> {
   }
 };
 
+/// `fir.copy` --> `llvm.memcpy` or `llvm.memmove`
+struct CopyOpConversion : public fir::FIROpConversion<fir::CopyOp> {
+  using FIROpConversion::FIROpConversion;
+
+  llvm::LogicalResult
+  matchAndRewrite(fir::CopyOp copy, OpAdaptor adaptor,
+                  mlir::ConversionPatternRewriter &rewriter) const override {
+    mlir::Location loc = copy.getLoc();
+    mlir::Value llvmSource = adaptor.getSource();
+    mlir::Value llvmDestination = adaptor.getDestination();
+    mlir::Type i64Ty = mlir::IntegerType::get(rewriter.getContext(), 64);
+    mlir::Type copyTy = fir::unwrapRefType(copy.getSource().getType());
+    mlir::Value copySize =
+        genTypeStrideInBytes(loc, i64Ty, rewriter, convertType(copyTy));
+
+    mlir::LLVM::AliasAnalysisOpInterface newOp;
+    if (copy.getNoOverlap())
+      newOp = rewriter.create<mlir::LLVM::MemcpyOp>(
+          loc, llvmDestination, llvmSource, copySize, /*isVolatile=*/false);
+    else
+      newOp = rewriter.create<mlir::LLVM::MemmoveOp>(
+          loc, llvmDestination, llvmSource, copySize, /*isVolatile=*/false);
+
+    // TODO: propagate TBAA once FirAliasTagOpInterface added to CopyOp.
+    attachTBAATag(newOp, copyTy, copyTy, nullptr);
+    rewriter.eraseOp(copy);
+    return mlir::success();
+  }
+};
+
 namespace {
 
 /// Convert `fir.unboxchar` into two `llvm.extractvalue` instructions. One for
@@ -4142,11 +4172,11 @@ void fir::populateFIRToLLVMConversionPatterns(
       BoxOffsetOpConversion, BoxProcHostOpConversion, BoxRankOpConversion,
       BoxTypeCodeOpConversion, BoxTypeDescOpConversion, CallOpConversion,
       CmpcOpConversion, ConvertOpConversion, CoordinateOpConversion,
-      DTEntryOpConversion, DeclareOpConversion, DivcOpConversion,
-      EmboxOpConversion, EmboxCharOpConversion, EmboxProcOpConversion,
-      ExtractValueOpConversion, FieldIndexOpConversion, FirEndOpConversion,
-      FreeMemOpConversion, GlobalLenOpConversion, GlobalOpConversion,
-      InsertOnRangeOpConversion, IsPresentOpConversion,
+      CopyOpConversion, DTEntryOpConversion, DeclareOpConversion,
+      DivcOpConversion, EmboxOpConversion, EmboxCharOpConversion,
+      EmboxProcOpConversion, ExtractValueOpConversion, FieldIndexOpConversion,
+      FirEndOpConversion, FreeMemOpConversion, GlobalLenOpConversion,
+      GlobalOpConversion, InsertOnRangeOpConversion, IsPresentOpConversion,
       LenParamIndexOpConversion, LoadOpConversion, MulcOpConversion,
       NegcOpConversion, NoReassocOpConversion, SelectCaseOpConversion,
       SelectOpConversion, SelectRankOpConversion, SelectTypeOpConversion,
diff --git a/flang/lib/Optimizer/Dialect/FIROps.cpp b/flang/lib/Optimizer/Dialect/FIROps.cpp
index 7efb733eb565c..059514cbb0183 100644
--- a/flang/lib/Optimizer/Dialect/FIROps.cpp
+++ b/flang/lib/Optimizer/Dialect/FIROps.cpp
@@ -3940,6 +3940,29 @@ void fir::StoreOp::build(mlir::OpBuilder &builder, mlir::OperationState &result,
   build(builder, result, value, memref, {});
 }
 
+//===----------------------------------------------------------------------===//
+// CopyOp
+//===----------------------------------------------------------------------===//
+
+void fir::CopyOp::build(mlir::OpBuilder &builder, mlir::OperationState &result,
+                        mlir::Value source, mlir::Value destination,
+                        bool noOverlap) {
+  mlir::UnitAttr noOverlapAttr =
+      noOverlap ? builder.getUnitAttr() : mlir::UnitAttr{};
+  build(builder, result, source, destination, noOverlapAttr);
+}
+
+llvm::LogicalResult fir::CopyOp::verify() {
+  mlir::Type sourceType = fir::unwrapRefType(getSource().getType());
+  mlir::Type destinationType = fir::unwrapRefType(getDestination().getType());
+  if (sourceType != destinationType)
+    return emitOpError("source and destination must have the same value type");
+  if (fir::hasDynamicSize(sourceType))
+    return emitOpError(
+        "source value type must have a compile time constant size");
+  return mlir::success();
+}
+
 //===----------------------------------------------------------------------===//
 // StringLitOp
 //===----------------------------------------------------------------------===//
diff --git a/flang/test/Fir/copy-codegen.fir b/flang/test/Fir/copy-codegen.fir
new file mode 100644
index 0000000000000..eef1885c6a49c
--- /dev/null
+++ b/flang/test/Fir/copy-codegen.fir
@@ -0,0 +1,35 @@
+// Test fir.copy codegen.
+// RUN: fir-opt --fir-to-llvm-ir %s -o - | FileCheck %s
+
+!t=!fir.type<sometype{i:!fir.array<9xi32>}>
+
+module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"} {
+
+func.func @test_copy_1(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t>) {
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!t>, !fir.ref<!t>
+  return
+}
+// CHECK-LABEL:   llvm.func @test_copy_1(
+// CHECK-SAME:                           %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr,
+// CHECK-SAME:                           %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr) {
+// CHECK:           %[[VAL_2:.*]] = llvm.mlir.zero : !llvm.ptr
+// CHECK:           %[[VAL_3:.*]] = llvm.getelementptr %[[VAL_2]][1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<"sometype", (array<9 x i32>)>
+// CHECK:           %[[VAL_4:.*]] = llvm.ptrtoint %[[VAL_3]] : !llvm.ptr to i64
+// CHECK:           "llvm.intr.memcpy"(%[[VAL_1]], %[[VAL_0]], %[[VAL_4]]) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> ()
+// CHECK:           llvm.return
+// CHECK:         }
+
+func.func @test_copy_2(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t>) {
+  fir.copy %arg0 to %arg1 : !fir.ref<!t>, !fir.ref<!t>
+  return
+}
+// CHECK-LABEL:   llvm.func @test_copy_2(
+// CHECK-SAME:                           %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr,
+// CHECK-SAME:                           %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !llvm.ptr) {
+// CHECK:           %[[VAL_2:.*]] = llvm.mlir.zero : !llvm.ptr
+// CHECK:           %[[VAL_3:.*]] = llvm.getelementptr %[[VAL_2]][1] : (!llvm.ptr) -> !llvm.ptr, !llvm.struct<"sometype", (array<9 x i32>)>
+// CHECK:           %[[VAL_4:.*]] = llvm.ptrtoint %[[VAL_3]] : !llvm.ptr to i64
+// CHECK:           "llvm.intr.memmove"(%[[VAL_1]], %[[VAL_0]], %[[VAL_4]]) <{isVolatile = false}> : (!llvm.ptr, !llvm.ptr, i64) -> ()
+// CHECK:           llvm.return
+// CHECK:         }
+}
diff --git a/flang/test/Fir/fir-ops.fir b/flang/test/Fir/fir-ops.fir
index 1bfcb3a9f3dc8..06b0bbbf0bd20 100644
--- a/flang/test/Fir/fir-ops.fir
+++ b/flang/test/Fir/fir-ops.fir
@@ -933,3 +933,12 @@ func.func @test_call_arg_attrs_indirect(%arg0: i16, %arg1: (i16)-> i16) -> i16 {
   %0 = fir.call %arg1(%arg0) : (i16 {llvm.noundef, llvm.signext}) -> (i16 {llvm.signext})
   return %0 : i16
 }
+
+// CHECK-LABEL: @test_copy(
+// CHECK-SAME:    %[[VAL_0:.*]]: !fir.ref<!fir.type<sometype{i:i32}>>,
+// CHECK-SAME:    %[[VAL_1:.*]]: !fir.ptr<!fir.type<sometype{i:i32}>>
+func.func @test_copy(%arg0: !fir.ref<!fir.type<sometype{i:i32}>>, %arg1: !fir.ptr<!fir.type<sometype{i:i32}>>) {
+  // CHECK: fir.copy %[[VAL_0]] to %[[VAL_1]] no_overlap : !fir.ref<!fir.type<sometype{i:i32}>>, !fir.ptr<!fir.type<sometype{i:i32}>>
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!fir.type<sometype{i:i32}>>, !fir.ptr<!fir.type<sometype{i:i32}>>
+  return
+}
diff --git a/flang/test/Fir/invalid.fir b/flang/test/Fir/invalid.fir
index 7e3f9d6498412..0ef64b547f3e9 100644
--- a/flang/test/Fir/invalid.fir
+++ b/flang/test/Fir/invalid.fir
@@ -1018,3 +1018,31 @@ func.func @bad_is_assumed_size(%arg0: !fir.ref<!fir.array<*:none>>) {
   %1 = fir.is_assumed_size %arg0 : (!fir.ref<!fir.array<*:none>>) -> i1
   return
 }
+
+// -----
+
+!t=!fir.type<sometype{i:i32}>
+!t2=!fir.type<sometype2{j:i32}>
+func.func @bad_copy_1(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t2>) {
+  // expected-error@+1{{'fir.copy' op source and destination must have the same value type}}
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!t>, !fir.ref<!t2>
+  return
+}
+
+// -----
+
+!t=!fir.type<sometype{i:i32}>
+func.func @bad_copy_2(%arg0: !fir.ref<!t>, %arg1: !t) {
+  // expected-error@+1{{'fir.copy' op operand #0 must be any reference, but got '!fir.type<sometype{i:i32}>'}}
+  fir.copy %arg1 to %arg0 no_overlap : !t, !fir.ref<!t>
+  return
+}
+
+// -----
+
+!t=!fir.array<?xi32>
+func.func @test_copy(%arg0: !fir.ref<!t>, %arg1: !fir.ref<!t>) {
+  // expected-error@+1{{'fir.copy' op source value type must have a compile time constant size}}
+  fir.copy %arg0 to %arg1 no_overlap : !fir.ref<!t>, !fir.ref<!t>
+  return
+}

Comment on lines +352 to +354
This is meant to be used for aggregate types where load and store
are not appropriate to make a copy because LLVM is not meant to
handle load and store of "big" aggregates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a verifier check or a predicate (AnyRefOfAggregateType) to ensure it is used only for aggregate types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although there is nothing that would technically prevent it to be used for intrinsic types, it seems saner to prevent it. I added a predicate.

Copy link
Contributor

@clementval clementval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Kiran's comment might make sense to avoid using it on any type.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with Kiran's comment

As for TBAA, it is difficult to compare it to the fabricator diff you mentioned because we don't really use tbaa in the way it was intended. TBAA is supposed to say that different types can't alias e.g. "loads of integers don't overlap with loads of floats".

The property we use to implement Fortran aliasing rules is that with tags arranged in a tree structure, parents alias with children, but siblings to not alias. Tags from trees with different roots are assumed to alias.

From what I remember, the tree looks something like this

root (unique per function and scope)
- any access
   - data
     - global data
         - global a
         - global b
     - locally allocated data
       - local alloca a
       - local alloca b
     - dummy arg data
       - dummy arg a
       - dummy arg b
     - direct data
  - descriptor member

I think the best we can do here would be to find the common parent of the tag given to each argument. For example, if copying one dummy argument to another then this should be tagged as an argument because it could alias with any dummy argument but not a global. If copying a local allocation to a global variable then we would have to make it alias with any other data access (so it would still not alias with descriptor accesses).

To be clear I mean tagging as that parent node, not as a child of it.

@vzakhari
Copy link
Contributor

vzakhari commented Mar 7, 2025

I think while we are using TBAA the way Tom described we have to be able to attach two TBAA tags for the source and the destination of fir.copy, then in CodeGen we will be able to apply them to the corresponding load and store.

The AddAliasTagsPass will need to handle fir.copy specially by tracing both operands to their sources.

Copy link
Contributor

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jeanPerier.

@jeanPerier
Copy link
Contributor Author

jeanPerier commented Mar 10, 2025

then in CodeGen we will be able to apply them to the corresponding load and store.

Codegen is generating a llvm.memcopy, not individual loads/stores, and I think one can only set a single TBAA tag for llvm.memcopy.
Maybe the tag can point to a metadata array? I am just not quite sure what are LLVM expectations regarding memcopy /memmove TBAA tags (in C/C++, you would probably be copying from/to entities of the same type, so a single TBAA node would probably be enough for "normal" TBAA). https://llvm.org/docs/LangRef.html#tbaa-metadata does not really talk about memcopy/memove (except for the tbaa.struct, that seems to be a quite different metadata node).

@vzakhari
Copy link
Contributor

Codegen is generating a llvm.memcopy, not individual loads/stores, and I think one can only set a single TBAA tag for llvm.memcopy. Maybe the tag can point to a metadata array? I am just not quite sure what are LLVM expectations regarding memcopy /memmove TBAA tags (in C/C++, you would probably be copying from/to entities of the same type, so a single TBAA node would probably be enough for "normal" TBAA). https://llvm.org/docs/LangRef.html#tbaa-metadata does not really talk about memcopy/memove (except for the tbaa.struct, that seems to be a quite different metadata node).

Right :(
Then the best we can do with the current TBAA scheme is what Tom described, i.e. tagging fir.copy with the common parent sub-root of the TBAA tree.

@jeanPerier jeanPerier merged commit 1ddf180 into main Mar 11, 2025
9 of 11 checks passed
@jeanPerier jeanPerier deleted the users/jeanPerier/fir_copy branch March 11, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:codegen flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants