[mlir][Linalg]: Optimize any structured linalg operation in transform::PromoteOp to avoid unnecessary copies #69876

AviadCo · 2023-10-22T11:58:29Z

Before promotion, there is no need to copy outputs thats are not considered to init tensors.

llvmbot · 2023-10-22T11:59:35Z

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Aviad Cohen (AviadCo)

Changes

Before promotion, there is no need to copy outputs thats are not considered to init tensors.

Full diff: https://github.com/llvm/llvm-project/pull/69876.diff

4 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp (+2-4)
(modified) mlir/test/Dialect/Linalg/promote.mlir (+3-3)
(modified) mlir/test/Dialect/Linalg/promotion_options.mlir (+1-1)
(modified) mlir/test/Dialect/Linalg/transform-promotion.mlir (+1-1)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp b/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
index a131f3097666197..34f8bdf844b3fef 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
@@ -177,10 +177,8 @@ LinalgOpInstancePromotionOptions::LinalgOpInstancePromotionOptions(
     Operation *op = opOperand.get().getDefiningOp();
     if (auto sv = dyn_cast_or_null<memref::SubViewOp>(op)) {
       subViews[operandNumber] = sv;
-      // In case of linalg generic, copy in only if subview is used in linalg
-      // payload.
-      if (!isa<linalg::GenericOp>(linalgOp) ||
-          linalgOp.payloadUsesValueFromOperand(&opOperand))
+      // Copy in only if subview is being used by the linalg operation.
+      if (linalgOp.isDpsInput(&opOperand) || !linalgOp.isInitTensor(&opOperand))
         operandsNumbersToCopyIn.insert(operandNumber);
       useFullTileBuffers[sv] = vUseFullTileBuffers[operandNumber];
     }
diff --git a/mlir/test/Dialect/Linalg/promote.mlir b/mlir/test/Dialect/Linalg/promote.mlir
index 31b29c0e105d99d..e97a2ca9cf8cb48 100644
--- a/mlir/test/Dialect/Linalg/promote.mlir
+++ b/mlir/test/Dialect/Linalg/promote.mlir
@@ -54,7 +54,7 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
 
 //       CHECK:         memref.copy %[[vA]], %[[partialA]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
 //       CHECK:         memref.copy %[[vB]], %[[partialB]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-//       CHECK:         memref.copy %[[vC]], %[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+//       CHECK-NOT:     memref.copy %[[vC]], %[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
 //
 //       CHECK:         linalg.matmul ins(%[[partialA]], %[[partialB]]{{.*}} outs(%[[partialC]]
 //
@@ -124,7 +124,7 @@ func.func @matmul_f64(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
 
 //       CHECK:         memref.copy %[[vA_f64]], %[[partialA_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
 //       CHECK:         memref.copy %[[vB_f64]], %[[partialB_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
-//       CHECK:         memref.copy %[[vC_f64]], %[[partialC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
+//       CHECK-NOT:     memref.copy %[[vC_f64]], %[[partialC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
 //
 //       CHECK:         linalg.matmul ins(%[[partialA_f64]], %[[partialB_f64]]{{.*}} outs(%[[partialC_f64]]
 //
@@ -255,7 +255,7 @@ func.func @promote_rank_reducing_subviews(%arg0:  memref<?x?x?x64xf32, strided<[
   // CHECK: %[[c_view:.+]] = memref.view
   // CHECK: %[[c_pro_subview:.+]] = memref.subview %[[c_view]]
 
-  // CHECK-COUNT-3: memref.copy
+  // CHECK-COUNT-2: memref.copy
   // CHECK: linalg.generic
   // CHECK-SAME: ins(%[[a_pro_subview]], %[[b_pro_subview]]
   // CHECK-SAME: outs(%[[c_pro_subview]]
diff --git a/mlir/test/Dialect/Linalg/promotion_options.mlir b/mlir/test/Dialect/Linalg/promotion_options.mlir
index a6daa9af2f37cec..9cce028df48bcb2 100644
--- a/mlir/test/Dialect/Linalg/promotion_options.mlir
+++ b/mlir/test/Dialect/Linalg/promotion_options.mlir
@@ -28,7 +28,7 @@ func.func @gemm(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>
 //      CHECK:       %[[svCC:.+]] = memref.subview %[[VC]]
 
 //      CHECK:       memref.copy %[[svA]], %[[svAA]]
-//      CHECK:       memref.copy %[[svC]], %[[svCC]]
+//      CHECK-NOT:   memref.copy %[[svC]], %[[svCC]]
 //      CHECK:       linalg.matmul ins(%[[VA]], %[[svB]]{{.*}} outs(%[[VC]]
 //      CHECK:       memref.copy %[[svCC]], %[[svC]]
 //      CHECK:       memref.dealloc %[[tmpA]]
diff --git a/mlir/test/Dialect/Linalg/transform-promotion.mlir b/mlir/test/Dialect/Linalg/transform-promotion.mlir
index 2f98e394fe05198..cd9d40863bc1a4d 100644
--- a/mlir/test/Dialect/Linalg/transform-promotion.mlir
+++ b/mlir/test/Dialect/Linalg/transform-promotion.mlir
@@ -53,7 +53,7 @@ func.func @promote_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1], offset:
 // CHECK-SAME:            memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
 // CHECK:               memref.copy %[[s0]], %[[l0]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
 // CHECK:               memref.copy %[[s1]], %[[l1]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
-// CHECK:               memref.copy %[[s2]], %[[l2]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
+// CHECK-NOT:           memref.copy %[[s2]], %[[l2]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
 // CHECK:               linalg.matmul
 // CHECK-SAME:                 ins(%[[v0]], %[[v1]] : memref<?x?xf32>, memref<?x?xf32>)
 // CHECK-SAME:                outs(%[[v2]] : memref<?x?xf32>)

AviadCo · 2023-10-22T12:03:24Z

@nicolasvasilache @chelini
This patch continues #68555

I went over the linalg structured operations and went over the LinalgOp interface. After thinking I decided that we need to promote arguments only if those are input or those are outputs which are considered as init tensors, where init tensor function is defined to be:

    InterfaceMethod<
      /*desc=*/[{
        Return true if `opOperand` is an init tensor. This is true when it is
        an output tensor operand whose value is used in the payload region.
      }],
      /*retTy=*/"bool",
      /*methodName=*/"isInitTensor",
      /*args=*/(ins "OpOperand *":$opOperand),
      /*methodBody=*/"",
      /*defaultImplementation=*/[{
        if (!$_op.isDpsInit(opOperand))
          return false;
        return payloadUsesValueFromOperand(opOperand);
      }]
    >,

…::PromoteOp to avoid unnecessary copies Before promotion, there is no need to copy outputs thats are not considered to init tensors.

AviadCo · 2023-10-27T08:48:05Z

@nicolasvasilache @chelini ping

nicolasvasilache

I am not completely clear about this PR,

nicolasvasilache · 2023-10-27T14:23:35Z

mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp

-      if (!isa<linalg::GenericOp>(linalgOp) ||
-          linalgOp.payloadUsesValueFromOperand(&opOperand))
+      // Copy in only if subview is being used by the linalg operation.
+      if (linalgOp.isDpsInput(&opOperand) || !linalgOp.isInitTensor(&opOperand))


I think isDpsInput does not check payloadUsesValueFromOperand AFAIR.
To achieve what you seem to reach for here, you prob want a condition that does both payloadUsesValueFromOperand for inputs and !linalgOp.isInitTensor(&opOperand) for other operands.

In any case you should also have a test where one of the inputs is ignored I think.

@nicolasvasilache thanks for the response.
Unfortunately, we can't guarantee on payloadUsesValueFromOperand for inputs.
In the general case (like as in CopyOp), the input doesn't appear in the payload, but we still need to promote it.
I believe that as so now we must promote any input and all outputs that are considered InitTensors.
In case of unused input for GenericOp, user may use populateEraseUnnecessaryInputsPatterns (maybe worth a transform? I can add one if you think it is useful) what do you think?

nicolasvasilache · 2023-10-27T14:27:43Z

mlir/test/Dialect/Linalg/promote.mlir

@@ -54,7 +54,8 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) {

 //       CHECK:         linalg.copy ins(%[[vA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
 //       CHECK:         linalg.copy ins(%[[vB]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialB]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
-//       CHECK:         linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
+//       CHECK-NOT:     linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)


This seems wrong to me .. if you don't copy in the local buffer, your local buffer contains garbage and you run linalg.matmul on it, the conditions seem off somewhere.

Why do we need to promote the output? does matmul use the outputs in its calculation? if so, I believe we want it to be InitTensor.

AviadCo requested a review from nicolasvasilache October 22, 2023 11:58

llvmbot added mlir:linalg mlir labels Oct 22, 2023

AviadCo requested a review from chelini October 22, 2023 11:58

AviadCo requested a review from ftynse October 24, 2023 08:25

[mlir][Linalg]: Optimize any structured linalg operation in transform…

6dcc864

…::PromoteOp to avoid unnecessary copies Before promotion, there is no need to copy outputs thats are not considered to init tensors.

AviadCo force-pushed the linalg/is-read-write-op branch from ac98c7a to 6dcc864 Compare October 27, 2023 08:47

nicolasvasilache requested changes Oct 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][Linalg]: Optimize any structured linalg operation in transform::PromoteOp to avoid unnecessary copies #69876

[mlir][Linalg]: Optimize any structured linalg operation in transform::PromoteOp to avoid unnecessary copies #69876

Uh oh!

AviadCo commented Oct 22, 2023

Uh oh!

llvmbot commented Oct 22, 2023 •

edited

Loading

Uh oh!

AviadCo commented Oct 22, 2023

Uh oh!

AviadCo commented Oct 27, 2023

Uh oh!

nicolasvasilache left a comment

Uh oh!

nicolasvasilache Oct 27, 2023

Uh oh!

AviadCo Oct 31, 2023

Uh oh!

nicolasvasilache Oct 27, 2023

Uh oh!

AviadCo Oct 31, 2023

Uh oh!

Uh oh!

[mlir][Linalg]: Optimize any structured linalg operation in transform::PromoteOp to avoid unnecessary copies #69876

Are you sure you want to change the base?

[mlir][Linalg]: Optimize any structured linalg operation in transform::PromoteOp to avoid unnecessary copies #69876

Uh oh!

Conversation

AviadCo commented Oct 22, 2023

Uh oh!

llvmbot commented Oct 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AviadCo commented Oct 22, 2023

Uh oh!

AviadCo commented Oct 27, 2023

Uh oh!

nicolasvasilache left a comment

Choose a reason for hiding this comment

Uh oh!

nicolasvasilache Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

AviadCo Oct 31, 2023

Choose a reason for hiding this comment

Uh oh!

nicolasvasilache Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

AviadCo Oct 31, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Oct 22, 2023 •

edited

Loading