Skip to content

[mlir][Linalg]: Optimize any structured linalg operation in transform::PromoteOp to avoid unnecessary copies #69876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AviadCo
Copy link
Contributor

@AviadCo AviadCo commented Oct 22, 2023

Before promotion, there is no need to copy outputs thats are not considered to init tensors.

@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2023

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Aviad Cohen (AviadCo)

Changes

Before promotion, there is no need to copy outputs thats are not considered to init tensors.


Full diff: https://github.com/llvm/llvm-project/pull/69876.diff

4 Files Affected:

  • (modified) mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp (+2-4)
  • (modified) mlir/test/Dialect/Linalg/promote.mlir (+3-3)
  • (modified) mlir/test/Dialect/Linalg/promotion_options.mlir (+1-1)
  • (modified) mlir/test/Dialect/Linalg/transform-promotion.mlir (+1-1)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp b/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
index a131f3097666197..34f8bdf844b3fef 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
@@ -177,10 +177,8 @@ LinalgOpInstancePromotionOptions::LinalgOpInstancePromotionOptions(
     Operation *op = opOperand.get().getDefiningOp();
     if (auto sv = dyn_cast_or_null<memref::SubViewOp>(op)) {
       subViews[operandNumber] = sv;
-      // In case of linalg generic, copy in only if subview is used in linalg
-      // payload.
-      if (!isa<linalg::GenericOp>(linalgOp) ||
-          linalgOp.payloadUsesValueFromOperand(&opOperand))
+      // Copy in only if subview is being used by the linalg operation.
+      if (linalgOp.isDpsInput(&opOperand) || !linalgOp.isInitTensor(&opOperand))
         operandsNumbersToCopyIn.insert(operandNumber);
       useFullTileBuffers[sv] = vUseFullTileBuffers[operandNumber];
     }
diff --git a/mlir/test/Dialect/Linalg/promote.mlir b/mlir/test/Dialect/Linalg/promote.mlir
index 31b29c0e105d99d..e97a2ca9cf8cb48 100644
--- a/mlir/test/Dialect/Linalg/promote.mlir
+++ b/mlir/test/Dialect/Linalg/promote.mlir
@@ -54,7 +54,7 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
 
 //       CHECK:         memref.copy %[[vA]], %[[partialA]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
 //       CHECK:         memref.copy %[[vB]], %[[partialB]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-//       CHECK:         memref.copy %[[vC]], %[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+//       CHECK-NOT:     memref.copy %[[vC]], %[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
 //
 //       CHECK:         linalg.matmul ins(%[[partialA]], %[[partialB]]{{.*}} outs(%[[partialC]]
 //
@@ -124,7 +124,7 @@ func.func @matmul_f64(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
 
 //       CHECK:         memref.copy %[[vA_f64]], %[[partialA_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
 //       CHECK:         memref.copy %[[vB_f64]], %[[partialB_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
-//       CHECK:         memref.copy %[[vC_f64]], %[[partialC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
+//       CHECK-NOT:     memref.copy %[[vC_f64]], %[[partialC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
 //
 //       CHECK:         linalg.matmul ins(%[[partialA_f64]], %[[partialB_f64]]{{.*}} outs(%[[partialC_f64]]
 //
@@ -255,7 +255,7 @@ func.func @promote_rank_reducing_subviews(%arg0:  memref<?x?x?x64xf32, strided<[
   // CHECK: %[[c_view:.+]] = memref.view
   // CHECK: %[[c_pro_subview:.+]] = memref.subview %[[c_view]]
 
-  // CHECK-COUNT-3: memref.copy
+  // CHECK-COUNT-2: memref.copy
   // CHECK: linalg.generic
   // CHECK-SAME: ins(%[[a_pro_subview]], %[[b_pro_subview]]
   // CHECK-SAME: outs(%[[c_pro_subview]]
diff --git a/mlir/test/Dialect/Linalg/promotion_options.mlir b/mlir/test/Dialect/Linalg/promotion_options.mlir
index a6daa9af2f37cec..9cce028df48bcb2 100644
--- a/mlir/test/Dialect/Linalg/promotion_options.mlir
+++ b/mlir/test/Dialect/Linalg/promotion_options.mlir
@@ -28,7 +28,7 @@ func.func @gemm(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>
 //      CHECK:       %[[svCC:.+]] = memref.subview %[[VC]]
 
 //      CHECK:       memref.copy %[[svA]], %[[svAA]]
-//      CHECK:       memref.copy %[[svC]], %[[svCC]]
+//      CHECK-NOT:   memref.copy %[[svC]], %[[svCC]]
 //      CHECK:       linalg.matmul ins(%[[VA]], %[[svB]]{{.*}} outs(%[[VC]]
 //      CHECK:       memref.copy %[[svCC]], %[[svC]]
 //      CHECK:       memref.dealloc %[[tmpA]]
diff --git a/mlir/test/Dialect/Linalg/transform-promotion.mlir b/mlir/test/Dialect/Linalg/transform-promotion.mlir
index 2f98e394fe05198..cd9d40863bc1a4d 100644
--- a/mlir/test/Dialect/Linalg/transform-promotion.mlir
+++ b/mlir/test/Dialect/Linalg/transform-promotion.mlir
@@ -53,7 +53,7 @@ func.func @promote_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1], offset:
 // CHECK-SAME:            memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
 // CHECK:               memref.copy %[[s0]], %[[l0]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
 // CHECK:               memref.copy %[[s1]], %[[l1]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
-// CHECK:               memref.copy %[[s2]], %[[l2]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
+// CHECK-NOT:           memref.copy %[[s2]], %[[l2]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
 // CHECK:               linalg.matmul
 // CHECK-SAME:                 ins(%[[v0]], %[[v1]] : memref<?x?xf32>, memref<?x?xf32>)
 // CHECK-SAME:                outs(%[[v2]] : memref<?x?xf32>)

@AviadCo
Copy link
Contributor Author

AviadCo commented Oct 22, 2023

@nicolasvasilache @chelini
This patch continues #68555

I went over the linalg structured operations and went over the LinalgOp interface. After thinking I decided that we need to promote arguments only if those are input or those are outputs which are considered as init tensors, where init tensor function is defined to be:

    InterfaceMethod<
      /*desc=*/[{
        Return true if `opOperand` is an init tensor. This is true when it is
        an output tensor operand whose value is used in the payload region.
      }],
      /*retTy=*/"bool",
      /*methodName=*/"isInitTensor",
      /*args=*/(ins "OpOperand *":$opOperand),
      /*methodBody=*/"",
      /*defaultImplementation=*/[{
        if (!$_op.isDpsInit(opOperand))
          return false;
        return payloadUsesValueFromOperand(opOperand);
      }]
    >,

@AviadCo AviadCo requested a review from ftynse October 24, 2023 08:25
…::PromoteOp to avoid unnecessary copies

Before promotion, there is no need to copy outputs thats are not
considered to init tensors.
@AviadCo AviadCo force-pushed the linalg/is-read-write-op branch from ac98c7a to 6dcc864 Compare October 27, 2023 08:47
@AviadCo
Copy link
Contributor Author

AviadCo commented Oct 27, 2023

@nicolasvasilache @chelini ping

Copy link
Contributor

@nicolasvasilache nicolasvasilache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely clear about this PR,

if (!isa<linalg::GenericOp>(linalgOp) ||
linalgOp.payloadUsesValueFromOperand(&opOperand))
// Copy in only if subview is being used by the linalg operation.
if (linalgOp.isDpsInput(&opOperand) || !linalgOp.isInitTensor(&opOperand))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isDpsInput does not check payloadUsesValueFromOperand AFAIR.
To achieve what you seem to reach for here, you prob want a condition that does both payloadUsesValueFromOperand for inputs and !linalgOp.isInitTensor(&opOperand) for other operands.

In any case you should also have a test where one of the inputs is ignored I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasvasilache thanks for the response.
Unfortunately, we can't guarantee on payloadUsesValueFromOperand for inputs.
In the general case (like as in CopyOp), the input doesn't appear in the payload, but we still need to promote it.
I believe that as so now we must promote any input and all outputs that are considered InitTensors.
In case of unused input for GenericOp, user may use populateEraseUnnecessaryInputsPatterns (maybe worth a transform? I can add one if you think it is useful) what do you think?

@@ -54,7 +54,8 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) {

// CHECK: linalg.copy ins(%[[vA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
// CHECK: linalg.copy ins(%[[vB]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialB]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
// CHECK: linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
// CHECK-NOT: linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong to me .. if you don't copy in the local buffer, your local buffer contains garbage and you run linalg.matmul on it, the conditions seem off somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to promote the output? does matmul use the outputs in its calculation? if so, I believe we want it to be InitTensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants