-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][Linalg]: Optimize any structured linalg operation in transform::PromoteOp to avoid unnecessary copies #69876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir Author: Aviad Cohen (AviadCo) ChangesBefore promotion, there is no need to copy outputs thats are not considered to init tensors. Full diff: https://github.com/llvm/llvm-project/pull/69876.diff 4 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp b/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
index a131f3097666197..34f8bdf844b3fef 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Promotion.cpp
@@ -177,10 +177,8 @@ LinalgOpInstancePromotionOptions::LinalgOpInstancePromotionOptions(
Operation *op = opOperand.get().getDefiningOp();
if (auto sv = dyn_cast_or_null<memref::SubViewOp>(op)) {
subViews[operandNumber] = sv;
- // In case of linalg generic, copy in only if subview is used in linalg
- // payload.
- if (!isa<linalg::GenericOp>(linalgOp) ||
- linalgOp.payloadUsesValueFromOperand(&opOperand))
+ // Copy in only if subview is being used by the linalg operation.
+ if (linalgOp.isDpsInput(&opOperand) || !linalgOp.isInitTensor(&opOperand))
operandsNumbersToCopyIn.insert(operandNumber);
useFullTileBuffers[sv] = vUseFullTileBuffers[operandNumber];
}
diff --git a/mlir/test/Dialect/Linalg/promote.mlir b/mlir/test/Dialect/Linalg/promote.mlir
index 31b29c0e105d99d..e97a2ca9cf8cb48 100644
--- a/mlir/test/Dialect/Linalg/promote.mlir
+++ b/mlir/test/Dialect/Linalg/promote.mlir
@@ -54,7 +54,7 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
// CHECK: memref.copy %[[vA]], %[[partialA]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
// CHECK: memref.copy %[[vB]], %[[partialB]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
-// CHECK: memref.copy %[[vC]], %[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
+// CHECK-NOT: memref.copy %[[vC]], %[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>> to memref<?x?xf32, strided<[?, 1], offset: ?>>
//
// CHECK: linalg.matmul ins(%[[partialA]], %[[partialB]]{{.*}} outs(%[[partialC]]
//
@@ -124,7 +124,7 @@ func.func @matmul_f64(%A: memref<?xi8>, %M: index, %N: index, %K: index) {
// CHECK: memref.copy %[[vA_f64]], %[[partialA_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
// CHECK: memref.copy %[[vB_f64]], %[[partialB_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
-// CHECK: memref.copy %[[vC_f64]], %[[partialC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
+// CHECK-NOT: memref.copy %[[vC_f64]], %[[partialC_f64]] : memref<?x?xf64, strided<[?, 1], offset: ?>> to memref<?x?xf64, strided<[?, 1], offset: ?>>
//
// CHECK: linalg.matmul ins(%[[partialA_f64]], %[[partialB_f64]]{{.*}} outs(%[[partialC_f64]]
//
@@ -255,7 +255,7 @@ func.func @promote_rank_reducing_subviews(%arg0: memref<?x?x?x64xf32, strided<[
// CHECK: %[[c_view:.+]] = memref.view
// CHECK: %[[c_pro_subview:.+]] = memref.subview %[[c_view]]
- // CHECK-COUNT-3: memref.copy
+ // CHECK-COUNT-2: memref.copy
// CHECK: linalg.generic
// CHECK-SAME: ins(%[[a_pro_subview]], %[[b_pro_subview]]
// CHECK-SAME: outs(%[[c_pro_subview]]
diff --git a/mlir/test/Dialect/Linalg/promotion_options.mlir b/mlir/test/Dialect/Linalg/promotion_options.mlir
index a6daa9af2f37cec..9cce028df48bcb2 100644
--- a/mlir/test/Dialect/Linalg/promotion_options.mlir
+++ b/mlir/test/Dialect/Linalg/promotion_options.mlir
@@ -28,7 +28,7 @@ func.func @gemm(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>
// CHECK: %[[svCC:.+]] = memref.subview %[[VC]]
// CHECK: memref.copy %[[svA]], %[[svAA]]
-// CHECK: memref.copy %[[svC]], %[[svCC]]
+// CHECK-NOT: memref.copy %[[svC]], %[[svCC]]
// CHECK: linalg.matmul ins(%[[VA]], %[[svB]]{{.*}} outs(%[[VC]]
// CHECK: memref.copy %[[svCC]], %[[svC]]
// CHECK: memref.dealloc %[[tmpA]]
diff --git a/mlir/test/Dialect/Linalg/transform-promotion.mlir b/mlir/test/Dialect/Linalg/transform-promotion.mlir
index 2f98e394fe05198..cd9d40863bc1a4d 100644
--- a/mlir/test/Dialect/Linalg/transform-promotion.mlir
+++ b/mlir/test/Dialect/Linalg/transform-promotion.mlir
@@ -53,7 +53,7 @@ func.func @promote_subview_matmul(%arg0: memref<?x?xf32, strided<[?, 1], offset:
// CHECK-SAME: memref<?x?xf32> to memref<?x?xf32, strided<[?, 1], offset: ?>>
// CHECK: memref.copy %[[s0]], %[[l0]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
// CHECK: memref.copy %[[s1]], %[[l1]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
-// CHECK: memref.copy %[[s2]], %[[l2]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
+// CHECK-NOT: memref.copy %[[s2]], %[[l2]] : memref<?x?xf32, strided{{.*}}> to memref<?x?xf32, strided{{.*}}>
// CHECK: linalg.matmul
// CHECK-SAME: ins(%[[v0]], %[[v1]] : memref<?x?xf32>, memref<?x?xf32>)
// CHECK-SAME: outs(%[[v2]] : memref<?x?xf32>)
|
@nicolasvasilache @chelini I went over the linalg structured operations and went over the InterfaceMethod<
/*desc=*/[{
Return true if `opOperand` is an init tensor. This is true when it is
an output tensor operand whose value is used in the payload region.
}],
/*retTy=*/"bool",
/*methodName=*/"isInitTensor",
/*args=*/(ins "OpOperand *":$opOperand),
/*methodBody=*/"",
/*defaultImplementation=*/[{
if (!$_op.isDpsInit(opOperand))
return false;
return payloadUsesValueFromOperand(opOperand);
}]
>, |
…::PromoteOp to avoid unnecessary copies Before promotion, there is no need to copy outputs thats are not considered to init tensors.
ac98c7a
to
6dcc864
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely clear about this PR,
if (!isa<linalg::GenericOp>(linalgOp) || | ||
linalgOp.payloadUsesValueFromOperand(&opOperand)) | ||
// Copy in only if subview is being used by the linalg operation. | ||
if (linalgOp.isDpsInput(&opOperand) || !linalgOp.isInitTensor(&opOperand)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isDpsInput
does not check payloadUsesValueFromOperand
AFAIR.
To achieve what you seem to reach for here, you prob want a condition that does both payloadUsesValueFromOperand
for inputs and !linalgOp.isInitTensor(&opOperand)
for other operands.
In any case you should also have a test where one of the inputs is ignored I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolasvasilache thanks for the response.
Unfortunately, we can't guarantee on payloadUsesValueFromOperand
for inputs.
In the general case (like as in CopyOp), the input doesn't appear in the payload, but we still need to promote it.
I believe that as so now we must promote any input and all outputs that are considered InitTensors
.
In case of unused input for GenericOp, user may use populateEraseUnnecessaryInputsPatterns
(maybe worth a transform? I can add one if you think it is useful) what do you think?
@@ -54,7 +54,8 @@ func.func @matmul_f32(%A: memref<?xi8>, %M: index, %N: index, %K: index) { | |||
|
|||
// CHECK: linalg.copy ins(%[[vA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialA]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) | |||
// CHECK: linalg.copy ins(%[[vB]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialB]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) | |||
// CHECK: linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) | |||
// CHECK-NOT: linalg.copy ins(%[[vC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%[[partialC]] : memref<?x?xf32, strided<[?, 1], offset: ?>>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong to me .. if you don't copy in the local buffer, your local buffer contains garbage and you run linalg.matmul on it, the conditions seem off somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to promote the output? does matmul use the outputs in its calculation? if so, I believe we want it to be InitTensor
.
Before promotion, there is no need to copy outputs thats are not considered to init tensors.