-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][vector] Hoist transfer pairs when the source is AssumeAlignmentOp #144809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
llvm@ffb9bbf makes memref::AssumeAlignmentOp be ViewLikeOp, which disables the hoisting support when AssumeAlignmentOp is present. In the past, it is not an issue because the op does not have a result. After the op has a result, the hoisting is not working if transfer ops operate on AssumeAlignmentOp. Signed-off-by: hanhanW <[email protected]>
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir Author: Han-Chung Wang (hanhanW) Changesffb9bbf makes memref::AssumeAlignmentOp be ViewLikeOp, which disables the hoisting support when AssumeAlignmentOp is present. In the past, it is not an issue because the op does not have a result. After the op has a result, the hoisting is not working if transfer ops operate on AssumeAlignmentOp. Full diff: https://github.com/llvm/llvm-project/pull/144809.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
index 4edf432d9d97d..39c3451b1369f 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
@@ -27,8 +27,8 @@ namespace linalg {
/// dominated by the transfer_write (i.e. no aliasing between the write and
/// the read across the loop)
/// 4. The source operands for vector.transfer_{read|write} do not originate
-/// from Ops implementing ViewLikeOpInterface (to reduce the risk of
-/// aliasing).
+/// from ops implementing ViewLikeOpInterface (to reduce the risk of
+/// aliasing), except memref::AssumeAlignmentOp.
/// 5. If `verifyNonZeroTrip` is true, then the lower bound of the loop must
/// be statically smaller than the upper bound of the loop, guaranteeing that
/// the loop body will execute at least once.
@@ -39,8 +39,8 @@ namespace linalg {
///
/// TODO: To further improve hoisting opportunities, fold aliasing memref
/// operations into respective vector.transfer{read|write} operations and
-/// avoid using ops implementing ViewLikeOpInterface as the source for transfer
-/// Ops.
+/// avoid using ops implementing ViewLikeOpInterface, except
+/// memref::AssumeAlignmentOp, as the source for transfer ops.
///
/// WARNING: This hoisting does not model parallelism and is generally incorrect
/// when used on distributed loops with memref semantics!
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp b/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
index 707b63ff9335b..d473a1cff08cc 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
@@ -21,6 +21,7 @@
#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/SCF/IR/SCF.h"
#include "mlir/Dialect/SCF/Utils/Utils.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"
@@ -303,7 +304,8 @@ void mlir::linalg::hoistRedundantVectorTransfers(Operation *root,
// 1. indices, vector type and permutation map are the same (i.e., the
// transfer_read/transfer_write ops are matching),
// 2. source operands for transfer.{read|write} do not originate from
- // Ops implementing ViewLikeOpInterface.
+ // ops implementing ViewLikeOpInterface, except
+ // memref::AssumeAlingmentOp.
// 3. no other operations in the loop access the same memref except
// for transfer_read/transfer_write accessing statically disjoint
// slices.
@@ -313,11 +315,13 @@ void mlir::linalg::hoistRedundantVectorTransfers(Operation *root,
return WalkResult::advance();
auto *source = transferRead.getBase().getDefiningOp();
- if (source && isa_and_nonnull<ViewLikeOpInterface>(source))
+ if (source && isa_and_nonnull<ViewLikeOpInterface>(source) &&
+ !isa<memref::AssumeAlignmentOp>(source))
return WalkResult::advance();
source = transferWrite.getBase().getDefiningOp();
- if (source && isa_and_nonnull<ViewLikeOpInterface>(source))
+ if (source && isa_and_nonnull<ViewLikeOpInterface>(source) &&
+ !isa<memref::AssumeAlignmentOp>(source))
return WalkResult::advance();
// TODO: may want to memoize this information for performance but it
diff --git a/mlir/test/Dialect/Linalg/hoisting.mlir b/mlir/test/Dialect/Linalg/hoisting.mlir
index 318edca73cce1..a71bd044d98c1 100644
--- a/mlir/test/Dialect/Linalg/hoisting.mlir
+++ b/mlir/test/Dialect/Linalg/hoisting.mlir
@@ -18,11 +18,13 @@ func.func @hoist_vector_transfer_pairs(
%val: index, %lb : index, %ub : index, %step: index, %cmp: i1) {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
+ %assume_align = memref.assume_alignment %memref0, 64 : memref<?x?xf32>
// CHECK: vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<1xf32>
-// CHECK: scf.for %[[I:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>) {
+// CHECK: vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<1xf32>
+// CHECK: scf.for %[[I:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>, vector<1xf32>) {
// CHECK: vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<2xf32>
-// CHECK: scf.for %[[J:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>, vector<2xf32>) {
+// CHECK: scf.for %[[J:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>, vector<2xf32>, vector<1xf32>) {
// CHECK: vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<3xf32>
// CHECK: vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<4xf32>
// CHECK: "some_crippling_use"(%[[MEMREF4]]) : (memref<?x?xf32>) -> ()
@@ -43,6 +45,7 @@ func.func @hoist_vector_transfer_pairs(
// CHECK: scf.yield {{.*}} : vector<1xf32>
// CHECK: }
// CHECK: vector.transfer_write %{{.*}} : vector<1xf32>, memref<?x?xf32>
+// CHECK: vector.transfer_write %{{.*}} : vector<1xf32>, memref<?x?xf32>
// CHECK: "unrelated_use"(%[[MEMREF1]]) : (memref<?x?xf32>) -> ()
scf.for %i = %lb to %ub step %step {
scf.for %j = %lb to %ub step %step {
@@ -53,6 +56,7 @@ func.func @hoist_vector_transfer_pairs(
"some_crippling_use"(%memref4) : (memref<?x?xf32>) -> ()
%r4 = vector.transfer_read %memref4[%c0, %c0], %cst: memref<?x?xf32>, vector<5xf32>
%r5 = vector.transfer_read %memref5[%c0, %c0], %cst: memref<?x?xf32>, vector<6xf32>
+ %r6 = vector.transfer_read %assume_align[%c0, %c0], %cst: memref<?x?xf32>, vector<1xf32>
"some_crippling_use"(%memref5) : (memref<?x?xf32>) -> ()
%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>
@@ -60,12 +64,14 @@ func.func @hoist_vector_transfer_pairs(
%u3 = "some_use"(%r3) : (vector<4xf32>) -> vector<4xf32>
%u4 = "some_use"(%r4) : (vector<5xf32>) -> vector<5xf32>
%u5 = "some_use"(%r5) : (vector<6xf32>) -> vector<6xf32>
+ %u6 = "some_use"(%r6) : (vector<1xf32>) -> vector<1xf32>
vector.transfer_write %u0, %memref1[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
vector.transfer_write %u1, %memref0[%i, %i] : vector<2xf32>, memref<?x?xf32>
vector.transfer_write %u2, %memref2[%c0, %c0] : vector<3xf32>, memref<?x?xf32>
vector.transfer_write %u3, %memref3[%c0, %c0] : vector<4xf32>, memref<?x?xf32>
vector.transfer_write %u4, %memref4[%c0, %c0] : vector<5xf32>, memref<?x?xf32>
vector.transfer_write %u5, %memref5[%c0, %c0] : vector<6xf32>, memref<?x?xf32>
+ vector.transfer_write %u6, %assume_align[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
"some_crippling_use"(%memref3) : (memref<?x?xf32>) -> ()
}
"unrelated_use"(%memref0) : (memref<?x?xf32>) -> ()
|
Isn't the new behaviour correct? I might be mistaken, but to me this creates an alias: %assume_align = memref.assume_alignment %memref0, 64 : memref<?x?xf32> Without proper alias-analysis, the logic updated in this PR should behave conservatively, i.e. block hoisting. When To me the lack of hoisting feels like quite an unfortunate side-effect of ffb9bbf. WDYT? |
What alias-analysis is needed for doing the hoisting?
I'm not sure what the proper fix is, and suggestions are welcome. Let me unpack more context, ffb9bbf breaks a set of tests in our downstream project. The reason is that it disables hoisting on memrefs when memref.assume_alignment is involved. It means that we'll never be able to hoist vectors on memrefs without a fix. If the vectorization and hoisting happen at tensor level, there are no issues. However, some compilers are not tensor based compiler. They can decide to do vectorization and hoisting on memrefs. Hoisting is an important feature in optimization. I think we need to support memref.assume_alignment somehow. The intention of memref.assume_alignment is allowing the compiler to generate more efficient code based on the alignment assumption; it returns the same buffer with the alignment assumption. I think it is a special ViewLikeOp. Should I check the source of assume_alignment is ViewLikeOpInterface instead? The author mentioned that it's fine to remove ViewLikeOpInterface from the op, but I agree with you that it creates an alias. So maybe we want to keep the interface? #139521 (comment) |
(waving hands here a bit…)
I've been experimenting with some simpler repros (the existing ones in "hoisting.mlir" are a bit too dense), and honestly, I'm seeing some confusing results. I’ve added a repro below with comments inline. Even with the original code, I'm puzzled: reading/writing func.func @example(
%mem: memref<?x?xf32>,
%lb : index, %ub : index, %step: index, %in: vector<1xf32>) {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
%sv = memref.subview %mem[0, 0][1, 1][1, 1] : memref<?x?xf32> to memref<1x1xf32, strided<[?, 1]>>
scf.for %i = %lb to %ub step %step {
%r0 = vector.transfer_read %mem[%c0, %c0], %cst: memref<?x?xf32>, vector<1xf32>
%r1 = vector.transfer_read %sv[%c0, %c0], %cst: memref<1x1xf32, strided<[?, 1]>>, vector<1xf32>
%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
%u1 = "some_use"(%r1) : (vector<1xf32>) -> vector<1xf32>
vector.transfer_write %u0, %mem[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
vector.transfer_write %u1, %sv[%c0, %c0] : vector<1xf32>, memref<1x1xf32, strided<[?, 1]>>
// NOTE: Using %in - some input argument.
// 1. Hoisting still happens with this uncommented
// vector.transfer_write %in, %sv[%c0, %c0] : vector<1xf32>, memref<1x1xf32, strided<[?, 1]>>
// 2. Hoisting does not happen with this uncommented
// vector.transfer_write %in, %mem[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
}
"unrelated_use"(%mem) : (memref<?x?xf32>) -> ()
return
}
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
%0 = transform.structured.match ops{["func.func"]} in %arg1
: (!transform.any_op) -> !transform.any_op
transform.structured.hoist_redundant_vector_transfers %0
: (!transform.any_op) -> !transform.any_op
transform.yield
}
} And this is without even introducing
Totally agree - it feels like the consequences of that change weren’t fully evaluated before merging. In short:
To me the interface makes sense, yes. At least after adding a return value.
Right —-but aren't we missing some kind of explicit “assume” mechanism here? Diego raised something similar (though in the context of vector masking): That’s a topic I’m actively exploring, though it'll take time - and it may or may not translate cleanly to the My overall feeling is this:
In general, I don't see an obvious fix - and it feels like For context, this isn’t really my area of deep expertise, and I’ve only skimmed through the original PR. That said, happy to keep digging and discussing! |
I've posted two PRs to help move this forward:
Let me know if anything here seems off - hopefully these changes make sense and help clarify the path forward. |
@@ -313,11 +315,13 @@ void mlir::linalg::hoistRedundantVectorTransfers(Operation *root, | |||
return WalkResult::advance(); | |||
|
|||
auto *source = transferRead.getBase().getDefiningOp(); | |||
if (source && isa_and_nonnull<ViewLikeOpInterface>(source)) | |||
if (source && isa_and_nonnull<ViewLikeOpInterface>(source) && | |||
!isa<memref::AssumeAlignmentOp>(source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the mem souce of AssumeAlignmentOp maybe also an viewlike operation, we should also check AssumeAlignmentOp’s memref operand. I give an similar PR at 144843, (and also a probem case in it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do so. What I meant in #144809 (comment) is the similar idea. I did not make the change because we don't have an agreement yet.
I begin to understand why we skip viewlikeOp before ---- to skip mem alias. But it seems the line 340 and 347 are tring to slove this alias problem: |
It looks like a bug. It's been a long time since the last time I looked at the code, so I'm quite outdated in this area.
I can't translate the idea to the
Reverting the change could be a pain, since I already landed some improvements in upstream: #142425 #142358 and downstream projects: iree-org/iree#20913 iree-org/iree#20984 iree-org/iree#20973 iree-org/iree#20925
My repro is here, just in case if it is helpful: iree-org/iree#20912 (comment) It is not IREE specific, you can run
So many thanks for spending your time here! I added you as a reviewer because the transformation is not updated for a while, and you're one of the most recent contributors in terms of this file. I don't have an obvious fix either. Can we add an option for now? E.g., we pass
I think this is a little different. IIUC, it is mainly for checking whether there are overlapping between read and writes? It may be the place where we inject alias analysis checks though. We may use simple analysis that only allows assume_alignment as a start, and improve it when there are other needs. So I think there are two approaches:
I'd like to go with (1) and can help with (2), like providing guidance. Note that the issue has been there for ~1 month, and I only work on these fixes in my 20% time. It'd take more time if we want to go with (2) directly, while I'd like to re-enable tests in our downstream projects. Our cooperative matrix support on SPIR-V side is disabled for a while if assume_alignment feature is requested. It is only enabled without assume_alignment feature. |
I suggest to go with (2) directly, Inject basic alias and use it in isDisjointTransferSet checks (or add other check function). |
ffb9bbf makes memref::AssumeAlignmentOp be ViewLikeOp, which disables the hoisting support when AssumeAlignmentOp is present. In the past, it is not an issue because the op does not have a result. After the op has a result, the hoisting is not working if transfer ops operate on AssumeAlignmentOp.