Skip to content

[mlir][vector] Hoist transfer pairs when the source is AssumeAlignmentOp #144809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Jun 18, 2025

ffb9bbf makes memref::AssumeAlignmentOp be ViewLikeOp, which disables the hoisting support when AssumeAlignmentOp is present. In the past, it is not an issue because the op does not have a result. After the op has a result, the hoisting is not working if transfer ops operate on AssumeAlignmentOp.

llvm@ffb9bbf
makes memref::AssumeAlignmentOp be ViewLikeOp, which disables the
hoisting support when AssumeAlignmentOp is present. In the past, it is
not an issue because the op does not have a result. After the op has a
result, the hoisting is not working if transfer ops operate on
AssumeAlignmentOp.

Signed-off-by: hanhanW <[email protected]>
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Han-Chung Wang (hanhanW)

Changes

ffb9bbf makes memref::AssumeAlignmentOp be ViewLikeOp, which disables the hoisting support when AssumeAlignmentOp is present. In the past, it is not an issue because the op does not have a result. After the op has a result, the hoisting is not working if transfer ops operate on AssumeAlignmentOp.


Full diff: https://github.com/llvm/llvm-project/pull/144809.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h (+4-4)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp (+7-3)
  • (modified) mlir/test/Dialect/Linalg/hoisting.mlir (+8-2)
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
index 4edf432d9d97d..39c3451b1369f 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
@@ -27,8 +27,8 @@ namespace linalg {
 ///   dominated by the transfer_write (i.e. no aliasing between the write and
 ///   the read across the loop)
 ///   4. The source operands for vector.transfer_{read|write} do not originate
-///   from Ops implementing ViewLikeOpInterface (to reduce the risk of
-///   aliasing).
+///   from ops implementing ViewLikeOpInterface (to reduce the risk of
+///   aliasing), except memref::AssumeAlignmentOp.
 ///   5. If `verifyNonZeroTrip` is true, then the lower bound of the loop must
 ///   be statically smaller than the upper bound of the loop, guaranteeing that
 ///   the loop body will execute at least once.
@@ -39,8 +39,8 @@ namespace linalg {
 ///
 /// TODO: To further improve hoisting opportunities, fold aliasing memref
 /// operations into respective vector.transfer{read|write} operations and
-/// avoid using ops implementing ViewLikeOpInterface as the source for transfer
-/// Ops.
+/// avoid using ops implementing ViewLikeOpInterface, except
+/// memref::AssumeAlignmentOp, as the source for transfer ops.
 ///
 /// WARNING: This hoisting does not model parallelism and is generally incorrect
 /// when used on distributed loops with memref semantics!
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp b/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
index 707b63ff9335b..d473a1cff08cc 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
@@ -21,6 +21,7 @@
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/Dialect/Linalg/IR/Linalg.h"
 #include "mlir/Dialect/Linalg/Transforms/Transforms.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
 #include "mlir/Dialect/SCF/IR/SCF.h"
 #include "mlir/Dialect/SCF/Utils/Utils.h"
 #include "mlir/Dialect/Tensor/IR/Tensor.h"
@@ -303,7 +304,8 @@ void mlir::linalg::hoistRedundantVectorTransfers(Operation *root,
       //   1. indices, vector type and permutation map are the same (i.e., the
       //      transfer_read/transfer_write ops are matching),
       //   2. source operands for transfer.{read|write} do not originate from
-      //      Ops implementing ViewLikeOpInterface.
+      //      ops implementing ViewLikeOpInterface, except
+      //      memref::AssumeAlingmentOp.
       //   3. no other operations in the loop access the same memref except
       //      for transfer_read/transfer_write accessing statically disjoint
       //      slices.
@@ -313,11 +315,13 @@ void mlir::linalg::hoistRedundantVectorTransfers(Operation *root,
         return WalkResult::advance();
 
       auto *source = transferRead.getBase().getDefiningOp();
-      if (source && isa_and_nonnull<ViewLikeOpInterface>(source))
+      if (source && isa_and_nonnull<ViewLikeOpInterface>(source) &&
+          !isa<memref::AssumeAlignmentOp>(source))
         return WalkResult::advance();
 
       source = transferWrite.getBase().getDefiningOp();
-      if (source && isa_and_nonnull<ViewLikeOpInterface>(source))
+      if (source && isa_and_nonnull<ViewLikeOpInterface>(source) &&
+          !isa<memref::AssumeAlignmentOp>(source))
         return WalkResult::advance();
 
       // TODO: may want to memoize this information for performance but it
diff --git a/mlir/test/Dialect/Linalg/hoisting.mlir b/mlir/test/Dialect/Linalg/hoisting.mlir
index 318edca73cce1..a71bd044d98c1 100644
--- a/mlir/test/Dialect/Linalg/hoisting.mlir
+++ b/mlir/test/Dialect/Linalg/hoisting.mlir
@@ -18,11 +18,13 @@ func.func @hoist_vector_transfer_pairs(
     %val: index, %lb : index, %ub : index, %step: index, %cmp: i1) {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
+  %assume_align = memref.assume_alignment %memref0, 64 : memref<?x?xf32>
 
 // CHECK: vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<1xf32>
-// CHECK: scf.for %[[I:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>) {
+// CHECK: vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<1xf32>
+// CHECK: scf.for %[[I:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>, vector<1xf32>) {
 // CHECK:   vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<2xf32>
-// CHECK:   scf.for %[[J:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>, vector<2xf32>) {
+// CHECK:   scf.for %[[J:.*]] = %[[LB]] to %[[UB]] step %[[STEP]] iter_args({{.*}}) -> (vector<1xf32>, vector<2xf32>, vector<1xf32>) {
 // CHECK:     vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<3xf32>
 // CHECK:     vector.transfer_read %{{.*}} : memref<?x?xf32>, vector<4xf32>
 // CHECK:     "some_crippling_use"(%[[MEMREF4]]) : (memref<?x?xf32>) -> ()
@@ -43,6 +45,7 @@ func.func @hoist_vector_transfer_pairs(
 // CHECK:   scf.yield {{.*}} : vector<1xf32>
 // CHECK: }
 // CHECK: vector.transfer_write %{{.*}} : vector<1xf32>, memref<?x?xf32>
+// CHECK: vector.transfer_write %{{.*}} : vector<1xf32>, memref<?x?xf32>
 // CHECK: "unrelated_use"(%[[MEMREF1]]) : (memref<?x?xf32>) -> ()
   scf.for %i = %lb to %ub step %step {
     scf.for %j = %lb to %ub step %step {
@@ -53,6 +56,7 @@ func.func @hoist_vector_transfer_pairs(
       "some_crippling_use"(%memref4) : (memref<?x?xf32>) -> ()
       %r4 = vector.transfer_read %memref4[%c0, %c0], %cst: memref<?x?xf32>, vector<5xf32>
       %r5 = vector.transfer_read %memref5[%c0, %c0], %cst: memref<?x?xf32>, vector<6xf32>
+      %r6 = vector.transfer_read %assume_align[%c0, %c0], %cst: memref<?x?xf32>, vector<1xf32>
       "some_crippling_use"(%memref5) : (memref<?x?xf32>) -> ()
       %u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
       %u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>
@@ -60,12 +64,14 @@ func.func @hoist_vector_transfer_pairs(
       %u3 = "some_use"(%r3) : (vector<4xf32>) -> vector<4xf32>
       %u4 = "some_use"(%r4) : (vector<5xf32>) -> vector<5xf32>
       %u5 = "some_use"(%r5) : (vector<6xf32>) -> vector<6xf32>
+      %u6 = "some_use"(%r6) : (vector<1xf32>) -> vector<1xf32>
       vector.transfer_write %u0, %memref1[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
       vector.transfer_write %u1, %memref0[%i, %i] : vector<2xf32>, memref<?x?xf32>
       vector.transfer_write %u2, %memref2[%c0, %c0] : vector<3xf32>, memref<?x?xf32>
       vector.transfer_write %u3, %memref3[%c0, %c0] : vector<4xf32>, memref<?x?xf32>
       vector.transfer_write %u4, %memref4[%c0, %c0] : vector<5xf32>, memref<?x?xf32>
       vector.transfer_write %u5, %memref5[%c0, %c0] : vector<6xf32>, memref<?x?xf32>
+      vector.transfer_write %u6, %assume_align[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
       "some_crippling_use"(%memref3) : (memref<?x?xf32>) -> ()
     }
     "unrelated_use"(%memref0) : (memref<?x?xf32>) -> ()

@banach-space
Copy link
Contributor

After the op has a result, the hoisting is not working if transfer ops operate on AssumeAlignmentOp.

Isn't the new behaviour correct? I might be mistaken, but to me this creates an alias:

  %assume_align = memref.assume_alignment %memref0, 64 : memref<?x?xf32>

Without proper alias-analysis, the logic updated in this PR should behave conservatively, i.e. block hoisting. When memref.assume_alignment didn't return a value, there was no alias and hoisting was safe. But that's no longer the case.

To me the lack of hoisting feels like quite an unfortunate side-effect of ffb9bbf.

WDYT?

@hanhanW
Copy link
Contributor Author

hanhanW commented Jun 19, 2025

Isn't the new behaviour correct? I might be mistaken, but to me this creates an alias:

  %assume_align = memref.assume_alignment %memref0, 64 : memref<?x?xf32>

Without proper alias-analysis, the logic updated in this PR should behave conservatively, i.e. block hoisting. When memref.assume_alignment didn't return a value, there was no alias and hoisting was safe. But that's no longer the case.

What alias-analysis is needed for doing the hoisting?

To me the lack of hoisting feels like quite an unfortunate side-effect of ffb9bbf.

WDYT?

I'm not sure what the proper fix is, and suggestions are welcome. Let me unpack more context, ffb9bbf breaks a set of tests in our downstream project. The reason is that it disables hoisting on memrefs when memref.assume_alignment is involved. It means that we'll never be able to hoist vectors on memrefs without a fix.

If the vectorization and hoisting happen at tensor level, there are no issues. However, some compilers are not tensor based compiler. They can decide to do vectorization and hoisting on memrefs. Hoisting is an important feature in optimization. I think we need to support memref.assume_alignment somehow.

The intention of memref.assume_alignment is allowing the compiler to generate more efficient code based on the alignment assumption; it returns the same buffer with the alignment assumption. I think it is a special ViewLikeOp. Should I check the source of assume_alignment is ViewLikeOpInterface instead?

The author mentioned that it's fine to remove ViewLikeOpInterface from the op, but I agree with you that it creates an alias. So maybe we want to keep the interface? #139521 (comment)

@banach-space
Copy link
Contributor

banach-space commented Jun 20, 2025

What alias-analysis is needed for doing the hoisting?

(waving hands here a bit…)

  • If there are no aliases, it's safe to analyse the underlying memrefs in isolation.
  • But with aliasing (e.g., via memref.subview), all "references" to a given memref need to be considered together.

I've been experimenting with some simpler repros (the existing ones in "hoisting.mlir" are a bit too dense), and honestly, I'm seeing some confusing results. I’ve added a repro below with comments inline.

Even with the original code, I'm puzzled: reading/writing %mem is hoisted, but reading/writing %sv is not - I would have expected neither to be hoisted:

func.func @example(
    %mem: memref<?x?xf32>,
    %lb : index, %ub : index, %step: index, %in: vector<1xf32>) {
  %c0 = arith.constant 0 : index
  %cst = arith.constant 0.0 : f32
  %sv = memref.subview %mem[0, 0][1, 1][1, 1] : memref<?x?xf32> to memref<1x1xf32, strided<[?, 1]>>


  scf.for %i = %lb to %ub step %step {
      %r0 = vector.transfer_read %mem[%c0, %c0], %cst: memref<?x?xf32>, vector<1xf32>
      %r1 = vector.transfer_read %sv[%c0, %c0], %cst: memref<1x1xf32, strided<[?, 1]>>, vector<1xf32>

      %u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
      %u1 = "some_use"(%r1) : (vector<1xf32>) -> vector<1xf32>

      vector.transfer_write %u0, %mem[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
      vector.transfer_write %u1, %sv[%c0, %c0] : vector<1xf32>, memref<1x1xf32, strided<[?, 1]>>

      // NOTE: Using %in - some input argument. 
      // 1. Hoisting still happens with this uncommented
      // vector.transfer_write %in, %sv[%c0, %c0] : vector<1xf32>, memref<1x1xf32, strided<[?, 1]>>
      // 2. Hoisting does not happen with this uncommented
      // vector.transfer_write %in, %mem[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
  }
  "unrelated_use"(%mem) : (memref<?x?xf32>) -> ()
  return
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["func.func"]} in %arg1
      : (!transform.any_op) -> !transform.any_op
    transform.structured.hoist_redundant_vector_transfers %0
      : (!transform.any_op) -> !transform.any_op
    transform.yield
  }
}

And this is without even introducing memref.assume_alignment. So yes - maybe it's just me being slow on a Friday, but I think something isn't adding up here.


(...) ffb9bbf breaks a set of tests in our downstream project. The reason is that it disables hoisting on memrefs when memref.assume_alignment is involved. It means that we'll never be able to hoist vectors on memrefs without a fix.

Totally agree - it feels like the consequences of that change weren’t fully evaluated before merging.

In short:

  • Before that change: no alias was created.
  • After: an alias is created — and that introduces knock-on effects, just like we’re seeing here.

The author mentioned that it's fine to remove ViewLikeOpInterface from the op, but I agree with you that it creates an alias. So maybe we want to keep the interface? #139521 (comment)

To me the interface makes sense, yes. At least after adding a return value.

The intention of memref.assume_alignment is allowing the compiler to generate more efficient code based on the alignment assumption; i

Right —-but aren't we missing some kind of explicit “assume” mechanism here?

Diego raised something similar (though in the context of vector masking):

That’s a topic I’m actively exploring, though it'll take time - and it may or may not translate cleanly to the memref case.


My overall feeling is this:

  • The change in PR #139521 seems to have introduced a real perf regression. We should consider reverting and re-evaluating the design.
  • Before reverting, though, it would be good to have minimal repros. I’ve tried creating some (see above), but I’m still a bit stuck.

In general, I don't see an obvious fix - and it feels like memref.assume_alignment shouldn’t cause this kind of trouble.

For context, this isn’t really my area of deep expertise, and I’ve only skimmed through the original PR. That said, happy to keep digging and discussing!

@banach-space
Copy link
Contributor

I've posted two PRs to help move this forward:

Let me know if anything here seems off - hopefully these changes make sense and help clarify the path forward.

@xiangzh1 xiangzh1 self-requested a review June 23, 2025 00:43
@@ -313,11 +315,13 @@ void mlir::linalg::hoistRedundantVectorTransfers(Operation *root,
return WalkResult::advance();

auto *source = transferRead.getBase().getDefiningOp();
if (source && isa_and_nonnull<ViewLikeOpInterface>(source))
if (source && isa_and_nonnull<ViewLikeOpInterface>(source) &&
!isa<memref::AssumeAlignmentOp>(source))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mem souce of AssumeAlignmentOp maybe also an viewlike operation, we should also check AssumeAlignmentOp’s memref operand. I give an similar PR at 144843, (and also a probem case in it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do so. What I meant in #144809 (comment) is the similar idea. I did not make the change because we don't have an agreement yet.

@xiangzh1
Copy link
Contributor

xiangzh1 commented Jun 23, 2025

I begin to understand why we skip viewlikeOp before ---- to skip mem alias.

But it seems the line 340 and 347 are tring to slove this alias problem:
Line340
if (!vector::isDisjointTransferSet( cast<VectorTransferOpInterface>(*transferWrite), cast<VectorTransferOpInterface>(*transferWriteUse), /*testDynamicValueUsingBounds=*/true))
maybe we should enhance the isDisjointTransferSet for which have viewlikeOp source.

@hanhanW
Copy link
Contributor Author

hanhanW commented Jun 24, 2025

Even with the original code, I'm puzzled: reading/writing %mem is hoisted, but reading/writing %sv is not - I would have expected neither to be hoisted:

func.func @example(
    %mem: memref<?x?xf32>,
    %lb : index, %ub : index, %step: index, %in: vector<1xf32>) {
  %c0 = arith.constant 0 : index
  %cst = arith.constant 0.0 : f32
  %sv = memref.subview %mem[0, 0][1, 1][1, 1] : memref<?x?xf32> to memref<1x1xf32, strided<[?, 1]>>


  scf.for %i = %lb to %ub step %step {
      %r0 = vector.transfer_read %mem[%c0, %c0], %cst: memref<?x?xf32>, vector<1xf32>
      %r1 = vector.transfer_read %sv[%c0, %c0], %cst: memref<1x1xf32, strided<[?, 1]>>, vector<1xf32>

      %u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
      %u1 = "some_use"(%r1) : (vector<1xf32>) -> vector<1xf32>

      vector.transfer_write %u0, %mem[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
      vector.transfer_write %u1, %sv[%c0, %c0] : vector<1xf32>, memref<1x1xf32, strided<[?, 1]>>

      // NOTE: Using %in - some input argument. 
      // 1. Hoisting still happens with this uncommented
      // vector.transfer_write %in, %sv[%c0, %c0] : vector<1xf32>, memref<1x1xf32, strided<[?, 1]>>
      // 2. Hoisting does not happen with this uncommented
      // vector.transfer_write %in, %mem[%c0, %c0] : vector<1xf32>, memref<?x?xf32>
  }
  "unrelated_use"(%mem) : (memref<?x?xf32>) -> ()
  return
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["func.func"]} in %arg1
      : (!transform.any_op) -> !transform.any_op
    transform.structured.hoist_redundant_vector_transfers %0
      : (!transform.any_op) -> !transform.any_op
    transform.yield
  }
}

And this is without even introducing memref.assume_alignment. So yes - maybe it's just me being slow on a Friday, but I think something isn't adding up here.

It looks like a bug. It's been a long time since the last time I looked at the code, so I'm quite outdated in this area.

The intention of memref.assume_alignment is allowing the compiler to generate more efficient code based on the alignment assumption; i

Right —-but aren't we missing some kind of explicit “assume” mechanism here?

Diego raised something similar (though in the context of vector masking):

That’s a topic I’m actively exploring, though it'll take time - and it may or may not translate cleanly to the memref case.

I can't translate the idea to the memref case easily. I think they are slightly different. My first thought about your topic is more about integer range analysis; if you go with vector.assume, you may hit issues like this when people make it have return values.

My overall feeling is this:

  • The change in PR #139521 seems to have introduced a real perf regression. We should consider reverting and re-evaluating the design.

Reverting the change could be a pain, since I already landed some improvements in upstream: #142425 #142358

and downstream projects: iree-org/iree#20913 iree-org/iree#20984 iree-org/iree#20973 iree-org/iree#20925

  • Before reverting, though, it would be good to have minimal repros. I’ve tried creating some (see above), but I’m still a bit stuck.

My repro is here, just in case if it is helpful: iree-org/iree#20912 (comment) It is not IREE specific, you can run mlir-opt -transform-interpreter -canonicalize --split-input-file --allow-unregistered-dialect repro.mlir.

In general, I don't see an obvious fix - and it feels like memref.assume_alignment shouldn’t cause this kind of trouble.

For context, this isn’t really my area of deep expertise, and I’ve only skimmed through the original PR. That said, happy to keep digging and discussing!

So many thanks for spending your time here! I added you as a reviewer because the transformation is not updated for a while, and you're one of the most recent contributors in terms of this file. I don't have an obvious fix either. Can we add an option for now? E.g., we pass bool allowAssumeAlignmentAlias and we only perform the checks when it is true. The default value is false, so it'd work as expected.

But it seems the line 340 and 347 are tring to slove this alias problem:
Line340
if (!vector::isDisjointTransferSet( cast(*transferWrite), cast(*transferWriteUse), /testDynamicValueUsingBounds=/true))
maybe we should enhance the isDisjointTransferSet for which have viewlikeOp source.

I think this is a little different. IIUC, it is mainly for checking whether there are overlapping between read and writes? It may be the place where we inject alias analysis checks though. We may use simple analysis that only allows assume_alignment as a start, and improve it when there are other needs.


So I think there are two approaches:

  1. Introduce an option and only apply the check when it is on.
  2. Inject basic alias and use it in isDisjointTransferSet checks -- I don't have direct prototype, so I'm not sure if it solves all the problem or not.

I'd like to go with (1) and can help with (2), like providing guidance. Note that the issue has been there for ~1 month, and I only work on these fixes in my 20% time. It'd take more time if we want to go with (2) directly, while I'd like to re-enable tests in our downstream projects. Our cooperative matrix support on SPIR-V side is disabled for a while if assume_alignment feature is requested. It is only enabled without assume_alignment feature.

@xiangzh1
Copy link
Contributor

xiangzh1 commented Jun 25, 2025

I suggest to go with (2) directly, Inject basic alias and use it in isDisjointTransferSet checks (or add other check function).
Adding more trivial options is not an good idea, and what's more, this is an very small/detailed point, which is not worth to add a option in mlir. Even adding an option, most users will not understand it if they not deeply research on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants