[mlir][Bufferization] Do not have read semantics for destination of `tensor.parallel_insert_slice`. #134169

MaheshRavishankar · 2025-04-02T22:50:03Z

tensor.insert_slice needs to have read semantics on its destination operand. Since it has a return value, its semantics are

Copy dest to result
Copy source to subview of destination.

tensor.parallel_insert_slice though has no result. So it does not need to have read semantics. The op description here also says that it is expected to lower to a memref.subview, that does not have read semantics on the destination (its just a view).

This patch drops the read semantics for destination of tensor.parallel_insert_slice but also makes the shared_outs operands of scf.forall have read semantics. Earlier it would rely indirectly on read semantics of destination operand of tensor.parallel_insert_slice to propagate the read semantics for shared_outs. Now that is specified more directly.

Fixes #133964

…tensor.parallel_insert_slice`. Signed-off-by: MaheshRavishankar <[email protected]>

llvmbot · 2025-04-02T22:50:37Z

@llvm/pr-subscribers-mlir-scf

@llvm/pr-subscribers-mlir

Author: None (MaheshRavishankar)

Changes

tensor.insert_slice needs to have read semantics on its destination operand. Since it has a return value, its semantics are

Copy dest to result
Copy source to subview of destination.

tensor.parallel_insert_slice though has no result. So it does not need to have read semantics. The op description here also says that it is expected to lower to a memref.subview, that does not have read semantics on the destination (its just a view).

This patch drops the read semantics for destination of tensor.parallel_insert_slice but also makes the shared_outs operands of scf.forall have read semantics. Earlier it would rely indirectly on read semantics of destination operand of tensor.parallel_insert_slice to propagate the read semantics for shared_outs. Now that is specified more directly.

Full diff: https://github.com/llvm/llvm-project/pull/134169.diff

3 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp (+5-11)
(modified) mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp (+1-2)
(modified) mlir/test/Dialect/SCF/one-shot-bufferize.mlir (+35)

diff --git a/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp b/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
index f48d2a2df9c3c..f83435521e708 100644
--- a/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
@@ -1207,17 +1207,11 @@ struct ForallOpInterface
                                                     ForallOp> {
   bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
                               const AnalysisState &state) const {
-    auto forallOp = cast<ForallOp>(op);
-
-    // If the loop has zero iterations, the results of the op are their
-    // corresponding shared_outs, meaning that the shared_outs bufferize to a
-    // read.
-    if (mayHaveZeroIterations(forallOp))
-      return true;
-
-    // scf::ForallOp alone doesn't bufferize to a memory read, one of the
-    // uses of its matching bbArg may.
-    return state.isValueRead(forallOp.getTiedBlockArgument(&opOperand));
+    // All tensor operands to `scf.forall` are `shared_outs` and all
+    // shared outs are assumed to be read by the loop. This does not
+    // account for the case where the entire value is over-written,
+    // but being conservative here.
+    return true;
   }
 
   bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
diff --git a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
index 4ac6eca586961..31014172a9555 100644
--- a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
@@ -930,8 +930,7 @@ struct ParallelInsertSliceOpInterface
 
   bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
                               const AnalysisState &state) const {
-    return insertSliceOpRequiresRead(cast<tensor::ParallelInsertSliceOp>(op),
-                                     opOperand);
+    return opOperand == cast<ParallelInsertSliceOp>(op).getSourceMutable();
   }
 
   bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
diff --git a/mlir/test/Dialect/SCF/one-shot-bufferize.mlir b/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
index bb9f7dfdba83f..a1067ec3ba05f 100644
--- a/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
@@ -946,3 +946,38 @@ func.func @index_switch(%pred: index, %b: tensor<5xf32>, %c: tensor<5xf32>) -> t
   // CHECK: return %[[r]]
   return %0 : tensor<5xf32>
 }
+
+// -----
+
+// See Issue https://github.com/llvm/llvm-project/issues/133964 . Checks that
+// tensor.parallel_insert_slice dest operand does not have read semantics.
+func.func @check_scfforall_inplace_bufferizer(%arg0 : tensor<?x?xf32>,
+    %arg1 : tensor<?x?xf32>,
+    %arg2 : tensor<?xf32> {bufferization.writable = true}) ->  tensor<?xf32> {
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %d0 = tensor.dim %arg2, %c0 : tensor<?xf32>
+  %d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
+  %0 = scf.forall (%arg3) in (%c1) shared_outs(%arg4 = %arg2) -> (tensor<?xf32>) {
+    %1 = tensor.extract_slice %arg0[0, 0][%d0, %d1][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
+    %2 = tensor.extract_slice %arg1[0, 0][%d0, %d1][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
+    %3 = linalg.generic {
+        indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
+                         affine_map<(d0, d1) -> (d0, d1)>,
+                         affine_map<(d0, d1) -> (d0)>],
+        iterator_types = ["parallel", "reduction"]}
+        ins(%1, %2 : tensor<?x?xf32>, tensor<?x?xf32>)
+        outs(%arg4 : tensor<?xf32>) {
+      ^bb0(%b0 : f32, %b1: f32, %b2 : f32):
+        %4 = arith.mulf %b0, %b1 : f32
+        %5 = arith.addf %4, %b2 : f32
+        linalg.yield %5 : f32
+    } -> tensor<?xf32>
+    scf.forall.in_parallel {
+      tensor.parallel_insert_slice %3 into %arg4[0] [%d0] [1] : tensor<?xf32> into tensor<?xf32>
+    }
+  }
+  return %0 : tensor<?xf32>
+}
+// CHECK-LABEL: func @check_scfforall_inplace_bufferizer
+//   CHECK-NOT:   memref.alloc

llvmbot · 2025-04-02T22:50:37Z

@llvm/pr-subscribers-mlir-tensor

Author: None (MaheshRavishankar)

Changes

tensor.insert_slice needs to have read semantics on its destination operand. Since it has a return value, its semantics are

Copy dest to result
Copy source to subview of destination.

tensor.parallel_insert_slice though has no result. So it does not need to have read semantics. The op description here also says that it is expected to lower to a memref.subview, that does not have read semantics on the destination (its just a view).

This patch drops the read semantics for destination of tensor.parallel_insert_slice but also makes the shared_outs operands of scf.forall have read semantics. Earlier it would rely indirectly on read semantics of destination operand of tensor.parallel_insert_slice to propagate the read semantics for shared_outs. Now that is specified more directly.

Full diff: https://github.com/llvm/llvm-project/pull/134169.diff

3 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp (+5-11)
(modified) mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp (+1-2)
(modified) mlir/test/Dialect/SCF/one-shot-bufferize.mlir (+35)

diff --git a/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp b/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
index f48d2a2df9c3c..f83435521e708 100644
--- a/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
@@ -1207,17 +1207,11 @@ struct ForallOpInterface
                                                     ForallOp> {
   bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
                               const AnalysisState &state) const {
-    auto forallOp = cast<ForallOp>(op);
-
-    // If the loop has zero iterations, the results of the op are their
-    // corresponding shared_outs, meaning that the shared_outs bufferize to a
-    // read.
-    if (mayHaveZeroIterations(forallOp))
-      return true;
-
-    // scf::ForallOp alone doesn't bufferize to a memory read, one of the
-    // uses of its matching bbArg may.
-    return state.isValueRead(forallOp.getTiedBlockArgument(&opOperand));
+    // All tensor operands to `scf.forall` are `shared_outs` and all
+    // shared outs are assumed to be read by the loop. This does not
+    // account for the case where the entire value is over-written,
+    // but being conservative here.
+    return true;
   }
 
   bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
diff --git a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
index 4ac6eca586961..31014172a9555 100644
--- a/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
@@ -930,8 +930,7 @@ struct ParallelInsertSliceOpInterface
 
   bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
                               const AnalysisState &state) const {
-    return insertSliceOpRequiresRead(cast<tensor::ParallelInsertSliceOp>(op),
-                                     opOperand);
+    return opOperand == cast<ParallelInsertSliceOp>(op).getSourceMutable();
   }
 
   bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
diff --git a/mlir/test/Dialect/SCF/one-shot-bufferize.mlir b/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
index bb9f7dfdba83f..a1067ec3ba05f 100644
--- a/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
+++ b/mlir/test/Dialect/SCF/one-shot-bufferize.mlir
@@ -946,3 +946,38 @@ func.func @index_switch(%pred: index, %b: tensor<5xf32>, %c: tensor<5xf32>) -> t
   // CHECK: return %[[r]]
   return %0 : tensor<5xf32>
 }
+
+// -----
+
+// See Issue https://github.com/llvm/llvm-project/issues/133964 . Checks that
+// tensor.parallel_insert_slice dest operand does not have read semantics.
+func.func @check_scfforall_inplace_bufferizer(%arg0 : tensor<?x?xf32>,
+    %arg1 : tensor<?x?xf32>,
+    %arg2 : tensor<?xf32> {bufferization.writable = true}) ->  tensor<?xf32> {
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %d0 = tensor.dim %arg2, %c0 : tensor<?xf32>
+  %d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
+  %0 = scf.forall (%arg3) in (%c1) shared_outs(%arg4 = %arg2) -> (tensor<?xf32>) {
+    %1 = tensor.extract_slice %arg0[0, 0][%d0, %d1][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
+    %2 = tensor.extract_slice %arg1[0, 0][%d0, %d1][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
+    %3 = linalg.generic {
+        indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
+                         affine_map<(d0, d1) -> (d0, d1)>,
+                         affine_map<(d0, d1) -> (d0)>],
+        iterator_types = ["parallel", "reduction"]}
+        ins(%1, %2 : tensor<?x?xf32>, tensor<?x?xf32>)
+        outs(%arg4 : tensor<?xf32>) {
+      ^bb0(%b0 : f32, %b1: f32, %b2 : f32):
+        %4 = arith.mulf %b0, %b1 : f32
+        %5 = arith.addf %4, %b2 : f32
+        linalg.yield %5 : f32
+    } -> tensor<?xf32>
+    scf.forall.in_parallel {
+      tensor.parallel_insert_slice %3 into %arg4[0] [%d0] [1] : tensor<?xf32> into tensor<?xf32>
+    }
+  }
+  return %0 : tensor<?xf32>
+}
+// CHECK-LABEL: func @check_scfforall_inplace_bufferizer
+//   CHECK-NOT:   memref.alloc

Signed-off-by: MaheshRavishankar <[email protected]>

The major changes come from the following PRs: * llvm/llvm-project#131226 * llvm/llvm-project#134264 * llvm/llvm-project#134169 * llvm/llvm-project#134517 --------- Signed-off-by: MaheshRavishankar <[email protected]> Signed-off-by: Jakub Kuderski <[email protected]> Co-authored-by: MaheshRavishankar <[email protected]>

[mlir][Bufferization] Do not have read semantics for destination of `…

e167d94

…tensor.parallel_insert_slice`. Signed-off-by: MaheshRavishankar <[email protected]>

MaheshRavishankar requested review from hanhanW, nicolasvasilache and matthias-springer as code owners April 2, 2025 22:50

llvmbot added mlir mlir:tensor mlir:scf labels Apr 2, 2025

MaheshRavishankar added 2 commits April 2, 2025 15:52

Make shared outs of scf.forall have read semantics.

2cd1361

Signed-off-by: MaheshRavishankar <[email protected]>

Add test.

c61a1c2

Signed-off-by: MaheshRavishankar <[email protected]>

MaheshRavishankar force-pushed the users/MaheshRavishankar/fixBufferizationIssue branch from b2e739e to c61a1c2 Compare April 2, 2025 22:53

matthias-springer approved these changes Apr 2, 2025

View reviewed changes

MaheshRavishankar added a commit to MaheshRavishankar/iree that referenced this pull request Apr 3, 2025

Pick up llvm/llvm-project#134169

0a186cf

Signed-off-by: MaheshRavishankar <[email protected]>

MaheshRavishankar mentioned this pull request Apr 3, 2025

[Do Not Submit] Fix for https://github.com/iree-org/iree/issues/20400 iree-org/iree#20440

Closed

MaheshRavishankar merged commit a1bc979 into llvm:main Apr 3, 2025
11 checks passed

kuhar mentioned this pull request Apr 8, 2025

Integrate llvm-project@bafa2f4442bc iree-org/iree#20492

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][Bufferization] Do not have read semantics for destination of `tensor.parallel_insert_slice`. #134169

[mlir][Bufferization] Do not have read semantics for destination of `tensor.parallel_insert_slice`. #134169

Uh oh!

MaheshRavishankar commented Apr 2, 2025 •

edited

Loading

Uh oh!

llvmbot commented Apr 2, 2025 •

edited

Loading

Uh oh!

llvmbot commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

[mlir][Bufferization] Do not have read semantics for destination of tensor.parallel_insert_slice. #134169

[mlir][Bufferization] Do not have read semantics for destination of tensor.parallel_insert_slice. #134169

Uh oh!

Conversation

MaheshRavishankar commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

[mlir][Bufferization] Do not have read semantics for destination of `tensor.parallel_insert_slice`. #134169

[mlir][Bufferization] Do not have read semantics for destination of `tensor.parallel_insert_slice`. #134169

MaheshRavishankar commented Apr 2, 2025 •

edited

Loading

llvmbot commented Apr 2, 2025 •

edited

Loading