[mlir][vector] Relax the requirements on broadcast dims #99341

banach-space · 2024-07-17T15:30:06Z

NOTE: This is a follow-up for #97049 in which the in_bounds attribute
was made mandatory.

This PR updates the semantics of the in_bounds attribute so that
broadcast dimensions are no longer required to be "in bounds".
Specifically, these xfer_read/xfer_write Ops become valid after this
change:

  %read = vector.transfer_read %A[%base1, %base2], %pad
      {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
      {permutation_map = affine_map<(d0, d1) -> (0)>}
      : memref<?x?xf32>, vector<9xf32>

  vector.transfer_write %vec, %A[%base1, %base2],
      {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
      {permutation_map = affine_map<(d0, d1) -> (0)>}
      : vector<9xf32>, memref<?x?xf32>

Note that the value false merely means "may run out-of-bounds", i.e.,
the corresponding access can still be "in bounds". In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to true if all non-broadcast dims
are marked as "in bounds".

Note that this PR doesn't change any of the lowerings. The changes in
"SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple
reverts of recent changes in #97049. Those were only meant to facilitate
making in_bounds mandatory and to work around the extra requirements
for broadcast dims (those requirements ere removed in this PR). All
changes in tests are also reverts of changes from #97049.

For context, here's a PR in which "broadcast" dims where forced to
always be "in-bounds":

https://reviews.llvm.org/D102566

(*) See foldTransferInBoundsAttribute.

llvmbot · 2024-07-17T15:30:39Z

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-linalg
@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir-affine

Author: Andrzej Warzyński (banach-space)

Changes

NOTE: This is a follow-up for #97049 in which the in_bounds attribute
was made mandatory.

This PR updates the semantics of the in_bounds attribute so that
broadcast dimensions are no longer required to be "in bounds".
Specifically, these xfer_read/xfer_write Ops become valid after this
change:

  %read = vector.transfer_read %A[%base1, %base2], %pad
      {in_bounds = [false], permutation_map = affine_map&lt;(d0, d1) -&gt; (0)&gt;}
      {permutation_map = affine_map&lt;(d0, d1) -&gt; (0)&gt;}
      : memref&lt;?x?xf32&gt;, vector&lt;9xf32&gt;

  vector.transfer_write %vec, %A[%base1, %base2],
      {in_bounds = [false], permutation_map = affine_map&lt;(d0, d1) -&gt; (0)&gt;}
      {permutation_map = affine_map&lt;(d0, d1) -&gt; (0)&gt;}
      : vector&lt;9xf32&gt;, memref&lt;?x?xf32&gt;

Note that the value false merely means "may run out-of-bounds", i.e.,
the corresponding access can still be "in bounds". In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to true. Indeed, such dims would
never be out-of-bounds in practice. Still, there's no need to require
Op "users" to always set the corresponding in_bounds flag to `true.

Note that this PR doesn't change any of the lowerings. The changes in
"SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple
reverts of recent changes in #97049. Those were only meant to facilitate
making in_bounds mandatory and to work around the extra requirements
for broadcast dims (those requirements ere removed in this PR). All
changes in tests are also reverts of changes from #97049.

For context, here's a PR in which "broadcast" dims where forced to
always be "in-bounds":

https://reviews.llvm.org/D102566

(*) See foldTransferInBoundsAttribute.

Patch is 27.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99341.diff

20 Files Affected:

(modified) mlir/include/mlir/Dialect/Vector/IR/VectorOps.td (+12-12)
(modified) mlir/include/mlir/IR/AffineMap.h (-8)
(modified) mlir/include/mlir/Interfaces/VectorInterfaces.td (+2-5)
(modified) mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp (+1-12)
(modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+1-10)
(modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+13-10)
(modified) mlir/lib/IR/AffineMap.cpp (-13)
(modified) mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir (+3-3)
(modified) mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir (+3-3)
(modified) mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir (+2-2)
(modified) mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir (+1-1)
(modified) mlir/test/Dialect/Linalg/hoisting.mlir (+1-1)
(modified) mlir/test/Dialect/Linalg/vectorization.mlir (+1-1)
(modified) mlir/test/Dialect/Vector/invalid.mlir (-10)
(modified) mlir/test/Dialect/Vector/ops.mlir (+1-1)
(modified) mlir/test/Dialect/Vector/vector-transfer-permutation-lowering.mlir (+1-1)
(modified) mlir/test/Dialect/Vector/vector-transfer-unroll.mlir (+2-2)
(modified) mlir/test/Integration/Dialect/Vector/CPU/transfer-read-1d.mlir (+1-1)
(modified) mlir/test/Integration/Dialect/Vector/CPU/transfer-read-2d.mlir (+3-3)
(modified) mlir/test/Integration/Dialect/Vector/CPU/transfer-read-3d.mlir (+2-2)

diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 2019eb5a9fd7f..f8345c77b3eed 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -1405,12 +1405,12 @@ def Vector_TransferReadOp :
     specifies if the transfer is guaranteed to be within the source bounds. If
     set to "false", accesses (including the starting point) may run
     out-of-bounds along the respective vector dimension as the index increases.
-    Non-vector and broadcast dimensions *must* always be in-bounds. The
-    `in_bounds` array length has to be equal to the vector rank. This attribute
-    has a default value: `false` (i.e. "out-of-bounds"). When skipped in the
-    textual IR, the default value is assumed. Similarly, the OP printer will
-    omit this attribute when all dimensions are out-of-bounds (i.e. the default
-    value is used).
+    Non-vector dimensions *must* always be in-bounds. The `in_bounds` array
+    length has to be equal to the vector rank. This attribute has a default
+    value: `false` (i.e. "out-of-bounds"). When skipped in the textual IR, the
+    default value is assumed. Similarly, the OP printer will omit this
+    attribute when all dimensions are out-of-bounds (i.e. the default value is
+    used).
 
     A `vector.transfer_read` can be lowered to a simple load if all dimensions
     are specified to be within bounds and no `mask` was specified.
@@ -1650,12 +1650,12 @@ def Vector_TransferWriteOp :
     specifies if the transfer is guaranteed to be within the source bounds. If
     set to "false", accesses (including the starting point) may run
     out-of-bounds along the respective vector dimension as the index increases.
-    Non-vector and broadcast dimensions *must* always be in-bounds. The
-    `in_bounds` array length has to be equal to the vector rank. This attribute
-    has a default value: `false` (i.e. "out-of-bounds"). When skipped in the
-    textual IR, the default value is assumed. Similarly, the OP printer will
-    omit this attribute when all dimensions are out-of-bounds (i.e. the default
-    value is used).
+    Non-vector dimensions *must* always be in-bounds. The `in_bounds` array
+    length has to be equal to the vector rank. This attribute has a default
+    value: `false` (i.e. "out-of-bounds"). When skipped in the textual IR, the
+    default value is assumed. Similarly, the OP printer will omit this
+    attribute when all dimensions are out-of-bounds (i.e. the default value is
+    used).
 
      A `vector.transfer_write` can be lowered to a simple store if all
      dimensions are specified to be within bounds and no `mask` was specified.
diff --git a/mlir/include/mlir/IR/AffineMap.h b/mlir/include/mlir/IR/AffineMap.h
index 676da6d176497..264c1c8308e78 100644
--- a/mlir/include/mlir/IR/AffineMap.h
+++ b/mlir/include/mlir/IR/AffineMap.h
@@ -146,14 +146,6 @@ class AffineMap {
   /// affine map (d0, ..., dn) -> (dp, ..., dn) on the most minor dimensions.
   bool isMinorIdentity() const;
 
-  /// Returns the list of broadcast dimensions (i.e. dims indicated by value 0
-  /// in the result).
-  /// Ex:
-  ///  * (d0, d1, d2) -> (0, d1) gives [0]
-  ///  * (d0, d1, d2) -> (d2, d1) gives []
-  ///  * (d0, d1, d2, d4) -> (d0, 0, d1, 0) gives [1, 3]
-  SmallVector<unsigned> getBroadcastDims() const;
-
   /// Returns true if this affine map is a minor identity up to broadcasted
   /// dimensions which are indicated by value 0 in the result. If
   /// `broadcastedDims` is not null, it will be populated with the indices of
diff --git a/mlir/include/mlir/Interfaces/VectorInterfaces.td b/mlir/include/mlir/Interfaces/VectorInterfaces.td
index 7ea62c2ae2ab1..be939bad14b7b 100644
--- a/mlir/include/mlir/Interfaces/VectorInterfaces.td
+++ b/mlir/include/mlir/Interfaces/VectorInterfaces.td
@@ -234,12 +234,9 @@ def VectorTransferOpInterface : OpInterface<"VectorTransferOpInterface"> {
       return constExpr && constExpr.getValue() == 0;
     }
 
-    /// Return "true" if the vector transfer dimension `dim` is in-bounds. Also
-    /// return "true" if the dimension is a broadcast dimension. Return "false"
-    /// otherwise.
+    /// Return "true" if the vector transfer dimension `dim` is in-bounds.
+    /// Return "false" otherwise.
     bool isDimInBounds(unsigned dim) {
-      if ($_op.isBroadcastDim(dim))
-        return true;
       auto inBounds = $_op.getInBounds();
       return ::llvm::cast<::mlir::BoolAttr>(inBounds[dim]).getValue();
     }
diff --git a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
index 6bb8dfecba0ec..71e9648a5e00f 100644
--- a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
+++ b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
@@ -1223,19 +1223,8 @@ static Operation *vectorizeAffineLoad(AffineLoadOp loadOp,
   LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ permutationMap: ");
   LLVM_DEBUG(permutationMap.print(dbgs()));
 
-  // Make sure that the in_bounds attribute corresponding to a broadcast dim
-  // is set to `true` - that's required by the xfer Op.
-  // FIXME: We're not veryfying whether the corresponding access is in bounds.
-  // TODO: Use masking instead.
-  SmallVector<unsigned> broadcastedDims = permutationMap.getBroadcastDims();
-  SmallVector<bool> inBounds(vectorType.getRank(), false);
-
-  for (auto idx : broadcastedDims)
-    inBounds[idx] = true;
-
   auto transfer = state.builder.create<vector::TransferReadOp>(
-      loadOp.getLoc(), vectorType, loadOp.getMemRef(), indices, permutationMap,
-      inBounds);
+      loadOp.getLoc(), vectorType, loadOp.getMemRef(), indices, permutationMap);
 
   // Register replacement for future uses in the scope.
   state.registerOpVectorReplacement(loadOp, transfer);
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 68ee915cca3f4..4b07cc3e9186c 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -1343,17 +1343,8 @@ vectorizeAsLinalgGeneric(RewriterBase &rewriter, VectorizationState &state,
 
     SmallVector<Value> indices(linalgOp.getShape(opOperand).size(), zero);
 
-    // Make sure that the in_bounds attribute corresponding to a broadcast dim
-    // is `true`
-    SmallVector<unsigned> broadcastedDims = readMap.getBroadcastDims();
-    SmallVector<bool> inBounds(readType.getRank(), false);
-
-    for (auto idx : broadcastedDims)
-      inBounds[idx] = true;
-
     Operation *read = rewriter.create<vector::TransferReadOp>(
-        loc, readType, opOperand->get(), indices, readMap,
-        ArrayRef<bool>(inBounds));
+        loc, readType, opOperand->get(), indices, readMap);
     read = state.maskOperation(rewriter, read, linalgOp, maskingMap);
     Value readValue = read->getResult(0);
 
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index df3a59ed80ad4..cb59cc333e936 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -3958,10 +3958,6 @@ verifyTransferOp(VectorTransferOpInterface op, ShapedType shapedType,
                            "as permutation_map results: ")
            << AffineMapAttr::get(permutationMap)
            << " vs inBounds of size: " << inBounds.size();
-  for (unsigned int i = 0, e = permutationMap.getNumResults(); i < e; ++i)
-    if (isa<AffineConstantExpr>(permutationMap.getResult(i)) &&
-        !llvm::cast<BoolAttr>(inBounds.getValue()[i]).getValue())
-      return op->emitOpError("requires broadcast dimensions to be in-bounds");
 
   return success();
 }
@@ -4150,17 +4146,24 @@ static LogicalResult foldTransferInBoundsAttribute(TransferOp op) {
   SmallVector<bool, 4> newInBounds;
   newInBounds.reserve(op.getTransferRank());
   for (unsigned i = 0; i < op.getTransferRank(); ++i) {
-    // Already marked as in-bounds, nothing to see here.
+    // 1. Already marked as in-bounds, nothing to see here.
     if (op.isDimInBounds(i)) {
       newInBounds.push_back(true);
       continue;
     }
-    // Currently out-of-bounds, check whether we can statically determine it is
-    // inBounds.
+    // 2. Currently out-of-bounds, check whether we can statically determine it
+    // is inBounds.
+    bool inBounds = false;
     auto dimExpr = dyn_cast<AffineDimExpr>(permutationMap.getResult(i));
-    assert(dimExpr && "Broadcast dims must be in-bounds");
-    auto inBounds =
-        isInBounds(op, /*resultIdx=*/i, /*indicesIdx=*/dimExpr.getPosition());
+    if (dimExpr) {
+      // 2.a Non-broadcast dim
+      inBounds = isInBounds(op, /*resultIdx=*/i,
+                            /*indicesIdx=*/dimExpr.getPosition());
+    } else {
+      // 2.b Broadcast dim
+      inBounds = true;
+    }
+
     newInBounds.push_back(inBounds);
     // We commit the pattern if it is "more inbounds".
     changed |= inBounds;
diff --git a/mlir/lib/IR/AffineMap.cpp b/mlir/lib/IR/AffineMap.cpp
index 859fb8ebc10e8..62f595299afe2 100644
--- a/mlir/lib/IR/AffineMap.cpp
+++ b/mlir/lib/IR/AffineMap.cpp
@@ -158,19 +158,6 @@ bool AffineMap::isMinorIdentity() const {
              getMinorIdentityMap(getNumDims(), getNumResults(), getContext());
 }
 
-SmallVector<unsigned> AffineMap::getBroadcastDims() const {
-  SmallVector<unsigned> broadcastedDims;
-  for (const auto &[resIdx, expr] : llvm::enumerate(getResults())) {
-    if (auto constExpr = dyn_cast<AffineConstantExpr>(expr)) {
-      if (constExpr.getValue() != 0)
-        continue;
-      broadcastedDims.push_back(resIdx);
-    }
-  }
-
-  return broadcastedDims;
-}
-
 /// Returns true if this affine map is a minor identity up to broadcasted
 /// dimensions which are indicated by value 0 in the result.
 bool AffineMap::isMinorIdentityWithBroadcasting(
diff --git a/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir b/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
index 3f4e70a6835af..e1babdd2f1f63 100644
--- a/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
+++ b/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
@@ -133,7 +133,7 @@ func.func @materialize_read(%M: index, %N: index, %O: index, %P: index) {
     affine.for %i1 = 0 to %N {
       affine.for %i2 = 0 to %O {
         affine.for %i3 = 0 to %P step 5 {
-          %f = vector.transfer_read %A[%i0, %i1, %i2, %i3], %f0 {in_bounds = [false, true, false], permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, 0, d0)>} : memref<?x?x?x?xf32>, vector<5x4x3xf32>
+          %f = vector.transfer_read %A[%i0, %i1, %i2, %i3], %f0 {permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, 0, d0)>} : memref<?x?x?x?xf32>, vector<5x4x3xf32>
           // Add a dummy use to prevent dead code elimination from removing
           // transfer read ops.
           "dummy_use"(%f) : (vector<5x4x3xf32>) -> ()
@@ -507,7 +507,7 @@ func.func @transfer_read_with_tensor(%arg: tensor<f32>) -> vector<1xf32> {
     // CHECK-NEXT: %[[RESULT:.*]] = vector.broadcast %[[EXTRACTED]] : f32 to vector<1xf32>
     // CHECK-NEXT: return %[[RESULT]] : vector<1xf32>
     %f0 = arith.constant 0.0 : f32
-    %0 = vector.transfer_read %arg[], %f0 {in_bounds = [true], permutation_map = affine_map<()->(0)>} :
+    %0 = vector.transfer_read %arg[], %f0 {permutation_map = affine_map<()->(0)>} :
       tensor<f32>, vector<1xf32>
     return %0: vector<1xf32>
 }
@@ -746,7 +746,7 @@ func.func @cannot_lower_transfer_read_with_leading_scalable(%arg0: memref<?x4xf3
 func.func @does_not_crash_on_unpack_one_dim(%subview:  memref<1x1x1x1xi32>, %mask: vector<1x1xi1>) -> vector<1x1x1x1xi32> {
   %c0 = arith.constant 0 : index
   %c0_i32 = arith.constant 0 : i32
-  %3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %mask {in_bounds = [false, true, true, false], permutation_map = #map1}
+  %3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %mask {permutation_map = #map1}
           : memref<1x1x1x1xi32>, vector<1x1x1x1xi32>
   return %3 : vector<1x1x1x1xi32>
 }
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir
index 0a077624d18f8..9244604128cb7 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir
@@ -22,7 +22,7 @@ func.func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
 // CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])
 // CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])
 // CHECK-NEXT: %{{.*}} = arith.constant 0.0{{.*}}: f32
-// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {in_bounds = [true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
+// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
    affine.for %i0 = 0 to %M { // vectorized due to scalar -> vector
      %a0 = affine.load %A[%c0, %c0] : memref<?x?xf32>
    }
@@ -425,7 +425,7 @@ func.func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
 // CHECK:     %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
 // CHECK:     %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
 // CHECK:     %{{.*}} = arith.constant 0.0{{.*}}: f32
-// CHECK:     {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {in_bounds = [true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
+// CHECK:     {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
    affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %{{.*}} in DFS post-order prevents vectorizing %{{.*}}
      affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
        %a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>
@@ -459,7 +459,7 @@ func.func @vec_rejected_9(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
 // CHECK:      %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
 // CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
 // CHECK-NEXT: %{{.*}} = arith.constant 0.0{{.*}}: f32
-// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {in_bounds = [true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
+// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
    affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %{{.*}}
      affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
        %a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir
index eb5120a49e3d4..83916e755363b 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir
@@ -123,8 +123,8 @@ func.func @vectorize_matmul(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg
   //      VECT:  affine.for %[[I2:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[M]]) step 4 {
   // VECT-NEXT:    affine.for %[[I3:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[N]]) step 8 {
   // VECT-NEXT:      affine.for %[[I4:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[K]]) {
-  //      VECT:        %[[A:.*]] = vector.transfer_read %{{.*}}[%[[I4]], %[[I3]]], %{{.*}} {in_bounds = [true, false], permutation_map = #[[$map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>
-  //      VECT:        %[[B:.*]] = vector.transfer_read %{{.*}}[%[[I2]], %[[I4]]], %{{.*}} {in_bounds = [false, true], permutation_map = #[[$map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>
+  //      VECT:        %[[A:.*]] = vector.transfer_read %{{.*}}[%[[I4]], %[[I3]]], %{{.*}} {permutation_map = #[[$map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>
+  //      VECT:        %[[B:.*]] = vector.transfer_read %{{.*}}[%[[I2]], %[[I4]]], %{{.*}} {permutation_map = #[[$map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>
   // VECT-NEXT:        %[[C:.*]] = arith.mulf %[[B]], %[[A]] : vector<4x8xf32>
   //      VECT:        %[[D:.*]] = vector.transfer_read %{{.*}}[%[[I2]], %[[I3]]], %{{.*}} : memref<?x?xf32>, vector<4x8xf32>
   // VECT-NEXT:        %[[E:.*]] = arith.addf %[[D]], %[[C]] : vector<4x8xf32>
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
index 16ade6455d697..15a7133cf0f65 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
@@ -141,7 +141,7 @@ func.func @affine_map_with_expr_2(%arg0: memref<8x12x16xf32>, %arg1: memref<8x24
 // CHECK-NEXT:       %[[S1:.*]] = affine.apply #[[$MAP_ID4]](%[[ARG3]], %[[ARG4]], %[[I0]])
 // CHECK-NEXT:       %[[S2:.*]] = affine.apply #[[$MAP_ID5]](%[[ARG3]], %[[ARG4]], %[[I0]])
 // CHECK-NEXT:       %[[CST:.*]] = arith.constant 0.000000e+00 : f32
-// CHECK-NEXT:       %[[S3:.*]] = vector.transfer_read %[[ARG0]][%[[S0]], %[[S1]], %[[S2]]], %[[CST]] {in_bounds = [true], permutation_map = #[[$MAP_ID6]]} : memref<8x12x16xf32>, vector<8xf32>
+// CHECK-NEXT:       %[[S3:.*]] = vector.transfer_read %[[ARG0]][%[[S0]], %[[S1]], %[[S2]]], %[[CST]] {permutation_map = #[[$MAP_ID6]]} : memref<8x12x16xf32>, vector<8xf32>
 // CHECK-NEXT:       vector.transfer_write %[[S3]], %[[ARG1]][%[[ARG3]], %[[ARG4]], %[[ARG5]]] : vector<8xf32>, memref<8x24x48xf32>
 // CHECK-NEXT:     }
 // CHECK-NEXT:   }
diff --git a/mlir/test/Dialect/Linalg/hoisting.mlir b/mlir/test/Dialect/Linalg/hoisting.mlir
index 44c15c272bb3e..241b8a486c012 100644
--- a/mlir/test/Dialect/Linalg/hoisting.mlir
+++ b/mlir/test/Dialect/Linalg/hoisting.mlir
@@ -200,7 +200,7 @@ func.func @hoist_vector_transfer_pairs_in_affine_loops(%memref0: memref<64x64xi3
   affine.for %arg3 = 0 to 64 {
     affine.for %arg4 = 0 to 64 step 16 {
       affine.for %arg5 = 0 to 64 {
-        %0 = vector.transfer_read %memref0[%arg3, %arg5], %c0_i32 {in_bounds = [true], permutation_map = affine_map<(d0, d1) -> (0)>} : memref<64x64xi32>, vector<16xi32>
+        %0 = vector.transfer_read %memref0[%arg3, %arg5], %c0_i32 {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<64x64xi32>, vector<16xi32>
         %1 = vector.transfer_read %memref1[%arg5, %arg4], %c0_i32 : memref<64x64xi32>, vector<16xi32>
         %2 = vector.transfer_read %memref2[%arg3, %arg4], %c0_i32 : memref<64x64xi32>, vector<16xi32>
         %3 = arith.muli %0, %1 : vector<16xi32>
diff --git a/mlir/test/Dialect/Linalg/vectorization.mlir b/mlir/test/Dialect/Linalg/vectorization.mlir
index 783149971f0d6..bbeccc7fecd68 100644
--- a/mlir/test/Dialect/Linalg/vectorization.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization.mlir
@@ -130,7 +130,7 @@ func.func @vectorize_dynamic_1d_broadcast(%arg0: tensor<?xf32>,
 // CHECK-LABEL:   @vectorize_dynamic_1d_broadcast
 // CHECK:           %[[VAL_3:.*]] = arith.constant 0 : index
 // CHECK:           %[[VAL_4:.*]] = tensor.dim %{{.*}}, %[[VAL_3]] : tensor<?xf32>
-// CHECK:           %[[VAL_7:.*]] = vector.transfer_read %{{.*}} {in_bounds = {{.*}}, permutation_map = #{{.*}}} : tensor<?xf32>, vector<4xf32>
+// CHECK:           %[[VAL_7:.*]] = vector.transfer_read %{{.*}} {permutation_map = #{{.*}}} : tensor<?xf32>, vector<4xf32>
 // CHECK:           %[[VAL_9:.*]] = vector.create_mask %[[VAL_4]] : vector<4xi1>
 // CHECK:           %[[VAL_10:.*]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.*}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
 // CHECK:           %[[VAL_12:.*]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.*}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
diff --git a/mlir/test/Dialect/Vector/invalid.mlir b/mlir/test/Dialect/Vector/invalid.mlir
index 208982a3e0e7b..abd8769e62e6b 100644
--- a/mlir/test/Dialect/Vector/invalid.mlir
+++ b/mlir/test/Dialect/Vector/invalid.mlir
@@ -484,16 +484,6 @@ func.func @test_vector.transfer_read(%arg0: memref<?x?xvector<2x3xf32>>) {
 
 // -----
 
-func.func @test_vector....
[truncated]

dcaballe · 2024-07-23T05:47:47Z

In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to true. Indeed, such dims would
never be out-of-bounds in practice.

Could you please elaborate on this? I'm not sure I follow

banach-space · 2024-07-23T19:12:56Z

Could you please elaborate on this? I'm not sure I follow

I "spoke" too soon :)

Looking at this example (output from a vectorizer test), we can infer that all broadcast dims are "in bounds" as all other dims are in bounds:

#map = affine_map<(d0, d1) -> (d0, 0, 0, d1)>
#map1 = affine_map<(d0) -> (d0, 0, 0, 0)>
#map2 = affine_map<(d0) -> (0, 0, d0, 0)>
#map3 = affine_map<(d0, d1) -> (d1, 0, d0, 0)>
module {
  func.func @generic_vectorize_broadcast_transpose(%arg0: memref<4xf32>, %arg1: memref<4x4xf32>, %arg2: memref<4x4x4x4xf32>) {
    %c0 = arith.constant 0 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = vector.transfer_read %arg1[%c0, %c0], %cst {in_bounds = [true, false, false, true], permutation_map = #map} : memref<4x4xf32>, vector<4x4x4x4xf32>
    %1 = vector.transfer_read %arg0[%c0], %cst {in_bounds = [true, false, false, false], permutation_map = #map1} : memref<4xf32>, vector<4x4x4x4xf32>
    %2 = vector.transfer_read %arg0[%c0], %cst {in_bounds = [false, false, true, false], permutation_map = #map2} : memref<4xf32>, vector<4x4x4x4xf32>
    %3 = vector.transfer_read %arg1[%c0, %c0], %cst {in_bounds = [true, false, true, false], permutation_map = #map3} : memref<4x4xf32>, vector<4x4x4x4xf32>
    %4 = arith.subf %0, %1 : vector<4x4x4x4xf32>
    %5 = arith.addf %2, %4 : vector<4x4x4x4xf32>
    %6 = arith.addf %3, %5 : vector<4x4x4x4xf32>
    vector.transfer_write %6, %arg2[%c0, %c0, %c0, %c0] {in_bounds = [true, true, true, true]} : vector<4x4x4x4xf32>, memref<4x4x4x4xf32>
    return
  }

In cases where non-broadcast dims are not "in bounds", I think that we should conservatively stick to "out of bounds". I will be sending an update for foldTransferInBoundsAttribute shortly.

banach-space · 2024-09-30T17:04:58Z

Hey @dcaballe , shall we try finalising this one? Could you take another look?

dcaballe · 2024-10-03T00:20:07Z

Thanks, and sorry for the delay. LGTM

In cases where non-broadcast dims are not "in bounds", I think that we should conservatively stick to "out of bounds". I will be sending an update for foldTransferInBoundsAttribute shortly.

This is conservative that it should be good enough.

dcaballe · 2024-10-03T00:08:44Z

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

+                            /*indicesIdx=*/dimExpr.getPosition());
+      // 2.b Broadcast dims are handled after processing non-bcast dims
+      // FIXME: constant expr != 0 are not broadcasts - should such
+      // constants be allowed at all?


I don't think so. Any value != 0 would be incorrect.

Thanks! Looks like the Op verifier would reject such cases before we get here (so all good). I'll add relevant "invalid" cases to test/Dialect/Vector/invalid.mlir

kuhar · 2024-10-03T00:39:45Z

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

+    llvm::for_each(permutationMap.getBroadcastDims(), [&](unsigned idx) {
+      changed |= !newInBounds[idx];
+      newInBounds[idx] = true;


nit: why not a plain for loop?

NOTE: This is a follow-up for llvm#97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated (*) and will update the attribute value corresponding to broadcast dims to `true`. Indeed, such dims would never be out-of-bounds in practice. Still, there's no need to require Op "users" to always set the corresponding `in_bounds` flag to `true. Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in llvm#97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from llvm#97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": * https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.

Only mark bcast dims as "in bounds" when all non-bcast dims are "in bounds".

Address PR suggestions, update comments, added new tests

banach-space · 2024-10-03T18:04:22Z

@kuhar, do let me know if you have any further comments, otherwise I will land this tomorrow. Thanks for taking a look 🙏🏻

kuhar · 2024-10-03T18:29:02Z

@kuhar, do let me know if you have any further comments, otherwise I will land this tomorrow. Thanks for taking a look 🙏🏻

I'm not an expert on broadcasting semantics, no need to wait for me

banach-space requested review from dcaballe, nicolasvasilache, matthias-springer and hanhanW as code owners July 17, 2024 15:30

llvmbot added mlir:core MLIR Core Infrastructure mlir:linalg mlir:affine mlir:vectorops mlir mlir:vector labels Jul 17, 2024

banach-space mentioned this pull request Jul 22, 2024

[mlir][Linalg] Refine how broadcast dims are treated #99015

Merged

banach-space mentioned this pull request Jun 19, 2024

Audit vector-transfer-permutation-lowering.mlir #95533

Open

banach-space force-pushed the andrzej/in_bounds_bcast branch from b09adbb to cfec2c7 Compare September 30, 2024 17:03

banach-space requested a review from kuhar as a code owner September 30, 2024 17:03

dcaballe approved these changes Oct 3, 2024

View reviewed changes

kuhar reviewed Oct 3, 2024

View reviewed changes

banach-space added 3 commits October 3, 2024 09:43

Refine how bcast dims are handled

b23b6a9

Only mark bcast dims as "in bounds" when all non-bcast dims are "in bounds".

fixup! Refine how bcast dims are handled

cd0f43b

Address PR suggestions, update comments, added new tests

banach-space force-pushed the andrzej/in_bounds_bcast branch from cfec2c7 to cd0f43b Compare October 3, 2024 10:12

banach-space merged commit 56d6b56 into llvm:main Oct 4, 2024
8 checks passed

banach-space deleted the andrzej/in_bounds_bcast branch October 4, 2024 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][vector] Relax the requirements on broadcast dims #99341

[mlir][vector] Relax the requirements on broadcast dims #99341

Uh oh!

banach-space commented Jul 17, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jul 17, 2024 •

edited

Loading

Uh oh!

dcaballe commented Jul 23, 2024

Uh oh!

banach-space commented Jul 23, 2024

Uh oh!

banach-space commented Sep 30, 2024

Uh oh!

dcaballe commented Oct 3, 2024

Uh oh!

dcaballe Oct 3, 2024

Uh oh!

banach-space Oct 3, 2024

Uh oh!

kuhar Oct 3, 2024

Uh oh!

banach-space Oct 3, 2024

Uh oh!

banach-space commented Oct 3, 2024

Uh oh!

kuhar commented Oct 3, 2024

Uh oh!

Uh oh!

Uh oh!

[mlir][vector] Relax the requirements on broadcast dims #99341

[mlir][vector] Relax the requirements on broadcast dims #99341

Uh oh!

Conversation

banach-space commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcaballe commented Jul 23, 2024

Uh oh!

banach-space commented Jul 23, 2024

Uh oh!

banach-space commented Sep 30, 2024

Uh oh!

dcaballe commented Oct 3, 2024

Uh oh!

dcaballe Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

banach-space Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

kuhar Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

banach-space Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

banach-space commented Oct 3, 2024

Uh oh!

kuhar commented Oct 3, 2024

Uh oh!

Uh oh!

Uh oh!

banach-space commented Jul 17, 2024 •

edited

Loading

llvmbot commented Jul 17, 2024 •

edited

Loading