-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][vector] Relax the requirements on broadcast dims #99341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-affine Author: Andrzej Warzyński (banach-space) ChangesNOTE: This is a follow-up for #97049 in which the This PR updates the semantics of the %read = vector.transfer_read %A[%base1, %base2], %pad
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: memref<?x?xf32>, vector<9xf32>
vector.transfer_write %vec, %A[%base1, %base2],
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: vector<9xf32>, memref<?x?xf32> Note that the value Note that this PR doesn't change any of the lowerings. The changes in For context, here's a PR in which "broadcast" dims where forced to (*) See Patch is 27.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99341.diff 20 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 2019eb5a9fd7f..f8345c77b3eed 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -1405,12 +1405,12 @@ def Vector_TransferReadOp :
specifies if the transfer is guaranteed to be within the source bounds. If
set to "false", accesses (including the starting point) may run
out-of-bounds along the respective vector dimension as the index increases.
- Non-vector and broadcast dimensions *must* always be in-bounds. The
- `in_bounds` array length has to be equal to the vector rank. This attribute
- has a default value: `false` (i.e. "out-of-bounds"). When skipped in the
- textual IR, the default value is assumed. Similarly, the OP printer will
- omit this attribute when all dimensions are out-of-bounds (i.e. the default
- value is used).
+ Non-vector dimensions *must* always be in-bounds. The `in_bounds` array
+ length has to be equal to the vector rank. This attribute has a default
+ value: `false` (i.e. "out-of-bounds"). When skipped in the textual IR, the
+ default value is assumed. Similarly, the OP printer will omit this
+ attribute when all dimensions are out-of-bounds (i.e. the default value is
+ used).
A `vector.transfer_read` can be lowered to a simple load if all dimensions
are specified to be within bounds and no `mask` was specified.
@@ -1650,12 +1650,12 @@ def Vector_TransferWriteOp :
specifies if the transfer is guaranteed to be within the source bounds. If
set to "false", accesses (including the starting point) may run
out-of-bounds along the respective vector dimension as the index increases.
- Non-vector and broadcast dimensions *must* always be in-bounds. The
- `in_bounds` array length has to be equal to the vector rank. This attribute
- has a default value: `false` (i.e. "out-of-bounds"). When skipped in the
- textual IR, the default value is assumed. Similarly, the OP printer will
- omit this attribute when all dimensions are out-of-bounds (i.e. the default
- value is used).
+ Non-vector dimensions *must* always be in-bounds. The `in_bounds` array
+ length has to be equal to the vector rank. This attribute has a default
+ value: `false` (i.e. "out-of-bounds"). When skipped in the textual IR, the
+ default value is assumed. Similarly, the OP printer will omit this
+ attribute when all dimensions are out-of-bounds (i.e. the default value is
+ used).
A `vector.transfer_write` can be lowered to a simple store if all
dimensions are specified to be within bounds and no `mask` was specified.
diff --git a/mlir/include/mlir/IR/AffineMap.h b/mlir/include/mlir/IR/AffineMap.h
index 676da6d176497..264c1c8308e78 100644
--- a/mlir/include/mlir/IR/AffineMap.h
+++ b/mlir/include/mlir/IR/AffineMap.h
@@ -146,14 +146,6 @@ class AffineMap {
/// affine map (d0, ..., dn) -> (dp, ..., dn) on the most minor dimensions.
bool isMinorIdentity() const;
- /// Returns the list of broadcast dimensions (i.e. dims indicated by value 0
- /// in the result).
- /// Ex:
- /// * (d0, d1, d2) -> (0, d1) gives [0]
- /// * (d0, d1, d2) -> (d2, d1) gives []
- /// * (d0, d1, d2, d4) -> (d0, 0, d1, 0) gives [1, 3]
- SmallVector<unsigned> getBroadcastDims() const;
-
/// Returns true if this affine map is a minor identity up to broadcasted
/// dimensions which are indicated by value 0 in the result. If
/// `broadcastedDims` is not null, it will be populated with the indices of
diff --git a/mlir/include/mlir/Interfaces/VectorInterfaces.td b/mlir/include/mlir/Interfaces/VectorInterfaces.td
index 7ea62c2ae2ab1..be939bad14b7b 100644
--- a/mlir/include/mlir/Interfaces/VectorInterfaces.td
+++ b/mlir/include/mlir/Interfaces/VectorInterfaces.td
@@ -234,12 +234,9 @@ def VectorTransferOpInterface : OpInterface<"VectorTransferOpInterface"> {
return constExpr && constExpr.getValue() == 0;
}
- /// Return "true" if the vector transfer dimension `dim` is in-bounds. Also
- /// return "true" if the dimension is a broadcast dimension. Return "false"
- /// otherwise.
+ /// Return "true" if the vector transfer dimension `dim` is in-bounds.
+ /// Return "false" otherwise.
bool isDimInBounds(unsigned dim) {
- if ($_op.isBroadcastDim(dim))
- return true;
auto inBounds = $_op.getInBounds();
return ::llvm::cast<::mlir::BoolAttr>(inBounds[dim]).getValue();
}
diff --git a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
index 6bb8dfecba0ec..71e9648a5e00f 100644
--- a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
+++ b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
@@ -1223,19 +1223,8 @@ static Operation *vectorizeAffineLoad(AffineLoadOp loadOp,
LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ permutationMap: ");
LLVM_DEBUG(permutationMap.print(dbgs()));
- // Make sure that the in_bounds attribute corresponding to a broadcast dim
- // is set to `true` - that's required by the xfer Op.
- // FIXME: We're not veryfying whether the corresponding access is in bounds.
- // TODO: Use masking instead.
- SmallVector<unsigned> broadcastedDims = permutationMap.getBroadcastDims();
- SmallVector<bool> inBounds(vectorType.getRank(), false);
-
- for (auto idx : broadcastedDims)
- inBounds[idx] = true;
-
auto transfer = state.builder.create<vector::TransferReadOp>(
- loadOp.getLoc(), vectorType, loadOp.getMemRef(), indices, permutationMap,
- inBounds);
+ loadOp.getLoc(), vectorType, loadOp.getMemRef(), indices, permutationMap);
// Register replacement for future uses in the scope.
state.registerOpVectorReplacement(loadOp, transfer);
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 68ee915cca3f4..4b07cc3e9186c 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -1343,17 +1343,8 @@ vectorizeAsLinalgGeneric(RewriterBase &rewriter, VectorizationState &state,
SmallVector<Value> indices(linalgOp.getShape(opOperand).size(), zero);
- // Make sure that the in_bounds attribute corresponding to a broadcast dim
- // is `true`
- SmallVector<unsigned> broadcastedDims = readMap.getBroadcastDims();
- SmallVector<bool> inBounds(readType.getRank(), false);
-
- for (auto idx : broadcastedDims)
- inBounds[idx] = true;
-
Operation *read = rewriter.create<vector::TransferReadOp>(
- loc, readType, opOperand->get(), indices, readMap,
- ArrayRef<bool>(inBounds));
+ loc, readType, opOperand->get(), indices, readMap);
read = state.maskOperation(rewriter, read, linalgOp, maskingMap);
Value readValue = read->getResult(0);
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index df3a59ed80ad4..cb59cc333e936 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -3958,10 +3958,6 @@ verifyTransferOp(VectorTransferOpInterface op, ShapedType shapedType,
"as permutation_map results: ")
<< AffineMapAttr::get(permutationMap)
<< " vs inBounds of size: " << inBounds.size();
- for (unsigned int i = 0, e = permutationMap.getNumResults(); i < e; ++i)
- if (isa<AffineConstantExpr>(permutationMap.getResult(i)) &&
- !llvm::cast<BoolAttr>(inBounds.getValue()[i]).getValue())
- return op->emitOpError("requires broadcast dimensions to be in-bounds");
return success();
}
@@ -4150,17 +4146,24 @@ static LogicalResult foldTransferInBoundsAttribute(TransferOp op) {
SmallVector<bool, 4> newInBounds;
newInBounds.reserve(op.getTransferRank());
for (unsigned i = 0; i < op.getTransferRank(); ++i) {
- // Already marked as in-bounds, nothing to see here.
+ // 1. Already marked as in-bounds, nothing to see here.
if (op.isDimInBounds(i)) {
newInBounds.push_back(true);
continue;
}
- // Currently out-of-bounds, check whether we can statically determine it is
- // inBounds.
+ // 2. Currently out-of-bounds, check whether we can statically determine it
+ // is inBounds.
+ bool inBounds = false;
auto dimExpr = dyn_cast<AffineDimExpr>(permutationMap.getResult(i));
- assert(dimExpr && "Broadcast dims must be in-bounds");
- auto inBounds =
- isInBounds(op, /*resultIdx=*/i, /*indicesIdx=*/dimExpr.getPosition());
+ if (dimExpr) {
+ // 2.a Non-broadcast dim
+ inBounds = isInBounds(op, /*resultIdx=*/i,
+ /*indicesIdx=*/dimExpr.getPosition());
+ } else {
+ // 2.b Broadcast dim
+ inBounds = true;
+ }
+
newInBounds.push_back(inBounds);
// We commit the pattern if it is "more inbounds".
changed |= inBounds;
diff --git a/mlir/lib/IR/AffineMap.cpp b/mlir/lib/IR/AffineMap.cpp
index 859fb8ebc10e8..62f595299afe2 100644
--- a/mlir/lib/IR/AffineMap.cpp
+++ b/mlir/lib/IR/AffineMap.cpp
@@ -158,19 +158,6 @@ bool AffineMap::isMinorIdentity() const {
getMinorIdentityMap(getNumDims(), getNumResults(), getContext());
}
-SmallVector<unsigned> AffineMap::getBroadcastDims() const {
- SmallVector<unsigned> broadcastedDims;
- for (const auto &[resIdx, expr] : llvm::enumerate(getResults())) {
- if (auto constExpr = dyn_cast<AffineConstantExpr>(expr)) {
- if (constExpr.getValue() != 0)
- continue;
- broadcastedDims.push_back(resIdx);
- }
- }
-
- return broadcastedDims;
-}
-
/// Returns true if this affine map is a minor identity up to broadcasted
/// dimensions which are indicated by value 0 in the result.
bool AffineMap::isMinorIdentityWithBroadcasting(
diff --git a/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir b/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
index 3f4e70a6835af..e1babdd2f1f63 100644
--- a/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
+++ b/mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
@@ -133,7 +133,7 @@ func.func @materialize_read(%M: index, %N: index, %O: index, %P: index) {
affine.for %i1 = 0 to %N {
affine.for %i2 = 0 to %O {
affine.for %i3 = 0 to %P step 5 {
- %f = vector.transfer_read %A[%i0, %i1, %i2, %i3], %f0 {in_bounds = [false, true, false], permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, 0, d0)>} : memref<?x?x?x?xf32>, vector<5x4x3xf32>
+ %f = vector.transfer_read %A[%i0, %i1, %i2, %i3], %f0 {permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, 0, d0)>} : memref<?x?x?x?xf32>, vector<5x4x3xf32>
// Add a dummy use to prevent dead code elimination from removing
// transfer read ops.
"dummy_use"(%f) : (vector<5x4x3xf32>) -> ()
@@ -507,7 +507,7 @@ func.func @transfer_read_with_tensor(%arg: tensor<f32>) -> vector<1xf32> {
// CHECK-NEXT: %[[RESULT:.*]] = vector.broadcast %[[EXTRACTED]] : f32 to vector<1xf32>
// CHECK-NEXT: return %[[RESULT]] : vector<1xf32>
%f0 = arith.constant 0.0 : f32
- %0 = vector.transfer_read %arg[], %f0 {in_bounds = [true], permutation_map = affine_map<()->(0)>} :
+ %0 = vector.transfer_read %arg[], %f0 {permutation_map = affine_map<()->(0)>} :
tensor<f32>, vector<1xf32>
return %0: vector<1xf32>
}
@@ -746,7 +746,7 @@ func.func @cannot_lower_transfer_read_with_leading_scalable(%arg0: memref<?x4xf3
func.func @does_not_crash_on_unpack_one_dim(%subview: memref<1x1x1x1xi32>, %mask: vector<1x1xi1>) -> vector<1x1x1x1xi32> {
%c0 = arith.constant 0 : index
%c0_i32 = arith.constant 0 : i32
- %3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %mask {in_bounds = [false, true, true, false], permutation_map = #map1}
+ %3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %mask {permutation_map = #map1}
: memref<1x1x1x1xi32>, vector<1x1x1x1xi32>
return %3 : vector<1x1x1x1xi32>
}
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir
index 0a077624d18f8..9244604128cb7 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir
@@ -22,7 +22,7 @@ func.func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])
// CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])
// CHECK-NEXT: %{{.*}} = arith.constant 0.0{{.*}}: f32
-// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {in_bounds = [true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
+// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i0 = 0 to %M { // vectorized due to scalar -> vector
%a0 = affine.load %A[%c0, %c0] : memref<?x?xf32>
}
@@ -425,7 +425,7 @@ func.func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK: %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
// CHECK: %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
// CHECK: %{{.*}} = arith.constant 0.0{{.*}}: f32
-// CHECK: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {in_bounds = [true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
+// CHECK: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %{{.*}} in DFS post-order prevents vectorizing %{{.*}}
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
%a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>
@@ -459,7 +459,7 @@ func.func @vec_rejected_9(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK: %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
// CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%{{.*}})
// CHECK-NEXT: %{{.*}} = arith.constant 0.0{{.*}}: f32
-// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {in_bounds = [true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
+// CHECK-NEXT: {{.*}} = vector.transfer_read %{{.*}}[%{{.*}}, %{{.*}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %{{.*}}
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
%a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir
index eb5120a49e3d4..83916e755363b 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir
@@ -123,8 +123,8 @@ func.func @vectorize_matmul(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg
// VECT: affine.for %[[I2:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[M]]) step 4 {
// VECT-NEXT: affine.for %[[I3:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[N]]) step 8 {
// VECT-NEXT: affine.for %[[I4:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[K]]) {
- // VECT: %[[A:.*]] = vector.transfer_read %{{.*}}[%[[I4]], %[[I3]]], %{{.*}} {in_bounds = [true, false], permutation_map = #[[$map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>
- // VECT: %[[B:.*]] = vector.transfer_read %{{.*}}[%[[I2]], %[[I4]]], %{{.*}} {in_bounds = [false, true], permutation_map = #[[$map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>
+ // VECT: %[[A:.*]] = vector.transfer_read %{{.*}}[%[[I4]], %[[I3]]], %{{.*}} {permutation_map = #[[$map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>
+ // VECT: %[[B:.*]] = vector.transfer_read %{{.*}}[%[[I2]], %[[I4]]], %{{.*}} {permutation_map = #[[$map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>
// VECT-NEXT: %[[C:.*]] = arith.mulf %[[B]], %[[A]] : vector<4x8xf32>
// VECT: %[[D:.*]] = vector.transfer_read %{{.*}}[%[[I2]], %[[I3]]], %{{.*}} : memref<?x?xf32>, vector<4x8xf32>
// VECT-NEXT: %[[E:.*]] = arith.addf %[[D]], %[[C]] : vector<4x8xf32>
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
index 16ade6455d697..15a7133cf0f65 100644
--- a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
@@ -141,7 +141,7 @@ func.func @affine_map_with_expr_2(%arg0: memref<8x12x16xf32>, %arg1: memref<8x24
// CHECK-NEXT: %[[S1:.*]] = affine.apply #[[$MAP_ID4]](%[[ARG3]], %[[ARG4]], %[[I0]])
// CHECK-NEXT: %[[S2:.*]] = affine.apply #[[$MAP_ID5]](%[[ARG3]], %[[ARG4]], %[[I0]])
// CHECK-NEXT: %[[CST:.*]] = arith.constant 0.000000e+00 : f32
-// CHECK-NEXT: %[[S3:.*]] = vector.transfer_read %[[ARG0]][%[[S0]], %[[S1]], %[[S2]]], %[[CST]] {in_bounds = [true], permutation_map = #[[$MAP_ID6]]} : memref<8x12x16xf32>, vector<8xf32>
+// CHECK-NEXT: %[[S3:.*]] = vector.transfer_read %[[ARG0]][%[[S0]], %[[S1]], %[[S2]]], %[[CST]] {permutation_map = #[[$MAP_ID6]]} : memref<8x12x16xf32>, vector<8xf32>
// CHECK-NEXT: vector.transfer_write %[[S3]], %[[ARG1]][%[[ARG3]], %[[ARG4]], %[[ARG5]]] : vector<8xf32>, memref<8x24x48xf32>
// CHECK-NEXT: }
// CHECK-NEXT: }
diff --git a/mlir/test/Dialect/Linalg/hoisting.mlir b/mlir/test/Dialect/Linalg/hoisting.mlir
index 44c15c272bb3e..241b8a486c012 100644
--- a/mlir/test/Dialect/Linalg/hoisting.mlir
+++ b/mlir/test/Dialect/Linalg/hoisting.mlir
@@ -200,7 +200,7 @@ func.func @hoist_vector_transfer_pairs_in_affine_loops(%memref0: memref<64x64xi3
affine.for %arg3 = 0 to 64 {
affine.for %arg4 = 0 to 64 step 16 {
affine.for %arg5 = 0 to 64 {
- %0 = vector.transfer_read %memref0[%arg3, %arg5], %c0_i32 {in_bounds = [true], permutation_map = affine_map<(d0, d1) -> (0)>} : memref<64x64xi32>, vector<16xi32>
+ %0 = vector.transfer_read %memref0[%arg3, %arg5], %c0_i32 {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<64x64xi32>, vector<16xi32>
%1 = vector.transfer_read %memref1[%arg5, %arg4], %c0_i32 : memref<64x64xi32>, vector<16xi32>
%2 = vector.transfer_read %memref2[%arg3, %arg4], %c0_i32 : memref<64x64xi32>, vector<16xi32>
%3 = arith.muli %0, %1 : vector<16xi32>
diff --git a/mlir/test/Dialect/Linalg/vectorization.mlir b/mlir/test/Dialect/Linalg/vectorization.mlir
index 783149971f0d6..bbeccc7fecd68 100644
--- a/mlir/test/Dialect/Linalg/vectorization.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization.mlir
@@ -130,7 +130,7 @@ func.func @vectorize_dynamic_1d_broadcast(%arg0: tensor<?xf32>,
// CHECK-LABEL: @vectorize_dynamic_1d_broadcast
// CHECK: %[[VAL_3:.*]] = arith.constant 0 : index
// CHECK: %[[VAL_4:.*]] = tensor.dim %{{.*}}, %[[VAL_3]] : tensor<?xf32>
-// CHECK: %[[VAL_7:.*]] = vector.transfer_read %{{.*}} {in_bounds = {{.*}}, permutation_map = #{{.*}}} : tensor<?xf32>, vector<4xf32>
+// CHECK: %[[VAL_7:.*]] = vector.transfer_read %{{.*}} {permutation_map = #{{.*}}} : tensor<?xf32>, vector<4xf32>
// CHECK: %[[VAL_9:.*]] = vector.create_mask %[[VAL_4]] : vector<4xi1>
// CHECK: %[[VAL_10:.*]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.*}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
// CHECK: %[[VAL_12:.*]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.*}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
diff --git a/mlir/test/Dialect/Vector/invalid.mlir b/mlir/test/Dialect/Vector/invalid.mlir
index 208982a3e0e7b..abd8769e62e6b 100644
--- a/mlir/test/Dialect/Vector/invalid.mlir
+++ b/mlir/test/Dialect/Vector/invalid.mlir
@@ -484,16 +484,6 @@ func.func @test_vector.transfer_read(%arg0: memref<?x?xvector<2x3xf32>>) {
// -----
-func.func @test_vector....
[truncated]
|
Could you please elaborate on this? I'm not sure I follow |
I "spoke" too soon :) Looking at this example (output from a vectorizer test), we can infer that all broadcast dims are "in bounds" as all other dims are in bounds: #map = affine_map<(d0, d1) -> (d0, 0, 0, d1)>
#map1 = affine_map<(d0) -> (d0, 0, 0, 0)>
#map2 = affine_map<(d0) -> (0, 0, d0, 0)>
#map3 = affine_map<(d0, d1) -> (d1, 0, d0, 0)>
module {
func.func @generic_vectorize_broadcast_transpose(%arg0: memref<4xf32>, %arg1: memref<4x4xf32>, %arg2: memref<4x4x4x4xf32>) {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32
%0 = vector.transfer_read %arg1[%c0, %c0], %cst {in_bounds = [true, false, false, true], permutation_map = #map} : memref<4x4xf32>, vector<4x4x4x4xf32>
%1 = vector.transfer_read %arg0[%c0], %cst {in_bounds = [true, false, false, false], permutation_map = #map1} : memref<4xf32>, vector<4x4x4x4xf32>
%2 = vector.transfer_read %arg0[%c0], %cst {in_bounds = [false, false, true, false], permutation_map = #map2} : memref<4xf32>, vector<4x4x4x4xf32>
%3 = vector.transfer_read %arg1[%c0, %c0], %cst {in_bounds = [true, false, true, false], permutation_map = #map3} : memref<4x4xf32>, vector<4x4x4x4xf32>
%4 = arith.subf %0, %1 : vector<4x4x4x4xf32>
%5 = arith.addf %2, %4 : vector<4x4x4x4xf32>
%6 = arith.addf %3, %5 : vector<4x4x4x4xf32>
vector.transfer_write %6, %arg2[%c0, %c0, %c0, %c0] {in_bounds = [true, true, true, true]} : vector<4x4x4x4xf32>, memref<4x4x4x4xf32>
return
} In cases where non-broadcast dims are not "in bounds", I think that we should conservatively stick to "out of bounds". I will be sending an update for |
b09adbb
to
cfec2c7
Compare
Hey @dcaballe , shall we try finalising this one? Could you take another look? |
Thanks, and sorry for the delay. LGTM
This is conservative that it should be good enough. |
/*indicesIdx=*/dimExpr.getPosition()); | ||
// 2.b Broadcast dims are handled after processing non-bcast dims | ||
// FIXME: constant expr != 0 are not broadcasts - should such | ||
// constants be allowed at all? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Any value != 0 would be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks like the Op verifier would reject such cases before we get here (so all good). I'll add relevant "invalid" cases to test/Dialect/Vector/invalid.mlir
llvm::for_each(permutationMap.getBroadcastDims(), [&](unsigned idx) { | ||
changed |= !newInBounds[idx]; | ||
newInBounds[idx] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not a plain for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
NOTE: This is a follow-up for llvm#97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated (*) and will update the attribute value corresponding to broadcast dims to `true`. Indeed, such dims would never be out-of-bounds in practice. Still, there's no need to require Op "users" to always set the corresponding `in_bounds` flag to `true. Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in llvm#97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from llvm#97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": * https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.
Only mark bcast dims as "in bounds" when all non-bcast dims are "in bounds".
Address PR suggestions, update comments, added new tests
cfec2c7
to
cd0f43b
Compare
@kuhar, do let me know if you have any further comments, otherwise I will land this tomorrow. Thanks for taking a look 🙏🏻 |
I'm not an expert on broadcasting semantics, no need to wait for me |
NOTE: This is a follow-up for #97049 in which the
in_bounds
attributewas made mandatory.
This PR updates the semantics of the
in_bounds
attribute so thatbroadcast dimensions are no longer required to be "in bounds".
Specifically, these xfer_read/xfer_write Ops become valid after this
change:
Note that the value
false
merely means "may run out-of-bounds", i.e.,the corresponding access can still be "in bounds". In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to
true
if all non-broadcast dimsare marked as "in bounds".
Note that this PR doesn't change any of the lowerings. The changes in
"SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple
reverts of recent changes in #97049. Those were only meant to facilitate
making
in_bounds
mandatory and to work around the extra requirementsfor broadcast dims (those requirements ere removed in this PR). All
changes in tests are also reverts of changes from #97049.
For context, here's a PR in which "broadcast" dims where forced to
always be "in-bounds":
(*) See
foldTransferInBoundsAttribute
.