-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][vector] Improve flattening vector.transfer_write ops. #94051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.
@llvm/pr-subscribers-mlir-vector Author: Han-Chung Wang (hanhanW) ChangesWe can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases. Full diff: https://github.com/llvm/llvm-project/pull/94051.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
index 997b56a1ce142..ae47d7fc28811 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
@@ -510,20 +510,59 @@ static Value collapseInnerDims(PatternRewriter &rewriter, mlir::Location loc,
/// the truncated indices where `firstDimToCollapse` is now the innermost dim.
/// TODO: Extract the logic that writes to outIndices so that this method
/// simply checks one pre-condition.
-static LogicalResult
-checkAndCollapseInnerZeroIndices(ValueRange indices, int64_t firstDimToCollapse,
- SmallVector<Value> &outIndices) {
- int64_t rank = indices.size();
- if (firstDimToCollapse >= rank)
- return failure();
- for (int64_t i = firstDimToCollapse; i < rank; ++i) {
- std::optional<int64_t> cst = getConstantIntValue(indices[i]);
- if (!cst || cst.value() != 0)
- return failure();
+static SmallVector<Value> getCollapsedIndices(RewriterBase &rewriter,
+ Location loc,
+ ArrayRef<int64_t> shape,
+ ValueRange indices,
+ int64_t firstDimToCollapse) {
+ assert(firstDimToCollapse < static_cast<int64_t>(indices.size()));
+
+ // If all the collapsed indices are zero then no extra logic is needed.
+ // Otherwise, a new offset/index has to be computed.
+ SmallVector<Value> collapsedIndices(indices.begin(),
+ indices.begin() + firstDimToCollapse);
+ SmallVector<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
+ indices.end());
+ if (llvm::all_of(indicesToCollapse, isZeroIndex)) {
+ collapsedIndices.push_back(indicesToCollapse[0]);
+ return collapsedIndices;
+ }
+
+ // Compute the remaining trailing index/offset required for reading from
+ // the collapsed memref:
+ //
+ // offset = 0
+ // for (i = firstDimToCollapse; i < outputRank; ++i)
+ // offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
+ //
+ // For this example:
+ // %2 = vector.transfer_read/write %arg4[%c0, %arg0, %c0] (...) :
+ // memref<1x43x2xi32>, vector<1x2xi32>
+ // which would be collapsed to:
+ // %1 = vector.transfer_read/write %collapse_shape[%c0, %offset] (...) :
+ // memref<1x86xi32>, vector<2xi32>
+ // one would get the following offset:
+ // %offset = %arg0 * 43
+ OpFoldResult collapsedOffset =
+ rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
+
+ auto collapsedStrides = computeSuffixProduct(
+ ArrayRef<int64_t>(shape.begin() + firstDimToCollapse, shape.end()));
+
+ // Compute the collapsed offset.
+ auto &&[collapsedExpr, collapsedVals] =
+ computeLinearIndex(collapsedOffset, collapsedStrides, indicesToCollapse);
+ collapsedOffset = affine::makeComposedFoldedAffineApply(
+ rewriter, loc, collapsedExpr, collapsedVals);
+
+ if (collapsedOffset.is<Value>()) {
+ collapsedIndices.push_back(collapsedOffset.get<Value>());
+ } else {
+ collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
+ loc, *getConstantIntValue(collapsedOffset)));
}
- outIndices = indices;
- outIndices.resize(firstDimToCollapse + 1);
- return success();
+
+ return collapsedIndices;
}
namespace {
@@ -594,54 +633,9 @@ class FlattenContiguousRowMajorTransferReadPattern
AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
// 2.2 New indices
- // If all the collapsed indices are zero then no extra logic is needed.
- // Otherwise, a new offset/index has to be computed.
- SmallVector<Value> collapsedIndices;
- if (failed(checkAndCollapseInnerZeroIndices(transferReadOp.getIndices(),
- firstDimToCollapse,
- collapsedIndices))) {
- // Copy all the leading indices.
- SmallVector<Value> indices = transferReadOp.getIndices();
- collapsedIndices.append(indices.begin(),
- indices.begin() + firstDimToCollapse);
-
- // Compute the remaining trailing index/offset required for reading from
- // the collapsed memref:
- //
- // offset = 0
- // for (i = firstDimToCollapse; i < outputRank; ++i)
- // offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
- //
- // For this example:
- // %2 = vector.transfer_read %arg4[%c0, %arg0, %c0] (...) :
- // memref<1x43x2xi32>, vector<1x2xi32>
- // which would be collapsed to:
- // %1 = vector.transfer_read %collapse_shape[%c0, %offset] (...) :
- // memref<1x86xi32>, vector<2xi32>
- // one would get the following offset:
- // %offset = %arg0 * 43
- OpFoldResult collapsedOffset =
- rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
-
- auto sourceShape = sourceType.getShape();
- auto collapsedStrides = computeSuffixProduct(ArrayRef<int64_t>(
- sourceShape.begin() + firstDimToCollapse, sourceShape.end()));
-
- // Compute the collapsed offset.
- ArrayRef<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
- indices.end());
- auto &&[collapsedExpr, collapsedVals] = computeLinearIndex(
- collapsedOffset, collapsedStrides, indicesToCollapse);
- collapsedOffset = affine::makeComposedFoldedAffineApply(
- rewriter, loc, collapsedExpr, collapsedVals);
-
- if (collapsedOffset.is<Value>()) {
- collapsedIndices.push_back(collapsedOffset.get<Value>());
- } else {
- collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
- loc, *getConstantIntValue(collapsedOffset)));
- }
- }
+ SmallVector<Value> collapsedIndices =
+ getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+ transferReadOp.getIndices(), firstDimToCollapse);
// 3. Create new vector.transfer_read that reads from the collapsed memref
VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
@@ -697,8 +691,7 @@ class FlattenContiguousRowMajorTransferWritePattern
return failure();
if (!vector::isContiguousSlice(sourceType, vectorType))
return failure();
- int64_t firstContiguousInnerDim =
- sourceType.getRank() - vectorType.getRank();
+ int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank();
// TODO: generalize this pattern, relax the requirements here.
if (transferWriteOp.hasOutOfBoundsDim())
return failure();
@@ -706,22 +699,23 @@ class FlattenContiguousRowMajorTransferWritePattern
return failure();
if (transferWriteOp.getMask())
return failure();
- SmallVector<Value> collapsedIndices;
- if (failed(checkAndCollapseInnerZeroIndices(transferWriteOp.getIndices(),
- firstContiguousInnerDim,
- collapsedIndices)))
- return failure();
+
+ SmallVector<Value> collapsedIndices =
+ getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+ transferWriteOp.getIndices(), firstDimToCollapse);
Value collapsedSource =
- collapseInnerDims(rewriter, loc, source, firstContiguousInnerDim);
+ collapseInnerDims(rewriter, loc, source, firstDimToCollapse);
MemRefType collapsedSourceType =
cast<MemRefType>(collapsedSource.getType());
int64_t collapsedRank = collapsedSourceType.getRank();
- assert(collapsedRank == firstContiguousInnerDim + 1);
+ assert(collapsedRank == firstDimToCollapse + 1);
+
SmallVector<AffineExpr, 1> dimExprs{
- getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};
+ getAffineDimExpr(firstDimToCollapse, rewriter.getContext())};
auto collapsedMap =
AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
+
VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
vectorType.getElementType());
Value flatVector =
diff --git a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
index 788ae9ac044ed..b5f29b2ac958b 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
@@ -471,16 +471,16 @@ func.func @regression_non_contiguous_dim_read(%subview : memref<1x3x3x2xf32, str
}
// CHECK: #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
-// CHECK-LABEL: func.func @regression_non_contiguous_dim_read(
-// CHECK: %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-// CHECK: %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK-LABEL: func.func @regression_non_contiguous_dim_read(
+// CHECK: %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+// CHECK: %[[APPLY:.*]] = affine.apply #[[$MAP]]()
// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
// CHECK-128B: memref.collapse_shape
// -----
-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
%subview : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
%idx0 : index, %idx1 : index) {
%c0 = arith.constant 0 : index
@@ -488,8 +488,10 @@ func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
return
}
-// CHECK-LABEL: func.func @unsupported_non_contiguous_dim_write(
-// CHECK-NOT: memref.collapse_shape
+// CHECK: #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
+// CHECK-LABEL: func.func @regression_non_contiguous_dim_write(
+// CHECK: %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK: %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-// CHECK-128B-LABEL: func @unsupported_non_contiguous_dim_write(
-// CHECK-128B-NOT: memref.collapse_shape
+// CHECK-128B-LABEL: func @regression_non_contiguous_dim_write(
+// CHECK-128B: memref.collapse_shape
|
@llvm/pr-subscribers-mlir Author: Han-Chung Wang (hanhanW) ChangesWe can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases. Full diff: https://github.com/llvm/llvm-project/pull/94051.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
index 997b56a1ce142..ae47d7fc28811 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
@@ -510,20 +510,59 @@ static Value collapseInnerDims(PatternRewriter &rewriter, mlir::Location loc,
/// the truncated indices where `firstDimToCollapse` is now the innermost dim.
/// TODO: Extract the logic that writes to outIndices so that this method
/// simply checks one pre-condition.
-static LogicalResult
-checkAndCollapseInnerZeroIndices(ValueRange indices, int64_t firstDimToCollapse,
- SmallVector<Value> &outIndices) {
- int64_t rank = indices.size();
- if (firstDimToCollapse >= rank)
- return failure();
- for (int64_t i = firstDimToCollapse; i < rank; ++i) {
- std::optional<int64_t> cst = getConstantIntValue(indices[i]);
- if (!cst || cst.value() != 0)
- return failure();
+static SmallVector<Value> getCollapsedIndices(RewriterBase &rewriter,
+ Location loc,
+ ArrayRef<int64_t> shape,
+ ValueRange indices,
+ int64_t firstDimToCollapse) {
+ assert(firstDimToCollapse < static_cast<int64_t>(indices.size()));
+
+ // If all the collapsed indices are zero then no extra logic is needed.
+ // Otherwise, a new offset/index has to be computed.
+ SmallVector<Value> collapsedIndices(indices.begin(),
+ indices.begin() + firstDimToCollapse);
+ SmallVector<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
+ indices.end());
+ if (llvm::all_of(indicesToCollapse, isZeroIndex)) {
+ collapsedIndices.push_back(indicesToCollapse[0]);
+ return collapsedIndices;
+ }
+
+ // Compute the remaining trailing index/offset required for reading from
+ // the collapsed memref:
+ //
+ // offset = 0
+ // for (i = firstDimToCollapse; i < outputRank; ++i)
+ // offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
+ //
+ // For this example:
+ // %2 = vector.transfer_read/write %arg4[%c0, %arg0, %c0] (...) :
+ // memref<1x43x2xi32>, vector<1x2xi32>
+ // which would be collapsed to:
+ // %1 = vector.transfer_read/write %collapse_shape[%c0, %offset] (...) :
+ // memref<1x86xi32>, vector<2xi32>
+ // one would get the following offset:
+ // %offset = %arg0 * 43
+ OpFoldResult collapsedOffset =
+ rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
+
+ auto collapsedStrides = computeSuffixProduct(
+ ArrayRef<int64_t>(shape.begin() + firstDimToCollapse, shape.end()));
+
+ // Compute the collapsed offset.
+ auto &&[collapsedExpr, collapsedVals] =
+ computeLinearIndex(collapsedOffset, collapsedStrides, indicesToCollapse);
+ collapsedOffset = affine::makeComposedFoldedAffineApply(
+ rewriter, loc, collapsedExpr, collapsedVals);
+
+ if (collapsedOffset.is<Value>()) {
+ collapsedIndices.push_back(collapsedOffset.get<Value>());
+ } else {
+ collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
+ loc, *getConstantIntValue(collapsedOffset)));
}
- outIndices = indices;
- outIndices.resize(firstDimToCollapse + 1);
- return success();
+
+ return collapsedIndices;
}
namespace {
@@ -594,54 +633,9 @@ class FlattenContiguousRowMajorTransferReadPattern
AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
// 2.2 New indices
- // If all the collapsed indices are zero then no extra logic is needed.
- // Otherwise, a new offset/index has to be computed.
- SmallVector<Value> collapsedIndices;
- if (failed(checkAndCollapseInnerZeroIndices(transferReadOp.getIndices(),
- firstDimToCollapse,
- collapsedIndices))) {
- // Copy all the leading indices.
- SmallVector<Value> indices = transferReadOp.getIndices();
- collapsedIndices.append(indices.begin(),
- indices.begin() + firstDimToCollapse);
-
- // Compute the remaining trailing index/offset required for reading from
- // the collapsed memref:
- //
- // offset = 0
- // for (i = firstDimToCollapse; i < outputRank; ++i)
- // offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
- //
- // For this example:
- // %2 = vector.transfer_read %arg4[%c0, %arg0, %c0] (...) :
- // memref<1x43x2xi32>, vector<1x2xi32>
- // which would be collapsed to:
- // %1 = vector.transfer_read %collapse_shape[%c0, %offset] (...) :
- // memref<1x86xi32>, vector<2xi32>
- // one would get the following offset:
- // %offset = %arg0 * 43
- OpFoldResult collapsedOffset =
- rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
-
- auto sourceShape = sourceType.getShape();
- auto collapsedStrides = computeSuffixProduct(ArrayRef<int64_t>(
- sourceShape.begin() + firstDimToCollapse, sourceShape.end()));
-
- // Compute the collapsed offset.
- ArrayRef<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
- indices.end());
- auto &&[collapsedExpr, collapsedVals] = computeLinearIndex(
- collapsedOffset, collapsedStrides, indicesToCollapse);
- collapsedOffset = affine::makeComposedFoldedAffineApply(
- rewriter, loc, collapsedExpr, collapsedVals);
-
- if (collapsedOffset.is<Value>()) {
- collapsedIndices.push_back(collapsedOffset.get<Value>());
- } else {
- collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
- loc, *getConstantIntValue(collapsedOffset)));
- }
- }
+ SmallVector<Value> collapsedIndices =
+ getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+ transferReadOp.getIndices(), firstDimToCollapse);
// 3. Create new vector.transfer_read that reads from the collapsed memref
VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
@@ -697,8 +691,7 @@ class FlattenContiguousRowMajorTransferWritePattern
return failure();
if (!vector::isContiguousSlice(sourceType, vectorType))
return failure();
- int64_t firstContiguousInnerDim =
- sourceType.getRank() - vectorType.getRank();
+ int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank();
// TODO: generalize this pattern, relax the requirements here.
if (transferWriteOp.hasOutOfBoundsDim())
return failure();
@@ -706,22 +699,23 @@ class FlattenContiguousRowMajorTransferWritePattern
return failure();
if (transferWriteOp.getMask())
return failure();
- SmallVector<Value> collapsedIndices;
- if (failed(checkAndCollapseInnerZeroIndices(transferWriteOp.getIndices(),
- firstContiguousInnerDim,
- collapsedIndices)))
- return failure();
+
+ SmallVector<Value> collapsedIndices =
+ getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+ transferWriteOp.getIndices(), firstDimToCollapse);
Value collapsedSource =
- collapseInnerDims(rewriter, loc, source, firstContiguousInnerDim);
+ collapseInnerDims(rewriter, loc, source, firstDimToCollapse);
MemRefType collapsedSourceType =
cast<MemRefType>(collapsedSource.getType());
int64_t collapsedRank = collapsedSourceType.getRank();
- assert(collapsedRank == firstContiguousInnerDim + 1);
+ assert(collapsedRank == firstDimToCollapse + 1);
+
SmallVector<AffineExpr, 1> dimExprs{
- getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};
+ getAffineDimExpr(firstDimToCollapse, rewriter.getContext())};
auto collapsedMap =
AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
+
VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
vectorType.getElementType());
Value flatVector =
diff --git a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
index 788ae9ac044ed..b5f29b2ac958b 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
@@ -471,16 +471,16 @@ func.func @regression_non_contiguous_dim_read(%subview : memref<1x3x3x2xf32, str
}
// CHECK: #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
-// CHECK-LABEL: func.func @regression_non_contiguous_dim_read(
-// CHECK: %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-// CHECK: %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK-LABEL: func.func @regression_non_contiguous_dim_read(
+// CHECK: %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+// CHECK: %[[APPLY:.*]] = affine.apply #[[$MAP]]()
// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
// CHECK-128B: memref.collapse_shape
// -----
-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
%subview : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
%idx0 : index, %idx1 : index) {
%c0 = arith.constant 0 : index
@@ -488,8 +488,10 @@ func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
return
}
-// CHECK-LABEL: func.func @unsupported_non_contiguous_dim_write(
-// CHECK-NOT: memref.collapse_shape
+// CHECK: #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
+// CHECK-LABEL: func.func @regression_non_contiguous_dim_write(
+// CHECK: %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK: %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-// CHECK-128B-LABEL: func @unsupported_non_contiguous_dim_write(
-// CHECK-128B-NOT: memref.collapse_shape
+// CHECK-128B-LABEL: func @regression_non_contiguous_dim_write(
+// CHECK-128B: memref.collapse_shape
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, mostly LG. Looks like this is missing a test?
/// the truncated indices where `firstDimToCollapse` is now the innermost dim. | ||
/// TODO: Extract the logic that writes to outIndices so that this method | ||
/// simply checks one pre-condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be updated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
SmallVector<Value> collapsedIndices(indices.begin(), | ||
indices.begin() + firstDimToCollapse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collapsedIndices
is a bit misleading - and I might be the one who have invented that, sorry 😅
The leading indices are actually preserved and only the trailing indices are "collapsed". So this is more like "indicesAfterCollapsingSrc", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no src
in the local function. How about renaming it to indicesAfterCollapsing
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming it to indicesAfterCollapsing?
👍🏻
|
||
// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read( | ||
// CHECK-128B: memref.collapse_shape | ||
|
||
// ----- | ||
|
||
func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>, | ||
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turns an existing testcase that couldn't be collapsed into one that can be collapsed. Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?
Also, is any testcase testing in_bounds = false
and specifically any edge case where the ability to collapse depends on in_bounds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?
There are many cases are covered by CHECK-128B
. E.g., there is a CHECK-128B: memref.collapse_shape
in this and other testcases. Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1]
but not [x, 6, 2, 1]
. I.e., it is not contiguous. Do you want me to add a test that can't be collapsed for every dimension?
Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?
There are no negative tests for the case. I can add one to iterate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases.
But CHECK-128B
tests a specific configuration of the pass rather than the generic version, right?
Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]
Just to double check - this is referring to regression_non_contiguous_dim_write
? In which collapsing does happen?
In general, I think that it would be good to identify and document all edge cases through tests (most of the stuff is already here). I would start by renaming this test:
@regression_non_contiguous_dim_write
"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous
- both tests seem to check sth very similar, two edge cases for vector.transfer_write
. I find it very helpful when such edge cases are clustered together.
Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?
There are no negative tests for the case. I can add one to iterate it.
I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).
(*) In general, I'm not a fan of regression tests, but appreciate that others find the valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]
Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?
Yes, it is referring to the test. However, I was wrong on the statement because it is 2D vector. I will add a 3D vector case, and check that it is not collapsed.
"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.
I also found "regression" is weird, but I'd like to follow the naming style like the above xfer_read test. The new name is great, and I'll send a follow-up about naming and structure of the file.
I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).
UPDATE: I already add the tests. They are negative_out_of_bound_transfer_read and negative_out_of_bound_transfer_write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found that we already have a negative test. It does not matter whether the last index is zero or not. So I'm going to merge the change and send a follow-up for the naming and structure.
llvm-project/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
Lines 215 to 221 in 0e743ec
func.func @transfer_write_dims_mismatch_non_contiguous( | |
%arg : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, %vec : vector<2x1x2x2xi8>) { | |
%c0 = arith.constant 0 : index | |
vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] : | |
vector<2x1x2x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>> | |
return | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I forgot to submit the comments..
/// the truncated indices where `firstDimToCollapse` is now the innermost dim. | ||
/// TODO: Extract the logic that writes to outIndices so that this method | ||
/// simply checks one pre-condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
SmallVector<Value> collapsedIndices(indices.begin(), | ||
indices.begin() + firstDimToCollapse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no src
in the local function. How about renaming it to indicesAfterCollapsing
?
|
||
// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read( | ||
// CHECK-128B: memref.collapse_shape | ||
|
||
// ----- | ||
|
||
func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>, | ||
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?
There are many cases are covered by CHECK-128B
. E.g., there is a CHECK-128B: memref.collapse_shape
in this and other testcases. Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1]
but not [x, 6, 2, 1]
. I.e., it is not contiguous. Do you want me to add a test that can't be collapsed for every dimension?
Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?
There are no negative tests for the case. I can add one to iterate it.
I think it does not miss a test. It adds the support and makes one of negative test work now. So we do have a test for it. |
Not sure how I missed that, sorry :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of non-blocking comments - let me know what you think and thanks for improving this 🙏🏻
SmallVector<Value> collapsedIndices(indices.begin(), | ||
indices.begin() + firstDimToCollapse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming it to indicesAfterCollapsing?
👍🏻
|
||
// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read( | ||
// CHECK-128B: memref.collapse_shape | ||
|
||
// ----- | ||
|
||
func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>, | ||
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases.
But CHECK-128B
tests a specific configuration of the pass rather than the generic version, right?
Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]
Just to double check - this is referring to regression_non_contiguous_dim_write
? In which collapsing does happen?
In general, I think that it would be good to identify and document all edge cases through tests (most of the stuff is already here). I would start by renaming this test:
@regression_non_contiguous_dim_write
"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous
- both tests seem to check sth very similar, two edge cases for vector.transfer_write
. I find it very helpful when such edge cases are clustered together.
Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?
There are no negative tests for the case. I can add one to iterate it.
I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).
(*) In general, I'm not a fan of regression tests, but appreciate that others find the valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the PR soon.
|
||
// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read( | ||
// CHECK-128B: memref.collapse_shape | ||
|
||
// ----- | ||
|
||
func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>, | ||
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]
Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?
Yes, it is referring to the test. However, I was wrong on the statement because it is 2D vector. I will add a 3D vector case, and check that it is not collapsed.
"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.
I also found "regression" is weird, but I'd like to follow the naming style like the above xfer_read test. The new name is great, and I'll send a follow-up about naming and structure of the file.
I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).
UPDATE: I already add the tests. They are negative_out_of_bound_transfer_read and negative_out_of_bound_transfer_write.
We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.