Skip to content

[mlir][vector] Improve flattening vector.transfer_write ops. #94051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2024

Conversation

hanhanW
Copy link
Contributor

@hanhanW hanhanW commented May 31, 2024

We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.

We can flatten the transfer ops even when the collapsed indices are not
zeros. We can compute it. It is already supported in
vector.transfer_read cases. The revision refactors the logic and reuse
it in transfer_write cases.
@llvmbot
Copy link
Member

llvmbot commented May 31, 2024

@llvm/pr-subscribers-mlir-vector

Author: Han-Chung Wang (hanhanW)

Changes

We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.


Full diff: https://github.com/llvm/llvm-project/pull/94051.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp (+65-71)
  • (modified) mlir/test/Dialect/Vector/vector-transfer-flatten.mlir (+10-8)
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
index 997b56a1ce142..ae47d7fc28811 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
@@ -510,20 +510,59 @@ static Value collapseInnerDims(PatternRewriter &rewriter, mlir::Location loc,
 /// the truncated indices where `firstDimToCollapse` is now the innermost dim.
 /// TODO: Extract the logic that writes to outIndices so that this method
 /// simply checks one pre-condition.
-static LogicalResult
-checkAndCollapseInnerZeroIndices(ValueRange indices, int64_t firstDimToCollapse,
-                                 SmallVector<Value> &outIndices) {
-  int64_t rank = indices.size();
-  if (firstDimToCollapse >= rank)
-    return failure();
-  for (int64_t i = firstDimToCollapse; i < rank; ++i) {
-    std::optional<int64_t> cst = getConstantIntValue(indices[i]);
-    if (!cst || cst.value() != 0)
-      return failure();
+static SmallVector<Value> getCollapsedIndices(RewriterBase &rewriter,
+                                              Location loc,
+                                              ArrayRef<int64_t> shape,
+                                              ValueRange indices,
+                                              int64_t firstDimToCollapse) {
+  assert(firstDimToCollapse < static_cast<int64_t>(indices.size()));
+
+  // If all the collapsed indices are zero then no extra logic is needed.
+  // Otherwise, a new offset/index has to be computed.
+  SmallVector<Value> collapsedIndices(indices.begin(),
+                                      indices.begin() + firstDimToCollapse);
+  SmallVector<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
+                                       indices.end());
+  if (llvm::all_of(indicesToCollapse, isZeroIndex)) {
+    collapsedIndices.push_back(indicesToCollapse[0]);
+    return collapsedIndices;
+  }
+
+  // Compute the remaining trailing index/offset required for reading from
+  // the collapsed memref:
+  //
+  //    offset = 0
+  //    for (i = firstDimToCollapse; i < outputRank; ++i)
+  //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
+  //
+  // For this example:
+  //   %2 = vector.transfer_read/write %arg4[%c0, %arg0, %c0] (...) :
+  //      memref<1x43x2xi32>, vector<1x2xi32>
+  // which would be collapsed to:
+  //   %1 = vector.transfer_read/write %collapse_shape[%c0, %offset] (...) :
+  //      memref<1x86xi32>, vector<2xi32>
+  // one would get the following offset:
+  //    %offset = %arg0 * 43
+  OpFoldResult collapsedOffset =
+      rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
+
+  auto collapsedStrides = computeSuffixProduct(
+      ArrayRef<int64_t>(shape.begin() + firstDimToCollapse, shape.end()));
+
+  // Compute the collapsed offset.
+  auto &&[collapsedExpr, collapsedVals] =
+      computeLinearIndex(collapsedOffset, collapsedStrides, indicesToCollapse);
+  collapsedOffset = affine::makeComposedFoldedAffineApply(
+      rewriter, loc, collapsedExpr, collapsedVals);
+
+  if (collapsedOffset.is<Value>()) {
+    collapsedIndices.push_back(collapsedOffset.get<Value>());
+  } else {
+    collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
+        loc, *getConstantIntValue(collapsedOffset)));
   }
-  outIndices = indices;
-  outIndices.resize(firstDimToCollapse + 1);
-  return success();
+
+  return collapsedIndices;
 }
 
 namespace {
@@ -594,54 +633,9 @@ class FlattenContiguousRowMajorTransferReadPattern
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
 
     // 2.2 New indices
-    // If all the collapsed indices are zero then no extra logic is needed.
-    // Otherwise, a new offset/index has to be computed.
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferReadOp.getIndices(),
-                                                firstDimToCollapse,
-                                                collapsedIndices))) {
-      // Copy all the leading indices.
-      SmallVector<Value> indices = transferReadOp.getIndices();
-      collapsedIndices.append(indices.begin(),
-                              indices.begin() + firstDimToCollapse);
-
-      // Compute the remaining trailing index/offset required for reading from
-      // the collapsed memref:
-      //
-      //    offset = 0
-      //    for (i = firstDimToCollapse; i < outputRank; ++i)
-      //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
-      //
-      // For this example:
-      //   %2 = vector.transfer_read %arg4[%c0, %arg0, %c0] (...) :
-      //      memref<1x43x2xi32>, vector<1x2xi32>
-      // which would be collapsed to:
-      //   %1 = vector.transfer_read %collapse_shape[%c0, %offset] (...) :
-      //      memref<1x86xi32>, vector<2xi32>
-      // one would get the following offset:
-      //    %offset = %arg0 * 43
-      OpFoldResult collapsedOffset =
-          rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
-
-      auto sourceShape = sourceType.getShape();
-      auto collapsedStrides = computeSuffixProduct(ArrayRef<int64_t>(
-          sourceShape.begin() + firstDimToCollapse, sourceShape.end()));
-
-      // Compute the collapsed offset.
-      ArrayRef<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
-                                        indices.end());
-      auto &&[collapsedExpr, collapsedVals] = computeLinearIndex(
-          collapsedOffset, collapsedStrides, indicesToCollapse);
-      collapsedOffset = affine::makeComposedFoldedAffineApply(
-          rewriter, loc, collapsedExpr, collapsedVals);
-
-      if (collapsedOffset.is<Value>()) {
-        collapsedIndices.push_back(collapsedOffset.get<Value>());
-      } else {
-        collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
-            loc, *getConstantIntValue(collapsedOffset)));
-      }
-    }
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferReadOp.getIndices(), firstDimToCollapse);
 
     // 3. Create new vector.transfer_read that reads from the collapsed memref
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
@@ -697,8 +691,7 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (!vector::isContiguousSlice(sourceType, vectorType))
       return failure();
-    int64_t firstContiguousInnerDim =
-        sourceType.getRank() - vectorType.getRank();
+    int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank();
     // TODO: generalize this pattern, relax the requirements here.
     if (transferWriteOp.hasOutOfBoundsDim())
       return failure();
@@ -706,22 +699,23 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (transferWriteOp.getMask())
       return failure();
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferWriteOp.getIndices(),
-                                                firstContiguousInnerDim,
-                                                collapsedIndices)))
-      return failure();
+
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferWriteOp.getIndices(), firstDimToCollapse);
 
     Value collapsedSource =
-        collapseInnerDims(rewriter, loc, source, firstContiguousInnerDim);
+        collapseInnerDims(rewriter, loc, source, firstDimToCollapse);
     MemRefType collapsedSourceType =
         cast<MemRefType>(collapsedSource.getType());
     int64_t collapsedRank = collapsedSourceType.getRank();
-    assert(collapsedRank == firstContiguousInnerDim + 1);
+    assert(collapsedRank == firstDimToCollapse + 1);
+
     SmallVector<AffineExpr, 1> dimExprs{
-        getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};
+        getAffineDimExpr(firstDimToCollapse, rewriter.getContext())};
     auto collapsedMap =
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
+
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
                                                 vectorType.getElementType());
     Value flatVector =
diff --git a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
index 788ae9ac044ed..b5f29b2ac958b 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
@@ -471,16 +471,16 @@ func.func @regression_non_contiguous_dim_read(%subview : memref<1x3x3x2xf32, str
 }
 
 //       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
-// CHECK-LABEL:    func.func @regression_non_contiguous_dim_read(
-//       CHECK:      %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-//       CHECK:     %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_read(
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
 
 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape
 
 // -----
 
-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
                                                 %subview : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
                                                 %idx0 : index, %idx1 : index) {
   %c0 = arith.constant 0 : index
@@ -488,8 +488,10 @@ func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
   return
 }
 
-// CHECK-LABEL:  func.func @unsupported_non_contiguous_dim_write(
-//   CHECK-NOT:    memref.collapse_shape
+//       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_write(
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
 
-// CHECK-128B-LABEL: func @unsupported_non_contiguous_dim_write(
-//   CHECK-128B-NOT:   memref.collapse_shape
+// CHECK-128B-LABEL: func @regression_non_contiguous_dim_write(
+//       CHECK-128B:   memref.collapse_shape

@llvmbot
Copy link
Member

llvmbot commented May 31, 2024

@llvm/pr-subscribers-mlir

Author: Han-Chung Wang (hanhanW)

Changes

We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.


Full diff: https://github.com/llvm/llvm-project/pull/94051.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp (+65-71)
  • (modified) mlir/test/Dialect/Vector/vector-transfer-flatten.mlir (+10-8)
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
index 997b56a1ce142..ae47d7fc28811 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
@@ -510,20 +510,59 @@ static Value collapseInnerDims(PatternRewriter &rewriter, mlir::Location loc,
 /// the truncated indices where `firstDimToCollapse` is now the innermost dim.
 /// TODO: Extract the logic that writes to outIndices so that this method
 /// simply checks one pre-condition.
-static LogicalResult
-checkAndCollapseInnerZeroIndices(ValueRange indices, int64_t firstDimToCollapse,
-                                 SmallVector<Value> &outIndices) {
-  int64_t rank = indices.size();
-  if (firstDimToCollapse >= rank)
-    return failure();
-  for (int64_t i = firstDimToCollapse; i < rank; ++i) {
-    std::optional<int64_t> cst = getConstantIntValue(indices[i]);
-    if (!cst || cst.value() != 0)
-      return failure();
+static SmallVector<Value> getCollapsedIndices(RewriterBase &rewriter,
+                                              Location loc,
+                                              ArrayRef<int64_t> shape,
+                                              ValueRange indices,
+                                              int64_t firstDimToCollapse) {
+  assert(firstDimToCollapse < static_cast<int64_t>(indices.size()));
+
+  // If all the collapsed indices are zero then no extra logic is needed.
+  // Otherwise, a new offset/index has to be computed.
+  SmallVector<Value> collapsedIndices(indices.begin(),
+                                      indices.begin() + firstDimToCollapse);
+  SmallVector<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
+                                       indices.end());
+  if (llvm::all_of(indicesToCollapse, isZeroIndex)) {
+    collapsedIndices.push_back(indicesToCollapse[0]);
+    return collapsedIndices;
+  }
+
+  // Compute the remaining trailing index/offset required for reading from
+  // the collapsed memref:
+  //
+  //    offset = 0
+  //    for (i = firstDimToCollapse; i < outputRank; ++i)
+  //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
+  //
+  // For this example:
+  //   %2 = vector.transfer_read/write %arg4[%c0, %arg0, %c0] (...) :
+  //      memref<1x43x2xi32>, vector<1x2xi32>
+  // which would be collapsed to:
+  //   %1 = vector.transfer_read/write %collapse_shape[%c0, %offset] (...) :
+  //      memref<1x86xi32>, vector<2xi32>
+  // one would get the following offset:
+  //    %offset = %arg0 * 43
+  OpFoldResult collapsedOffset =
+      rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
+
+  auto collapsedStrides = computeSuffixProduct(
+      ArrayRef<int64_t>(shape.begin() + firstDimToCollapse, shape.end()));
+
+  // Compute the collapsed offset.
+  auto &&[collapsedExpr, collapsedVals] =
+      computeLinearIndex(collapsedOffset, collapsedStrides, indicesToCollapse);
+  collapsedOffset = affine::makeComposedFoldedAffineApply(
+      rewriter, loc, collapsedExpr, collapsedVals);
+
+  if (collapsedOffset.is<Value>()) {
+    collapsedIndices.push_back(collapsedOffset.get<Value>());
+  } else {
+    collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
+        loc, *getConstantIntValue(collapsedOffset)));
   }
-  outIndices = indices;
-  outIndices.resize(firstDimToCollapse + 1);
-  return success();
+
+  return collapsedIndices;
 }
 
 namespace {
@@ -594,54 +633,9 @@ class FlattenContiguousRowMajorTransferReadPattern
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
 
     // 2.2 New indices
-    // If all the collapsed indices are zero then no extra logic is needed.
-    // Otherwise, a new offset/index has to be computed.
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferReadOp.getIndices(),
-                                                firstDimToCollapse,
-                                                collapsedIndices))) {
-      // Copy all the leading indices.
-      SmallVector<Value> indices = transferReadOp.getIndices();
-      collapsedIndices.append(indices.begin(),
-                              indices.begin() + firstDimToCollapse);
-
-      // Compute the remaining trailing index/offset required for reading from
-      // the collapsed memref:
-      //
-      //    offset = 0
-      //    for (i = firstDimToCollapse; i < outputRank; ++i)
-      //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
-      //
-      // For this example:
-      //   %2 = vector.transfer_read %arg4[%c0, %arg0, %c0] (...) :
-      //      memref<1x43x2xi32>, vector<1x2xi32>
-      // which would be collapsed to:
-      //   %1 = vector.transfer_read %collapse_shape[%c0, %offset] (...) :
-      //      memref<1x86xi32>, vector<2xi32>
-      // one would get the following offset:
-      //    %offset = %arg0 * 43
-      OpFoldResult collapsedOffset =
-          rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
-
-      auto sourceShape = sourceType.getShape();
-      auto collapsedStrides = computeSuffixProduct(ArrayRef<int64_t>(
-          sourceShape.begin() + firstDimToCollapse, sourceShape.end()));
-
-      // Compute the collapsed offset.
-      ArrayRef<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
-                                        indices.end());
-      auto &&[collapsedExpr, collapsedVals] = computeLinearIndex(
-          collapsedOffset, collapsedStrides, indicesToCollapse);
-      collapsedOffset = affine::makeComposedFoldedAffineApply(
-          rewriter, loc, collapsedExpr, collapsedVals);
-
-      if (collapsedOffset.is<Value>()) {
-        collapsedIndices.push_back(collapsedOffset.get<Value>());
-      } else {
-        collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
-            loc, *getConstantIntValue(collapsedOffset)));
-      }
-    }
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferReadOp.getIndices(), firstDimToCollapse);
 
     // 3. Create new vector.transfer_read that reads from the collapsed memref
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
@@ -697,8 +691,7 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (!vector::isContiguousSlice(sourceType, vectorType))
       return failure();
-    int64_t firstContiguousInnerDim =
-        sourceType.getRank() - vectorType.getRank();
+    int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank();
     // TODO: generalize this pattern, relax the requirements here.
     if (transferWriteOp.hasOutOfBoundsDim())
       return failure();
@@ -706,22 +699,23 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (transferWriteOp.getMask())
       return failure();
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferWriteOp.getIndices(),
-                                                firstContiguousInnerDim,
-                                                collapsedIndices)))
-      return failure();
+
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferWriteOp.getIndices(), firstDimToCollapse);
 
     Value collapsedSource =
-        collapseInnerDims(rewriter, loc, source, firstContiguousInnerDim);
+        collapseInnerDims(rewriter, loc, source, firstDimToCollapse);
     MemRefType collapsedSourceType =
         cast<MemRefType>(collapsedSource.getType());
     int64_t collapsedRank = collapsedSourceType.getRank();
-    assert(collapsedRank == firstContiguousInnerDim + 1);
+    assert(collapsedRank == firstDimToCollapse + 1);
+
     SmallVector<AffineExpr, 1> dimExprs{
-        getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};
+        getAffineDimExpr(firstDimToCollapse, rewriter.getContext())};
     auto collapsedMap =
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
+
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
                                                 vectorType.getElementType());
     Value flatVector =
diff --git a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
index 788ae9ac044ed..b5f29b2ac958b 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
@@ -471,16 +471,16 @@ func.func @regression_non_contiguous_dim_read(%subview : memref<1x3x3x2xf32, str
 }
 
 //       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
-// CHECK-LABEL:    func.func @regression_non_contiguous_dim_read(
-//       CHECK:      %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-//       CHECK:     %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_read(
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
 
 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape
 
 // -----
 
-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
                                                 %subview : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
                                                 %idx0 : index, %idx1 : index) {
   %c0 = arith.constant 0 : index
@@ -488,8 +488,10 @@ func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
   return
 }
 
-// CHECK-LABEL:  func.func @unsupported_non_contiguous_dim_write(
-//   CHECK-NOT:    memref.collapse_shape
+//       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_write(
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
 
-// CHECK-128B-LABEL: func @unsupported_non_contiguous_dim_write(
-//   CHECK-128B-NOT:   memref.collapse_shape
+// CHECK-128B-LABEL: func @regression_non_contiguous_dim_write(
+//       CHECK-128B:   memref.collapse_shape

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, mostly LG. Looks like this is missing a test?

Comment on lines 510 to 512
/// the truncated indices where `firstDimToCollapse` is now the innermost dim.
/// TODO: Extract the logic that writes to outIndices so that this method
/// simply checks one pre-condition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

Comment on lines 522 to 523
SmallVector<Value> collapsedIndices(indices.begin(),
indices.begin() + firstDimToCollapse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collapsedIndices is a bit misleading - and I might be the one who have invented that, sorry 😅

The leading indices are actually preserved and only the trailing indices are "collapsed". So this is more like "indicesAfterCollapsingSrc", right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no src in the local function. How about renaming it to indicesAfterCollapsing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming it to indicesAfterCollapsing?

👍🏻


// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
// CHECK-128B: memref.collapse_shape

// -----

func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turns an existing testcase that couldn't be collapsed into one that can be collapsed. Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?

There are many cases are covered by CHECK-128B. E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases. Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]. I.e., it is not contiguous. Do you want me to add a test that can't be collapsed for every dimension?

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases.

But CHECK-128B tests a specific configuration of the pass rather than the generic version, right?

Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

In general, I think that it would be good to identify and document all edge cases through tests (most of the stuff is already here). I would start by renaming this test:

@regression_non_contiguous_dim_write

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:

  • @transfer_write_non_contigous_dims_mismatch_non_zero_idx

Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

(*) In general, I'm not a fan of regression tests, but appreciate that others find the valuable.

Copy link
Contributor Author

@hanhanW hanhanW Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

Yes, it is referring to the test. However, I was wrong on the statement because it is 2D vector. I will add a 3D vector case, and check that it is not collapsed.

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

I also found "regression" is weird, but I'd like to follow the naming style like the above xfer_read test. The new name is great, and I'll send a follow-up about naming and structure of the file.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

UPDATE: I already add the tests. They are negative_out_of_bound_transfer_read and negative_out_of_bound_transfer_write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found that we already have a negative test. It does not matter whether the last index is zero or not. So I'm going to merge the change and send a follow-up for the naming and structure.

func.func @transfer_write_dims_mismatch_non_contiguous(
%arg : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, %vec : vector<2x1x2x2xi8>) {
%c0 = arith.constant 0 : index
vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] :
vector<2x1x2x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>
return
}

@hanhanW hanhanW requested review from bjacob and banach-space June 3, 2024 21:37
Copy link
Contributor Author

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I forgot to submit the comments..

Comment on lines 510 to 512
/// the truncated indices where `firstDimToCollapse` is now the innermost dim.
/// TODO: Extract the logic that writes to outIndices so that this method
/// simply checks one pre-condition.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

Comment on lines 522 to 523
SmallVector<Value> collapsedIndices(indices.begin(),
indices.begin() + firstDimToCollapse);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no src in the local function. How about renaming it to indicesAfterCollapsing?


// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
// CHECK-128B: memref.collapse_shape

// -----

func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?

There are many cases are covered by CHECK-128B. E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases. Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]. I.e., it is not contiguous. Do you want me to add a test that can't be collapsed for every dimension?

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

@hanhanW
Copy link
Contributor Author

hanhanW commented Jun 4, 2024

Thanks, mostly LG. Looks like this is missing a test?

I think it does not miss a test. It adds the support and makes one of negative test work now. So we do have a test for it.

@banach-space
Copy link
Contributor

Thanks, mostly LG. Looks like this is missing a test?

I think it does not miss a test. It adds the support and makes one of negative test work now. So we do have a test for it.

Not sure how I missed that, sorry :)

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of non-blocking comments - let me know what you think and thanks for improving this 🙏🏻

Comment on lines 522 to 523
SmallVector<Value> collapsedIndices(indices.begin(),
indices.begin() + firstDimToCollapse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming it to indicesAfterCollapsing?

👍🏻


// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
// CHECK-128B: memref.collapse_shape

// -----

func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases.

But CHECK-128B tests a specific configuration of the pass rather than the generic version, right?

Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

In general, I think that it would be good to identify and document all edge cases through tests (most of the stuff is already here). I would start by renaming this test:

@regression_non_contiguous_dim_write

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:

  • @transfer_write_non_contigous_dims_mismatch_non_zero_idx

Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

(*) In general, I'm not a fan of regression tests, but appreciate that others find the valuable.

Copy link
Contributor Author

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the PR soon.


// CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
// CHECK-128B: memref.collapse_shape

// -----

func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
Copy link
Contributor Author

@hanhanW hanhanW Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

Yes, it is referring to the test. However, I was wrong on the statement because it is 2D vector. I will add a 3D vector case, and check that it is not collapsed.

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

I also found "regression" is weird, but I'd like to follow the naming style like the above xfer_read test. The new name is great, and I'll send a follow-up about naming and structure of the file.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

UPDATE: I already add the tests. They are negative_out_of_bound_transfer_read and negative_out_of_bound_transfer_write.

@hanhanW hanhanW merged commit 53ddc87 into llvm:main Jun 5, 2024
7 checks passed
@hanhanW hanhanW deleted the flatten-xfer-write branch June 5, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants