[mlir][vector] Improve flattening vector.transfer_write ops. #94051

hanhanW · 2024-05-31T21:43:31Z

We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.

llvmbot · 2024-05-31T21:44:05Z

@llvm/pr-subscribers-mlir-vector

Author: Han-Chung Wang (hanhanW)

Changes

We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.

Full diff: https://github.com/llvm/llvm-project/pull/94051.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp (+65-71)
(modified) mlir/test/Dialect/Vector/vector-transfer-flatten.mlir (+10-8)

diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
index 997b56a1ce142..ae47d7fc28811 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
@@ -510,20 +510,59 @@ static Value collapseInnerDims(PatternRewriter &rewriter, mlir::Location loc,
 /// the truncated indices where `firstDimToCollapse` is now the innermost dim.
 /// TODO: Extract the logic that writes to outIndices so that this method
 /// simply checks one pre-condition.
-static LogicalResult
-checkAndCollapseInnerZeroIndices(ValueRange indices, int64_t firstDimToCollapse,
-                                 SmallVector<Value> &outIndices) {
-  int64_t rank = indices.size();
-  if (firstDimToCollapse >= rank)
-    return failure();
-  for (int64_t i = firstDimToCollapse; i < rank; ++i) {
-    std::optional<int64_t> cst = getConstantIntValue(indices[i]);
-    if (!cst || cst.value() != 0)
-      return failure();
+static SmallVector<Value> getCollapsedIndices(RewriterBase &rewriter,
+                                              Location loc,
+                                              ArrayRef<int64_t> shape,
+                                              ValueRange indices,
+                                              int64_t firstDimToCollapse) {
+  assert(firstDimToCollapse < static_cast<int64_t>(indices.size()));
+
+  // If all the collapsed indices are zero then no extra logic is needed.
+  // Otherwise, a new offset/index has to be computed.
+  SmallVector<Value> collapsedIndices(indices.begin(),
+                                      indices.begin() + firstDimToCollapse);
+  SmallVector<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
+                                       indices.end());
+  if (llvm::all_of(indicesToCollapse, isZeroIndex)) {
+    collapsedIndices.push_back(indicesToCollapse[0]);
+    return collapsedIndices;
+  }
+
+  // Compute the remaining trailing index/offset required for reading from
+  // the collapsed memref:
+  //
+  //    offset = 0
+  //    for (i = firstDimToCollapse; i < outputRank; ++i)
+  //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
+  //
+  // For this example:
+  //   %2 = vector.transfer_read/write %arg4[%c0, %arg0, %c0] (...) :
+  //      memref<1x43x2xi32>, vector<1x2xi32>
+  // which would be collapsed to:
+  //   %1 = vector.transfer_read/write %collapse_shape[%c0, %offset] (...) :
+  //      memref<1x86xi32>, vector<2xi32>
+  // one would get the following offset:
+  //    %offset = %arg0 * 43
+  OpFoldResult collapsedOffset =
+      rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
+
+  auto collapsedStrides = computeSuffixProduct(
+      ArrayRef<int64_t>(shape.begin() + firstDimToCollapse, shape.end()));
+
+  // Compute the collapsed offset.
+  auto &&[collapsedExpr, collapsedVals] =
+      computeLinearIndex(collapsedOffset, collapsedStrides, indicesToCollapse);
+  collapsedOffset = affine::makeComposedFoldedAffineApply(
+      rewriter, loc, collapsedExpr, collapsedVals);
+
+  if (collapsedOffset.is<Value>()) {
+    collapsedIndices.push_back(collapsedOffset.get<Value>());
+  } else {
+    collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
+        loc, *getConstantIntValue(collapsedOffset)));
   }
-  outIndices = indices;
-  outIndices.resize(firstDimToCollapse + 1);
-  return success();
+
+  return collapsedIndices;
 }
 
 namespace {
@@ -594,54 +633,9 @@ class FlattenContiguousRowMajorTransferReadPattern
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
 
     // 2.2 New indices
-    // If all the collapsed indices are zero then no extra logic is needed.
-    // Otherwise, a new offset/index has to be computed.
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferReadOp.getIndices(),
-                                                firstDimToCollapse,
-                                                collapsedIndices))) {
-      // Copy all the leading indices.
-      SmallVector<Value> indices = transferReadOp.getIndices();
-      collapsedIndices.append(indices.begin(),
-                              indices.begin() + firstDimToCollapse);
-
-      // Compute the remaining trailing index/offset required for reading from
-      // the collapsed memref:
-      //
-      //    offset = 0
-      //    for (i = firstDimToCollapse; i < outputRank; ++i)
-      //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
-      //
-      // For this example:
-      //   %2 = vector.transfer_read %arg4[%c0, %arg0, %c0] (...) :
-      //      memref<1x43x2xi32>, vector<1x2xi32>
-      // which would be collapsed to:
-      //   %1 = vector.transfer_read %collapse_shape[%c0, %offset] (...) :
-      //      memref<1x86xi32>, vector<2xi32>
-      // one would get the following offset:
-      //    %offset = %arg0 * 43
-      OpFoldResult collapsedOffset =
-          rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
-
-      auto sourceShape = sourceType.getShape();
-      auto collapsedStrides = computeSuffixProduct(ArrayRef<int64_t>(
-          sourceShape.begin() + firstDimToCollapse, sourceShape.end()));
-
-      // Compute the collapsed offset.
-      ArrayRef<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
-                                        indices.end());
-      auto &&[collapsedExpr, collapsedVals] = computeLinearIndex(
-          collapsedOffset, collapsedStrides, indicesToCollapse);
-      collapsedOffset = affine::makeComposedFoldedAffineApply(
-          rewriter, loc, collapsedExpr, collapsedVals);
-
-      if (collapsedOffset.is<Value>()) {
-        collapsedIndices.push_back(collapsedOffset.get<Value>());
-      } else {
-        collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
-            loc, *getConstantIntValue(collapsedOffset)));
-      }
-    }
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferReadOp.getIndices(), firstDimToCollapse);
 
     // 3. Create new vector.transfer_read that reads from the collapsed memref
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
@@ -697,8 +691,7 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (!vector::isContiguousSlice(sourceType, vectorType))
       return failure();
-    int64_t firstContiguousInnerDim =
-        sourceType.getRank() - vectorType.getRank();
+    int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank();
     // TODO: generalize this pattern, relax the requirements here.
     if (transferWriteOp.hasOutOfBoundsDim())
       return failure();
@@ -706,22 +699,23 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (transferWriteOp.getMask())
       return failure();
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferWriteOp.getIndices(),
-                                                firstContiguousInnerDim,
-                                                collapsedIndices)))
-      return failure();
+
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferWriteOp.getIndices(), firstDimToCollapse);
 
     Value collapsedSource =
-        collapseInnerDims(rewriter, loc, source, firstContiguousInnerDim);
+        collapseInnerDims(rewriter, loc, source, firstDimToCollapse);
     MemRefType collapsedSourceType =
         cast<MemRefType>(collapsedSource.getType());
     int64_t collapsedRank = collapsedSourceType.getRank();
-    assert(collapsedRank == firstContiguousInnerDim + 1);
+    assert(collapsedRank == firstDimToCollapse + 1);
+
     SmallVector<AffineExpr, 1> dimExprs{
-        getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};
+        getAffineDimExpr(firstDimToCollapse, rewriter.getContext())};
     auto collapsedMap =
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
+
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
                                                 vectorType.getElementType());
     Value flatVector =
diff --git a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
index 788ae9ac044ed..b5f29b2ac958b 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
@@ -471,16 +471,16 @@ func.func @regression_non_contiguous_dim_read(%subview : memref<1x3x3x2xf32, str
 }
 
 //       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
-// CHECK-LABEL:    func.func @regression_non_contiguous_dim_read(
-//       CHECK:      %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-//       CHECK:     %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_read(
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
 
 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape
 
 // -----
 
-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
                                                 %subview : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
                                                 %idx0 : index, %idx1 : index) {
   %c0 = arith.constant 0 : index
@@ -488,8 +488,10 @@ func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
   return
 }
 
-// CHECK-LABEL:  func.func @unsupported_non_contiguous_dim_write(
-//   CHECK-NOT:    memref.collapse_shape
+//       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_write(
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
 
-// CHECK-128B-LABEL: func @unsupported_non_contiguous_dim_write(
-//   CHECK-128B-NOT:   memref.collapse_shape
+// CHECK-128B-LABEL: func @regression_non_contiguous_dim_write(
+//       CHECK-128B:   memref.collapse_shape

llvmbot · 2024-05-31T21:44:05Z

@llvm/pr-subscribers-mlir

Author: Han-Chung Wang (hanhanW)

Changes

We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.

Full diff: https://github.com/llvm/llvm-project/pull/94051.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp (+65-71)
(modified) mlir/test/Dialect/Vector/vector-transfer-flatten.mlir (+10-8)

diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
index 997b56a1ce142..ae47d7fc28811 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp
@@ -510,20 +510,59 @@ static Value collapseInnerDims(PatternRewriter &rewriter, mlir::Location loc,
 /// the truncated indices where `firstDimToCollapse` is now the innermost dim.
 /// TODO: Extract the logic that writes to outIndices so that this method
 /// simply checks one pre-condition.
-static LogicalResult
-checkAndCollapseInnerZeroIndices(ValueRange indices, int64_t firstDimToCollapse,
-                                 SmallVector<Value> &outIndices) {
-  int64_t rank = indices.size();
-  if (firstDimToCollapse >= rank)
-    return failure();
-  for (int64_t i = firstDimToCollapse; i < rank; ++i) {
-    std::optional<int64_t> cst = getConstantIntValue(indices[i]);
-    if (!cst || cst.value() != 0)
-      return failure();
+static SmallVector<Value> getCollapsedIndices(RewriterBase &rewriter,
+                                              Location loc,
+                                              ArrayRef<int64_t> shape,
+                                              ValueRange indices,
+                                              int64_t firstDimToCollapse) {
+  assert(firstDimToCollapse < static_cast<int64_t>(indices.size()));
+
+  // If all the collapsed indices are zero then no extra logic is needed.
+  // Otherwise, a new offset/index has to be computed.
+  SmallVector<Value> collapsedIndices(indices.begin(),
+                                      indices.begin() + firstDimToCollapse);
+  SmallVector<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
+                                       indices.end());
+  if (llvm::all_of(indicesToCollapse, isZeroIndex)) {
+    collapsedIndices.push_back(indicesToCollapse[0]);
+    return collapsedIndices;
+  }
+
+  // Compute the remaining trailing index/offset required for reading from
+  // the collapsed memref:
+  //
+  //    offset = 0
+  //    for (i = firstDimToCollapse; i < outputRank; ++i)
+  //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
+  //
+  // For this example:
+  //   %2 = vector.transfer_read/write %arg4[%c0, %arg0, %c0] (...) :
+  //      memref<1x43x2xi32>, vector<1x2xi32>
+  // which would be collapsed to:
+  //   %1 = vector.transfer_read/write %collapse_shape[%c0, %offset] (...) :
+  //      memref<1x86xi32>, vector<2xi32>
+  // one would get the following offset:
+  //    %offset = %arg0 * 43
+  OpFoldResult collapsedOffset =
+      rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
+
+  auto collapsedStrides = computeSuffixProduct(
+      ArrayRef<int64_t>(shape.begin() + firstDimToCollapse, shape.end()));
+
+  // Compute the collapsed offset.
+  auto &&[collapsedExpr, collapsedVals] =
+      computeLinearIndex(collapsedOffset, collapsedStrides, indicesToCollapse);
+  collapsedOffset = affine::makeComposedFoldedAffineApply(
+      rewriter, loc, collapsedExpr, collapsedVals);
+
+  if (collapsedOffset.is<Value>()) {
+    collapsedIndices.push_back(collapsedOffset.get<Value>());
+  } else {
+    collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
+        loc, *getConstantIntValue(collapsedOffset)));
   }
-  outIndices = indices;
-  outIndices.resize(firstDimToCollapse + 1);
-  return success();
+
+  return collapsedIndices;
 }
 
 namespace {
@@ -594,54 +633,9 @@ class FlattenContiguousRowMajorTransferReadPattern
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
 
     // 2.2 New indices
-    // If all the collapsed indices are zero then no extra logic is needed.
-    // Otherwise, a new offset/index has to be computed.
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferReadOp.getIndices(),
-                                                firstDimToCollapse,
-                                                collapsedIndices))) {
-      // Copy all the leading indices.
-      SmallVector<Value> indices = transferReadOp.getIndices();
-      collapsedIndices.append(indices.begin(),
-                              indices.begin() + firstDimToCollapse);
-
-      // Compute the remaining trailing index/offset required for reading from
-      // the collapsed memref:
-      //
-      //    offset = 0
-      //    for (i = firstDimToCollapse; i < outputRank; ++i)
-      //      offset += sourceType.getDimSize(i) * transferReadOp.indices[i]
-      //
-      // For this example:
-      //   %2 = vector.transfer_read %arg4[%c0, %arg0, %c0] (...) :
-      //      memref<1x43x2xi32>, vector<1x2xi32>
-      // which would be collapsed to:
-      //   %1 = vector.transfer_read %collapse_shape[%c0, %offset] (...) :
-      //      memref<1x86xi32>, vector<2xi32>
-      // one would get the following offset:
-      //    %offset = %arg0 * 43
-      OpFoldResult collapsedOffset =
-          rewriter.create<arith::ConstantIndexOp>(loc, 0).getResult();
-
-      auto sourceShape = sourceType.getShape();
-      auto collapsedStrides = computeSuffixProduct(ArrayRef<int64_t>(
-          sourceShape.begin() + firstDimToCollapse, sourceShape.end()));
-
-      // Compute the collapsed offset.
-      ArrayRef<Value> indicesToCollapse(indices.begin() + firstDimToCollapse,
-                                        indices.end());
-      auto &&[collapsedExpr, collapsedVals] = computeLinearIndex(
-          collapsedOffset, collapsedStrides, indicesToCollapse);
-      collapsedOffset = affine::makeComposedFoldedAffineApply(
-          rewriter, loc, collapsedExpr, collapsedVals);
-
-      if (collapsedOffset.is<Value>()) {
-        collapsedIndices.push_back(collapsedOffset.get<Value>());
-      } else {
-        collapsedIndices.push_back(rewriter.create<arith::ConstantIndexOp>(
-            loc, *getConstantIntValue(collapsedOffset)));
-      }
-    }
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferReadOp.getIndices(), firstDimToCollapse);
 
     // 3. Create new vector.transfer_read that reads from the collapsed memref
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
@@ -697,8 +691,7 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (!vector::isContiguousSlice(sourceType, vectorType))
       return failure();
-    int64_t firstContiguousInnerDim =
-        sourceType.getRank() - vectorType.getRank();
+    int64_t firstDimToCollapse = sourceType.getRank() - vectorType.getRank();
     // TODO: generalize this pattern, relax the requirements here.
     if (transferWriteOp.hasOutOfBoundsDim())
       return failure();
@@ -706,22 +699,23 @@ class FlattenContiguousRowMajorTransferWritePattern
       return failure();
     if (transferWriteOp.getMask())
       return failure();
-    SmallVector<Value> collapsedIndices;
-    if (failed(checkAndCollapseInnerZeroIndices(transferWriteOp.getIndices(),
-                                                firstContiguousInnerDim,
-                                                collapsedIndices)))
-      return failure();
+
+    SmallVector<Value> collapsedIndices =
+        getCollapsedIndices(rewriter, loc, sourceType.getShape(),
+                            transferWriteOp.getIndices(), firstDimToCollapse);
 
     Value collapsedSource =
-        collapseInnerDims(rewriter, loc, source, firstContiguousInnerDim);
+        collapseInnerDims(rewriter, loc, source, firstDimToCollapse);
     MemRefType collapsedSourceType =
         cast<MemRefType>(collapsedSource.getType());
     int64_t collapsedRank = collapsedSourceType.getRank();
-    assert(collapsedRank == firstContiguousInnerDim + 1);
+    assert(collapsedRank == firstDimToCollapse + 1);
+
     SmallVector<AffineExpr, 1> dimExprs{
-        getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};
+        getAffineDimExpr(firstDimToCollapse, rewriter.getContext())};
     auto collapsedMap =
         AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
+
     VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
                                                 vectorType.getElementType());
     Value flatVector =
diff --git a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
index 788ae9ac044ed..b5f29b2ac958b 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
@@ -471,16 +471,16 @@ func.func @regression_non_contiguous_dim_read(%subview : memref<1x3x3x2xf32, str
 }
 
 //       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
-// CHECK-LABEL:    func.func @regression_non_contiguous_dim_read(
-//       CHECK:      %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
-//       CHECK:     %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_read(
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
 
 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape
 
 // -----
 
-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,
                                                 %subview : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>>,
                                                 %idx0 : index, %idx1 : index) {
   %c0 = arith.constant 0 : index
@@ -488,8 +488,10 @@ func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
   return
 }
 
-// CHECK-LABEL:  func.func @unsupported_non_contiguous_dim_write(
-//   CHECK-NOT:    memref.collapse_shape
+//       CHECK:  #[[$MAP:.+]] = affine_map<()[s0] -> (s0 * 2)>
+// CHECK-LABEL:  func.func @regression_non_contiguous_dim_write(
+//       CHECK:    %[[APPLY:.*]] = affine.apply #[[$MAP]]()
+//       CHECK:    %[[COLLAPSE:.+]] = memref.collapse_shape %{{.*}} {{\[}}[0], [1], [2, 3]] : memref<1x3x3x2xf32, strided<[40, 10, 2, 1], offset: ?>> into memref<1x3x6xf32, strided<[40, 10, 1], offset: ?>>
 
-// CHECK-128B-LABEL: func @unsupported_non_contiguous_dim_write(
-//   CHECK-128B-NOT:   memref.collapse_shape
+// CHECK-128B-LABEL: func @regression_non_contiguous_dim_write(
+//       CHECK-128B:   memref.collapse_shape

banach-space

Thanks, mostly LG. Looks like this is missing a test?

banach-space · 2024-06-03T09:34:58Z

mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp

 /// the truncated indices where `firstDimToCollapse` is now the innermost dim.
 /// TODO: Extract the logic that writes to outIndices so that this method
 /// simply checks one pre-condition.


This comment should be updated, right?

good catch, thanks!

banach-space · 2024-06-03T09:39:33Z

mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp

+  SmallVector<Value> collapsedIndices(indices.begin(),
+                                      indices.begin() + firstDimToCollapse);


collapsedIndices is a bit misleading - and I might be the one who have invented that, sorry 😅

The leading indices are actually preserved and only the trailing indices are "collapsed". So this is more like "indicesAfterCollapsingSrc", right?

There are no src in the local function. How about renaming it to indicesAfterCollapsing?

How about renaming it to indicesAfterCollapsing?

👍🏻

bjacob · 2024-06-03T14:10:34Z

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir


 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape

 // -----

-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,


This turns an existing testcase that couldn't be collapsed into one that can be collapsed. Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?

There are many cases are covered by CHECK-128B. E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases. Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]. I.e., it is not contiguous. Do you want me to add a test that can't be collapsed for every dimension?

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases.

But CHECK-128B tests a specific configuration of the pass rather than the generic version, right?

Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

In general, I think that it would be good to identify and document all edge cases through tests (most of the stuff is already here). I would start by renaming this test:

@regression_non_contiguous_dim_write

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:

@transfer_write_non_contigous_dims_mismatch_non_zero_idx

Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

(*) In general, I'm not a fan of regression tests, but appreciate that others find the valuable.

Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

Yes, it is referring to the test. However, I was wrong on the statement because it is 2D vector. I will add a 3D vector case, and check that it is not collapsed.

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

I also found "regression" is weird, but I'd like to follow the naming style like the above xfer_read test. The new name is great, and I'll send a follow-up about naming and structure of the file.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

UPDATE: I already add the tests. They are negative_out_of_bound_transfer_read and negative_out_of_bound_transfer_write.

I just found that we already have a negative test. It does not matter whether the last index is zero or not. So I'm going to merge the change and send a follow-up for the naming and structure.

llvm-project/mlir/test/Dialect/Vector/vector-transfer-flatten.mlir

Lines 215 to 221 in 0e743ec

func.func @transfer_write_dims_mismatch_non_contiguous(

%arg : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, %vec : vector<2x1x2x2xi8>) {

%c0 = arith.constant 0 : index

vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] :

vector<2x1x2x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>

return

}

hanhanW

hmm, I forgot to submit the comments..

hanhanW · 2024-06-03T19:02:25Z

mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp

 /// the truncated indices where `firstDimToCollapse` is now the innermost dim.
 /// TODO: Extract the logic that writes to outIndices so that this method
 /// simply checks one pre-condition.


good catch, thanks!

hanhanW · 2024-06-03T19:10:16Z

mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp

+  SmallVector<Value> collapsedIndices(indices.begin(),
+                                      indices.begin() + firstDimToCollapse);


There are no src in the local function. How about renaming it to indicesAfterCollapsing?

hanhanW · 2024-06-03T21:29:36Z

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir


 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape

 // -----

-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,


Is any other testcase left testing remaining cases that more fundamentally can't be collapsed?

There are many cases are covered by CHECK-128B. E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases. Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]. I.e., it is not contiguous. Do you want me to add a test that can't be collapsed for every dimension?

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

hanhanW · 2024-06-04T18:10:16Z

Thanks, mostly LG. Looks like this is missing a test?

I think it does not miss a test. It adds the support and makes one of negative test work now. So we do have a test for it.

banach-space · 2024-06-04T21:07:32Z

Thanks, mostly LG. Looks like this is missing a test?

I think it does not miss a test. It adds the support and makes one of negative test work now. So we do have a test for it.

Not sure how I missed that, sorry :)

banach-space

A couple of non-blocking comments - let me know what you think and thanks for improving this 🙏🏻

banach-space · 2024-06-04T20:39:28Z

mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp

+  SmallVector<Value> collapsedIndices(indices.begin(),
+                                      indices.begin() + firstDimToCollapse);


How about renaming it to indicesAfterCollapsing?

👍🏻

banach-space · 2024-06-04T21:06:44Z

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir


 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape

 // -----

-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,


E.g., there is a CHECK-128B: memref.collapse_shape in this and other testcases.

But CHECK-128B tests a specific configuration of the pass rather than the generic version, right?

Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

In general, I think that it would be good to identify and document all edge cases through tests (most of the stuff is already here). I would start by renaming this test:

@regression_non_contiguous_dim_write

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:

@transfer_write_non_contigous_dims_mismatch_non_zero_idx

Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

Also, is any testcase testing in_bounds = false and specifically any edge case where the ability to collapse depends on in_bounds ?

There are no negative tests for the case. I can add one to iterate it.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

(*) In general, I'm not a fan of regression tests, but appreciate that others find the valuable.

hanhanW

I'll update the PR soon.

hanhanW · 2024-06-05T21:52:44Z

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir


 // CHECK-128B-LABEL: func @regression_non_contiguous_dim_read(
 //       CHECK-128B:   memref.collapse_shape

 // -----

-func.func @unsupported_non_contiguous_dim_write(%value : vector<2x2xf32>,
+func.func @regression_non_contiguous_dim_write(%value : vector<2x2xf32>,


Fundamentally, we can not collapse three inner dimensions because the strides is [x, 10, 2, 1] but not [x, 6, 2, 1]

Just to double check - this is referring to regression_non_contiguous_dim_write? In which collapsing does happen?

Yes, it is referring to the test. However, I was wrong on the statement because it is 2D vector. I will add a 3D vector case, and check that it is not collapsed.

"Regression" doesn't really help differentiate this test from other tests. It's a missed opportunity to document what makes this test case unique - i.e. non-zero indices :) (*) So, perhaps:
@transfer_write_non_contigous_dims_mismatch_non_zero_idx
Next, you could move it near @transfer_write_dims_mismatch_non_contiguous - both tests seem to check sth very similar, two edge cases for vector.transfer_write. I find it very helpful when such edge cases are clustered together.

I also found "regression" is weird, but I'd like to follow the naming style like the above xfer_read test. The new name is great, and I'll send a follow-up about naming and structure of the file.

I've been actually discussing this with someone recently :) We don't need it for this PR, but it would be good to add a TODO somewhere (I can look into this).

UPDATE: I already add the tests. They are negative_out_of_bound_transfer_read and negative_out_of_bound_transfer_write.

[mlir][vector] Improve flattening vector.transfer_write ops.

0d76610

We can flatten the transfer ops even when the collapsed indices are not zeros. We can compute it. It is already supported in vector.transfer_read cases. The revision refactors the logic and reuse it in transfer_write cases.

hanhanW requested review from bjacob, MaheshRavishankar, banach-space, pashu123, dcaballe and Max191 May 31, 2024 21:43

hanhanW requested a review from nicolasvasilache as a code owner May 31, 2024 21:43

llvmbot added mlir:vectorops mlir mlir:vector labels May 31, 2024

banach-space reviewed Jun 3, 2024

View reviewed changes

bjacob reviewed Jun 3, 2024

View reviewed changes

address comments

0bb49f1

hanhanW requested review from bjacob and banach-space June 3, 2024 21:37

hanhanW commented Jun 4, 2024

View reviewed changes

bjacob approved these changes Jun 4, 2024

View reviewed changes

banach-space reviewed Jun 4, 2024

View reviewed changes

hanhanW commented Jun 5, 2024

View reviewed changes

hanhanW merged commit 53ddc87 into llvm:main Jun 5, 2024
7 checks passed

hanhanW deleted the flatten-xfer-write branch June 5, 2024 22:07

		SmallVector<Value> collapsedIndices(indices.begin(),
		indices.begin() + firstDimToCollapse);

	func.func @transfer_write_dims_mismatch_non_contiguous(
	%arg : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, %vec : vector<2x1x2x2xi8>) {
	%c0 = arith.constant 0 : index
	vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] :
	vector<2x1x2x2xi8>, memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>
	return
	}

[mlir][vector] Improve flattening vector.transfer_write ops. #94051

[mlir][vector] Improve flattening vector.transfer_write ops. #94051

Uh oh!

Conversation

hanhanW commented May 31, 2024

Uh oh!

llvmbot commented May 31, 2024

Uh oh!

llvmbot commented May 31, 2024

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW commented Jun 4, 2024

Uh oh!

banach-space commented Jun 4, 2024

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

hanhanW Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hanhanW Jun 5, 2024 •

edited

Loading

hanhanW Jun 5, 2024 •

edited

Loading