-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[mlir][vector] Update tests for collapse 3/n (nfc) #94906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][vector] Update tests for collapse 3/n (nfc) #94906
Conversation
@llvm/pr-subscribers-mlir-vector Author: Andrzej Warzyński (banach-space) Changes
Patch is 24.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94906.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
index f29eba90c3ceb..caf1506e0db93 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
@@ -1293,6 +1293,21 @@ class DropInnerMostUnitDimsTransferRead
if (dimsToDrop == 0)
return failure();
+ // Make sure that the indixes to be dropped are equal 0.
+ // TODO: Deal with cases when the indices are not 0.
+ auto isZeroIdx = [](Value idx) {
+ Attribute attr;
+ APInt value;
+ if (!matchPattern(idx, m_Constant(&attr)))
+ return false;
+ if (matchPattern(attr, m_ConstantInt(&value)))
+ if (!value.isZero())
+ return false;
+ return true;
+ };
+ if (!llvm::all_of(readOp.getIndices().take_back(dimsToDrop), isZeroIdx))
+ return failure();
+
auto resultTargetVecType =
VectorType::get(targetType.getShape().drop_back(dimsToDrop),
targetType.getElementType(),
diff --git a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
index b4cb640108bae..4f8aab3729de8 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
@@ -1,12 +1,17 @@
// RUN: mlir-opt %s -test-vector-transfer-collapse-inner-most-dims -split-input-file | FileCheck %s
-func.func @contiguous_inner_most_view(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
+//-----------------------------------------------------------------------------
+// 1. vector.transfer_read
+//-----------------------------------------------------------------------------
+
+func.func @contiguous_inner_most(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
%0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
return %0 : vector<1x8x1xf32>
}
-// CHECK: func @contiguous_inner_most_view(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
+
+// CHECK: func @contiguous_inner_most(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK-SAME: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
// CHECK: %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]
@@ -14,15 +19,61 @@ func.func @contiguous_inner_most_view(%in: memref<1x1x8x1xf32, strided<[3072, 8,
// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
// CHECK: return %[[RESULT]]
+// Same as the top example within this split, but with the inner vector
+// dim scalable. Note that this example only makes sense when "8 = [8]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_scalable_inner_dim(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x[8]x1xf32>{
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x[8]x1xf32>
+ return %0 : vector<1x[8]x1xf32>
+}
+
+// CHECK: func @contiguous_inner_most_scalable_inner_dim(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK-SAME: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
+// CHECK: %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]
+// CHECK-SAME: memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>, vector<1x[8]xf32>
+// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
+// CHECK: return %[[RESULT]]
+
+// Same as the top example within this split, but the trailing unit dim was
+// replaced with a dyn dim - not supported
+
+func.func @non_unit_trailing_dim(%in: memref<1x1x8x?xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x?xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
+ return %0 : vector<1x8x1xf32>
+}
+
+// CHECK-LABEL: func @non_unit_trailing_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
+// Same as the top example within this split, but with a scalable unit dim in
+// the output vector - not supported (scalable 1 is _not_ a unit dimension).
+
+func.func @negative_scalable_unit_dim(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x[1]xf32>{
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x[1]xf32>
+ return %0 : vector<1x8x[1]xf32>
+}
+// CHECK-LABEL: func @negative_scalable_unit_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
// -----
-func.func @contiguous_outer_dyn_inner_most_view(%a: index, %b: index, %memref: memref<?x?x8x1xf32>) -> vector<8x1xf32> {
+func.func @contiguous_inner_most_dynamic_outer(%a: index, %b: index, %memref: memref<?x?x8x1xf32>) -> vector<8x1xf32> {
%c0 = arith.constant 0 : index
%pad = arith.constant 0.0 : f32
%v = vector.transfer_read %memref[%a, %b, %c0, %c0], %pad {in_bounds = [true, true]} : memref<?x?x8x1xf32>, vector<8x1xf32>
return %v : vector<8x1xf32>
}
-// CHECK: func.func @contiguous_outer_dyn_inner_most_view(
+// CHECK: func.func @contiguous_inner_most_dynamic_outer
// CHECK-SAME: %[[IDX0:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[IDX1:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
@@ -38,79 +89,179 @@ func.func @contiguous_outer_dyn_inner_most_view(%a: index, %b: index, %memref: m
// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
// CHECK: return %[[RESULT]]
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "8 = [8]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_outer_dim_dyn_scalable_inner_dim(%a: index, %b: index, %memref: memref<?x?x8x1xf32>) -> vector<[8]x1xf32> {
+ %c0 = arith.constant 0 : index
+ %pad = arith.constant 0.0 : f32
+ %v = vector.transfer_read %memref[%a, %b, %c0, %c0], %pad {in_bounds = [true, true]} : memref<?x?x8x1xf32>, vector<[8]x1xf32>
+ return %v : vector<[8]x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_outer_dim_dyn_scalable_inner_dim
+// CHECK-SAME: %[[IDX0:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[IDX1:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
+// CHECK: %[[VIEW:.+]] = memref.subview %[[SRC]]{{.*}} memref<?x?x8x1xf32> to memref<?x?x8xf32, strided<[?, 8, 1], offset: ?>>
+// CHECK: %[[VEC_READ:.+]] = vector.transfer_read %[[VIEW]]
+// CHECK-SAME: {in_bounds = [true]}
+// CHECK-SAME: memref<?x?x8xf32, strided<[?, 8, 1], offset: ?>>, vector<[8]xf32>
+// CHECK: vector.shape_cast %[[VEC_READ]]
+
// -----
-func.func @contiguous_inner_most_dim(%A: memref<16x1xf32>, %i:index, %j:index) -> (vector<8x1xf32>) {
+func.func @contiguous_inner_most_dim_non_zero_idx(%A: memref<16x1xf32>, %i:index) -> (vector<8x1xf32>) {
%c0 = arith.constant 0 : index
%f0 = arith.constant 0.0 : f32
- %1 = vector.transfer_read %A[%i, %j], %f0 : memref<16x1xf32>, vector<8x1xf32>
+ %1 = vector.transfer_read %A[%i, %c0], %f0 : memref<16x1xf32>, vector<8x1xf32>
return %1 : vector<8x1xf32>
}
-// CHECK: func @contiguous_inner_most_dim(%[[SRC:.+]]: memref<16x1xf32>, %[[I:.+]]: index, %[[J:.+]]: index) -> vector<8x1xf32>
+// CHECK: func @contiguous_inner_most_dim_non_zero_idx(%[[SRC:.+]]: memref<16x1xf32>, %[[I:.+]]: index) -> vector<8x1xf32>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK-SAME: memref<16x1xf32> to memref<16xf32, strided<[1]>>
// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_0]]
-// CHECK: %[[RESULT]] = vector.shape_cast %[[V]] : vector<8xf32> to vector<8x1xf32>
+// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[V]] : vector<8xf32> to vector<8x1xf32>
// CHECK: return %[[RESULT]]
+// The index to be dropped is != 0 - this is currently not supported.
+func.func @negative_contiguous_inner_most_dim_non_zero_idxs(%A: memref<16x1xf32>, %i:index) -> (vector<8x1xf32>) {
+ %f0 = arith.constant 0.0 : f32
+ %1 = vector.transfer_read %A[%i, %i], %f0 : memref<16x1xf32>, vector<8x1xf32>
+ return %1 : vector<8x1xf32>
+}
+// CHECK-LABEL: func @negative_contiguous_inner_most_dim_non_zero_idxs
+// CHECK-NOT: memref.subview
+// CHECK: vector.transfer_read
+
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "8 = [8]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_dim_non_zero_idx_scalable_inner_dim(%A: memref<16x1xf32>, %i:index) -> (vector<[8]x1xf32>) {
+ %c0 = arith.constant 0 : index
+ %f0 = arith.constant 0.0 : f32
+ %1 = vector.transfer_read %A[%i, %c0], %f0 : memref<16x1xf32>, vector<[8]x1xf32>
+ return %1 : vector<[8]x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_dim_non_zero_idx_scalable_inner_dim(
+// CHECK-SAME: %[[SRC:.+]]: memref<16x1xf32>, %[[I:.+]]: index) -> vector<[8]x1xf32>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK-SAME: memref<16x1xf32> to memref<16xf32, strided<[1]>>
+// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_0]]
+// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[V]] : vector<[8]xf32> to vector<[8]x1xf32>
+// CHECK: return %[[RESULT]]
+
// -----
-func.func @contiguous_inner_most_dim_bounds(%A: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<4x1xf32>) {
+func.func @contiguous_inner_most_dim_with_subview(%A: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<4x1xf32>) {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
%0 = memref.subview %A[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
%1 = vector.transfer_read %0[%ii, %c0], %cst {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1], offset: ?>>, vector<4x1xf32>
return %1 : vector<4x1xf32>
}
-// CHECK: func @contiguous_inner_most_dim_bounds(%[[SRC:.+]]: memref<1000x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1xf32>
+// CHECK: func @contiguous_inner_most_dim_with_subview(%[[SRC:.+]]: memref<1000x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1xf32>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK: %[[SRC_1:.+]] = memref.subview %[[SRC_0]]
// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_1]]
// CHECK-SAME: {in_bounds = [true]}
// CHECK-SAME: vector<4xf32>
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "4 = [4]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_dim_with_subview_scalable_inner_dim(%A: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<[4]x1xf32>) {
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = memref.subview %A[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
+ %1 = vector.transfer_read %0[%ii, %c0], %cst {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1], offset: ?>>, vector<[4]x1xf32>
+ return %1 : vector<[4]x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_dim_with_subview_scalable_inner_dim
+// CHECK-SAME: %[[SRC:.+]]: memref<1000x1xf32>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_0]]
+// CHECK-SAME: {in_bounds = [true]}
+// CHECK-SAME: vector<[4]xf32>
+
// -----
-func.func @contiguous_inner_most_dim_bounds_2d(%A: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<4x1x1xf32>) {
+func.func @contiguous_inner_most_dim_with_subview_2d(%A: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<4x1x1xf32>) {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
%0 = memref.subview %A[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
%1 = vector.transfer_read %0[%ii, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>, vector<4x1x1xf32>
return %1 : vector<4x1x1xf32>
}
-// CHECK: func @contiguous_inner_most_dim_bounds_2d(%[[SRC:.+]]: memref<1000x1x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1x1xf32>
+// CHECK: func @contiguous_inner_most_dim_with_subview_2d(%[[SRC:.+]]: memref<1000x1x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1x1xf32>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK: %[[SRC_1:.+]] = memref.subview %[[SRC_0]]
// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_1]]
// CHECK-SAME: {in_bounds = [true]}
// CHECK-SAME: vector<4xf32>
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "4 = [4]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_dim_with_subview_2d_scalable_inner_dim(%A: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<[4]x1x1xf32>) {
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = memref.subview %A[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
+ %1 = vector.transfer_read %0[%ii, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>, vector<[4]x1x1xf32>
+ return %1 : vector<[4]x1x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_dim_with_subview_2d_scalable_inner_dim(
+// CHECK-SAME: %[[SRC:.+]]: memref<1000x1x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<[4]x1x1xf32>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK: %[[SRC_1:.+]] = memref.subview %[[SRC_0]]
+// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_1]]
+// CHECK-SAME: {in_bounds = [true]}
+// CHECK-SAME: vector<[4]xf32>
+// CHECK: vector.shape_cast %[[V]]
+
// -----
-func.func @contiguous_inner_most_dim_out_of_bounds_2d(%arg0: memref<1x1xf32>) -> vector<4x8xf32> {
+// NOTE: This is an out-of-bounds access.
+
+func.func @negative_non_unit_inner_vec_dim(%arg0: memref<4x1xf32>) -> vector<4x8xf32> {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32
- %0 = vector.transfer_read %arg0[%c0, %c0], %cst : memref<1x1xf32>, vector<4x8xf32>
+ %0 = vector.transfer_read %arg0[%c0, %c0], %cst : memref<4x1xf32>, vector<4x8xf32>
return %0 : vector<4x8xf32>
}
-// The inner most unit dim can not be dropped. In this context, we do not
-// generate rank-reduced memref.subview ops.
-// CHECK: func.func @contiguous_inner_most_dim_out_of_bounds_2d
-// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
+// CHECK: func.func @negative_non_unit_inner_vec_dim
// CHECK-NOT: memref.subview
-// CHECK: %[[READ:.+]] = vector.transfer_read %[[SRC]]
-// CHECK: return %[[READ]] : vector<4x8xf32>
+// CHECK: vector.transfer_read
// -----
-func.func @drop_two_inner_most_dim_for_transfer_write(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x1xf32>, %arg2: index) {
+func.func @negative_non_unit_inner_memref_dim(%arg0: memref<4x8xf32>) -> vector<4x1xf32> {
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.000000e+00 : f32
+ %0 = vector.transfer_read %arg0[%c0, %c0], %cst : memref<4x8xf32>, vector<4x1xf32>
+ return %0 : vector<4x1xf32>
+}
+// CHECK: func.func @negative_non_unit_inner_memref_dim
+// CHECK-NOT: memref.subview
+// CHECK: vector.transfer_read
+
+// -----
+
+//-----------------------------------------------------------------------------
+// 2. vector.transfer_write
+//-----------------------------------------------------------------------------
+
+func.func @drop_two_inner_most_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x1xf32>, %arg2: index) {
%c0 = arith.constant 0 : index
vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
{in_bounds = [true, true, true, true, true]}
: vector<1x16x16x1x1xf32>, memref<1x512x16x1x1xf32>
return
}
-// CHECK: func.func @drop_two_inner_most_dim_for_transfer_write
+// CHECK: func.func @drop_two_inner_most_dim
// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[VEC:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[IDX:[a-zA-Z0-9]+]]
@@ -121,16 +272,67 @@ func.func @drop_two_inner_most_dim_for_transfer_write(%arg0: memref<1x512x16x1x1
// CHECK: vector.transfer_write %[[CAST]], %[[SUBVIEW]]
// CHECK-SAME: [%[[C0]], %[[IDX]], %[[C0]]]
+// Same as the top example within this split, but with the inner vector
+// dim scalable. Note that this example only makes sense when "16 = [16]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @drop_two_inner_most_dim_scalable_inner_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x[16]x1x1xf32>, %arg2: index) {
+ %c0 = arith.constant 0 : index
+ vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
+ {in_bounds = [true, true, true, true, true]}
+ : vector<1x16x[16]x1x1xf32>, memref<1x512x16x1x1xf32>
+ return
+}
+// CHECK: func.func @drop_two_inner_most_dim_scalable_inner_dim
+// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[VEC:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[IDX:[a-zA-Z0-9]+]]
+// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
+// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[DEST]]
+// CHECK-SAME: memref<1x512x16x1x1xf32> to memref<1x512x16xf32, strided<[8192, 16, 1]>>
+// CHECK: %[[CAST:.+]] = vector.shape_cast %[[VEC]] : vector<1x16x[16]x1x1xf32> to vector<1x16x[16]xf32>
+// CHECK: vector.transfer_write %[[CAST]], %[[SUBVIEW]]
+// CHECK-SAME: [%[[C0]], %[[IDX]], %[[C0]]]
+
+// Same as the top example within this split, but the trailing unit dim was
+// replaced with a dyn dim - not supported
+
+func.func @negative_non_unit_trailing_dim(%arg0: memref<1x512x16x1x?xf32>, %arg1: vector<1x16x16x1x1xf32>, %arg2: index) {
+ %c0 = arith.constant 0 : index
+ vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
+ {in_bounds = [true, true, true, true, true]}
+ : vector<1x16x16x1x1xf32>, memref<1x512x16x1x?xf32>
+ return
+}
+// CHECK: func.func @negative_non_unit_trailing_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
+// Same as the top example within this split, but with a scalable unit dim in
+// the output vector - not supported
+
+func.func @negative_scalable_unit_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x[1]xf32>, %arg2: index) {
+ %c0 = arith.constant 0 : index
+ vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
+ {in_bounds = [true, true, true, true, true]}
+ : vector<1x16x16x1x[1]xf32>, memref<1x512x16x1x1xf32>
+ return
+}
+
+// CHECK: func.func @negative_scalable_unit_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
// -----
-func.func @drop_inner_most_dim_for_transfer_write(%arg0: memref<1x512x16x1xf32, strided<[8192, 16, 1, 1], offset: ?>>, %arg1: vector<1x16x16x1xf32>, %arg2: index) {
+func.func @drop_inner_most_dim(%arg0: memref<1x512x16x1xf32, strided<[8192, 16, 1, 1], offset: ?>>, %arg1: vector<1x16x16x1xf32>, %arg2: index) {
%c0 = arith.constant 0 : index
vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0]
{in_bounds = [tr...
[truncated]
|
@llvm/pr-subscribers-mlir Author: Andrzej Warzyński (banach-space) Changes
Patch is 24.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94906.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
index f29eba90c3ceb..caf1506e0db93 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
@@ -1293,6 +1293,21 @@ class DropInnerMostUnitDimsTransferRead
if (dimsToDrop == 0)
return failure();
+ // Make sure that the indixes to be dropped are equal 0.
+ // TODO: Deal with cases when the indices are not 0.
+ auto isZeroIdx = [](Value idx) {
+ Attribute attr;
+ APInt value;
+ if (!matchPattern(idx, m_Constant(&attr)))
+ return false;
+ if (matchPattern(attr, m_ConstantInt(&value)))
+ if (!value.isZero())
+ return false;
+ return true;
+ };
+ if (!llvm::all_of(readOp.getIndices().take_back(dimsToDrop), isZeroIdx))
+ return failure();
+
auto resultTargetVecType =
VectorType::get(targetType.getShape().drop_back(dimsToDrop),
targetType.getElementType(),
diff --git a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
index b4cb640108bae..4f8aab3729de8 100644
--- a/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
+++ b/mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
@@ -1,12 +1,17 @@
// RUN: mlir-opt %s -test-vector-transfer-collapse-inner-most-dims -split-input-file | FileCheck %s
-func.func @contiguous_inner_most_view(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
+//-----------------------------------------------------------------------------
+// 1. vector.transfer_read
+//-----------------------------------------------------------------------------
+
+func.func @contiguous_inner_most(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
%0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
return %0 : vector<1x8x1xf32>
}
-// CHECK: func @contiguous_inner_most_view(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
+
+// CHECK: func @contiguous_inner_most(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK-SAME: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
// CHECK: %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]
@@ -14,15 +19,61 @@ func.func @contiguous_inner_most_view(%in: memref<1x1x8x1xf32, strided<[3072, 8,
// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
// CHECK: return %[[RESULT]]
+// Same as the top example within this split, but with the inner vector
+// dim scalable. Note that this example only makes sense when "8 = [8]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_scalable_inner_dim(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x[8]x1xf32>{
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x[8]x1xf32>
+ return %0 : vector<1x[8]x1xf32>
+}
+
+// CHECK: func @contiguous_inner_most_scalable_inner_dim(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK-SAME: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
+// CHECK: %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]
+// CHECK-SAME: memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>, vector<1x[8]xf32>
+// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
+// CHECK: return %[[RESULT]]
+
+// Same as the top example within this split, but the trailing unit dim was
+// replaced with a dyn dim - not supported
+
+func.func @non_unit_trailing_dim(%in: memref<1x1x8x?xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x?xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
+ return %0 : vector<1x8x1xf32>
+}
+
+// CHECK-LABEL: func @non_unit_trailing_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
+// Same as the top example within this split, but with a scalable unit dim in
+// the output vector - not supported (scalable 1 is _not_ a unit dimension).
+
+func.func @negative_scalable_unit_dim(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x[1]xf32>{
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x[1]xf32>
+ return %0 : vector<1x8x[1]xf32>
+}
+// CHECK-LABEL: func @negative_scalable_unit_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
// -----
-func.func @contiguous_outer_dyn_inner_most_view(%a: index, %b: index, %memref: memref<?x?x8x1xf32>) -> vector<8x1xf32> {
+func.func @contiguous_inner_most_dynamic_outer(%a: index, %b: index, %memref: memref<?x?x8x1xf32>) -> vector<8x1xf32> {
%c0 = arith.constant 0 : index
%pad = arith.constant 0.0 : f32
%v = vector.transfer_read %memref[%a, %b, %c0, %c0], %pad {in_bounds = [true, true]} : memref<?x?x8x1xf32>, vector<8x1xf32>
return %v : vector<8x1xf32>
}
-// CHECK: func.func @contiguous_outer_dyn_inner_most_view(
+// CHECK: func.func @contiguous_inner_most_dynamic_outer
// CHECK-SAME: %[[IDX0:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[IDX1:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
@@ -38,79 +89,179 @@ func.func @contiguous_outer_dyn_inner_most_view(%a: index, %b: index, %memref: m
// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
// CHECK: return %[[RESULT]]
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "8 = [8]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_outer_dim_dyn_scalable_inner_dim(%a: index, %b: index, %memref: memref<?x?x8x1xf32>) -> vector<[8]x1xf32> {
+ %c0 = arith.constant 0 : index
+ %pad = arith.constant 0.0 : f32
+ %v = vector.transfer_read %memref[%a, %b, %c0, %c0], %pad {in_bounds = [true, true]} : memref<?x?x8x1xf32>, vector<[8]x1xf32>
+ return %v : vector<[8]x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_outer_dim_dyn_scalable_inner_dim
+// CHECK-SAME: %[[IDX0:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[IDX1:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
+// CHECK: %[[VIEW:.+]] = memref.subview %[[SRC]]{{.*}} memref<?x?x8x1xf32> to memref<?x?x8xf32, strided<[?, 8, 1], offset: ?>>
+// CHECK: %[[VEC_READ:.+]] = vector.transfer_read %[[VIEW]]
+// CHECK-SAME: {in_bounds = [true]}
+// CHECK-SAME: memref<?x?x8xf32, strided<[?, 8, 1], offset: ?>>, vector<[8]xf32>
+// CHECK: vector.shape_cast %[[VEC_READ]]
+
// -----
-func.func @contiguous_inner_most_dim(%A: memref<16x1xf32>, %i:index, %j:index) -> (vector<8x1xf32>) {
+func.func @contiguous_inner_most_dim_non_zero_idx(%A: memref<16x1xf32>, %i:index) -> (vector<8x1xf32>) {
%c0 = arith.constant 0 : index
%f0 = arith.constant 0.0 : f32
- %1 = vector.transfer_read %A[%i, %j], %f0 : memref<16x1xf32>, vector<8x1xf32>
+ %1 = vector.transfer_read %A[%i, %c0], %f0 : memref<16x1xf32>, vector<8x1xf32>
return %1 : vector<8x1xf32>
}
-// CHECK: func @contiguous_inner_most_dim(%[[SRC:.+]]: memref<16x1xf32>, %[[I:.+]]: index, %[[J:.+]]: index) -> vector<8x1xf32>
+// CHECK: func @contiguous_inner_most_dim_non_zero_idx(%[[SRC:.+]]: memref<16x1xf32>, %[[I:.+]]: index) -> vector<8x1xf32>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK-SAME: memref<16x1xf32> to memref<16xf32, strided<[1]>>
// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_0]]
-// CHECK: %[[RESULT]] = vector.shape_cast %[[V]] : vector<8xf32> to vector<8x1xf32>
+// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[V]] : vector<8xf32> to vector<8x1xf32>
// CHECK: return %[[RESULT]]
+// The index to be dropped is != 0 - this is currently not supported.
+func.func @negative_contiguous_inner_most_dim_non_zero_idxs(%A: memref<16x1xf32>, %i:index) -> (vector<8x1xf32>) {
+ %f0 = arith.constant 0.0 : f32
+ %1 = vector.transfer_read %A[%i, %i], %f0 : memref<16x1xf32>, vector<8x1xf32>
+ return %1 : vector<8x1xf32>
+}
+// CHECK-LABEL: func @negative_contiguous_inner_most_dim_non_zero_idxs
+// CHECK-NOT: memref.subview
+// CHECK: vector.transfer_read
+
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "8 = [8]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_dim_non_zero_idx_scalable_inner_dim(%A: memref<16x1xf32>, %i:index) -> (vector<[8]x1xf32>) {
+ %c0 = arith.constant 0 : index
+ %f0 = arith.constant 0.0 : f32
+ %1 = vector.transfer_read %A[%i, %c0], %f0 : memref<16x1xf32>, vector<[8]x1xf32>
+ return %1 : vector<[8]x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_dim_non_zero_idx_scalable_inner_dim(
+// CHECK-SAME: %[[SRC:.+]]: memref<16x1xf32>, %[[I:.+]]: index) -> vector<[8]x1xf32>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK-SAME: memref<16x1xf32> to memref<16xf32, strided<[1]>>
+// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_0]]
+// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[V]] : vector<[8]xf32> to vector<[8]x1xf32>
+// CHECK: return %[[RESULT]]
+
// -----
-func.func @contiguous_inner_most_dim_bounds(%A: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<4x1xf32>) {
+func.func @contiguous_inner_most_dim_with_subview(%A: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<4x1xf32>) {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
%0 = memref.subview %A[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
%1 = vector.transfer_read %0[%ii, %c0], %cst {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1], offset: ?>>, vector<4x1xf32>
return %1 : vector<4x1xf32>
}
-// CHECK: func @contiguous_inner_most_dim_bounds(%[[SRC:.+]]: memref<1000x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1xf32>
+// CHECK: func @contiguous_inner_most_dim_with_subview(%[[SRC:.+]]: memref<1000x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1xf32>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK: %[[SRC_1:.+]] = memref.subview %[[SRC_0]]
// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_1]]
// CHECK-SAME: {in_bounds = [true]}
// CHECK-SAME: vector<4xf32>
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "4 = [4]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_dim_with_subview_scalable_inner_dim(%A: memref<1000x1xf32>, %i:index, %ii:index) -> (vector<[4]x1xf32>) {
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = memref.subview %A[%i, 0] [40, 1] [1, 1] : memref<1000x1xf32> to memref<40x1xf32, strided<[1, 1], offset: ?>>
+ %1 = vector.transfer_read %0[%ii, %c0], %cst {in_bounds = [true, true]} : memref<40x1xf32, strided<[1, 1], offset: ?>>, vector<[4]x1xf32>
+ return %1 : vector<[4]x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_dim_with_subview_scalable_inner_dim
+// CHECK-SAME: %[[SRC:.+]]: memref<1000x1xf32>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_0]]
+// CHECK-SAME: {in_bounds = [true]}
+// CHECK-SAME: vector<[4]xf32>
+
// -----
-func.func @contiguous_inner_most_dim_bounds_2d(%A: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<4x1x1xf32>) {
+func.func @contiguous_inner_most_dim_with_subview_2d(%A: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<4x1x1xf32>) {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32
%0 = memref.subview %A[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
%1 = vector.transfer_read %0[%ii, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>, vector<4x1x1xf32>
return %1 : vector<4x1x1xf32>
}
-// CHECK: func @contiguous_inner_most_dim_bounds_2d(%[[SRC:.+]]: memref<1000x1x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1x1xf32>
+// CHECK: func @contiguous_inner_most_dim_with_subview_2d(%[[SRC:.+]]: memref<1000x1x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<4x1x1xf32>
// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
// CHECK: %[[SRC_1:.+]] = memref.subview %[[SRC_0]]
// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_1]]
// CHECK-SAME: {in_bounds = [true]}
// CHECK-SAME: vector<4xf32>
+// Same as the top example within this split, but with the outer vector
+// dim scalable. Note that this example only makes sense when "4 = [4]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @contiguous_inner_most_dim_with_subview_2d_scalable_inner_dim(%A: memref<1000x1x1xf32>, %i:index, %ii:index) -> (vector<[4]x1x1xf32>) {
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.0 : f32
+ %0 = memref.subview %A[%i, 0, 0] [40, 1, 1] [1, 1, 1] : memref<1000x1x1xf32> to memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>
+ %1 = vector.transfer_read %0[%ii, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<40x1x1xf32, strided<[1, 1, 1], offset: ?>>, vector<[4]x1x1xf32>
+ return %1 : vector<[4]x1x1xf32>
+}
+// CHECK-LABEL: func @contiguous_inner_most_dim_with_subview_2d_scalable_inner_dim(
+// CHECK-SAME: %[[SRC:.+]]: memref<1000x1x1xf32>, %[[II:.+]]: index, %[[J:.+]]: index) -> vector<[4]x1x1xf32>
+// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
+// CHECK: %[[SRC_1:.+]] = memref.subview %[[SRC_0]]
+// CHECK: %[[V:.+]] = vector.transfer_read %[[SRC_1]]
+// CHECK-SAME: {in_bounds = [true]}
+// CHECK-SAME: vector<[4]xf32>
+// CHECK: vector.shape_cast %[[V]]
+
// -----
-func.func @contiguous_inner_most_dim_out_of_bounds_2d(%arg0: memref<1x1xf32>) -> vector<4x8xf32> {
+// NOTE: This is an out-of-bounds access.
+
+func.func @negative_non_unit_inner_vec_dim(%arg0: memref<4x1xf32>) -> vector<4x8xf32> {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32
- %0 = vector.transfer_read %arg0[%c0, %c0], %cst : memref<1x1xf32>, vector<4x8xf32>
+ %0 = vector.transfer_read %arg0[%c0, %c0], %cst : memref<4x1xf32>, vector<4x8xf32>
return %0 : vector<4x8xf32>
}
-// The inner most unit dim can not be dropped. In this context, we do not
-// generate rank-reduced memref.subview ops.
-// CHECK: func.func @contiguous_inner_most_dim_out_of_bounds_2d
-// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
+// CHECK: func.func @negative_non_unit_inner_vec_dim
// CHECK-NOT: memref.subview
-// CHECK: %[[READ:.+]] = vector.transfer_read %[[SRC]]
-// CHECK: return %[[READ]] : vector<4x8xf32>
+// CHECK: vector.transfer_read
// -----
-func.func @drop_two_inner_most_dim_for_transfer_write(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x1xf32>, %arg2: index) {
+func.func @negative_non_unit_inner_memref_dim(%arg0: memref<4x8xf32>) -> vector<4x1xf32> {
+ %c0 = arith.constant 0 : index
+ %cst = arith.constant 0.000000e+00 : f32
+ %0 = vector.transfer_read %arg0[%c0, %c0], %cst : memref<4x8xf32>, vector<4x1xf32>
+ return %0 : vector<4x1xf32>
+}
+// CHECK: func.func @negative_non_unit_inner_memref_dim
+// CHECK-NOT: memref.subview
+// CHECK: vector.transfer_read
+
+// -----
+
+//-----------------------------------------------------------------------------
+// 2. vector.transfer_write
+//-----------------------------------------------------------------------------
+
+func.func @drop_two_inner_most_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x1xf32>, %arg2: index) {
%c0 = arith.constant 0 : index
vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
{in_bounds = [true, true, true, true, true]}
: vector<1x16x16x1x1xf32>, memref<1x512x16x1x1xf32>
return
}
-// CHECK: func.func @drop_two_inner_most_dim_for_transfer_write
+// CHECK: func.func @drop_two_inner_most_dim
// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[VEC:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[IDX:[a-zA-Z0-9]+]]
@@ -121,16 +272,67 @@ func.func @drop_two_inner_most_dim_for_transfer_write(%arg0: memref<1x512x16x1x1
// CHECK: vector.transfer_write %[[CAST]], %[[SUBVIEW]]
// CHECK-SAME: [%[[C0]], %[[IDX]], %[[C0]]]
+// Same as the top example within this split, but with the inner vector
+// dim scalable. Note that this example only makes sense when "16 = [16]" (i.e.
+// vscale = 1). This is assumed (implicitly) via the `in_bounds` attribute.
+
+func.func @drop_two_inner_most_dim_scalable_inner_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x[16]x1x1xf32>, %arg2: index) {
+ %c0 = arith.constant 0 : index
+ vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
+ {in_bounds = [true, true, true, true, true]}
+ : vector<1x16x[16]x1x1xf32>, memref<1x512x16x1x1xf32>
+ return
+}
+// CHECK: func.func @drop_two_inner_most_dim_scalable_inner_dim
+// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[VEC:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[IDX:[a-zA-Z0-9]+]]
+// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
+// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[DEST]]
+// CHECK-SAME: memref<1x512x16x1x1xf32> to memref<1x512x16xf32, strided<[8192, 16, 1]>>
+// CHECK: %[[CAST:.+]] = vector.shape_cast %[[VEC]] : vector<1x16x[16]x1x1xf32> to vector<1x16x[16]xf32>
+// CHECK: vector.transfer_write %[[CAST]], %[[SUBVIEW]]
+// CHECK-SAME: [%[[C0]], %[[IDX]], %[[C0]]]
+
+// Same as the top example within this split, but the trailing unit dim was
+// replaced with a dyn dim - not supported
+
+func.func @negative_non_unit_trailing_dim(%arg0: memref<1x512x16x1x?xf32>, %arg1: vector<1x16x16x1x1xf32>, %arg2: index) {
+ %c0 = arith.constant 0 : index
+ vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
+ {in_bounds = [true, true, true, true, true]}
+ : vector<1x16x16x1x1xf32>, memref<1x512x16x1x?xf32>
+ return
+}
+// CHECK: func.func @negative_non_unit_trailing_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
+// Same as the top example within this split, but with a scalable unit dim in
+// the output vector - not supported
+
+func.func @negative_scalable_unit_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x[1]xf32>, %arg2: index) {
+ %c0 = arith.constant 0 : index
+ vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0]
+ {in_bounds = [true, true, true, true, true]}
+ : vector<1x16x16x1x[1]xf32>, memref<1x512x16x1x1xf32>
+ return
+}
+
+// CHECK: func.func @negative_scalable_unit_dim
+// CHECK-NOT: memref.subview
+// CHECK-NOT: vector.shape_cast
+
// -----
-func.func @drop_inner_most_dim_for_transfer_write(%arg0: memref<1x512x16x1xf32, strided<[8192, 16, 1, 1], offset: ?>>, %arg1: vector<1x16x16x1xf32>, %arg2: index) {
+func.func @drop_inner_most_dim(%arg0: memref<1x512x16x1xf32, strided<[8192, 16, 1, 1], offset: ?>>, %arg1: vector<1x16x16x1xf32>, %arg2: index) {
%c0 = arith.constant 0 : index
vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0]
{in_bounds = [tr...
[truncated]
|
16bc9b9
to
aecd754
Compare
// Same as the top example within this split, but with a scalable unit dim in | ||
// the output vector - not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Same as the top example within this split, but with a scalable unit dim in | |
// the output vector - not supported | |
// Same as the top example within this split, but with a scalable one dim in | |
// the output vector - not allowed |
There's no such thing as a scalable unit dim for AArch64 [1]
can contain up to 16 elements!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no such thing as a scalable unit dim for AArch64 [1] can contain up to 16 elements!
This test is target-agnostic - the limitations of any specific target are irrelevant here. If we want to allow/disallow sth, then we need formalise and justify it (i.e. document it). I don't think that's necessary.
As for "scalable unit" vs "scalable one" for [1]
, I don't follow. The very presence of "scalable" in the name highlights that this is something special. By using "unit" we are consistent with the other tests and the terminology used in the pattern definition. What's wrong with "unit"?
If we do advocate for inconsistency, then please formalise the difference between "scalable one" and "scalable unit" through documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit = an individual thing
But a scalable one dim is not an individual thing, depending on vscale it can be up to 16 elements for AArch64 (but for any scalable target it won't always be one individual element).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I prefer "scalable one dim" as "scalable unit dim" feels like an oxymoron.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit = an individual thing
Indeed, and the meaning of "thing" is up for interpretation - yours is different to mine.
While there's no documented definition of "scalable one"/"scalable unit", this is all about our individual preferences ...
So, I prefer "scalable one dim"
Anyway, I've renamed these tests and added comments in "VectorTransforms.cpp".
// Same as the top example within this split, but with a scalable unit dim in | ||
// the output vector - not supported | ||
|
||
func.func @negative_scalable_unit_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x[1]xf32>, %arg2: index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func.func @negative_scalable_unit_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x[1]xf32>, %arg2: index) { | |
func.func @negative_scalable_one_dim(%arg0: memref<1x512x16x1x1xf32>, %arg1: vector<1x16x16x1x[1]xf32>, %arg2: index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"unit" is consistent with what's used in the pattern definition and the goal here is to remain consistent
llvm-project/mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
Lines 1338 to 1354 in 77db8b0
/// Drop inner most contiguous unit dimensions from transfer_write operand. | |
/// E.g., | |
/// vector.transfer_write %arg1, %arg0[%c0, %arg2, %c0, %c0, %c0] | |
/// {in_bounds = [true, true, true, true, true]} | |
/// : vector<1x16x16x1x1xf32>, memref<1x512x16x1x1xf32> | |
/// | |
/// will be replaced with | |
/// | |
/// %subview = memref.subview %arg0 | |
/// [0, 0, 0, 0, 0] [1, 512, 16, 1, 1] [1, 1, 1, 1, 1] | |
/// : memref<1x512x16x1x1xf32> to memref<1x512x16xf32> | |
/// %0 = vector.shape_cast %arg1 : vector<1x16x16x1x1xf32> | |
/// to vector<1x16x16xf32> | |
/// vector.transfer_write %0, %subview[%c0, %arg2, %c0] | |
/// {in_bounds = [true, true, true]} | |
/// : vector<1x16x16xf32>, memref<1x512x16xf32> | |
class DropInnerMostUnitDimsTransferWrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern is removing unit dims yes, but [1]
is not a unit dim (it's something else the pattern should fail one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's something else the pattern should fail one
I will update the comments to better document the behaviour. For consistency with the docs and the existing names, I am keeping "scalable unit" (happy to keep in quotes to highlight that it's a special 🌷 ) . As pointed out in my other comment, "scalable unit" vs "scalable one" is a matter of individual preference unless we formalise the definition.
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, the very first test for `vector.transfer_write` is complemented with all the possible combinations: * scalable (rather than fixed) unit trailing dim, * dynamic (rather than static) trailing dim in the source memref. To this end, the following tests: * `@leading_scalable_dimension_transfer_write` `@trailing_scalable_one_dim_transfer_write` are replaced with: * `@drop_two_inner_most_dim_scalable_inner_dim` and `@negative_scalable_unit_dim`, respectively. In addition: * "_for_transfer_write" is removed from function names (to reduce noise). This is a follow-up for: llvm#94490, llvm#94604 NOTE: This PR is limited to tests for `vector.transfer_write`.
Rename tests, expand docs
aecd754
to
6aadc8a
Compare
Rename xfer_write rather than xfer_read tests. In the previous PR I renamed xfer_read tests instead. For consistency, I'll keep both changes and update the summary accordingly.
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, `@outer_dyn_drop_inner_most_dim` is replaced with: * `@contiguous_inner_most_dynamic_outer` I am also adding a similar test for scalable vectors. In addition, * `@drop_two_inner_most_dim` and `@drop_two_inner_most_dim_scalable_inner_dim`, are renamed as `@contiguous_inner_most_scalable_inner_dim` to match their counterpart for xfer_read. NOTE: This PR is limited to tests for `vector.transfer_write` This is a follow-up for: llvm#94490, llvm#94604, llvm#94906
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, I am simply adding more tests for `vector.transfer_write` so that for every test for `xfer_read`, there's a corresponding test for `xfer_write`. This is a follow-up for: llvm#94490, llvm#94604, llvm#94906, llvm#96214
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, `@outer_dyn_drop_inner_most_dim` is replaced with: * `@contiguous_inner_most_dynamic_outer` I am also adding a similar test for scalable vectors. In addition, * `@drop_two_inner_most_dim` and `@drop_two_inner_most_dim_scalable_inner_dim`, are renamed as `@contiguous_inner_most` and `@contiguous_inner_most_scalable_inner_dim`, respectively, to match their counterpart for `xfer_read`. NOTE: This PR is limited to tests for `vector.transfer_write` This is a follow-up for: #94490, #94604, #94906
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, the very first test for `vector.transfer_write` is complemented with all the possible combinations: * scalable (rather than fixed) unit trailing dim, * dynamic (rather than static) trailing dim in the source memref. To this end, the following tests: * `@leading_scalable_dimension_transfer_write` `@trailing_scalable_one_dim_transfer_write` are replaced with: * `@drop_two_inner_most_dim_scalable_inner_dim` and `@negative_scalable_unit_dim`, respectively. In addition: * "_for_transfer_write" is removed from function names (to reduce noise). In addition, to maintain consistency between the tests for `xfer_read` and `xfer_write`, 2 negative tests for `xfer_read` are also renamed. This is to follow the suggestion made during the review of this PR. Extra comments in "VectorTransforms.cpp" are added to better document the limitations related to scalable vectors and which tests added here excercise. This is a follow-up for: llvm#94490 and llvm#94604 NOTE: This PR is limited to tests for `vector.transfer_write`.
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, `@outer_dyn_drop_inner_most_dim` is replaced with: * `@contiguous_inner_most_dynamic_outer` I am also adding a similar test for scalable vectors. In addition, * `@drop_two_inner_most_dim` and `@drop_two_inner_most_dim_scalable_inner_dim`, are renamed as `@contiguous_inner_most` and `@contiguous_inner_most_scalable_inner_dim`, respectively, to match their counterpart for `xfer_read`. NOTE: This PR is limited to tests for `vector.transfer_write` This is a follow-up for: llvm#94490, llvm#94604, llvm#94906
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, I am simply adding more tests for `vector.transfer_write` so that for every test for `xfer_read`, there's a corresponding test for `xfer_write`. This is a follow-up for: llvm#94490, llvm#94604, llvm#94906, llvm#96214
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, I am simply adding more tests for `vector.transfer_write` so that for every test for `xfer_read`, there's a corresponding test for `xfer_write`. This is a follow-up for: #94490, #94604, #94906, #96214
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, I am making the following changes: * All input memrefs for `xfer_read` are are renamed as `%src`. * All input memrefs for `xfer_write` are are renamed as `%dest`. * All variables representing pad values for `xfer_read` are renamed as `%pad`. * All vector variables (for `xfer_read` and `xfer_write`) are renamed as `%v`. * Add `@contiguous_inner_most_non_zero_idx_in_bounds_scalable` for `xfer_read` (similar test already exists for `xfer_write`) * All indiex variables are renamed as `%i` (1st index) and `%ii` (2nd index). The above were marked as TODOs in the test file - these are not resolved. In addition (to avoid sending another PR): * `@drop_inner_most_dim` is deleted - it duplicates `@contiguous_inner_most` for xfer_write * For consistency with other negative tests, renamed `@non_unit_strides` as `@negative_non_unit_strides` and added a similar test for `xfer_read` This is a follow-up for: llvm#94490, llvm#94604, llvm#94906, llvm#96214, llvm#96227
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, I am making the following changes: * All input memrefs for `xfer_read` are are renamed as `%src`. * All input memrefs for `xfer_write` are are renamed as `%dest`. * All variables representing pad values for `xfer_read` are renamed as `%pad`. * All vector variables (for `xfer_read` and `xfer_write`) are renamed as `%v`. * Add `@contiguous_inner_most_non_zero_idx_in_bounds_scalable` for `xfer_read` (similar test already exists for `xfer_write`) * All index variables are renamed as `%i` (1st index) and `%ii` (2nd index). The above were marked as TODOs in the test file - these are not resolved. In addition (to avoid sending another PR): * `@drop_inner_most_dim` is deleted - it duplicates `@contiguous_inner_most` for xfer_write * For consistency with other negative tests, renamed `@non_unit_strides` as `@negative_non_unit_strides` and added a similar test for `xfer_read` * `@non_unit_strides` is renamed as `@negative_non_unit_strides` and a similar test is added for `xfer_read`. This is a follow-up for: #94490, #94604, #94906, #96214, #96227
Summary: The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, I am making the following changes: * All input memrefs for `xfer_read` are are renamed as `%src`. * All input memrefs for `xfer_write` are are renamed as `%dest`. * All variables representing pad values for `xfer_read` are renamed as `%pad`. * All vector variables (for `xfer_read` and `xfer_write`) are renamed as `%v`. * Add `@contiguous_inner_most_non_zero_idx_in_bounds_scalable` for `xfer_read` (similar test already exists for `xfer_write`) * All index variables are renamed as `%i` (1st index) and `%ii` (2nd index). The above were marked as TODOs in the test file - these are not resolved. In addition (to avoid sending another PR): * `@drop_inner_most_dim` is deleted - it duplicates `@contiguous_inner_most` for xfer_write * For consistency with other negative tests, renamed `@non_unit_strides` as `@negative_non_unit_strides` and added a similar test for `xfer_read` * `@non_unit_strides` is renamed as `@negative_non_unit_strides` and a similar test is added for `xfer_read`. This is a follow-up for: #94490, #94604, #94906, #96214, #96227 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251592
The main goal of this PR (and subsequent PRs), is to add more tests with
scalable vectors to:
There's quite a few cases to consider, hence this is split into multiple
PRs. In this PR, the very first test for
vector.transfer_write
iscomplemented with all the possible combinations:
To this end, the following tests:
@leading_scalable_dimension_transfer_write
@trailing_scalable_one_dim_transfer_write
are replaced with:
@drop_two_inner_most_dim_scalable_inner_dim
and@negative_scalable_unit_dim
,respectively. In addition:
noise).
In addition, to maintain consistency between the tests for
xfer_read
and
xfer_write
, 2 negative tests forxfer_read
are also renamed.This is to follow the suggestion made during the review of this PR.
This is a follow-up for: #94490 and #94604
NOTE: This PR is limited to tests for
vector.transfer_write
.