-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][linalg] Fix for bias handling for Winograd #110331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir Author: Dmitriy Smirnov (d-smirnov) ChangesPR adds handing of bias to Winograd output transform op decomposition Full diff: https://github.com/llvm/llvm-project/pull/110331.diff 3 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp b/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp
index 80edf4a32c6df8..adfbb331ec0987 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp
@@ -837,9 +837,25 @@ Value outputTransform(RewriterBase &rewriter, Location loc, Value value,
Value widthOffset =
builder.create<affine::AffineApplyOp>(loc, affineMap, tileWIter);
+ // Handling bias.
+ Value prevVal =
+ extract2DDataFrom4D(builder, loc, args[0], NIter, FIter, heightOffset,
+ widthOffset, retRows, retCols,
+ /*loopNorFIdx=*/0,
+ /*loopCorFIdx=*/3, /*heightIdx=*/1,
+ /*widthIdx=*/2);
+ Value biasedVal =
+ builder
+ .create<linalg::AddOp>(
+ loc, prevVal.getType(), ValueRange{matmulRetValue, prevVal},
+ ValueRange{builder.create<tensor::EmptyOp>(
+ loc, llvm::cast<ShapedType>(prevVal.getType()).getShape(),
+ elementType)})
+ .getResult(0);
+
// Insert (H, W) to (N, H, W, F).
Value combinedVal =
- insert2DDataTo4D(builder, loc, matmulRetValue, args[0], NIter, FIter,
+ insert2DDataTo4D(builder, loc, biasedVal, args[0], NIter, FIter,
heightOffset, widthOffset, retRows, retCols,
/*loopNorFIdx=*/0,
/*loopCorFIdx=*/3, /*heightIdx=*/1,
diff --git a/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir b/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir
index c5760acf94a88a..01c0d0a826c999 100644
--- a/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir
+++ b/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir
@@ -109,7 +109,10 @@ module attributes {transform.with_named_sequence} {
// CHECK: linalg.yield %[[IN]] : f32
// CHECK: } -> tensor<4x4xf32>
// CHECK: %[[S24:.*]] = linalg.mul ins(%[[S23]], %[[S21]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S22]] : tensor<4x4xf32>) -> tensor<4x4xf32>
-// CHECK: %[[INSERTED_SLICE_9:.*]] = tensor.insert_slice %[[S24]] into %[[ARG10]][%[[ARG7]], 0, 0, %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK: %[[S25:.*]] = tensor.extract_slice %[[ARG10]][%[[ARG7]], 0, 0, %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK: %[[S26:.*]] = tensor.empty() : tensor<4x4xf32>
+// CHECK: %[[S27:.*]] = linalg.add ins(%[[S24]], %[[S25]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S26]] : tensor<4x4xf32>) -> tensor<4x4xf32>
+// CHECK: %[[INSERTED_SLICE_9:.*]] = tensor.insert_slice %[[S27]] into %[[ARG10]][%[[ARG7]], 0, 0, %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1]
// CHECK: scf.yield %[[INSERTED_SLICE_9]]
// CHECK: scf.yield %[[S15]]
// CHECK: %[[S13:.*]] = affine.apply #[[$MAP0]](%[[ARG3]])
@@ -243,7 +246,10 @@ module attributes {transform.with_named_sequence} {
// CHECK: linalg.yield %[[IN]] : f32
// CHECK: } -> tensor<4x4xf32>
// CHECK: %[[S25:.*]] = linalg.mul ins(%[[S24]], %[[S22]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S23]] : tensor<4x4xf32>) -> tensor<4x4xf32>
-// CHECK: %[[INSERTED_SLICE_12:.*]] = tensor.insert_slice %[[S25]] into %[[ARG11]][%[[ARG8]], 0, 0, %[[ARG10]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK: %[[S26:.*]] = tensor.extract_slice %[[ARG11]][%[[ARG8]], 0, 0, %[[ARG10]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK: %[[S27:.*]] = tensor.empty() : tensor<4x4xf32>
+// CHECK: %[[S28:.*]] = linalg.add ins(%[[S25]], %[[S26]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S27]] : tensor<4x4xf32>) -> tensor<4x4xf32>
+// CHECK: %[[INSERTED_SLICE_12:.*]] = tensor.insert_slice %[[S28]] into %[[ARG11]][%[[ARG8]], 0, 0, %[[ARG10]]] [1, 4, 4, 1] [1, 1, 1, 1]
// CHECK: scf.yield %[[INSERTED_SLICE_12]]
// CHECK: scf.yield %[[S15]] : tensor<2x4x4x2xf32>
// CHECK: %[[S13:.*]] = affine.apply #[[$MAP0]](%[[ARG4]])
@@ -339,7 +345,10 @@ module attributes {transform.with_named_sequence} {
// CHECK: linalg.yield %[[IN]] : f32
// CHECK: } -> tensor<4x1xf32>
// CHECK: %[[S14:.*]] = linalg.mul ins(%[[S13]], %[[S11]] : tensor<4x1xf32>, tensor<4x1xf32>) outs(%[[S12]] : tensor<4x1xf32>) -> tensor<4x1xf32>
-// CHECK: %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S14]] into %[[ARG6]][%[[ARG3]], 0, 0, %[[ARG5]]] [1, 4, 1, 1] [1, 1, 1, 1]
+// CHECK: %[[S15:.*]] = tensor.extract_slice %[[ARG6]][%[[ARG3]], 0, 0, %[[ARG5]]] [1, 4, 1, 1] [1, 1, 1, 1]
+// CHECK: %[[S16:.*]] = tensor.empty() : tensor<4x1xf32>
+// CHECK: %[[S17:.*]] = linalg.add ins(%[[S14]], %[[S15]] : tensor<4x1xf32>, tensor<4x1xf32>) outs(%[[S16]] : tensor<4x1xf32>) -> tensor<4x1xf32>
+// CHECK: %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S17]] into %[[ARG6]][%[[ARG3]], 0, 0, %[[ARG5]]] [1, 4, 1, 1] [1, 1, 1, 1]
// CHECK: scf.yield %[[INSERTED_SLICE]]
// CHECK: scf.yield %[[S7]]
// CHECK: return %[[S6]]
diff --git a/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir b/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir
index 4369f5f1eab4ca..b24a93bc6c27ee 100644
--- a/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir
+++ b/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir
@@ -114,7 +114,10 @@ func.func @conv2d(%arg0: tensor<2x11x11x5xf32>, %arg1: tensor<2x3x3x5xf32>, %arg
// CHECK-NEXT: %[[S19:.*]] = linalg.mul ins(%[[S18]], %[[S16]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S17]] : tensor<4x4xf32>) -> tensor<4x4xf32>
// CHECK-NEXT: %[[S20:.*]] = affine.apply #[[$MAP0]](%[[ARG3]])
// CHECK-NEXT: %[[S21:.*]] = affine.apply #[[$MAP0]](%[[ARG5]])
-// CHECK-NEXT: %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S19]] into %[[ARG10]][%[[ARG7]], %[[S20]], %[[S21]], %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1] : tensor<4x4xf32> into tensor<2x12x12x2xf32>
+// CHECK-NEXT: %[[S22:.*]] = tensor.extract_slice %[[ARG10]][%[[ARG7]], %[[S20]], %[[S21]], %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1] : tensor<2x12x12x2xf32> to tensor<4x4xf32>
+// CHECK-NEXT: %[[S23:.*]] = tensor.empty() : tensor<4x4xf32>
+// CHECK-NEXT: %[[S24:.*]] = linalg.add ins(%[[S19]], %[[S22]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S23]] : tensor<4x4xf32>) -> tensor<4x4xf32>
+// CHECK-NEXT: %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S24]] into %[[ARG10]][%[[ARG7]], %[[S20]], %[[S21]], %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1] : tensor<4x4xf32> into tensor<2x12x12x2xf32>
// CHECK-NEXT: scf.yield %[[INSERTED_SLICE]] : tensor<2x12x12x2xf32>
// CHECK-NEXT: }
// CHECK-NEXT: scf.yield %[[S9]] : tensor<2x12x12x2xf32>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to be corrected but I am not sure adding bias to an op like this is warranted.
@@ -837,9 +837,25 @@ Value outputTransform(RewriterBase &rewriter, Location loc, Value value, | |||
Value widthOffset = | |||
builder.create<affine::AffineApplyOp>(loc, affineMap, tileWIter); | |||
|
|||
// Handling bias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I miss something but what bias are you actually handling here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output tensor is padded, pre-initialized with bias values, and supplied to output transform op as an output parameter. However the pre-initialized values will be discarded later when the output transformation actually happens. The patch fixes the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so this is out init value handling from what I gather and not "bias" per se. Practically of course someone can init the output with the bias values etc.
Would suggest rephrasing it a bit.
When lowering tosa.conv2d to linalg, the output tensor will be filled with bias values. The implementation here assumes the values inside output tensor is meaningful. If we do not added it back inside output transform, we will have another round to add the bias values back to the transformed results after output transform op. |
Sure but this itself is not per se coupled with bias. It is just using and operating over the initial values that were injected to the operator (unless I am missing something). So I would suggest dropping anything that has to do with referring to bias here in the code? Other than that I think is fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this makes sense to me. I think it would be good to make it more clear that this is simply trying to respect the original init value of the op. I think it's more clear to keep the init slice as an init operand of the resulting decomposed ops, and it keeps the DPS nature of the sequence. Also, there are quite a few ops being generated by this decomposition now. It would be good to combine some of them into a single op.
Patch adds handing of bias to Winograd output transform op decompositon Signed-off-by: Dmitriy Smirnov <[email protected]>
Tagged winograd.output_transform op with destinationStyleOpInterface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments, this lgtm now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @d-smirnov! Thanks for addressing the comments
PR makes winograd.output_transform op a destination style op and fixes handing of a pre-existing data in its output argument (i.e. possibly pre-initialized with bias, which was discarded before). --------- Signed-off-by: Dmitriy Smirnov <[email protected]>
PR makes winograd.output_transform op a destination style op and fixes handing of a pre-existing data in its output argument (i.e. possibly pre-initialized with bias, which was discarded before).