Skip to content

[mlir][linalg] Fix for bias handling for Winograd #110331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 11, 2024
Merged

Conversation

d-smirnov
Copy link
Contributor

@d-smirnov d-smirnov commented Sep 27, 2024

PR makes winograd.output_transform op a destination style op and fixes handing of a pre-existing data in its output argument (i.e. possibly pre-initialized with bias, which was discarded before).

@d-smirnov
Copy link
Contributor Author

@Hsiangkai

@llvmbot
Copy link
Member

llvmbot commented Sep 27, 2024

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Dmitriy Smirnov (d-smirnov)

Changes

PR adds handing of bias to Winograd output transform op decomposition


Full diff: https://github.com/llvm/llvm-project/pull/110331.diff

3 Files Affected:

  • (modified) mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp (+17-1)
  • (modified) mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir (+12-3)
  • (modified) mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir (+4-1)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp b/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp
index 80edf4a32c6df8..adfbb331ec0987 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp
@@ -837,9 +837,25 @@ Value outputTransform(RewriterBase &rewriter, Location loc, Value value,
     Value widthOffset =
         builder.create<affine::AffineApplyOp>(loc, affineMap, tileWIter);
 
+    // Handling bias.
+    Value prevVal =
+        extract2DDataFrom4D(builder, loc, args[0], NIter, FIter, heightOffset,
+                            widthOffset, retRows, retCols,
+                            /*loopNorFIdx=*/0,
+                            /*loopCorFIdx=*/3, /*heightIdx=*/1,
+                            /*widthIdx=*/2);
+    Value biasedVal =
+        builder
+            .create<linalg::AddOp>(
+                loc, prevVal.getType(), ValueRange{matmulRetValue, prevVal},
+                ValueRange{builder.create<tensor::EmptyOp>(
+                    loc, llvm::cast<ShapedType>(prevVal.getType()).getShape(),
+                    elementType)})
+            .getResult(0);
+
     // Insert (H, W) to (N, H, W, F).
     Value combinedVal =
-        insert2DDataTo4D(builder, loc, matmulRetValue, args[0], NIter, FIter,
+        insert2DDataTo4D(builder, loc, biasedVal, args[0], NIter, FIter,
                          heightOffset, widthOffset, retRows, retCols,
                          /*loopNorFIdx=*/0,
                          /*loopCorFIdx=*/3, /*heightIdx=*/1,
diff --git a/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir b/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir
index c5760acf94a88a..01c0d0a826c999 100644
--- a/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir
+++ b/mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir
@@ -109,7 +109,10 @@ module attributes {transform.with_named_sequence} {
 // CHECK:            linalg.yield %[[IN]] : f32
 // CHECK:          } -> tensor<4x4xf32>
 // CHECK:          %[[S24:.*]] = linalg.mul ins(%[[S23]], %[[S21]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S22]] : tensor<4x4xf32>) -> tensor<4x4xf32>
-// CHECK:          %[[INSERTED_SLICE_9:.*]] = tensor.insert_slice %[[S24]] into %[[ARG10]][%[[ARG7]], 0, 0, %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK:          %[[S25:.*]] = tensor.extract_slice %[[ARG10]][%[[ARG7]], 0, 0, %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK:          %[[S26:.*]] = tensor.empty() : tensor<4x4xf32>
+// CHECK:          %[[S27:.*]] = linalg.add ins(%[[S24]], %[[S25]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S26]] : tensor<4x4xf32>) -> tensor<4x4xf32>
+// CHECK:          %[[INSERTED_SLICE_9:.*]] = tensor.insert_slice %[[S27]] into %[[ARG10]][%[[ARG7]], 0, 0, %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1]
 // CHECK:          scf.yield %[[INSERTED_SLICE_9]]
 // CHECK:        scf.yield %[[S15]]
 // CHECK:      %[[S13:.*]] = affine.apply #[[$MAP0]](%[[ARG3]])
@@ -243,7 +246,10 @@ module attributes {transform.with_named_sequence} {
 // CHECK:            linalg.yield %[[IN]] : f32
 // CHECK:          } -> tensor<4x4xf32>
 // CHECK:          %[[S25:.*]] = linalg.mul ins(%[[S24]], %[[S22]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S23]] : tensor<4x4xf32>) -> tensor<4x4xf32>
-// CHECK:          %[[INSERTED_SLICE_12:.*]] = tensor.insert_slice %[[S25]] into %[[ARG11]][%[[ARG8]], 0, 0, %[[ARG10]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK:          %[[S26:.*]] = tensor.extract_slice %[[ARG11]][%[[ARG8]], 0, 0, %[[ARG10]]] [1, 4, 4, 1] [1, 1, 1, 1]
+// CHECK:          %[[S27:.*]] = tensor.empty() : tensor<4x4xf32>
+// CHECK:          %[[S28:.*]] = linalg.add ins(%[[S25]], %[[S26]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S27]] : tensor<4x4xf32>) -> tensor<4x4xf32>
+// CHECK:          %[[INSERTED_SLICE_12:.*]] = tensor.insert_slice %[[S28]] into %[[ARG11]][%[[ARG8]], 0, 0, %[[ARG10]]] [1, 4, 4, 1] [1, 1, 1, 1]
 // CHECK:          scf.yield %[[INSERTED_SLICE_12]]
 // CHECK:        scf.yield %[[S15]] : tensor<2x4x4x2xf32>
 // CHECK:      %[[S13:.*]] = affine.apply #[[$MAP0]](%[[ARG4]])
@@ -339,7 +345,10 @@ module attributes {transform.with_named_sequence} {
 // CHECK:         linalg.yield %[[IN]] : f32
 // CHECK:       } -> tensor<4x1xf32>
 // CHECK:       %[[S14:.*]] = linalg.mul ins(%[[S13]], %[[S11]] : tensor<4x1xf32>, tensor<4x1xf32>) outs(%[[S12]] : tensor<4x1xf32>) -> tensor<4x1xf32>
-// CHECK:       %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S14]] into %[[ARG6]][%[[ARG3]], 0, 0, %[[ARG5]]] [1, 4, 1, 1] [1, 1, 1, 1]
+// CHECK:       %[[S15:.*]] = tensor.extract_slice %[[ARG6]][%[[ARG3]], 0, 0, %[[ARG5]]] [1, 4, 1, 1] [1, 1, 1, 1]
+// CHECK:       %[[S16:.*]] = tensor.empty() : tensor<4x1xf32>
+// CHECK:       %[[S17:.*]] = linalg.add ins(%[[S14]], %[[S15]] : tensor<4x1xf32>, tensor<4x1xf32>) outs(%[[S16]] : tensor<4x1xf32>) -> tensor<4x1xf32>
+// CHECK:       %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S17]] into %[[ARG6]][%[[ARG3]], 0, 0, %[[ARG5]]] [1, 4, 1, 1] [1, 1, 1, 1]
 // CHECK:       scf.yield %[[INSERTED_SLICE]]
 // CHECK:     scf.yield %[[S7]]
 // CHECK:   return %[[S6]]
diff --git a/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir b/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir
index 4369f5f1eab4ca..b24a93bc6c27ee 100644
--- a/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir
+++ b/mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir
@@ -114,7 +114,10 @@ func.func @conv2d(%arg0: tensor<2x11x11x5xf32>, %arg1: tensor<2x3x3x5xf32>, %arg
 // CHECK-NEXT:           %[[S19:.*]] = linalg.mul ins(%[[S18]], %[[S16]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S17]] : tensor<4x4xf32>) -> tensor<4x4xf32>
 // CHECK-NEXT:           %[[S20:.*]] = affine.apply #[[$MAP0]](%[[ARG3]])
 // CHECK-NEXT:           %[[S21:.*]] = affine.apply #[[$MAP0]](%[[ARG5]])
-// CHECK-NEXT:           %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S19]] into %[[ARG10]][%[[ARG7]], %[[S20]], %[[S21]], %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1] : tensor<4x4xf32> into tensor<2x12x12x2xf32>
+// CHECK-NEXT:           %[[S22:.*]] = tensor.extract_slice %[[ARG10]][%[[ARG7]], %[[S20]], %[[S21]], %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1] : tensor<2x12x12x2xf32> to tensor<4x4xf32>
+// CHECK-NEXT:           %[[S23:.*]] = tensor.empty() : tensor<4x4xf32>
+// CHECK-NEXT:           %[[S24:.*]] = linalg.add ins(%[[S19]], %[[S22]] : tensor<4x4xf32>, tensor<4x4xf32>) outs(%[[S23]] : tensor<4x4xf32>) -> tensor<4x4xf32>
+// CHECK-NEXT:           %[[INSERTED_SLICE:.*]] = tensor.insert_slice %[[S24]] into %[[ARG10]][%[[ARG7]], %[[S20]], %[[S21]], %[[ARG9]]] [1, 4, 4, 1] [1, 1, 1, 1] : tensor<4x4xf32> into tensor<2x12x12x2xf32>
 // CHECK-NEXT:           scf.yield %[[INSERTED_SLICE]] : tensor<2x12x12x2xf32>
 // CHECK-NEXT:         }
 // CHECK-NEXT:         scf.yield %[[S9]] : tensor<2x12x12x2xf32>

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to be corrected but I am not sure adding bias to an op like this is warranted.

@@ -837,9 +837,25 @@ Value outputTransform(RewriterBase &rewriter, Location loc, Value value,
Value widthOffset =
builder.create<affine::AffineApplyOp>(loc, affineMap, tileWIter);

// Handling bias.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I miss something but what bias are you actually handling here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output tensor is padded, pre-initialized with bias values, and supplied to output transform op as an output parameter. However the pre-initialized values will be discarded later when the output transformation actually happens. The patch fixes the issue.

Copy link
Contributor

@GeorgeARM GeorgeARM Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so this is out init value handling from what I gather and not "bias" per se. Practically of course someone can init the output with the bias values etc.
Would suggest rephrasing it a bit.

@Hsiangkai
Copy link
Contributor

I am happy to be corrected but I am not sure adding bias to an op like this is warranted.

When lowering tosa.conv2d to linalg, the output tensor will be filled with bias values. The implementation here assumes the values inside output tensor is meaningful. If we do not added it back inside output transform, we will have another round to add the bias values back to the transformed results after output transform op.

@GeorgeARM
Copy link
Contributor

GeorgeARM commented Sep 30, 2024

I am happy to be corrected but I am not sure adding bias to an op like this is warranted.

When lowering tosa.conv2d to linalg, the output tensor will be filled with bias values. The implementation here assumes the values inside output tensor is meaningful. If we do not added it back inside output transform, we will have another round to add the bias values back to the transformed results after output transform op.

Sure but this itself is not per se coupled with bias. It is just using and operating over the initial values that were injected to the operator (unless I am missing something). So I would suggest dropping anything that has to do with referring to bias here in the code? Other than that I think is fine.

Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes sense to me. I think it would be good to make it more clear that this is simply trying to respect the original init value of the op. I think it's more clear to keep the init slice as an init operand of the resulting decomposed ops, and it keeps the DPS nature of the sequence. Also, there are quite a few ops being generated by this decomposition now. It would be good to combine some of them into a single op.

Patch adds handing of bias to Winograd output transform op decompositon

Signed-off-by: Dmitriy Smirnov <[email protected]>
Tagged winograd.output_transform op with destinationStyleOpInterface
Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments, this lgtm now!

Copy link
Contributor

@GeorgeARM GeorgeARM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @d-smirnov! Thanks for addressing the comments

@d-smirnov
Copy link
Contributor Author

@MaheshRavishankar

@GeorgeARM GeorgeARM merged commit bb4696c into llvm:main Oct 11, 2024
9 checks passed
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
PR makes winograd.output_transform op a destination style op and fixes
handing of a pre-existing data in its output argument (i.e. possibly
pre-initialized with bias, which was discarded before).

---------

Signed-off-by: Dmitriy Smirnov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants