Skip to content

[linalg] Add quantized version of conv_3d_ncdhw_fcdhw #113953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ubfx
Copy link
Member

@ubfx ubfx commented Oct 28, 2024

This patch adds the quantized 3d convolution operator conv_3d_ncdhw_fcdhw_q. This is the "channel-first" dimension ordering used by PyTorch and others.

This patch adds the quantized 3d convolution operator `conv_3d_ncdhw_fcdhw_q`.
This is the "channel-first" dimension ordering used by PyTorch and others.
@llvmbot
Copy link
Member

llvmbot commented Oct 28, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: Felix Schneider (ubfx)

Changes

This patch adds the quantized 3d convolution operator conv_3d_ncdhw_fcdhw_q. This is the "channel-first" dimension ordering used by PyTorch and others.


Full diff: https://github.com/llvm/llvm-project/pull/113953.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml (+139)
  • (modified) mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py (+40)
  • (modified) mlir/test/Dialect/Linalg/roundtrip.mlir (+15)
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
index bf2f26de26e9ed..4e3ef937d7d48f 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
@@ -4024,6 +4024,145 @@ structured_op: !LinalgStructuredOpConfig
                 - !ScalarExpression
                   scalar_arg: K
 --- !LinalgOpConfig
+metadata: !LinalgOpMetadata
+  name: conv_3d_ncdhw_fcdhw_q
+  cpp_class_name: Conv3DNcdhwFcdhwQOp
+  doc: |-
+    Performs 3-D convolution with zero point offsets.
+
+    Numeric casting is performed on the operands to the inner multiply, promoting
+    them to the same data type as the accumulator/output. This includes the zero
+    point offsets common to quantized operations.
+  implements:
+  - LinalgConvolutionOpInterface
+structured_op: !LinalgStructuredOpConfig
+  args:
+  - !LinalgOperandDefConfig
+    name: I
+    kind: input_tensor
+    type_var: T1
+    shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+      s13, s14] -> (s0, s1, s2 * s3 + s4 * s5, s6 * s7 + s8 * s9, s10 * s11 + s12
+      * s13)>
+  - !LinalgOperandDefConfig
+    name: K
+    kind: input_tensor
+    type_var: T2
+    shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+      s13, s14] -> (s14, s1, s4, s8, s12)>
+  - !LinalgOperandDefConfig
+    name: IZp
+    kind: scalar
+    type_var: I32
+  - !LinalgOperandDefConfig
+    name: KZp
+    kind: scalar
+    type_var: I32
+  - !LinalgOperandDefConfig
+    name: O
+    kind: output_tensor
+    type_var: U
+    shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+      s13, s14] -> (s0, s14, s2, s6, s10)>
+  - !LinalgOperandDefConfig
+    name: strides
+    kind: index_attr
+    index_attr_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+      s12, s13, s14] -> (s3, s7, s11)>
+    default_indices:
+    - 1
+    - 1
+    - 1
+  - !LinalgOperandDefConfig
+    name: dilations
+    kind: index_attr
+    index_attr_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+      s12, s13, s14] -> (s5, s9, s13)>
+    default_indices:
+    - 1
+    - 1
+    - 1
+  indexing_maps: !LinalgIndexingMapsConfig
+    static_indexing_maps:
+    - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+      s7, s8, s9, s10, s11, s12, s13, s14] -> (d0, d8, d1 * s3 + d5 * s5, d2 * s7
+      + d6 * s9, d3 * s11 + d7 * s13)>
+    - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+      s7, s8, s9, s10, s11, s12, s13, s14] -> (d4, d8, d5, d6, d7)>
+    - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+      s7, s8, s9, s10, s11, s12, s13, s14] -> ()>
+    - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+      s7, s8, s9, s10, s11, s12, s13, s14] -> ()>
+    - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+      s7, s8, s9, s10, s11, s12, s13, s14] -> (d0, d4, d1, d2, d3)>
+  iterator_types:
+  - parallel
+  - parallel
+  - parallel
+  - parallel
+  - parallel
+  - reduction
+  - reduction
+  - reduction
+  - reduction
+  assignments:
+  - !ScalarAssign
+    arg: O
+    value: !ScalarExpression
+      scalar_fn:
+        kind: binary
+        fn_name: add
+        operands:
+        - !ScalarExpression
+          scalar_arg: O
+        - !ScalarExpression
+          scalar_fn:
+            kind: binary
+            fn_name: mul
+            operands:
+            - !ScalarExpression
+              scalar_fn:
+                kind: binary
+                fn_name: sub
+                operands:
+                - !ScalarExpression
+                  scalar_fn:
+                    kind: type
+                    fn_name: cast_signed
+                    type_var: U
+                    operands:
+                    - !ScalarExpression
+                      scalar_arg: I
+                - !ScalarExpression
+                  scalar_fn:
+                    kind: type
+                    fn_name: cast_signed
+                    type_var: U
+                    operands:
+                    - !ScalarExpression
+                      scalar_arg: IZp
+            - !ScalarExpression
+              scalar_fn:
+                kind: binary
+                fn_name: sub
+                operands:
+                - !ScalarExpression
+                  scalar_fn:
+                    kind: type
+                    fn_name: cast_signed
+                    type_var: U
+                    operands:
+                    - !ScalarExpression
+                      scalar_arg: K
+                - !ScalarExpression
+                  scalar_fn:
+                    kind: type
+                    fn_name: cast_signed
+                    type_var: U
+                    operands:
+                    - !ScalarExpression
+                      scalar_arg: KZp
+--- !LinalgOpConfig
 metadata: !LinalgOpMetadata
   name: depthwise_conv_1d_nwc_wc
   cpp_class_name: DepthwiseConv1DNwcWcOp
diff --git a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py
index b45fecd0ee1457..4c7efc8d808767 100644
--- a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py
+++ b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py
@@ -1126,6 +1126,46 @@ def conv_3d_ncdhw_fcdhw(
         ],
     ) * TypeFn.cast_signed(U, K[D.f, D.c, D.kd, D.kh, D.kw])
 
+@linalg_structured_op
+def conv_3d_ncdhw_fcdhw_q(
+    I=TensorDef(
+        T1,
+        S.N,
+        S.C,
+        S.OD * S.SD + S.KD * S.DD,
+        S.OH * S.SH + S.KH * S.DH,
+        S.OW * S.SW + S.KW * S.DW,
+    ),
+    K=TensorDef(T2, S.F, S.C, S.KD, S.KH, S.KW),
+    IZp=ScalarDef(I32),
+    KZp=ScalarDef(I32),
+    O=TensorDef(U, S.N, S.F, S.OD, S.OH, S.OW, output=True),
+    strides=IndexAttrDef(S.SD, S.SH, S.SW, default=[1, 1, 1]),
+    dilations=IndexAttrDef(S.DD, S.DH, S.DW, default=[1, 1, 1]),
+):
+    """Performs 3-D convolution with zero point offsets.
+
+    Numeric casting is performed on the operands to the inner multiply, promoting
+    them to the same data type as the accumulator/output. This includes the zero
+    point offsets common to quantized operations.
+    """
+    implements(ConvolutionOpInterface)
+    domain(D.n, D.od, D.oh, D.ow, D.f, D.kd, D.kh, D.kw, D.c)
+    O[D.n, D.f, D.od, D.oh, D.ow] += (
+            TypeFn.cast_signed(
+            U,
+            I[
+                D.n,
+                D.c,
+                D.od * S.SD + D.kd * S.DD,
+                D.oh * S.SH + D.kh * S.DH,
+                D.ow * S.SW + D.kw * S.DW,
+            ],
+        ) - TypeFn.cast_signed(U, IZp)
+    ) * (
+        TypeFn.cast_signed(U, K[D.f, D.c, D.kd, D.kh, D.kw])
+        - TypeFn.cast_signed(U, KZp)
+    )
 
 @linalg_structured_op
 def depthwise_conv_1d_nwc_wc(
diff --git a/mlir/test/Dialect/Linalg/roundtrip.mlir b/mlir/test/Dialect/Linalg/roundtrip.mlir
index 1b8969bd115595..6e5adf007f58d7 100644
--- a/mlir/test/Dialect/Linalg/roundtrip.mlir
+++ b/mlir/test/Dialect/Linalg/roundtrip.mlir
@@ -694,3 +694,18 @@ func.func @conv2d_channel_first_q_promote(%img: tensor<100x3x224x224xi8>, %filt:
 // CHECK-LABEL: func @conv2d_channel_first_q_promote(
 // CHECK:   %[[arg0:[a-zA-z0-9]*]]: tensor<100x3x224x224xi8>, %[[arg1:[a-zA-z0-9]*]]: tensor<64x3x5x5xi8>, %[[arg2:[a-zA-z0-9]*]]: i8, %[[arg3:[a-zA-z0-9]*]]: i8)
 // CHECK:         linalg.conv_2d_nchw_fchw_q {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%[[arg0]], %[[arg1]], %[[arg2]], %[[arg3]] : tensor<100x3x224x224xi8>, tensor<64x3x5x5xi8>, i8, i8) outs(%{{.*}} : tensor<100x64x220x220xi32>) -> tensor<100x64x220x220xi32>
+
+// -----
+
+func.func @conv3d_channel_first_q(%img: tensor<1x27x49x48x47xi8>, %filt: tensor<28x27x3x4x5xi8>, %a: i32, %b: i32) -> tensor<1x28x47x45x43xi32> {
+  %init = arith.constant dense<0> : tensor<1x28x47x45x43xi32>
+  %1 = linalg.conv_3d_ncdhw_fcdhw_q  {dilations = dense<1> : tensor<3xi64>,
+      strides = dense<1> : tensor<3xi64>}
+    ins(%img, %filt, %a, %b : tensor<1x27x49x48x47xi8>, tensor<28x27x3x4x5xi8>, i32, i32)
+    outs(%init : tensor<1x28x47x45x43xi32>) -> tensor<1x28x47x45x43xi32>
+  return %1 : tensor<1x28x47x45x43xi32>
+}
+
+// CHECK-LABEL: func @conv3d_channel_first_q(
+// CHECK:   %[[arg0:[a-zA-z0-9]*]]: tensor<1x27x49x48x47xi8>, %[[arg1:[a-zA-z0-9]*]]: tensor<28x27x3x4x5xi8>, %[[arg2:[a-zA-z0-9]*]]: i32, %[[arg3:[a-zA-z0-9]*]]: i32)
+// CHECK:         linalg.conv_3d_ncdhw_fcdhw_q {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>} ins(%[[arg0]], %[[arg1]], %[[arg2]], %[[arg3]] : tensor<1x27x49x48x47xi8>, tensor<28x27x3x4x5xi8>, i32, i32) outs(%{{.*}} : tensor<1x28x47x45x43xi32>) -> tensor<1x28x47x45x43xi32>

Copy link

github-actions bot commented Oct 28, 2024

✅ With the latest revision this PR passed the Python code formatter.

@MaheshRavishankar
Copy link
Contributor

I dont know where these uses are coming from, but I'd like to just say that addition of new convolution ops is kind of getting out-of-hand. We have a lot and it doesnt cover all use cases, but it seems like might be time to move convolutions out of ODS generated named ops.

@javedabsar something you are interested in?

@ubfx
Copy link
Member Author

ubfx commented Oct 29, 2024

I dont know where these uses are coming from

I can't speak for the other Ops, but the "channel first" ones I've been adding are in order to support lowering torch IR to linalg without additional transpositions in torch-mlir.

@rengolin
Copy link
Member

I dont know where these uses are coming from, but I'd like to just say that addition of new convolution ops is kind of getting out-of-hand. We have a lot and it doesnt cover all use cases, but it seems like might be time to move convolutions out of ODS generated named ops.

I second that. While we're not directly interested in convolution ops (so won't have time to move them to ODS ourselves), I strongly suggest those interested do so instead of adding yet-another op in the large soup of slightly different semantics.

There has to be a way to common them all up using ODS + C++ code that OpDSL just couldn't. This is the reason we moved matmul to ODS and we're much better for it.

For the record, the medium-term plan is to move all relevant Linalg operations to ODS and remove OpDSL for good, the code merged in this PR will not live long. So if you care about these operations, please come up with a reasonable (parametrized) implementation in ODS and move the rest there, too.

@MaheshRavishankar
Copy link
Contributor

I dont know where these uses are coming from, but I'd like to just say that addition of new convolution ops is kind of getting out-of-hand. We have a lot and it doesnt cover all use cases, but it seems like might be time to move convolutions out of ODS generated named ops.

I second that. While we're not directly interested in convolution ops (so won't have time to move them to ODS ourselves), I strongly suggest those interested do so instead of adding yet-another op in the large soup of slightly different semantics.

There has to be a way to common them all up using ODS + C++ code that OpDSL just couldn't. This is the reason we moved matmul to ODS and we're much better for it.

For the record, the medium-term plan is to move all relevant Linalg operations to ODS and remove OpDSL for good, the code merged in this PR will not live long. So if you care about these operations, please come up with a reasonable (parametrized) implementation in ODS and move the rest there, too.

Yeah, that said. I really cannot in good faith put this on people trying to connect it to Torch. This is really just broken in Linalg. We need to fix this "properly", and that will take time, but in the meantime I think we need to provide some way forward. Not sure how to balance that without adding to the problem. @ubfx is there a way around adding a new operator?

@banach-space
Copy link
Contributor

I can't speak for the other Ops, but the "channel first" ones I've been adding are in order to support lowering torch IR to linalg without additional transpositions in torch-mlir.

Avoiding unnecessary transposition is good, but will this convolution get vectorized? If not, does this make any difference? Without vectorization, things will be slow with and without transposing.

Just wanted to draw people's attention to the fact that adding a new named Op, especially a convolution, is just one step towards good lowering 😅 And that there are some non-trivial gaps in how convolutions are vectorized.

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting for a few more eyes on this before we land this.

@ubfx
Copy link
Member Author

ubfx commented Oct 30, 2024

@ubfx is there a way around adding a new operator?

We can of course live with the transposition arrangement for a while, particularly the 3d convolution is probably less important. But we also have 2d depthwise convolution which currently has no quantized channel first named Op available, and which we would ideally like to have a named representation of.

Just to be clear - I would also prefer having a single quantized_conv3d Op (or similar), which captures the whole subset of dimension orderings. It's just that the linalg convolution Ops are probably among the Ops with the most downstream uses, so such an op would likely need much more careful and elaborate design than I could provide. Of course - if such an op or concept existed, I'd be happy to port our uses in torch-mlir accordingly.

Avoiding unnecessary transposition is good, but will this convolution get vectorized? If not, does this make any difference? Without vectorization, things will be slow with and without transposing.

I'm not sure if/how well vectorization is performed on the CPU backend. Personally, I use linalg graphs lowered from torch IR as a starting point for compilation to a custom dataflow accelerator. Graph partitioning becomes increasingly difficult when having to consider additional transposition ops at the inputs and outputs of the conv ops. In quantized graphs, DQ/Q ops might also sneak in between the convolution and transpositions, which makes pattern matching even more annoying and less reliable. And if a un-fused transposition actually makes it into the final partitioned graph, that's a huge cost on my hardware.
I'm not trying to hijack linalg for my specific use case - but it does feel like the abstract goal of "avoiding unnecessary transpositions" is a general good, even though some backends and lowering paths might only benefit minimally.

@rengolin
Copy link
Member

It's just that the linalg convolution Ops are probably among the Ops with the most downstream uses, so such an op would likely need much more careful and elaborate design than I could provide.

This doesn't scale. If no downstream user is "able to provide" a reasonable implementation upstream and just start dumping all variations of the same op to suit their needs, the dialect will be bloated and unusable.

@rengolin
Copy link
Member

but it does feel like the abstract goal of "avoiding unnecessary transpositions" is a general good, even though some backends and lowering paths might only benefit minimally.

This is a false correlation. Yes, avoiding unnecessary transposition is a good thing, but the way you're achieving is by bloating the namespace beyond usefulness. This is the same problem we have with matmul_transpose_a and matmul_transpose_b etc. It does not actually solve the problem, just moves it somewhere else.

See #104783 for a path to a proper solution.

@banach-space
Copy link
Contributor

I share Renato's concerns and agree with his perspective.

As I understand, we're aligned on the need to reduce the number of named operations in Linalg, and there’s already ongoing work in this direction. This PR, while aiming to support a specific downstream target, seems to move us away from that goal and, therefore, feels counterproductive.

Supporting downstream targets is certainly valuable, and I recognize the benefit of this approach for your project:

I use Linalg graphs lowered from Torch IR as a starting point for compilation to a custom dataflow accelerator.

However, from a pragmatic and long-term perspective, we should carefully consider the potential consequences. The risk is that downstream targets might get discontinued, or future downstream projects may have different needs. This could lead to even more named Ops, which would be challenging to maintain and reconcile.

I appreciate the work you've put into this PR, and I don’t want to discourage contributions, but I believe it would be beneficial to explore alternative approaches that align with our broader goals.

@MaheshRavishankar
Copy link
Contributor

I think we need to make a decision here.

  1. We freeze any additions to Linalg named ops.
  2. We allow uses of named ops of this form to come in and then build a "generalized convolution op" in Linalg that accounts for all convolution names ops.

I am not really comfortable with 1 cause we dont have a plan to fix it for downstream users. Without a solid plan and some one signed up to work on it, I would rather go with (2). I know I blocked this PR to kick start a discussion, but that isnt really fair to the contributor here (it is on us as defacto maintainers to provide a better solution).
If we are on-board with (2), we can let this one go in and add it to the list of "ops we need to deprecate".

@stellaraccident
Copy link
Contributor

stellaraccident commented Oct 30, 2024

Couple of comments:

  1. I think the most widely used op was probably the matmuls, and we just got that reconverged (still some cleanup but the main thrust resolved).
  2. Convolution naturally has more variants and practical decisions need to be made. I assume the answer is not one op. But neither is it ten.
  3. It is a historic anomaly that channels last got the priority when the most dominant framework operates channels first. We can't block some amount of progress on fixing that bug, regardless of POR. It would be good to treat these as high cost exceptions and only do it with a strong expressed need, though... Since it adds to the work needed to fix the bigger issue.
  4. We should publish a plan of record on where convolution needs to get to, informed by the matmul work. It's unfair to just tell people no on such an important case without so much as a written plan on what to do about it.

So yes, I vote for Mahesh's plan 2. And we should put in writing a strong position on where we think all of this needs to go.

@stellaraccident
Copy link
Contributor

We joined this PR with the other conv PR and moved discussion to the forum: https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863

@javedabsar1 javedabsar1 self-requested a review October 31, 2024 18:01
@ubfx ubfx closed this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants