-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[linalg] Add quantized version of conv_3d_ncdhw_fcdhw
#113953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds the quantized 3d convolution operator `conv_3d_ncdhw_fcdhw_q`. This is the "channel-first" dimension ordering used by PyTorch and others.
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-linalg Author: Felix Schneider (ubfx) ChangesThis patch adds the quantized 3d convolution operator Full diff: https://github.com/llvm/llvm-project/pull/113953.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
index bf2f26de26e9ed..4e3ef937d7d48f 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yaml
@@ -4024,6 +4024,145 @@ structured_op: !LinalgStructuredOpConfig
- !ScalarExpression
scalar_arg: K
--- !LinalgOpConfig
+metadata: !LinalgOpMetadata
+ name: conv_3d_ncdhw_fcdhw_q
+ cpp_class_name: Conv3DNcdhwFcdhwQOp
+ doc: |-
+ Performs 3-D convolution with zero point offsets.
+
+ Numeric casting is performed on the operands to the inner multiply, promoting
+ them to the same data type as the accumulator/output. This includes the zero
+ point offsets common to quantized operations.
+ implements:
+ - LinalgConvolutionOpInterface
+structured_op: !LinalgStructuredOpConfig
+ args:
+ - !LinalgOperandDefConfig
+ name: I
+ kind: input_tensor
+ type_var: T1
+ shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, s14] -> (s0, s1, s2 * s3 + s4 * s5, s6 * s7 + s8 * s9, s10 * s11 + s12
+ * s13)>
+ - !LinalgOperandDefConfig
+ name: K
+ kind: input_tensor
+ type_var: T2
+ shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, s14] -> (s14, s1, s4, s8, s12)>
+ - !LinalgOperandDefConfig
+ name: IZp
+ kind: scalar
+ type_var: I32
+ - !LinalgOperandDefConfig
+ name: KZp
+ kind: scalar
+ type_var: I32
+ - !LinalgOperandDefConfig
+ name: O
+ kind: output_tensor
+ type_var: U
+ shape_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, s14] -> (s0, s14, s2, s6, s10)>
+ - !LinalgOperandDefConfig
+ name: strides
+ kind: index_attr
+ index_attr_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, s13, s14] -> (s3, s7, s11)>
+ default_indices:
+ - 1
+ - 1
+ - 1
+ - !LinalgOperandDefConfig
+ name: dilations
+ kind: index_attr
+ index_attr_map: affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, s13, s14] -> (s5, s9, s13)>
+ default_indices:
+ - 1
+ - 1
+ - 1
+ indexing_maps: !LinalgIndexingMapsConfig
+ static_indexing_maps:
+ - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+ s7, s8, s9, s10, s11, s12, s13, s14] -> (d0, d8, d1 * s3 + d5 * s5, d2 * s7
+ + d6 * s9, d3 * s11 + d7 * s13)>
+ - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+ s7, s8, s9, s10, s11, s12, s13, s14] -> (d4, d8, d5, d6, d7)>
+ - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+ s7, s8, s9, s10, s11, s12, s13, s14] -> ()>
+ - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+ s7, s8, s9, s10, s11, s12, s13, s14] -> ()>
+ - affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8)[s0, s1, s2, s3, s4, s5, s6,
+ s7, s8, s9, s10, s11, s12, s13, s14] -> (d0, d4, d1, d2, d3)>
+ iterator_types:
+ - parallel
+ - parallel
+ - parallel
+ - parallel
+ - parallel
+ - reduction
+ - reduction
+ - reduction
+ - reduction
+ assignments:
+ - !ScalarAssign
+ arg: O
+ value: !ScalarExpression
+ scalar_fn:
+ kind: binary
+ fn_name: add
+ operands:
+ - !ScalarExpression
+ scalar_arg: O
+ - !ScalarExpression
+ scalar_fn:
+ kind: binary
+ fn_name: mul
+ operands:
+ - !ScalarExpression
+ scalar_fn:
+ kind: binary
+ fn_name: sub
+ operands:
+ - !ScalarExpression
+ scalar_fn:
+ kind: type
+ fn_name: cast_signed
+ type_var: U
+ operands:
+ - !ScalarExpression
+ scalar_arg: I
+ - !ScalarExpression
+ scalar_fn:
+ kind: type
+ fn_name: cast_signed
+ type_var: U
+ operands:
+ - !ScalarExpression
+ scalar_arg: IZp
+ - !ScalarExpression
+ scalar_fn:
+ kind: binary
+ fn_name: sub
+ operands:
+ - !ScalarExpression
+ scalar_fn:
+ kind: type
+ fn_name: cast_signed
+ type_var: U
+ operands:
+ - !ScalarExpression
+ scalar_arg: K
+ - !ScalarExpression
+ scalar_fn:
+ kind: type
+ fn_name: cast_signed
+ type_var: U
+ operands:
+ - !ScalarExpression
+ scalar_arg: KZp
+--- !LinalgOpConfig
metadata: !LinalgOpMetadata
name: depthwise_conv_1d_nwc_wc
cpp_class_name: DepthwiseConv1DNwcWcOp
diff --git a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py
index b45fecd0ee1457..4c7efc8d808767 100644
--- a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py
+++ b/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py
@@ -1126,6 +1126,46 @@ def conv_3d_ncdhw_fcdhw(
],
) * TypeFn.cast_signed(U, K[D.f, D.c, D.kd, D.kh, D.kw])
+@linalg_structured_op
+def conv_3d_ncdhw_fcdhw_q(
+ I=TensorDef(
+ T1,
+ S.N,
+ S.C,
+ S.OD * S.SD + S.KD * S.DD,
+ S.OH * S.SH + S.KH * S.DH,
+ S.OW * S.SW + S.KW * S.DW,
+ ),
+ K=TensorDef(T2, S.F, S.C, S.KD, S.KH, S.KW),
+ IZp=ScalarDef(I32),
+ KZp=ScalarDef(I32),
+ O=TensorDef(U, S.N, S.F, S.OD, S.OH, S.OW, output=True),
+ strides=IndexAttrDef(S.SD, S.SH, S.SW, default=[1, 1, 1]),
+ dilations=IndexAttrDef(S.DD, S.DH, S.DW, default=[1, 1, 1]),
+):
+ """Performs 3-D convolution with zero point offsets.
+
+ Numeric casting is performed on the operands to the inner multiply, promoting
+ them to the same data type as the accumulator/output. This includes the zero
+ point offsets common to quantized operations.
+ """
+ implements(ConvolutionOpInterface)
+ domain(D.n, D.od, D.oh, D.ow, D.f, D.kd, D.kh, D.kw, D.c)
+ O[D.n, D.f, D.od, D.oh, D.ow] += (
+ TypeFn.cast_signed(
+ U,
+ I[
+ D.n,
+ D.c,
+ D.od * S.SD + D.kd * S.DD,
+ D.oh * S.SH + D.kh * S.DH,
+ D.ow * S.SW + D.kw * S.DW,
+ ],
+ ) - TypeFn.cast_signed(U, IZp)
+ ) * (
+ TypeFn.cast_signed(U, K[D.f, D.c, D.kd, D.kh, D.kw])
+ - TypeFn.cast_signed(U, KZp)
+ )
@linalg_structured_op
def depthwise_conv_1d_nwc_wc(
diff --git a/mlir/test/Dialect/Linalg/roundtrip.mlir b/mlir/test/Dialect/Linalg/roundtrip.mlir
index 1b8969bd115595..6e5adf007f58d7 100644
--- a/mlir/test/Dialect/Linalg/roundtrip.mlir
+++ b/mlir/test/Dialect/Linalg/roundtrip.mlir
@@ -694,3 +694,18 @@ func.func @conv2d_channel_first_q_promote(%img: tensor<100x3x224x224xi8>, %filt:
// CHECK-LABEL: func @conv2d_channel_first_q_promote(
// CHECK: %[[arg0:[a-zA-z0-9]*]]: tensor<100x3x224x224xi8>, %[[arg1:[a-zA-z0-9]*]]: tensor<64x3x5x5xi8>, %[[arg2:[a-zA-z0-9]*]]: i8, %[[arg3:[a-zA-z0-9]*]]: i8)
// CHECK: linalg.conv_2d_nchw_fchw_q {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%[[arg0]], %[[arg1]], %[[arg2]], %[[arg3]] : tensor<100x3x224x224xi8>, tensor<64x3x5x5xi8>, i8, i8) outs(%{{.*}} : tensor<100x64x220x220xi32>) -> tensor<100x64x220x220xi32>
+
+// -----
+
+func.func @conv3d_channel_first_q(%img: tensor<1x27x49x48x47xi8>, %filt: tensor<28x27x3x4x5xi8>, %a: i32, %b: i32) -> tensor<1x28x47x45x43xi32> {
+ %init = arith.constant dense<0> : tensor<1x28x47x45x43xi32>
+ %1 = linalg.conv_3d_ncdhw_fcdhw_q {dilations = dense<1> : tensor<3xi64>,
+ strides = dense<1> : tensor<3xi64>}
+ ins(%img, %filt, %a, %b : tensor<1x27x49x48x47xi8>, tensor<28x27x3x4x5xi8>, i32, i32)
+ outs(%init : tensor<1x28x47x45x43xi32>) -> tensor<1x28x47x45x43xi32>
+ return %1 : tensor<1x28x47x45x43xi32>
+}
+
+// CHECK-LABEL: func @conv3d_channel_first_q(
+// CHECK: %[[arg0:[a-zA-z0-9]*]]: tensor<1x27x49x48x47xi8>, %[[arg1:[a-zA-z0-9]*]]: tensor<28x27x3x4x5xi8>, %[[arg2:[a-zA-z0-9]*]]: i32, %[[arg3:[a-zA-z0-9]*]]: i32)
+// CHECK: linalg.conv_3d_ncdhw_fcdhw_q {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>} ins(%[[arg0]], %[[arg1]], %[[arg2]], %[[arg3]] : tensor<1x27x49x48x47xi8>, tensor<28x27x3x4x5xi8>, i32, i32) outs(%{{.*}} : tensor<1x28x47x45x43xi32>) -> tensor<1x28x47x45x43xi32>
|
✅ With the latest revision this PR passed the Python code formatter. |
I dont know where these uses are coming from, but I'd like to just say that addition of new convolution ops is kind of getting out-of-hand. We have a lot and it doesnt cover all use cases, but it seems like might be time to move convolutions out of ODS generated named ops. @javedabsar something you are interested in? |
I can't speak for the other Ops, but the "channel first" ones I've been adding are in order to support lowering torch IR to linalg without additional transpositions in torch-mlir. |
I second that. While we're not directly interested in convolution ops (so won't have time to move them to ODS ourselves), I strongly suggest those interested do so instead of adding yet-another op in the large soup of slightly different semantics. There has to be a way to common them all up using ODS + C++ code that OpDSL just couldn't. This is the reason we moved For the record, the medium-term plan is to move all relevant Linalg operations to ODS and remove OpDSL for good, the code merged in this PR will not live long. So if you care about these operations, please come up with a reasonable (parametrized) implementation in ODS and move the rest there, too. |
Yeah, that said. I really cannot in good faith put this on people trying to connect it to Torch. This is really just broken in Linalg. We need to fix this "properly", and that will take time, but in the meantime I think we need to provide some way forward. Not sure how to balance that without adding to the problem. @ubfx is there a way around adding a new operator? |
Avoiding unnecessary transposition is good, but will this convolution get vectorized? If not, does this make any difference? Without vectorization, things will be slow with and without transposing. Just wanted to draw people's attention to the fact that adding a new named Op, especially a convolution, is just one step towards good lowering 😅 And that there are some non-trivial gaps in how convolutions are vectorized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just waiting for a few more eyes on this before we land this.
We can of course live with the transposition arrangement for a while, particularly the 3d convolution is probably less important. But we also have 2d depthwise convolution which currently has no quantized channel first named Op available, and which we would ideally like to have a named representation of. Just to be clear - I would also prefer having a single
I'm not sure if/how well vectorization is performed on the CPU backend. Personally, I use linalg graphs lowered from torch IR as a starting point for compilation to a custom dataflow accelerator. Graph partitioning becomes increasingly difficult when having to consider additional transposition ops at the inputs and outputs of the conv ops. In quantized graphs, DQ/Q ops might also sneak in between the convolution and transpositions, which makes pattern matching even more annoying and less reliable. And if a un-fused transposition actually makes it into the final partitioned graph, that's a huge cost on my hardware. |
This doesn't scale. If no downstream user is "able to provide" a reasonable implementation upstream and just start dumping all variations of the same op to suit their needs, the dialect will be bloated and unusable. |
This is a false correlation. Yes, avoiding unnecessary transposition is a good thing, but the way you're achieving is by bloating the namespace beyond usefulness. This is the same problem we have with See #104783 for a path to a proper solution. |
I share Renato's concerns and agree with his perspective. As I understand, we're aligned on the need to reduce the number of named operations in Linalg, and there’s already ongoing work in this direction. This PR, while aiming to support a specific downstream target, seems to move us away from that goal and, therefore, feels counterproductive. Supporting downstream targets is certainly valuable, and I recognize the benefit of this approach for your project:
However, from a pragmatic and long-term perspective, we should carefully consider the potential consequences. The risk is that downstream targets might get discontinued, or future downstream projects may have different needs. This could lead to even more named Ops, which would be challenging to maintain and reconcile. I appreciate the work you've put into this PR, and I don’t want to discourage contributions, but I believe it would be beneficial to explore alternative approaches that align with our broader goals. |
I think we need to make a decision here.
I am not really comfortable with 1 cause we dont have a plan to fix it for downstream users. Without a solid plan and some one signed up to work on it, I would rather go with (2). I know I blocked this PR to kick start a discussion, but that isnt really fair to the contributor here (it is on us as defacto maintainers to provide a better solution). |
Couple of comments:
So yes, I vote for Mahesh's plan 2. And we should put in writing a strong position on where we think all of this needs to go. |
We joined this PR with the other conv PR and moved discussion to the forum: https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863 |
This patch adds the quantized 3d convolution operator
conv_3d_ncdhw_fcdhw_q
. This is the "channel-first" dimension ordering used by PyTorch and others.