-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Enable sub(max, min) lowering for ABDS and ABDU #86592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We have the ISD nodes for representing signed and unsigned absolute difference. For RISCV, we have vector min/max in the base vector extension, so we can expand to the sub(max,min) lowering. We could almost use the default expansion, but since fixed length min/max are custom (not legal), the default expansion doesn't cover the fixed vector cases. The expansion here is just a copy of the generic code specialized to allow the custom min/max nodes to be created so they can in turn be legalized to the _vl variants. Existing DAG combines handle the recognition of absolute difference idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes. This change does have the net effect of potentially pushing a free floating zero/sign extend after the expansion, and we don't do a great job of folding that into later expressions. However, since in general narrowing can reduce required work (by reducing LMUL) this seems like the right general tradeoff.
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesWe have the ISD nodes for representing signed and unsigned absolute difference. For RISCV, we have vector min/max in the base vector extension, so we can expand to the sub(max,min) lowering. We could almost use the default expansion, but since fixed length min/max are custom (not legal), the default expansion doesn't cover the fixed vector cases. The expansion here is just a copy of the generic code specialized to allow the custom min/max nodes to be created so they can in turn be legalized to the _vl variants. Existing DAG combines handle the recognition of absolute difference idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes. This change does have the net effect of potentially pushing a free floating zero/sign extend after the expansion, and we don't do a great job of folding that into later expressions. However, since in general narrowing can reduce required work (by reducing LMUL) this seems like the right general tradeoff. Patch is 49.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86592.diff 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index c3f8a924a1da70..e6814c5f71a09b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -819,6 +819,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction({ISD::SMIN, ISD::SMAX, ISD::UMIN, ISD::UMAX}, VT,
Legal);
+ setOperationAction({ISD::ABDS, ISD::ABDU}, VT, Custom);
+
// Custom-lower extensions and truncations from/to mask types.
setOperationAction({ISD::ANY_EXTEND, ISD::SIGN_EXTEND, ISD::ZERO_EXTEND},
VT, Custom);
@@ -1203,6 +1205,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(
{ISD::SMIN, ISD::SMAX, ISD::UMIN, ISD::UMAX, ISD::ABS}, VT, Custom);
+ setOperationAction({ISD::ABDS, ISD::ABDU}, VT, Custom);
+
// vXi64 MULHS/MULHU requires the V extension instead of Zve64*.
if (VT.getVectorElementType() != MVT::i64 || Subtarget.hasStdExtV())
setOperationAction({ISD::MULHS, ISD::MULHU}, VT, Custom);
@@ -6785,6 +6789,22 @@ SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
if (!Op.getValueType().isVector())
return lowerSADDSAT_SSUBSAT(Op, DAG);
return lowerToScalableOp(Op, DAG);
+ case ISD::ABDS:
+ case ISD::ABDU: {
+ SDLoc dl(Op);
+ EVT VT = Op->getValueType(0);
+ SDValue LHS = DAG.getFreeze(Op->getOperand(0));
+ SDValue RHS = DAG.getFreeze(Op->getOperand(1));
+ bool IsSigned = Op->getOpcode() == ISD::ABDS;
+
+ // abds(lhs, rhs) -> sub(smax(lhs,rhs), smin(lhs,rhs))
+ // abdu(lhs, rhs) -> sub(umax(lhs,rhs), umin(lhs,rhs))
+ unsigned MaxOpc = IsSigned ? ISD::SMAX : ISD::UMAX;
+ unsigned MinOpc = IsSigned ? ISD::SMIN : ISD::UMIN;
+ SDValue Max = DAG.getNode(MaxOpc, dl, VT, LHS, RHS);
+ SDValue Min = DAG.getNode(MinOpc, dl, VT, LHS, RHS);
+ return DAG.getNode(ISD::SUB, dl, VT, Max, Min);
+ }
case ISD::ABS:
case ISD::VP_ABS:
return lowerABS(Op, DAG);
diff --git a/llvm/test/CodeGen/RISCV/rvv/abd.ll b/llvm/test/CodeGen/RISCV/rvv/abd.ll
index 7c0dc868860238..ddbfbd0b59fa4b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/abd.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/abd.ll
@@ -10,12 +10,9 @@ define <vscale x 16 x i8> @sabd_b(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
; CHECK-LABEL: sabd_b:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
-; CHECK-NEXT: vwsub.vv v12, v8, v10
-; CHECK-NEXT: vsetvli zero, zero, e16, m4, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v12, v12, v8
-; CHECK-NEXT: vsetvli zero, zero, e8, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v12, 0
+; CHECK-NEXT: vmin.vv v12, v8, v10
+; CHECK-NEXT: vmax.vv v8, v8, v10
+; CHECK-NEXT: vsub.vv v8, v8, v12
; CHECK-NEXT: ret
%a.sext = sext <vscale x 16 x i8> %a to <vscale x 16 x i16>
%b.sext = sext <vscale x 16 x i8> %b to <vscale x 16 x i16>
@@ -33,9 +30,9 @@ define <vscale x 16 x i8> @sabd_b_promoted_ops(<vscale x 16 x i1> %a, <vscale x
; CHECK-NEXT: vmerge.vim v12, v10, -1, v0
; CHECK-NEXT: vmv1r.v v0, v8
; CHECK-NEXT: vmerge.vim v8, v10, -1, v0
-; CHECK-NEXT: vsub.vv v8, v12, v8
-; CHECK-NEXT: vrsub.vi v10, v8, 0
-; CHECK-NEXT: vmax.vv v8, v8, v10
+; CHECK-NEXT: vmin.vv v10, v12, v8
+; CHECK-NEXT: vmax.vv v8, v12, v8
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <vscale x 16 x i1> %a to <vscale x 16 x i8>
%b.sext = sext <vscale x 16 x i1> %b to <vscale x 16 x i8>
@@ -48,12 +45,9 @@ define <vscale x 8 x i16> @sabd_h(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
; CHECK-LABEL: sabd_h:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
-; CHECK-NEXT: vwsub.vv v12, v8, v10
-; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v12, v12, v8
-; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v12, 0
+; CHECK-NEXT: vmin.vv v12, v8, v10
+; CHECK-NEXT: vmax.vv v8, v8, v10
+; CHECK-NEXT: vsub.vv v8, v8, v12
; CHECK-NEXT: ret
%a.sext = sext <vscale x 8 x i16> %a to <vscale x 8 x i32>
%b.sext = sext <vscale x 8 x i16> %b to <vscale x 8 x i32>
@@ -67,10 +61,11 @@ define <vscale x 8 x i16> @sabd_h_promoted_ops(<vscale x 8 x i8> %a, <vscale x 8
; CHECK-LABEL: sabd_h_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e8, m1, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v10, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v10
; CHECK-NEXT: ret
%a.sext = sext <vscale x 8 x i8> %a to <vscale x 8 x i16>
%b.sext = sext <vscale x 8 x i8> %b to <vscale x 8 x i16>
@@ -83,12 +78,9 @@ define <vscale x 4 x i32> @sabd_s(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
; CHECK-LABEL: sabd_s:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, ma
-; CHECK-NEXT: vwsub.vv v12, v8, v10
-; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v12, v12, v8
-; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v12, 0
+; CHECK-NEXT: vmin.vv v12, v8, v10
+; CHECK-NEXT: vmax.vv v8, v8, v10
+; CHECK-NEXT: vsub.vv v8, v8, v12
; CHECK-NEXT: ret
%a.sext = sext <vscale x 4 x i32> %a to <vscale x 4 x i64>
%b.sext = sext <vscale x 4 x i32> %b to <vscale x 4 x i64>
@@ -102,10 +94,11 @@ define <vscale x 4 x i32> @sabd_s_promoted_ops(<vscale x 4 x i16> %a, <vscale x
; CHECK-LABEL: sabd_s_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v10, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v10
; CHECK-NEXT: ret
%a.sext = sext <vscale x 4 x i16> %a to <vscale x 4 x i32>
%b.sext = sext <vscale x 4 x i16> %b to <vscale x 4 x i32>
@@ -128,10 +121,11 @@ define <vscale x 2 x i64> @sabd_d_promoted_ops(<vscale x 2 x i32> %a, <vscale x
; CHECK-LABEL: sabd_d_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e32, m1, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v10, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e64, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v10
; CHECK-NEXT: ret
%a.sext = sext <vscale x 2 x i32> %a to <vscale x 2 x i64>
%b.sext = sext <vscale x 2 x i32> %b to <vscale x 2 x i64>
@@ -148,12 +142,9 @@ define <vscale x 16 x i8> @uabd_b(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
; CHECK-LABEL: uabd_b:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
-; CHECK-NEXT: vwsubu.vv v12, v8, v10
-; CHECK-NEXT: vsetvli zero, zero, e16, m4, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v12, v12, v8
-; CHECK-NEXT: vsetvli zero, zero, e8, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v12, 0
+; CHECK-NEXT: vminu.vv v12, v8, v10
+; CHECK-NEXT: vmaxu.vv v8, v8, v10
+; CHECK-NEXT: vsub.vv v8, v8, v12
; CHECK-NEXT: ret
%a.zext = zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
%b.zext = zext <vscale x 16 x i8> %b to <vscale x 16 x i16>
@@ -171,9 +162,9 @@ define <vscale x 16 x i8> @uabd_b_promoted_ops(<vscale x 16 x i1> %a, <vscale x
; CHECK-NEXT: vmerge.vim v12, v10, 1, v0
; CHECK-NEXT: vmv1r.v v0, v8
; CHECK-NEXT: vmerge.vim v8, v10, 1, v0
-; CHECK-NEXT: vsub.vv v8, v12, v8
-; CHECK-NEXT: vrsub.vi v10, v8, 0
-; CHECK-NEXT: vmax.vv v8, v8, v10
+; CHECK-NEXT: vminu.vv v10, v12, v8
+; CHECK-NEXT: vmaxu.vv v8, v12, v8
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.zext = zext <vscale x 16 x i1> %a to <vscale x 16 x i8>
%b.zext = zext <vscale x 16 x i1> %b to <vscale x 16 x i8>
@@ -186,12 +177,9 @@ define <vscale x 8 x i16> @uabd_h(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
; CHECK-LABEL: uabd_h:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
-; CHECK-NEXT: vwsubu.vv v12, v8, v10
-; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v12, v12, v8
-; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v12, 0
+; CHECK-NEXT: vminu.vv v12, v8, v10
+; CHECK-NEXT: vmaxu.vv v8, v8, v10
+; CHECK-NEXT: vsub.vv v8, v8, v12
; CHECK-NEXT: ret
%a.zext = zext <vscale x 8 x i16> %a to <vscale x 8 x i32>
%b.zext = zext <vscale x 8 x i16> %b to <vscale x 8 x i32>
@@ -205,10 +193,11 @@ define <vscale x 8 x i16> @uabd_h_promoted_ops(<vscale x 8 x i8> %a, <vscale x 8
; CHECK-LABEL: uabd_h_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e8, m1, ta, ma
-; CHECK-NEXT: vwsubu.vv v10, v8, v9
+; CHECK-NEXT: vminu.vv v10, v8, v9
+; CHECK-NEXT: vmaxu.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v10, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v10
; CHECK-NEXT: ret
%a.zext = zext <vscale x 8 x i8> %a to <vscale x 8 x i16>
%b.zext = zext <vscale x 8 x i8> %b to <vscale x 8 x i16>
@@ -221,12 +210,9 @@ define <vscale x 4 x i32> @uabd_s(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
; CHECK-LABEL: uabd_s:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, ma
-; CHECK-NEXT: vwsubu.vv v12, v8, v10
-; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v12, v12, v8
-; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v12, 0
+; CHECK-NEXT: vminu.vv v12, v8, v10
+; CHECK-NEXT: vmaxu.vv v8, v8, v10
+; CHECK-NEXT: vsub.vv v8, v8, v12
; CHECK-NEXT: ret
%a.zext = zext <vscale x 4 x i32> %a to <vscale x 4 x i64>
%b.zext = zext <vscale x 4 x i32> %b to <vscale x 4 x i64>
@@ -240,10 +226,11 @@ define <vscale x 4 x i32> @uabd_s_promoted_ops(<vscale x 4 x i16> %a, <vscale x
; CHECK-LABEL: uabd_s_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, ma
-; CHECK-NEXT: vwsubu.vv v10, v8, v9
+; CHECK-NEXT: vminu.vv v10, v8, v9
+; CHECK-NEXT: vmaxu.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v10, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v10
; CHECK-NEXT: ret
%a.zext = zext <vscale x 4 x i16> %a to <vscale x 4 x i32>
%b.zext = zext <vscale x 4 x i16> %b to <vscale x 4 x i32>
@@ -266,10 +253,11 @@ define <vscale x 2 x i64> @uabd_d_promoted_ops(<vscale x 2 x i32> %a, <vscale x
; CHECK-LABEL: uabd_d_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e32, m1, ta, ma
-; CHECK-NEXT: vwsubu.vv v10, v8, v9
+; CHECK-NEXT: vminu.vv v10, v8, v9
+; CHECK-NEXT: vmaxu.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v10, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e64, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v10
; CHECK-NEXT: ret
%a.zext = zext <vscale x 2 x i32> %a to <vscale x 2 x i64>
%b.zext = zext <vscale x 2 x i32> %b to <vscale x 2 x i64>
@@ -285,12 +273,9 @@ define <vscale x 4 x i32> @uabd_non_matching_extension(<vscale x 4 x i32> %a, <v
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, ma
; CHECK-NEXT: vzext.vf4 v12, v10
-; CHECK-NEXT: vwsubu.vv v16, v8, v12
-; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v16, 0
-; CHECK-NEXT: vmax.vv v12, v16, v8
-; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v12, 0
+; CHECK-NEXT: vminu.vv v10, v8, v12
+; CHECK-NEXT: vmaxu.vv v8, v8, v12
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.zext = zext <vscale x 4 x i32> %a to <vscale x 4 x i64>
%b.zext = zext <vscale x 4 x i8> %b to <vscale x 4 x i64>
@@ -307,10 +292,11 @@ define <vscale x 4 x i32> @uabd_non_matching_promoted_ops(<vscale x 4 x i8> %a,
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, ma
; CHECK-NEXT: vzext.vf2 v10, v8
-; CHECK-NEXT: vwsubu.vv v12, v10, v9
+; CHECK-NEXT: vminu.vv v8, v10, v9
+; CHECK-NEXT: vmaxu.vv v9, v10, v9
+; CHECK-NEXT: vsub.vv v10, v9, v8
; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v8, v12, v8
+; CHECK-NEXT: vzext.vf2 v8, v10
; CHECK-NEXT: ret
%a.zext = zext <vscale x 4 x i8> %a to <vscale x 4 x i32>
%b.zext = zext <vscale x 4 x i16> %b to <vscale x 4 x i32>
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abd.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abd.ll
index 79c0857f90eccf..bd1209a17b5345 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abd.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abd.ll
@@ -10,12 +10,9 @@ define <8 x i8> @sabd_8b_as_16b(<8 x i8> %a, <8 x i8> %b) {
; CHECK-LABEL: sabd_8b_as_16b:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
-; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
-; CHECK-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v8, 0
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <8 x i8> %a to <8 x i16>
%b.sext = sext <8 x i8> %b to <8 x i16>
@@ -29,17 +26,10 @@ define <8 x i8> @sabd_8b_as_32b(<8 x i8> %a, <8 x i8> %b) {
;
; CHECK-LABEL: sabd_8b_as_32b:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; CHECK-NEXT: vsext.vf2 v10, v8
-; CHECK-NEXT: vsext.vf2 v8, v9
-; CHECK-NEXT: vwsub.vv v12, v10, v8
-; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v12, 0
-; CHECK-NEXT: vmax.vv v8, v12, v8
-; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
-; CHECK-NEXT: vnsrl.wi v10, v8, 0
-; CHECK-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v10, 0
+; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <8 x i8> %a to <8 x i32>
%b.sext = sext <8 x i8> %b to <8 x i32>
@@ -54,12 +44,9 @@ define <16 x i8> @sabd_16b(<16 x i8> %a, <16 x i8> %b) {
; CHECK-LABEL: sabd_16b:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 16, e8, m1, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
-; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v10, v10, v8
-; CHECK-NEXT: vsetvli zero, zero, e8, m1, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v10, 0
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <16 x i8> %a to <16 x i16>
%b.sext = sext <16 x i8> %b to <16 x i16>
@@ -74,12 +61,9 @@ define <4 x i16> @sabd_4h(<4 x i16> %a, <4 x i16> %b) {
; CHECK-LABEL: sabd_4h:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
-; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
-; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v8, 0
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <4 x i16> %a to <4 x i32>
%b.sext = sext <4 x i16> %b to <4 x i32>
@@ -94,10 +78,11 @@ define <4 x i16> @sabd_4h_promoted_ops(<4 x i8> %a, <4 x i8> %b) {
; CHECK-LABEL: sabd_4h_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v9, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v9
; CHECK-NEXT: ret
%a.sext = sext <4 x i8> %a to <4 x i16>
%b.sext = sext <4 x i8> %b to <4 x i16>
@@ -111,12 +96,9 @@ define <8 x i16> @sabd_8h(<8 x i16> %a, <8 x i16> %b) {
; CHECK-LABEL: sabd_8h:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
-; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v10, v10, v8
-; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v10, 0
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <8 x i16> %a to <8 x i32>
%b.sext = sext <8 x i16> %b to <8 x i32>
@@ -131,10 +113,11 @@ define <8 x i16> @sabd_8h_promoted_ops(<8 x i8> %a, <8 x i8> %b) {
; CHECK-LABEL: sabd_8h_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v9, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v9
; CHECK-NEXT: ret
%a.sext = sext <8 x i8> %a to <8 x i16>
%b.sext = sext <8 x i8> %b to <8 x i16>
@@ -148,12 +131,9 @@ define <2 x i32> @sabd_2s(<2 x i32> %a, <2 x i32> %b) {
; CHECK-LABEL: sabd_2s:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
-; CHECK-NEXT: vsetvli zero, zero, e64, m1, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
-; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v8, 0
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <2 x i32> %a to <2 x i64>
%b.sext = sext <2 x i32> %b to <2 x i64>
@@ -168,10 +148,11 @@ define <2 x i32> @sabd_2s_promoted_ops(<2 x i16> %a, <2 x i16> %b) {
; CHECK-LABEL: sabd_2s_promoted_ops:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v9, v8, v10
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v8, v10, v8
+; CHECK-NEXT: vzext.vf2 v8, v9
; CHECK-NEXT: ret
%a.sext = sext <2 x i16> %a to <2 x i32>
%b.sext = sext <2 x i16> %b to <2 x i32>
@@ -185,12 +166,9 @@ define <4 x i32> @sabd_4s(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: sabd_4s:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT: vwsub.vv v10, v8, v9
-; CHECK-NEXT: vsetvli zero, zero, e64, m2, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v10, 0
-; CHECK-NEXT: vmax.vv v10, v10, v8
-; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, ma
-; CHECK-NEXT: vnsrl.wi v8, v10, 0
+; CHECK-NEXT: vmin.vv v10, v8, v9
+; CHECK-NEXT: vmax.vv v8, v8, v9
+; CHECK-NEXT: vsub.vv v8, v8, v10
; CHECK-NEXT: ret
%a.sext = sext <4 x i32> %a to <4 x i64>
%b.sext = sext <4 x i32> %b to <4 x i64>
@@ -205,10 +183,11 @@ define <4 x i32> @s...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We have the ISD nodes for representing signed and unsigned absolute difference. For RISCV, we have vector min/max in the base vector extension, so we can expand to the sub(max,min) lowering.
We could almost use the default expansion, but since fixed length min/max are custom (not legal), the default expansion doesn't cover the fixed vector cases. The expansion here is just a copy of the generic code specialized to allow the custom min/max nodes to be created so they can in turn be legalized to the _vl variants.
Existing DAG combines handle the recognition of absolute difference idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes.
This change does have the net effect of potentially pushing a free floating zero/sign extend after the expansion, and we don't do a great job of folding that into later expressions. However, since in general narrowing can reduce required work (by reducing LMUL) this seems like the right general tradeoff.