-
Notifications
You must be signed in to change notification settings - Fork 14.3k
PowerPC/VSX: Select FMINNUM_IEEE and FMAXNUM_IEEE #112195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PowerPC with VSX has vector instructions: XVMAXSP/XVMINSP/XVMAXDP/XVMINDP which follow the semantics of minNUM/maxNUM of IEEE754-2008; and scaler instructions XSMINDP/XSMAXDP which also follow semantics of minNUM/maxNUM of IEEE754-2008. Let's use them to define FMAXNUM_IEEE and FMINNUM_IEEE. Currently, some Pat<(f64 (fminnum_ieee (fcanonicalize .. are defined. They are not correct. Let's remove them. In the future patch, we will define fcanonicalize for PowerPC/VSX, then `fminimunnum/fmaximumnum` will be usable.
@llvm/pr-subscribers-backend-powerpc Author: YunQiang Su (wzssyqa) ChangesPowerPC with VSX has vector instructions: and scaler instructions Let's use them to define FMAXNUM_IEEE and FMINNUM_IEEE. Currently, some Full diff: https://github.com/llvm/llvm-project/pull/112195.diff 4 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 911d92f0c4846b..ef33f5ac527aa4 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -775,6 +775,10 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::FMAXNUM_IEEE, MVT::f32, Legal);
setOperationAction(ISD::FMINNUM_IEEE, MVT::f64, Legal);
setOperationAction(ISD::FMINNUM_IEEE, MVT::f32, Legal);
+ setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);
+ setOperationAction(ISD::FMAXNUM, MVT::f32, Legal);
+ setOperationAction(ISD::FMINNUM, MVT::f64, Legal);
+ setOperationAction(ISD::FMINNUM, MVT::f32, Legal);
}
if (Subtarget.hasAltivec()) {
@@ -809,6 +813,8 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
if (Subtarget.hasVSX()) {
setOperationAction(ISD::FMAXNUM, VT, Legal);
setOperationAction(ISD::FMINNUM, VT, Legal);
+ setOperationAction(ISD::FMAXNUM_IEEE, VT, Legal);
+ setOperationAction(ISD::FMINNUM_IEEE, VT, Legal);
}
// Vector instructions introduced in P8
diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index dd07892794d599..0ac9194c7a5312 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -2722,6 +2722,15 @@ def : Pat<(v2f64 (any_fmaxnum v2f64:$src1, v2f64:$src2)),
def : Pat<(v2f64 (any_fminnum v2f64:$src1, v2f64:$src2)),
(v2f64 (XVMINDP $src1, $src2))>;
+def : Pat<(v4f32 (fmaxnum_ieee v4f32:$src1, v4f32:$src2)),
+ (v4f32 (XVMAXSP $src1, $src2))>;
+def : Pat<(v4f32 (fminnum_ieee v4f32:$src1, v4f32:$src2)),
+ (v4f32 (XVMINSP $src1, $src2))>;
+def : Pat<(v2f64 (fmaxnum_ieee v2f64:$src1, v2f64:$src2)),
+ (v2f64 (XVMAXDP $src1, $src2))>;
+def : Pat<(v2f64 (fminnum_ieee v2f64:$src1, v2f64:$src2)),
+ (v2f64 (XVMINDP $src1, $src2))>;
+
// f32 abs
def : Pat<(f32 (fabs f32:$S)),
(f32 (COPY_TO_REGCLASS (XSABSDP
@@ -2735,39 +2744,23 @@ def : Pat<(f32 (fneg (fabs f32:$S))),
// f32 Min.
def : Pat<(f32 (fminnum_ieee f32:$A, f32:$B)),
(f32 FpMinMax.F32Min)>;
-def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), f32:$B)),
- (f32 FpMinMax.F32Min)>;
-def : Pat<(f32 (fminnum_ieee f32:$A, (fcanonicalize f32:$B))),
- (f32 FpMinMax.F32Min)>;
-def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
+def : Pat<(f32 (fminnum f32:$A, f32:$B)),
(f32 FpMinMax.F32Min)>;
// F32 Max.
def : Pat<(f32 (fmaxnum_ieee f32:$A, f32:$B)),
(f32 FpMinMax.F32Max)>;
-def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), f32:$B)),
- (f32 FpMinMax.F32Max)>;
-def : Pat<(f32 (fmaxnum_ieee f32:$A, (fcanonicalize f32:$B))),
- (f32 FpMinMax.F32Max)>;
-def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
+def : Pat<(f32 (fmaxnum f32:$A, f32:$B)),
(f32 FpMinMax.F32Max)>;
// f64 Min.
def : Pat<(f64 (fminnum_ieee f64:$A, f64:$B)),
(f64 (XSMINDP $A, $B))>;
-def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), f64:$B)),
- (f64 (XSMINDP $A, $B))>;
-def : Pat<(f64 (fminnum_ieee f64:$A, (fcanonicalize f64:$B))),
- (f64 (XSMINDP $A, $B))>;
-def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
+def : Pat<(f64 (fminnum f64:$A, f64:$B)),
(f64 (XSMINDP $A, $B))>;
// f64 Max.
def : Pat<(f64 (fmaxnum_ieee f64:$A, f64:$B)),
(f64 (XSMAXDP $A, $B))>;
-def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), f64:$B)),
- (f64 (XSMAXDP $A, $B))>;
-def : Pat<(f64 (fmaxnum_ieee f64:$A, (fcanonicalize f64:$B))),
- (f64 (XSMAXDP $A, $B))>;
-def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
+def : Pat<(f64 (fmaxnum f64:$A, f64:$B)),
(f64 (XSMAXDP $A, $B))>;
def : Pat<(int_ppc_vsx_stxvd2x_be v2f64:$rS, ForceXForm:$dst),
diff --git a/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
index a99c25a4e44799..39cf136e10d77a 100644
--- a/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
+++ b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
@@ -301,22 +301,13 @@ define <4 x float> @v4f32_minimum(<4 x float> %a, <4 x float> %b) {
; VSX-NEXT: xvcmpeqsp 1, 35, 35
; VSX-NEXT: xvcmpeqsp 2, 34, 34
; VSX-NEXT: addis 3, 2, .LCPI4_0@toc@ha
-; VSX-NEXT: xxleqv 36, 36, 36
-; VSX-NEXT: xvminsp 0, 34, 35
-; VSX-NEXT: vslw 4, 4, 4
; VSX-NEXT: addi 3, 3, .LCPI4_0@toc@l
; VSX-NEXT: xxlnor 1, 1, 1
; VSX-NEXT: xxlnor 2, 2, 2
-; VSX-NEXT: vcmpequw 5, 2, 4
+; VSX-NEXT: xvminsp 0, 34, 35
; VSX-NEXT: xxlor 1, 2, 1
; VSX-NEXT: lxvd2x 2, 0, 3
-; VSX-NEXT: xxsel 0, 0, 2, 1
-; VSX-NEXT: xxlxor 2, 2, 2
-; VSX-NEXT: xvcmpeqsp 2, 0, 2
-; VSX-NEXT: xxsel 1, 0, 34, 37
-; VSX-NEXT: vcmpequw 2, 3, 4
-; VSX-NEXT: xxsel 1, 1, 35, 34
-; VSX-NEXT: xxsel 34, 0, 1, 2
+; VSX-NEXT: xxsel 34, 0, 2, 1
; VSX-NEXT: blr
;
; AIX-LABEL: v4f32_minimum:
@@ -324,21 +315,12 @@ define <4 x float> @v4f32_minimum(<4 x float> %a, <4 x float> %b) {
; AIX-NEXT: xvcmpeqsp 1, 35, 35
; AIX-NEXT: xvcmpeqsp 2, 34, 34
; AIX-NEXT: ld 3, L..C4(2) # %const.0
-; AIX-NEXT: xxleqv 36, 36, 36
; AIX-NEXT: xvminsp 0, 34, 35
-; AIX-NEXT: vslw 4, 4, 4
; AIX-NEXT: xxlnor 1, 1, 1
; AIX-NEXT: xxlnor 2, 2, 2
-; AIX-NEXT: vcmpequw 5, 2, 4
; AIX-NEXT: xxlor 1, 2, 1
; AIX-NEXT: lxvw4x 2, 0, 3
-; AIX-NEXT: xxsel 0, 0, 2, 1
-; AIX-NEXT: xxlxor 2, 2, 2
-; AIX-NEXT: xvcmpeqsp 2, 0, 2
-; AIX-NEXT: xxsel 1, 0, 34, 37
-; AIX-NEXT: vcmpequw 2, 3, 4
-; AIX-NEXT: xxsel 1, 1, 35, 34
-; AIX-NEXT: xxsel 34, 0, 1, 2
+; AIX-NEXT: xxsel 34, 0, 2, 1
; AIX-NEXT: blr
entry:
%m = call <4 x float> @llvm.minimum.v4f32(<4 x float> %a, <4 x float> %b)
@@ -377,16 +359,9 @@ define <4 x float> @v4f32_maximum(<4 x float> %a, <4 x float> %b) {
; VSX-NEXT: xxlnor 1, 1, 1
; VSX-NEXT: xxlnor 2, 2, 2
; VSX-NEXT: xvmaxsp 0, 34, 35
-; VSX-NEXT: xxlxor 36, 36, 36
-; VSX-NEXT: vcmpequw 5, 2, 4
; VSX-NEXT: xxlor 1, 2, 1
; VSX-NEXT: lxvd2x 2, 0, 3
-; VSX-NEXT: xxsel 0, 0, 2, 1
-; VSX-NEXT: xvcmpeqsp 2, 0, 36
-; VSX-NEXT: xxsel 1, 0, 34, 37
-; VSX-NEXT: vcmpequw 2, 3, 4
-; VSX-NEXT: xxsel 1, 1, 35, 34
-; VSX-NEXT: xxsel 34, 0, 1, 2
+; VSX-NEXT: xxsel 34, 0, 2, 1
; VSX-NEXT: blr
;
; AIX-LABEL: v4f32_maximum:
@@ -395,18 +370,11 @@ define <4 x float> @v4f32_maximum(<4 x float> %a, <4 x float> %b) {
; AIX-NEXT: xvcmpeqsp 2, 34, 34
; AIX-NEXT: ld 3, L..C5(2) # %const.0
; AIX-NEXT: xvmaxsp 0, 34, 35
-; AIX-NEXT: xxlxor 36, 36, 36
; AIX-NEXT: xxlnor 1, 1, 1
; AIX-NEXT: xxlnor 2, 2, 2
-; AIX-NEXT: vcmpequw 5, 2, 4
; AIX-NEXT: xxlor 1, 2, 1
; AIX-NEXT: lxvw4x 2, 0, 3
-; AIX-NEXT: xxsel 0, 0, 2, 1
-; AIX-NEXT: xvcmpeqsp 2, 0, 36
-; AIX-NEXT: xxsel 1, 0, 34, 37
-; AIX-NEXT: vcmpequw 2, 3, 4
-; AIX-NEXT: xxsel 1, 1, 35, 34
-; AIX-NEXT: xxsel 34, 0, 1, 2
+; AIX-NEXT: xxsel 34, 0, 2, 1
; AIX-NEXT: blr
entry:
%m = call <4 x float> @llvm.maximum.v4f32(<4 x float> %a, <4 x float> %b)
@@ -493,47 +461,28 @@ define <2 x double> @v2f64_minimum(<2 x double> %a, <2 x double> %b) {
; VSX-LABEL: v2f64_minimum:
; VSX: # %bb.0: # %entry
; VSX-NEXT: addis 3, 2, .LCPI6_0@toc@ha
-; VSX-NEXT: xvcmpeqdp 36, 35, 35
-; VSX-NEXT: xvcmpeqdp 37, 34, 34
-; VSX-NEXT: addi 3, 3, .LCPI6_0@toc@l
-; VSX-NEXT: xxlnor 36, 36, 36
-; VSX-NEXT: xxlnor 37, 37, 37
; VSX-NEXT: xvmindp 0, 34, 35
+; VSX-NEXT: xvcmpeqdp 35, 35, 35
+; VSX-NEXT: addi 3, 3, .LCPI6_0@toc@l
+; VSX-NEXT: xvcmpeqdp 34, 34, 34
+; VSX-NEXT: xxlnor 35, 35, 35
+; VSX-NEXT: xxlnor 34, 34, 34
; VSX-NEXT: lxvd2x 2, 0, 3
-; VSX-NEXT: addis 3, 2, .LCPI6_1@toc@ha
-; VSX-NEXT: xxlor 1, 37, 36
-; VSX-NEXT: addi 3, 3, .LCPI6_1@toc@l
-; VSX-NEXT: lxvd2x 36, 0, 3
-; VSX-NEXT: vcmpequd 5, 2, 4
-; VSX-NEXT: xxsel 0, 0, 2, 1
-; VSX-NEXT: xxlxor 2, 2, 2
-; VSX-NEXT: xxsel 1, 0, 34, 37
-; VSX-NEXT: vcmpequd 2, 3, 4
-; VSX-NEXT: xxsel 1, 1, 35, 34
-; VSX-NEXT: xvcmpeqdp 34, 0, 2
-; VSX-NEXT: xxsel 34, 0, 1, 34
+; VSX-NEXT: xxlor 1, 34, 35
+; VSX-NEXT: xxsel 34, 0, 2, 1
; VSX-NEXT: blr
;
; AIX-LABEL: v2f64_minimum:
; AIX: # %bb.0: # %entry
; AIX-NEXT: ld 3, L..C6(2) # %const.0
-; AIX-NEXT: xvcmpeqdp 36, 35, 35
-; AIX-NEXT: xvcmpeqdp 37, 34, 34
-; AIX-NEXT: lxvd2x 2, 0, 3
-; AIX-NEXT: ld 3, L..C7(2) # %const.1
-; AIX-NEXT: xxlnor 36, 36, 36
-; AIX-NEXT: xxlnor 37, 37, 37
; AIX-NEXT: xvmindp 0, 34, 35
-; AIX-NEXT: xxlor 1, 37, 36
-; AIX-NEXT: lxvd2x 36, 0, 3
-; AIX-NEXT: vcmpequd 5, 2, 4
-; AIX-NEXT: xxsel 0, 0, 2, 1
-; AIX-NEXT: xxlxor 2, 2, 2
-; AIX-NEXT: xxsel 1, 0, 34, 37
-; AIX-NEXT: vcmpequd 2, 3, 4
-; AIX-NEXT: xxsel 1, 1, 35, 34
-; AIX-NEXT: xvcmpeqdp 34, 0, 2
-; AIX-NEXT: xxsel 34, 0, 1, 34
+; AIX-NEXT: xvcmpeqdp 35, 35, 35
+; AIX-NEXT: lxvd2x 2, 0, 3
+; AIX-NEXT: xvcmpeqdp 34, 34, 34
+; AIX-NEXT: xxlnor 35, 35, 35
+; AIX-NEXT: xxlnor 34, 34, 34
+; AIX-NEXT: xxlor 1, 34, 35
+; AIX-NEXT: xxsel 34, 0, 2, 1
; AIX-NEXT: blr
entry:
%m = call <2 x double> @llvm.minimum.v2f64(<2 x double> %a, <2 x double> %b)
@@ -618,42 +567,28 @@ define <2 x double> @v2f64_maximum(<2 x double> %a, <2 x double> %b) {
; VSX-LABEL: v2f64_maximum:
; VSX: # %bb.0: # %entry
; VSX-NEXT: addis 3, 2, .LCPI7_0@toc@ha
-; VSX-NEXT: xvcmpeqdp 36, 35, 35
-; VSX-NEXT: xvcmpeqdp 37, 34, 34
-; VSX-NEXT: addi 3, 3, .LCPI7_0@toc@l
-; VSX-NEXT: xxlnor 36, 36, 36
-; VSX-NEXT: xxlnor 37, 37, 37
; VSX-NEXT: xvmaxdp 0, 34, 35
+; VSX-NEXT: xvcmpeqdp 35, 35, 35
+; VSX-NEXT: addi 3, 3, .LCPI7_0@toc@l
+; VSX-NEXT: xvcmpeqdp 34, 34, 34
+; VSX-NEXT: xxlnor 35, 35, 35
+; VSX-NEXT: xxlnor 34, 34, 34
; VSX-NEXT: lxvd2x 2, 0, 3
-; VSX-NEXT: xxlor 1, 37, 36
-; VSX-NEXT: xxlxor 36, 36, 36
-; VSX-NEXT: vcmpequd 5, 2, 4
-; VSX-NEXT: xxsel 0, 0, 2, 1
-; VSX-NEXT: xxsel 1, 0, 34, 37
-; VSX-NEXT: vcmpequd 2, 3, 4
-; VSX-NEXT: xxsel 1, 1, 35, 34
-; VSX-NEXT: xvcmpeqdp 34, 0, 36
-; VSX-NEXT: xxsel 34, 0, 1, 34
+; VSX-NEXT: xxlor 1, 34, 35
+; VSX-NEXT: xxsel 34, 0, 2, 1
; VSX-NEXT: blr
;
; AIX-LABEL: v2f64_maximum:
; AIX: # %bb.0: # %entry
-; AIX-NEXT: ld 3, L..C8(2) # %const.0
-; AIX-NEXT: xvcmpeqdp 36, 35, 35
-; AIX-NEXT: xvcmpeqdp 37, 34, 34
-; AIX-NEXT: lxvd2x 2, 0, 3
-; AIX-NEXT: xxlnor 36, 36, 36
-; AIX-NEXT: xxlnor 37, 37, 37
+; AIX-NEXT: ld 3, L..C7(2) # %const.0
; AIX-NEXT: xvmaxdp 0, 34, 35
-; AIX-NEXT: xxlor 1, 37, 36
-; AIX-NEXT: xxlxor 36, 36, 36
-; AIX-NEXT: vcmpequd 5, 2, 4
-; AIX-NEXT: xxsel 0, 0, 2, 1
-; AIX-NEXT: xxsel 1, 0, 34, 37
-; AIX-NEXT: vcmpequd 2, 3, 4
-; AIX-NEXT: xxsel 1, 1, 35, 34
-; AIX-NEXT: xvcmpeqdp 34, 0, 36
-; AIX-NEXT: xxsel 34, 0, 1, 34
+; AIX-NEXT: xvcmpeqdp 35, 35, 35
+; AIX-NEXT: lxvd2x 2, 0, 3
+; AIX-NEXT: xvcmpeqdp 34, 34, 34
+; AIX-NEXT: xxlnor 35, 35, 35
+; AIX-NEXT: xxlnor 34, 34, 34
+; AIX-NEXT: xxlor 1, 34, 35
+; AIX-NEXT: xxsel 34, 0, 2, 1
; AIX-NEXT: blr
entry:
%m = call <2 x double> @llvm.maximum.v2f64(<2 x double> %a, <2 x double> %b)
diff --git a/llvm/test/CodeGen/PowerPC/scalar-min-max.ll b/llvm/test/CodeGen/PowerPC/scalar-min-max.ll
index 216d498e85411c..f6ea0d9cc23289 100644
--- a/llvm/test/CodeGen/PowerPC/scalar-min-max.ll
+++ b/llvm/test/CodeGen/PowerPC/scalar-min-max.ll
@@ -117,13 +117,12 @@ define dso_local float @testfmax_fast(float %a, float %b) local_unnamed_addr {
;
; NO-FAST-P9-LABEL: testfmax_fast:
; NO-FAST-P9: # %bb.0: # %entry
-; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
+; NO-FAST-P9-NEXT: xsmaxdp f1, f1, f2
; NO-FAST-P9-NEXT: blr
;
; NO-FAST-P8-LABEL: testfmax_fast:
; NO-FAST-P8: # %bb.0: # %entry
-; NO-FAST-P8-NEXT: xssubsp f0, f2, f1
-; NO-FAST-P8-NEXT: fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT: xsmaxdp f1, f1, f2
; NO-FAST-P8-NEXT: blr
entry:
%cmp = fcmp nnan ninf ogt float %a, %b
@@ -138,13 +137,12 @@ define dso_local double @testdmax_fast(double %a, double %b) local_unnamed_addr
;
; NO-FAST-P9-LABEL: testdmax_fast:
; NO-FAST-P9: # %bb.0: # %entry
-; NO-FAST-P9-NEXT: xsmaxcdp f1, f1, f2
+; NO-FAST-P9-NEXT: xsmaxdp f1, f1, f2
; NO-FAST-P9-NEXT: blr
;
; NO-FAST-P8-LABEL: testdmax_fast:
; NO-FAST-P8: # %bb.0: # %entry
-; NO-FAST-P8-NEXT: xssubdp f0, f2, f1
-; NO-FAST-P8-NEXT: fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT: xsmaxdp f1, f1, f2
; NO-FAST-P8-NEXT: blr
entry:
%cmp = fcmp nnan ninf ogt double %a, %b
@@ -159,13 +157,12 @@ define dso_local float @testfmin_fast(float %a, float %b) local_unnamed_addr {
;
; NO-FAST-P9-LABEL: testfmin_fast:
; NO-FAST-P9: # %bb.0: # %entry
-; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
+; NO-FAST-P9-NEXT: xsmindp f1, f1, f2
; NO-FAST-P9-NEXT: blr
;
; NO-FAST-P8-LABEL: testfmin_fast:
; NO-FAST-P8: # %bb.0: # %entry
-; NO-FAST-P8-NEXT: xssubsp f0, f1, f2
-; NO-FAST-P8-NEXT: fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT: xsmindp f1, f1, f2
; NO-FAST-P8-NEXT: blr
entry:
%cmp = fcmp nnan ninf olt float %a, %b
@@ -180,13 +177,12 @@ define dso_local double @testdmin_fast(double %a, double %b) local_unnamed_addr
;
; NO-FAST-P9-LABEL: testdmin_fast:
; NO-FAST-P9: # %bb.0: # %entry
-; NO-FAST-P9-NEXT: xsmincdp f1, f1, f2
+; NO-FAST-P9-NEXT: xsmindp f1, f1, f2
; NO-FAST-P9-NEXT: blr
;
; NO-FAST-P8-LABEL: testdmin_fast:
; NO-FAST-P8: # %bb.0: # %entry
-; NO-FAST-P8-NEXT: xssubdp f0, f1, f2
-; NO-FAST-P8-NEXT: fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT: xsmindp f1, f1, f2
; NO-FAST-P8-NEXT: blr
entry:
%cmp = fcmp nnan ninf olt double %a, %b
|
@lei137 @stefanp-ibm ping. |
def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), f32:$B)), | ||
(f32 FpMinMax.F32Min)>; | ||
def : Pat<(f32 (fminnum_ieee f32:$A, (fcanonicalize f32:$B))), | ||
(f32 FpMinMax.F32Min)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't understand these patterns folding out a canonicalize on each individual operand. If FpMinMax.F32Min has the IEEE snan behavior, both need to be guaranteed quiet. Can these be removed as a precommit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove them will fail some current test cases.
setOperationAction(ISD::FMAXNUM, MVT::f64, Legal); | ||
setOperationAction(ISD::FMAXNUM, MVT::f32, Legal); | ||
setOperationAction(ISD::FMINNUM, MVT::f64, Legal); | ||
setOperationAction(ISD::FMINNUM, MVT::f32, Legal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beyond what the title suggests. Do all subtargets have the ignore signaling behavior, and only vsx has the ieee behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beyond what the title suggests.
It's a bootstrap problem:
If we fix FMAXNUM_IEEE
only, some of test cases will fail.
The problem is that currently FMINNUM
is expanded to
FCANONICALIZE + FCANONICALIZE + FMINNUM_IEEE
while FCANONICALIZE is not defined by here: I will submit it in a future patch.
Do all subtargets have the ignore signaling behavior, and only vsx has the ieee behavior?
It seems that PowerPC doesn't have min/max operation in its scale part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fix expandFMINNUMFMAXNUM
first, some PowerPC test cases will fails with complains that we cannot legalize FMINNUM_IEEE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my plan to break this bootstrap problem is:
- mark
Legal
to ISD::FMINNUM for PowerPC/VSX.
add FMINNUM_IEEE to PowerPC/VSX. - add
FCANONICALIZE
support to PowerPC/VSX. - Fix expandFMINNUMFMAXNUM to drop
FCANONICALIZE
ifFMINNUM_IEEE
is available. - Add test cases of
fmininumnum
to PowerPC/VSX. - Drop
setOperationAction(ISD::FMINNUM, MVT::f32, Legal);
from PowerPC/VSX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with the break down here is that it will leave things in an invalid state. Shouldn't step 1-4 be done together for the compiler behaviour to be valid? At the minimun, step 4, test casess should be added with the corresponding code change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ask @arsenm. I was asked to split these changes when I was working on other architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arsenm ping
PowerPC with VSX has vector instructions:
XVMAXSP/XVMINSP/XVMAXDP/XVMINDP
which follow the semantics of minNUM/maxNUM of IEEE754-2008;
and scaler instructions
XSMINDP/XSMAXDP
which also follow semantics of minNUM/maxNUM of IEEE754-2008.
Let's use them to define FMAXNUM_IEEE and FMINNUM_IEEE.
Currently, some
Pat<(f64 (fminnum_ieee (fcanonicalize ..
are defined. They are not correct. Let's remove them.
In the future patch, we will define fcanonicalize for PowerPC/VSX,
then
fminimunnum/fmaximumnum
will be usable.