Skip to content

PowerPC/VSX: Select FMINNUM_IEEE and FMAXNUM_IEEE #112195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

wzssyqa
Copy link
Contributor

@wzssyqa wzssyqa commented Oct 14, 2024

PowerPC with VSX has vector instructions:
XVMAXSP/XVMINSP/XVMAXDP/XVMINDP
which follow the semantics of minNUM/maxNUM of IEEE754-2008;

and scaler instructions
XSMINDP/XSMAXDP
which also follow semantics of minNUM/maxNUM of IEEE754-2008.

Let's use them to define FMAXNUM_IEEE and FMINNUM_IEEE.

Currently, some
Pat<(f64 (fminnum_ieee (fcanonicalize ..
are defined. They are not correct. Let's remove them.
In the future patch, we will define fcanonicalize for PowerPC/VSX,
then fminimunnum/fmaximumnum will be usable.

PowerPC with VSX has vector instructions:
   XVMAXSP/XVMINSP/XVMAXDP/XVMINDP
which follow the semantics of minNUM/maxNUM of IEEE754-2008;

and scaler instructions
   XSMINDP/XSMAXDP
which also follow semantics of minNUM/maxNUM of IEEE754-2008.

Let's use them to define FMAXNUM_IEEE and FMINNUM_IEEE.

Currently, some
   Pat<(f64 (fminnum_ieee (fcanonicalize ..
are defined. They are not correct. Let's remove them.
In the future patch, we will define fcanonicalize for PowerPC/VSX,
then `fminimunnum/fmaximumnum` will be usable.
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-backend-powerpc

Author: YunQiang Su (wzssyqa)

Changes

PowerPC with VSX has vector instructions:
XVMAXSP/XVMINSP/XVMAXDP/XVMINDP
which follow the semantics of minNUM/maxNUM of IEEE754-2008;

and scaler instructions
XSMINDP/XSMAXDP
which also follow semantics of minNUM/maxNUM of IEEE754-2008.

Let's use them to define FMAXNUM_IEEE and FMINNUM_IEEE.

Currently, some
Pat<(f64 (fminnum_ieee (fcanonicalize ..
are defined. They are not correct. Let's remove them. In the future patch, we will define fcanonicalize for PowerPC/VSX, then fminimunnum/fmaximumnum will be usable.


Full diff: https://github.com/llvm/llvm-project/pull/112195.diff

4 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+6)
  • (modified) llvm/lib/Target/PowerPC/PPCInstrVSX.td (+13-20)
  • (modified) llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll (+34-99)
  • (modified) llvm/test/CodeGen/PowerPC/scalar-min-max.ll (+8-12)
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 911d92f0c4846b..ef33f5ac527aa4 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -775,6 +775,10 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
     setOperationAction(ISD::FMAXNUM_IEEE, MVT::f32, Legal);
     setOperationAction(ISD::FMINNUM_IEEE, MVT::f64, Legal);
     setOperationAction(ISD::FMINNUM_IEEE, MVT::f32, Legal);
+    setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);
+    setOperationAction(ISD::FMAXNUM, MVT::f32, Legal);
+    setOperationAction(ISD::FMINNUM, MVT::f64, Legal);
+    setOperationAction(ISD::FMINNUM, MVT::f32, Legal);
   }
 
   if (Subtarget.hasAltivec()) {
@@ -809,6 +813,8 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
       if (Subtarget.hasVSX()) {
         setOperationAction(ISD::FMAXNUM, VT, Legal);
         setOperationAction(ISD::FMINNUM, VT, Legal);
+        setOperationAction(ISD::FMAXNUM_IEEE, VT, Legal);
+        setOperationAction(ISD::FMINNUM_IEEE, VT, Legal);
       }
 
       // Vector instructions introduced in P8
diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index dd07892794d599..0ac9194c7a5312 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -2722,6 +2722,15 @@ def : Pat<(v2f64 (any_fmaxnum v2f64:$src1, v2f64:$src2)),
 def : Pat<(v2f64 (any_fminnum v2f64:$src1, v2f64:$src2)),
           (v2f64 (XVMINDP $src1, $src2))>;
 
+def : Pat<(v4f32 (fmaxnum_ieee v4f32:$src1, v4f32:$src2)),
+          (v4f32 (XVMAXSP $src1, $src2))>;
+def : Pat<(v4f32 (fminnum_ieee v4f32:$src1, v4f32:$src2)),
+          (v4f32 (XVMINSP $src1, $src2))>;
+def : Pat<(v2f64 (fmaxnum_ieee v2f64:$src1, v2f64:$src2)),
+          (v2f64 (XVMAXDP $src1, $src2))>;
+def : Pat<(v2f64 (fminnum_ieee v2f64:$src1, v2f64:$src2)),
+          (v2f64 (XVMINDP $src1, $src2))>;
+
 // f32 abs
 def : Pat<(f32 (fabs f32:$S)),
           (f32 (COPY_TO_REGCLASS (XSABSDP
@@ -2735,39 +2744,23 @@ def : Pat<(f32 (fneg (fabs f32:$S))),
 // f32 Min.
 def : Pat<(f32 (fminnum_ieee f32:$A, f32:$B)),
           (f32 FpMinMax.F32Min)>;
-def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), f32:$B)),
-          (f32 FpMinMax.F32Min)>;
-def : Pat<(f32 (fminnum_ieee f32:$A, (fcanonicalize f32:$B))),
-          (f32 FpMinMax.F32Min)>;
-def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
+def : Pat<(f32 (fminnum f32:$A, f32:$B)),
           (f32 FpMinMax.F32Min)>;
 // F32 Max.
 def : Pat<(f32 (fmaxnum_ieee f32:$A, f32:$B)),
           (f32 FpMinMax.F32Max)>;
-def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), f32:$B)),
-          (f32 FpMinMax.F32Max)>;
-def : Pat<(f32 (fmaxnum_ieee f32:$A, (fcanonicalize f32:$B))),
-          (f32 FpMinMax.F32Max)>;
-def : Pat<(f32 (fmaxnum_ieee (fcanonicalize f32:$A), (fcanonicalize f32:$B))),
+def : Pat<(f32 (fmaxnum f32:$A, f32:$B)),
           (f32 FpMinMax.F32Max)>;
 
 // f64 Min.
 def : Pat<(f64 (fminnum_ieee f64:$A, f64:$B)),
           (f64 (XSMINDP $A, $B))>;
-def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), f64:$B)),
-          (f64 (XSMINDP $A, $B))>;
-def : Pat<(f64 (fminnum_ieee f64:$A, (fcanonicalize f64:$B))),
-          (f64 (XSMINDP $A, $B))>;
-def : Pat<(f64 (fminnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
+def : Pat<(f64 (fminnum f64:$A, f64:$B)),
           (f64 (XSMINDP $A, $B))>;
 // f64 Max.
 def : Pat<(f64 (fmaxnum_ieee f64:$A, f64:$B)),
           (f64 (XSMAXDP $A, $B))>;
-def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), f64:$B)),
-          (f64 (XSMAXDP $A, $B))>;
-def : Pat<(f64 (fmaxnum_ieee f64:$A, (fcanonicalize f64:$B))),
-          (f64 (XSMAXDP $A, $B))>;
-def : Pat<(f64 (fmaxnum_ieee (fcanonicalize f64:$A), (fcanonicalize f64:$B))),
+def : Pat<(f64 (fmaxnum f64:$A, f64:$B)),
           (f64 (XSMAXDP $A, $B))>;
 
 def : Pat<(int_ppc_vsx_stxvd2x_be v2f64:$rS, ForceXForm:$dst),
diff --git a/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
index a99c25a4e44799..39cf136e10d77a 100644
--- a/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
+++ b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
@@ -301,22 +301,13 @@ define <4 x float> @v4f32_minimum(<4 x float> %a, <4 x float> %b) {
 ; VSX-NEXT:    xvcmpeqsp 1, 35, 35
 ; VSX-NEXT:    xvcmpeqsp 2, 34, 34
 ; VSX-NEXT:    addis 3, 2, .LCPI4_0@toc@ha
-; VSX-NEXT:    xxleqv 36, 36, 36
-; VSX-NEXT:    xvminsp 0, 34, 35
-; VSX-NEXT:    vslw 4, 4, 4
 ; VSX-NEXT:    addi 3, 3, .LCPI4_0@toc@l
 ; VSX-NEXT:    xxlnor 1, 1, 1
 ; VSX-NEXT:    xxlnor 2, 2, 2
-; VSX-NEXT:    vcmpequw 5, 2, 4
+; VSX-NEXT:    xvminsp 0, 34, 35
 ; VSX-NEXT:    xxlor 1, 2, 1
 ; VSX-NEXT:    lxvd2x 2, 0, 3
-; VSX-NEXT:    xxsel 0, 0, 2, 1
-; VSX-NEXT:    xxlxor 2, 2, 2
-; VSX-NEXT:    xvcmpeqsp 2, 0, 2
-; VSX-NEXT:    xxsel 1, 0, 34, 37
-; VSX-NEXT:    vcmpequw 2, 3, 4
-; VSX-NEXT:    xxsel 1, 1, 35, 34
-; VSX-NEXT:    xxsel 34, 0, 1, 2
+; VSX-NEXT:    xxsel 34, 0, 2, 1
 ; VSX-NEXT:    blr
 ;
 ; AIX-LABEL: v4f32_minimum:
@@ -324,21 +315,12 @@ define <4 x float> @v4f32_minimum(<4 x float> %a, <4 x float> %b) {
 ; AIX-NEXT:    xvcmpeqsp 1, 35, 35
 ; AIX-NEXT:    xvcmpeqsp 2, 34, 34
 ; AIX-NEXT:    ld 3, L..C4(2) # %const.0
-; AIX-NEXT:    xxleqv 36, 36, 36
 ; AIX-NEXT:    xvminsp 0, 34, 35
-; AIX-NEXT:    vslw 4, 4, 4
 ; AIX-NEXT:    xxlnor 1, 1, 1
 ; AIX-NEXT:    xxlnor 2, 2, 2
-; AIX-NEXT:    vcmpequw 5, 2, 4
 ; AIX-NEXT:    xxlor 1, 2, 1
 ; AIX-NEXT:    lxvw4x 2, 0, 3
-; AIX-NEXT:    xxsel 0, 0, 2, 1
-; AIX-NEXT:    xxlxor 2, 2, 2
-; AIX-NEXT:    xvcmpeqsp 2, 0, 2
-; AIX-NEXT:    xxsel 1, 0, 34, 37
-; AIX-NEXT:    vcmpequw 2, 3, 4
-; AIX-NEXT:    xxsel 1, 1, 35, 34
-; AIX-NEXT:    xxsel 34, 0, 1, 2
+; AIX-NEXT:    xxsel 34, 0, 2, 1
 ; AIX-NEXT:    blr
 entry:
   %m = call <4 x float> @llvm.minimum.v4f32(<4 x float> %a, <4 x float> %b)
@@ -377,16 +359,9 @@ define <4 x float> @v4f32_maximum(<4 x float> %a, <4 x float> %b) {
 ; VSX-NEXT:    xxlnor 1, 1, 1
 ; VSX-NEXT:    xxlnor 2, 2, 2
 ; VSX-NEXT:    xvmaxsp 0, 34, 35
-; VSX-NEXT:    xxlxor 36, 36, 36
-; VSX-NEXT:    vcmpequw 5, 2, 4
 ; VSX-NEXT:    xxlor 1, 2, 1
 ; VSX-NEXT:    lxvd2x 2, 0, 3
-; VSX-NEXT:    xxsel 0, 0, 2, 1
-; VSX-NEXT:    xvcmpeqsp 2, 0, 36
-; VSX-NEXT:    xxsel 1, 0, 34, 37
-; VSX-NEXT:    vcmpequw 2, 3, 4
-; VSX-NEXT:    xxsel 1, 1, 35, 34
-; VSX-NEXT:    xxsel 34, 0, 1, 2
+; VSX-NEXT:    xxsel 34, 0, 2, 1
 ; VSX-NEXT:    blr
 ;
 ; AIX-LABEL: v4f32_maximum:
@@ -395,18 +370,11 @@ define <4 x float> @v4f32_maximum(<4 x float> %a, <4 x float> %b) {
 ; AIX-NEXT:    xvcmpeqsp 2, 34, 34
 ; AIX-NEXT:    ld 3, L..C5(2) # %const.0
 ; AIX-NEXT:    xvmaxsp 0, 34, 35
-; AIX-NEXT:    xxlxor 36, 36, 36
 ; AIX-NEXT:    xxlnor 1, 1, 1
 ; AIX-NEXT:    xxlnor 2, 2, 2
-; AIX-NEXT:    vcmpequw 5, 2, 4
 ; AIX-NEXT:    xxlor 1, 2, 1
 ; AIX-NEXT:    lxvw4x 2, 0, 3
-; AIX-NEXT:    xxsel 0, 0, 2, 1
-; AIX-NEXT:    xvcmpeqsp 2, 0, 36
-; AIX-NEXT:    xxsel 1, 0, 34, 37
-; AIX-NEXT:    vcmpequw 2, 3, 4
-; AIX-NEXT:    xxsel 1, 1, 35, 34
-; AIX-NEXT:    xxsel 34, 0, 1, 2
+; AIX-NEXT:    xxsel 34, 0, 2, 1
 ; AIX-NEXT:    blr
 entry:
   %m = call <4 x float> @llvm.maximum.v4f32(<4 x float> %a, <4 x float> %b)
@@ -493,47 +461,28 @@ define <2 x double> @v2f64_minimum(<2 x double> %a, <2 x double> %b) {
 ; VSX-LABEL: v2f64_minimum:
 ; VSX:       # %bb.0: # %entry
 ; VSX-NEXT:    addis 3, 2, .LCPI6_0@toc@ha
-; VSX-NEXT:    xvcmpeqdp 36, 35, 35
-; VSX-NEXT:    xvcmpeqdp 37, 34, 34
-; VSX-NEXT:    addi 3, 3, .LCPI6_0@toc@l
-; VSX-NEXT:    xxlnor 36, 36, 36
-; VSX-NEXT:    xxlnor 37, 37, 37
 ; VSX-NEXT:    xvmindp 0, 34, 35
+; VSX-NEXT:    xvcmpeqdp 35, 35, 35
+; VSX-NEXT:    addi 3, 3, .LCPI6_0@toc@l
+; VSX-NEXT:    xvcmpeqdp 34, 34, 34
+; VSX-NEXT:    xxlnor 35, 35, 35
+; VSX-NEXT:    xxlnor 34, 34, 34
 ; VSX-NEXT:    lxvd2x 2, 0, 3
-; VSX-NEXT:    addis 3, 2, .LCPI6_1@toc@ha
-; VSX-NEXT:    xxlor 1, 37, 36
-; VSX-NEXT:    addi 3, 3, .LCPI6_1@toc@l
-; VSX-NEXT:    lxvd2x 36, 0, 3
-; VSX-NEXT:    vcmpequd 5, 2, 4
-; VSX-NEXT:    xxsel 0, 0, 2, 1
-; VSX-NEXT:    xxlxor 2, 2, 2
-; VSX-NEXT:    xxsel 1, 0, 34, 37
-; VSX-NEXT:    vcmpequd 2, 3, 4
-; VSX-NEXT:    xxsel 1, 1, 35, 34
-; VSX-NEXT:    xvcmpeqdp 34, 0, 2
-; VSX-NEXT:    xxsel 34, 0, 1, 34
+; VSX-NEXT:    xxlor 1, 34, 35
+; VSX-NEXT:    xxsel 34, 0, 2, 1
 ; VSX-NEXT:    blr
 ;
 ; AIX-LABEL: v2f64_minimum:
 ; AIX:       # %bb.0: # %entry
 ; AIX-NEXT:    ld 3, L..C6(2) # %const.0
-; AIX-NEXT:    xvcmpeqdp 36, 35, 35
-; AIX-NEXT:    xvcmpeqdp 37, 34, 34
-; AIX-NEXT:    lxvd2x 2, 0, 3
-; AIX-NEXT:    ld 3, L..C7(2) # %const.1
-; AIX-NEXT:    xxlnor 36, 36, 36
-; AIX-NEXT:    xxlnor 37, 37, 37
 ; AIX-NEXT:    xvmindp 0, 34, 35
-; AIX-NEXT:    xxlor 1, 37, 36
-; AIX-NEXT:    lxvd2x 36, 0, 3
-; AIX-NEXT:    vcmpequd 5, 2, 4
-; AIX-NEXT:    xxsel 0, 0, 2, 1
-; AIX-NEXT:    xxlxor 2, 2, 2
-; AIX-NEXT:    xxsel 1, 0, 34, 37
-; AIX-NEXT:    vcmpequd 2, 3, 4
-; AIX-NEXT:    xxsel 1, 1, 35, 34
-; AIX-NEXT:    xvcmpeqdp 34, 0, 2
-; AIX-NEXT:    xxsel 34, 0, 1, 34
+; AIX-NEXT:    xvcmpeqdp 35, 35, 35
+; AIX-NEXT:    lxvd2x 2, 0, 3
+; AIX-NEXT:    xvcmpeqdp 34, 34, 34
+; AIX-NEXT:    xxlnor 35, 35, 35
+; AIX-NEXT:    xxlnor 34, 34, 34
+; AIX-NEXT:    xxlor 1, 34, 35
+; AIX-NEXT:    xxsel 34, 0, 2, 1
 ; AIX-NEXT:    blr
 entry:
   %m = call <2 x double> @llvm.minimum.v2f64(<2 x double> %a, <2 x double> %b)
@@ -618,42 +567,28 @@ define <2 x double> @v2f64_maximum(<2 x double> %a, <2 x double> %b) {
 ; VSX-LABEL: v2f64_maximum:
 ; VSX:       # %bb.0: # %entry
 ; VSX-NEXT:    addis 3, 2, .LCPI7_0@toc@ha
-; VSX-NEXT:    xvcmpeqdp 36, 35, 35
-; VSX-NEXT:    xvcmpeqdp 37, 34, 34
-; VSX-NEXT:    addi 3, 3, .LCPI7_0@toc@l
-; VSX-NEXT:    xxlnor 36, 36, 36
-; VSX-NEXT:    xxlnor 37, 37, 37
 ; VSX-NEXT:    xvmaxdp 0, 34, 35
+; VSX-NEXT:    xvcmpeqdp 35, 35, 35
+; VSX-NEXT:    addi 3, 3, .LCPI7_0@toc@l
+; VSX-NEXT:    xvcmpeqdp 34, 34, 34
+; VSX-NEXT:    xxlnor 35, 35, 35
+; VSX-NEXT:    xxlnor 34, 34, 34
 ; VSX-NEXT:    lxvd2x 2, 0, 3
-; VSX-NEXT:    xxlor 1, 37, 36
-; VSX-NEXT:    xxlxor 36, 36, 36
-; VSX-NEXT:    vcmpequd 5, 2, 4
-; VSX-NEXT:    xxsel 0, 0, 2, 1
-; VSX-NEXT:    xxsel 1, 0, 34, 37
-; VSX-NEXT:    vcmpequd 2, 3, 4
-; VSX-NEXT:    xxsel 1, 1, 35, 34
-; VSX-NEXT:    xvcmpeqdp 34, 0, 36
-; VSX-NEXT:    xxsel 34, 0, 1, 34
+; VSX-NEXT:    xxlor 1, 34, 35
+; VSX-NEXT:    xxsel 34, 0, 2, 1
 ; VSX-NEXT:    blr
 ;
 ; AIX-LABEL: v2f64_maximum:
 ; AIX:       # %bb.0: # %entry
-; AIX-NEXT:    ld 3, L..C8(2) # %const.0
-; AIX-NEXT:    xvcmpeqdp 36, 35, 35
-; AIX-NEXT:    xvcmpeqdp 37, 34, 34
-; AIX-NEXT:    lxvd2x 2, 0, 3
-; AIX-NEXT:    xxlnor 36, 36, 36
-; AIX-NEXT:    xxlnor 37, 37, 37
+; AIX-NEXT:    ld 3, L..C7(2) # %const.0
 ; AIX-NEXT:    xvmaxdp 0, 34, 35
-; AIX-NEXT:    xxlor 1, 37, 36
-; AIX-NEXT:    xxlxor 36, 36, 36
-; AIX-NEXT:    vcmpequd 5, 2, 4
-; AIX-NEXT:    xxsel 0, 0, 2, 1
-; AIX-NEXT:    xxsel 1, 0, 34, 37
-; AIX-NEXT:    vcmpequd 2, 3, 4
-; AIX-NEXT:    xxsel 1, 1, 35, 34
-; AIX-NEXT:    xvcmpeqdp 34, 0, 36
-; AIX-NEXT:    xxsel 34, 0, 1, 34
+; AIX-NEXT:    xvcmpeqdp 35, 35, 35
+; AIX-NEXT:    lxvd2x 2, 0, 3
+; AIX-NEXT:    xvcmpeqdp 34, 34, 34
+; AIX-NEXT:    xxlnor 35, 35, 35
+; AIX-NEXT:    xxlnor 34, 34, 34
+; AIX-NEXT:    xxlor 1, 34, 35
+; AIX-NEXT:    xxsel 34, 0, 2, 1
 ; AIX-NEXT:    blr
 entry:
   %m = call <2 x double> @llvm.maximum.v2f64(<2 x double> %a, <2 x double> %b)
diff --git a/llvm/test/CodeGen/PowerPC/scalar-min-max.ll b/llvm/test/CodeGen/PowerPC/scalar-min-max.ll
index 216d498e85411c..f6ea0d9cc23289 100644
--- a/llvm/test/CodeGen/PowerPC/scalar-min-max.ll
+++ b/llvm/test/CodeGen/PowerPC/scalar-min-max.ll
@@ -117,13 +117,12 @@ define dso_local float @testfmax_fast(float %a, float %b) local_unnamed_addr {
 ;
 ; NO-FAST-P9-LABEL: testfmax_fast:
 ; NO-FAST-P9:       # %bb.0: # %entry
-; NO-FAST-P9-NEXT:    xsmaxcdp f1, f1, f2
+; NO-FAST-P9-NEXT:    xsmaxdp f1, f1, f2
 ; NO-FAST-P9-NEXT:    blr
 ;
 ; NO-FAST-P8-LABEL: testfmax_fast:
 ; NO-FAST-P8:       # %bb.0: # %entry
-; NO-FAST-P8-NEXT:    xssubsp f0, f2, f1
-; NO-FAST-P8-NEXT:    fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT:    xsmaxdp f1, f1, f2
 ; NO-FAST-P8-NEXT:    blr
 entry:
   %cmp = fcmp nnan ninf ogt float %a, %b
@@ -138,13 +137,12 @@ define dso_local double @testdmax_fast(double %a, double %b) local_unnamed_addr
 ;
 ; NO-FAST-P9-LABEL: testdmax_fast:
 ; NO-FAST-P9:       # %bb.0: # %entry
-; NO-FAST-P9-NEXT:    xsmaxcdp f1, f1, f2
+; NO-FAST-P9-NEXT:    xsmaxdp f1, f1, f2
 ; NO-FAST-P9-NEXT:    blr
 ;
 ; NO-FAST-P8-LABEL: testdmax_fast:
 ; NO-FAST-P8:       # %bb.0: # %entry
-; NO-FAST-P8-NEXT:    xssubdp f0, f2, f1
-; NO-FAST-P8-NEXT:    fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT:    xsmaxdp f1, f1, f2
 ; NO-FAST-P8-NEXT:    blr
 entry:
   %cmp = fcmp nnan ninf ogt double %a, %b
@@ -159,13 +157,12 @@ define dso_local float @testfmin_fast(float %a, float %b) local_unnamed_addr {
 ;
 ; NO-FAST-P9-LABEL: testfmin_fast:
 ; NO-FAST-P9:       # %bb.0: # %entry
-; NO-FAST-P9-NEXT:    xsmincdp f1, f1, f2
+; NO-FAST-P9-NEXT:    xsmindp f1, f1, f2
 ; NO-FAST-P9-NEXT:    blr
 ;
 ; NO-FAST-P8-LABEL: testfmin_fast:
 ; NO-FAST-P8:       # %bb.0: # %entry
-; NO-FAST-P8-NEXT:    xssubsp f0, f1, f2
-; NO-FAST-P8-NEXT:    fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT:    xsmindp f1, f1, f2
 ; NO-FAST-P8-NEXT:    blr
 entry:
   %cmp = fcmp nnan ninf olt float %a, %b
@@ -180,13 +177,12 @@ define dso_local double @testdmin_fast(double %a, double %b) local_unnamed_addr
 ;
 ; NO-FAST-P9-LABEL: testdmin_fast:
 ; NO-FAST-P9:       # %bb.0: # %entry
-; NO-FAST-P9-NEXT:    xsmincdp f1, f1, f2
+; NO-FAST-P9-NEXT:    xsmindp f1, f1, f2
 ; NO-FAST-P9-NEXT:    blr
 ;
 ; NO-FAST-P8-LABEL: testdmin_fast:
 ; NO-FAST-P8:       # %bb.0: # %entry
-; NO-FAST-P8-NEXT:    xssubdp f0, f1, f2
-; NO-FAST-P8-NEXT:    fsel f1, f0, f2, f1
+; NO-FAST-P8-NEXT:    xsmindp f1, f1, f2
 ; NO-FAST-P8-NEXT:    blr
 entry:
   %cmp = fcmp nnan ninf olt double %a, %b

@wzssyqa
Copy link
Contributor Author

wzssyqa commented Oct 21, 2024

@lei137 @stefanp-ibm ping.

Comment on lines -2738 to -2741
def : Pat<(f32 (fminnum_ieee (fcanonicalize f32:$A), f32:$B)),
(f32 FpMinMax.F32Min)>;
def : Pat<(f32 (fminnum_ieee f32:$A, (fcanonicalize f32:$B))),
(f32 FpMinMax.F32Min)>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand these patterns folding out a canonicalize on each individual operand. If FpMinMax.F32Min has the IEEE snan behavior, both need to be guaranteed quiet. Can these be removed as a precommit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove them will fail some current test cases.

Comment on lines +778 to +781
setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f32, Legal);
setOperationAction(ISD::FMINNUM, MVT::f64, Legal);
setOperationAction(ISD::FMINNUM, MVT::f32, Legal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beyond what the title suggests. Do all subtargets have the ignore signaling behavior, and only vsx has the ieee behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beyond what the title suggests.

It's a bootstrap problem:
If we fix FMAXNUM_IEEE only, some of test cases will fail.
The problem is that currently FMINNUM is expanded to
FCANONICALIZE + FCANONICALIZE + FMINNUM_IEEE
while FCANONICALIZE is not defined by here: I will submit it in a future patch.

Do all subtargets have the ignore signaling behavior, and only vsx has the ieee behavior?

It seems that PowerPC doesn't have min/max operation in its scale part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fix expandFMINNUMFMAXNUM first, some PowerPC test cases will fails with complains that we cannot legalize FMINNUM_IEEE.

Copy link
Contributor Author

@wzssyqa wzssyqa Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my plan to break this bootstrap problem is:

  1. mark Legal to ISD::FMINNUM for PowerPC/VSX.
    add FMINNUM_IEEE to PowerPC/VSX.
  2. add FCANONICALIZE support to PowerPC/VSX.
  3. Fix expandFMINNUMFMAXNUM to drop FCANONICALIZE if FMINNUM_IEEE is available.
  4. Add test cases of fmininumnum to PowerPC/VSX.
  5. Drop setOperationAction(ISD::FMINNUM, MVT::f32, Legal); from PowerPC/VSX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with the break down here is that it will leave things in an invalid state. Shouldn't step 1-4 be done together for the compiler behaviour to be valid? At the minimun, step 4, test casess should be added with the corresponding code change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ask @arsenm. I was asked to split these changes when I was working on other architectures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants