Skip to content

[Legalizer] Expand fmaximum and fminimum #67301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -5220,6 +5220,9 @@ class TargetLowering : public TargetLoweringBase {
/// Expand fminnum/fmaxnum into fminnum_ieee/fmaxnum_ieee with quieted inputs.
SDValue expandFMINNUM_FMAXNUM(SDNode *N, SelectionDAG &DAG) const;

/// Expand fminimum/fmaximum into multiple comparison with selects.
SDValue expandFMINIMUM_FMAXIMUM(SDNode *N, SelectionDAG &DAG) const;

/// Expand FP_TO_[US]INT_SAT into FP_TO_[US]INT and selects or min/max.
/// \param N Node to expand
/// \returns The expansion result
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3556,6 +3556,12 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Expanded);
break;
}
case ISD::FMINIMUM:
case ISD::FMAXIMUM: {
if (SDValue Expanded = TLI.expandFMINIMUM_FMAXIMUM(Node, DAG))
Results.push_back(Expanded);
break;
}
case ISD::FSIN:
case ISD::FCOS: {
EVT VT = Node->getValueType(0);
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1048,6 +1048,13 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
return;
}
break;
case ISD::FMINIMUM:
case ISD::FMAXIMUM:
if (SDValue Expanded = TLI.expandFMINIMUM_FMAXIMUM(Node, DAG)) {
Results.push_back(Expanded);
return;
}
break;
case ISD::SMIN:
case ISD::SMAX:
case ISD::UMIN:
Expand Down
58 changes: 58 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8366,6 +8366,64 @@ SDValue TargetLowering::expandFMINNUM_FMAXNUM(SDNode *Node,
return SDValue();
}

SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
SelectionDAG &DAG) const {
SDLoc DL(N);
SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);
unsigned Opc = N->getOpcode();
EVT VT = N->getValueType(0);
EVT CCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
bool IsMax = Opc == ISD::FMAXIMUM;

if (VT.isVector() &&
isOperationLegalOrCustomOrPromote(Opc, VT.getScalarType()))
return SDValue();

// First, implement comparison not propagating NaN. If no native fmin or fmax
// available, use plain select with setcc instead.
SDValue MinMax;
unsigned CompOpcIeee = IsMax ? ISD::FMAXNUM_IEEE : ISD::FMINNUM_IEEE;
unsigned CompOpc = IsMax ? ISD::FMAXNUM : ISD::FMINNUM;
if (isOperationLegalOrCustom(CompOpcIeee, VT)) {
MinMax = DAG.getNode(CompOpcIeee, DL, VT, LHS, RHS);
} else if (isOperationLegalOrCustom(CompOpc, VT)) {
MinMax = DAG.getNode(CompOpc, DL, VT, LHS, RHS);
} else {
// NaN (if exists) will be propagated later, so orderness doesn't matter.
SDValue Compare =
DAG.getSetCC(DL, CCVT, LHS, RHS, IsMax ? ISD::SETGT : ISD::SETLT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SETGT/SETLT leave it up to the target whether it returns true or false for NaN. Is that intentional? If so it's probably worth a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes here it assumes no NaN exists (if either operand is NaN, it will be propagated in following code)

MinMax = DAG.getSelect(DL, VT, Compare, LHS, RHS);
}

// Propagate any NaN of both operands
if (!N->getFlags().hasNoNaNs() &&
(!DAG.isKnownNeverNaN(RHS) || !DAG.isKnownNeverNaN(LHS))) {
ConstantFP *FPNaN = ConstantFP::get(
*DAG.getContext(), APFloat::getNaN(DAG.EVTToAPFloatSemantics(VT)));
MinMax = DAG.getSelect(DL, VT, DAG.getSetCC(DL, CCVT, LHS, RHS, ISD::SETUO),
DAG.getConstantFP(*FPNaN, DL, VT), MinMax);
}

// fminimum/fmaximum requires -0.0 less than +0.0
if (!N->getFlags().hasNoSignedZeros() && !DAG.isKnownNeverZeroFloat(RHS) &&
!DAG.isKnownNeverZeroFloat(LHS)) {
SDValue IsZero = DAG.getSetCC(DL, CCVT, MinMax,
DAG.getConstantFP(0.0, DL, VT), ISD::SETEQ);
SDValue TestZero =
DAG.getTargetConstant(IsMax ? fcPosZero : fcNegZero, DL, MVT::i32);
SDValue LCmp = DAG.getSelect(
DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, LHS, TestZero), LHS,
MinMax);
SDValue RCmp = DAG.getSelect(
DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, RHS, TestZero), RHS,
LCmp);
MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random thought - can this be done better with FCOPYSIGN?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need isfpclass to know which is negative/positive zero and whether both are zeros, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to avoid using is_fpclass here. Additionally, I think we have under-defined the internally used IEEE nodes. As currently defined, minnum_ieee/maxnum_ieee have unspecified signed 0 order. However for AMDGPU at least, the actual hardware instructions have always appropriately ordered 0s. We could either refine the _IEEE node definitions to be IEEE -2019 and require ordered 0 behavior which doesn't require this fixup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PowerPC implements fmaxnum_ieee into xsmaxdp which repsects zero signs. I did not see AMDGPU backend implements it.

While LoongArch (using fmax.d) ISA just says following the specification of maxNum(x,y) operation in the IEEE 754-2008 standard. I think it's unspecified. (maybe @llvm/pr-subscribers-loongarch )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #85195 to just fix the definitions here. If LoongArch doesn't behave correctly, we can lower it to preserve the sign there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#85195 is landed, so we should just assume the _IEEE flavors preserve the sign

}

return MinMax;
}

/// Returns a true value if if this FPClassTest can be performed with an ordered
/// fcmp to 0, and a false value if it's an unordered fcmp to 0. Returns
/// std::nullopt if it cannot be performed as a compare with 0.
Expand Down
14 changes: 5 additions & 9 deletions llvm/lib/Target/ARM/ARMISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1555,15 +1555,11 @@ ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,

if (Subtarget->hasNEON()) {
// vmin and vmax aren't available in a scalar form, so we can use
// a NEON instruction with an undef lane instead. This has a performance
// penalty on some cores, so we don't do this unless we have been
// asked to by the core tuning model.
if (Subtarget->useNEONForSinglePrecisionFP()) {
setOperationAction(ISD::FMINIMUM, MVT::f32, Legal);
setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal);
setOperationAction(ISD::FMINIMUM, MVT::f16, Legal);
setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal);
}
// a NEON instruction with an undef lane instead.
setOperationAction(ISD::FMINIMUM, MVT::f32, Legal);
setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal);
setOperationAction(ISD::FMINIMUM, MVT::f16, Legal);
setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal);
setOperationAction(ISD::FMINIMUM, MVT::v2f32, Legal);
setOperationAction(ISD::FMAXIMUM, MVT::v2f32, Legal);
setOperationAction(ISD::FMINIMUM, MVT::v4f32, Legal);
Expand Down
11 changes: 7 additions & 4 deletions llvm/lib/Target/RISCV/RISCVISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -524,8 +524,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
if (Subtarget.is64Bit())
setOperationAction(ISD::FPOWI, MVT::i32, Custom);

if (!Subtarget.hasStdExtZfa())
setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f16, Custom);
setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f16,
Subtarget.hasStdExtZfa() ? Legal : Custom);
}

if (Subtarget.hasStdExtFOrZfinx()) {
Expand All @@ -548,10 +548,12 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FP_TO_FP16, MVT::f32, Custom);
setOperationAction(ISD::FP16_TO_FP, MVT::f32, Custom);

if (Subtarget.hasStdExtZfa())
if (Subtarget.hasStdExtZfa()) {
setOperationAction(ISD::FNEARBYINT, MVT::f32, Legal);
else
setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f32, Legal);
} else {
setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f32, Custom);
}
}

if (Subtarget.hasStdExtFOrZfinx() && Subtarget.is64Bit())
Expand All @@ -566,6 +568,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
if (Subtarget.hasStdExtZfa()) {
setOperationAction(FPRndMode, MVT::f64, Legal);
setOperationAction(ISD::FNEARBYINT, MVT::f64, Legal);
setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f64, Legal);
} else {
if (Subtarget.is64Bit())
setOperationAction(FPRndMode, MVT::f64, Custom);
Expand Down
28 changes: 10 additions & 18 deletions llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,10 @@ define float @fminnum32_intrinsic(float %x, float %y) {
define float @fminnum32_nsz_intrinsic(float %x, float %y) {
; ARMV7-LABEL: fminnum32_nsz_intrinsic:
; ARMV7: @ %bb.0:
; ARMV7-NEXT: vmov s0, r0
; ARMV7-NEXT: vmov s2, r1
; ARMV7-NEXT: vcmp.f32 s0, s2
; ARMV7-NEXT: vmrs APSR_nzcv, fpscr
; ARMV7-NEXT: vmovlt.f32 s2, s0
; ARMV7-NEXT: vmov r0, s2
; ARMV7-NEXT: vmov s0, r1
; ARMV7-NEXT: vmov s2, r0
; ARMV7-NEXT: vmin.f32 d0, d1, d0
; ARMV7-NEXT: vmov r0, s0
; ARMV7-NEXT: bx lr
;
; ARMV8-LABEL: fminnum32_nsz_intrinsic:
Expand All @@ -78,9 +76,7 @@ define float @fminnum32_non_zero_intrinsic(float %x) {
; ARMV7: @ %bb.0:
; ARMV7-NEXT: vmov.f32 s0, #-1.000000e+00
; ARMV7-NEXT: vmov s2, r0
; ARMV7-NEXT: vcmp.f32 s2, s0
; ARMV7-NEXT: vmrs APSR_nzcv, fpscr
; ARMV7-NEXT: vmovlt.f32 s0, s2
; ARMV7-NEXT: vmin.f32 d0, d1, d0
; ARMV7-NEXT: vmov r0, s0
; ARMV7-NEXT: bx lr
;
Expand Down Expand Up @@ -136,12 +132,10 @@ define float @fmaxnum32_intrinsic(float %x, float %y) {
define float @fmaxnum32_nsz_intrinsic(float %x, float %y) {
; ARMV7-LABEL: fmaxnum32_nsz_intrinsic:
; ARMV7: @ %bb.0:
; ARMV7-NEXT: vmov s0, r0
; ARMV7-NEXT: vmov s2, r1
; ARMV7-NEXT: vcmp.f32 s0, s2
; ARMV7-NEXT: vmrs APSR_nzcv, fpscr
; ARMV7-NEXT: vmovgt.f32 s2, s0
; ARMV7-NEXT: vmov r0, s2
; ARMV7-NEXT: vmov s0, r1
; ARMV7-NEXT: vmov s2, r0
; ARMV7-NEXT: vmax.f32 d0, d1, d0
; ARMV7-NEXT: vmov r0, s0
; ARMV7-NEXT: bx lr
;
; ARMV8-LABEL: fmaxnum32_nsz_intrinsic:
Expand Down Expand Up @@ -210,9 +204,7 @@ define float @fmaxnum32_non_zero_intrinsic(float %x) {
; ARMV7: @ %bb.0:
; ARMV7-NEXT: vmov.f32 s0, #1.000000e+00
; ARMV7-NEXT: vmov s2, r0
; ARMV7-NEXT: vcmp.f32 s2, s0
; ARMV7-NEXT: vmrs APSR_nzcv, fpscr
; ARMV7-NEXT: vmovgt.f32 s0, s2
; ARMV7-NEXT: vmax.f32 d0, d1, d0
; ARMV7-NEXT: vmov r0, s0
; ARMV7-NEXT: bx lr
;
Expand Down
97 changes: 97 additions & 0 deletions llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 < %s | FileCheck %s

define fp128 @f128_minimum(fp128 %a, fp128 %b) {
; CHECK-LABEL: f128_minimum:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xscmpuqp 0, 2, 3
; CHECK-NEXT: vmr 4, 2
; CHECK-NEXT: bge 0, .LBB0_8
; CHECK-NEXT: # %bb.1: # %entry
; CHECK-NEXT: bun 0, .LBB0_9
; CHECK-NEXT: .LBB0_2: # %entry
; CHECK-NEXT: xststdcqp 0, 2, 4
; CHECK-NEXT: bc 4, 2, .LBB0_10
; CHECK-NEXT: .LBB0_3: # %entry
; CHECK-NEXT: xststdcqp 0, 3, 4
; CHECK-NEXT: bc 12, 2, .LBB0_5
; CHECK-NEXT: .LBB0_4: # %entry
; CHECK-NEXT: vmr 3, 2
; CHECK-NEXT: .LBB0_5: # %entry
; CHECK-NEXT: addis 3, 2, .LCPI0_1@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI0_1@toc@l
; CHECK-NEXT: lxv 34, 0(3)
; CHECK-NEXT: xscmpuqp 0, 4, 2
; CHECK-NEXT: beq 0, .LBB0_7
; CHECK-NEXT: # %bb.6: # %entry
; CHECK-NEXT: vmr 3, 4
; CHECK-NEXT: .LBB0_7: # %entry
; CHECK-NEXT: vmr 2, 3
; CHECK-NEXT: blr
; CHECK-NEXT: .LBB0_8: # %entry
; CHECK-NEXT: vmr 4, 3
; CHECK-NEXT: bnu 0, .LBB0_2
; CHECK-NEXT: .LBB0_9:
; CHECK-NEXT: addis 3, 2, .LCPI0_0@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI0_0@toc@l
; CHECK-NEXT: lxv 36, 0(3)
; CHECK-NEXT: xststdcqp 0, 2, 4
; CHECK-NEXT: bc 12, 2, .LBB0_3
; CHECK-NEXT: .LBB0_10: # %entry
; CHECK-NEXT: vmr 2, 4
; CHECK-NEXT: xststdcqp 0, 3, 4
; CHECK-NEXT: bc 4, 2, .LBB0_4
; CHECK-NEXT: b .LBB0_5
entry:
%m = call fp128 @llvm.minimum.f128(fp128 %a, fp128 %b)
ret fp128 %m
}

define fp128 @f128_maximum(fp128 %a, fp128 %b) {
; CHECK-LABEL: f128_maximum:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xscmpuqp 0, 2, 3
; CHECK-NEXT: vmr 4, 2
; CHECK-NEXT: ble 0, .LBB1_8
; CHECK-NEXT: # %bb.1: # %entry
; CHECK-NEXT: bun 0, .LBB1_9
; CHECK-NEXT: .LBB1_2: # %entry
; CHECK-NEXT: xststdcqp 0, 2, 8
; CHECK-NEXT: bc 4, 2, .LBB1_10
; CHECK-NEXT: .LBB1_3: # %entry
; CHECK-NEXT: xststdcqp 0, 3, 8
; CHECK-NEXT: bc 12, 2, .LBB1_5
; CHECK-NEXT: .LBB1_4: # %entry
; CHECK-NEXT: vmr 3, 2
; CHECK-NEXT: .LBB1_5: # %entry
; CHECK-NEXT: addis 3, 2, .LCPI1_1@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI1_1@toc@l
; CHECK-NEXT: lxv 34, 0(3)
; CHECK-NEXT: xscmpuqp 0, 4, 2
; CHECK-NEXT: beq 0, .LBB1_7
; CHECK-NEXT: # %bb.6: # %entry
; CHECK-NEXT: vmr 3, 4
; CHECK-NEXT: .LBB1_7: # %entry
; CHECK-NEXT: vmr 2, 3
; CHECK-NEXT: blr
; CHECK-NEXT: .LBB1_8: # %entry
; CHECK-NEXT: vmr 4, 3
; CHECK-NEXT: bnu 0, .LBB1_2
; CHECK-NEXT: .LBB1_9:
; CHECK-NEXT: addis 3, 2, .LCPI1_0@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI1_0@toc@l
; CHECK-NEXT: lxv 36, 0(3)
; CHECK-NEXT: xststdcqp 0, 2, 8
; CHECK-NEXT: bc 12, 2, .LBB1_3
; CHECK-NEXT: .LBB1_10: # %entry
; CHECK-NEXT: vmr 2, 4
; CHECK-NEXT: xststdcqp 0, 3, 8
; CHECK-NEXT: bc 4, 2, .LBB1_4
; CHECK-NEXT: b .LBB1_5
entry:
%m = call fp128 @llvm.maximum.f128(fp128 %a, fp128 %b)
ret fp128 %m
}

declare fp128 @llvm.minimum.f128(fp128, fp128)
declare fp128 @llvm.maximum.f128(fp128, fp128)
Loading