-
Notifications
You must be signed in to change notification settings - Fork 14.3k
AArch64: Add FMINNUM_IEEE and FMAXNUM_IEEE support #107855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-aarch64 Author: YunQiang Su (wzssyqa) ChangesFMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. Update combine_andor_with_cmps.ll. Full diff: https://github.com/llvm/llvm-project/pull/107855.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d1ddbfa300846b..cea16a257ef450 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -860,12 +860,14 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FP_ROUND, MVT::v4bf16, Custom);
// AArch64 has implementations of a lot of rounding-like FP operations.
+ // clang-format off
for (auto Op :
{ISD::FFLOOR, ISD::FNEARBYINT, ISD::FCEIL,
ISD::FRINT, ISD::FTRUNC, ISD::FROUND,
ISD::FROUNDEVEN, ISD::FMINNUM, ISD::FMAXNUM,
ISD::FMINIMUM, ISD::FMAXIMUM, ISD::LROUND,
ISD::LLROUND, ISD::LRINT, ISD::LLRINT,
+ ISD::FMINNUM_IEEE, ISD::FMAXNUM_IEEE,
ISD::STRICT_FFLOOR, ISD::STRICT_FCEIL, ISD::STRICT_FNEARBYINT,
ISD::STRICT_FRINT, ISD::STRICT_FTRUNC, ISD::STRICT_FROUNDEVEN,
ISD::STRICT_FROUND, ISD::STRICT_FMINNUM, ISD::STRICT_FMAXNUM,
@@ -876,6 +878,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
if (Subtarget->hasFullFP16())
setOperationAction(Op, MVT::f16, Legal);
}
+ // clang-format on
// Basic strict FP operations are legal
for (auto Op : {ISD::STRICT_FADD, ISD::STRICT_FSUB, ISD::STRICT_FMUL,
@@ -1201,6 +1204,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
ISD::STRICT_FMINIMUM, ISD::STRICT_FMAXIMUM})
setOperationAction(Op, MVT::v1f64, Expand);
// clang-format on
+
+ for (auto Op : {ISD::FMAXNUM_IEEE, ISD::FMINNUM_IEEE})
+ setOperationAction(Op, MVT::v1f64, Legal);
+
for (auto Op :
{ISD::FP_TO_SINT, ISD::FP_TO_UINT, ISD::SINT_TO_FP, ISD::UINT_TO_FP,
ISD::FP_ROUND, ISD::FP_TO_SINT_SAT, ISD::FP_TO_UINT_SAT, ISD::MUL,
@@ -1345,11 +1352,13 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
}
// AArch64 has implementations of a lot of rounding-like FP operations.
+ // And the same for FMAXNUM_IEEE and FMINNUM_IEEE.
for (auto Op :
{ISD::FFLOOR, ISD::FNEARBYINT, ISD::FCEIL, ISD::FRINT, ISD::FTRUNC,
- ISD::FROUND, ISD::FROUNDEVEN, ISD::STRICT_FFLOOR,
- ISD::STRICT_FNEARBYINT, ISD::STRICT_FCEIL, ISD::STRICT_FRINT,
- ISD::STRICT_FTRUNC, ISD::STRICT_FROUND, ISD::STRICT_FROUNDEVEN}) {
+ ISD::FROUND, ISD::FROUNDEVEN, ISD::STRICT_FFLOOR, ISD::FMAXNUM_IEEE,
+ ISD::FMINNUM_IEEE, ISD::STRICT_FNEARBYINT, ISD::STRICT_FCEIL,
+ ISD::STRICT_FRINT, ISD::STRICT_FTRUNC, ISD::STRICT_FROUND,
+ ISD::STRICT_FROUNDEVEN}) {
for (MVT Ty : {MVT::v2f32, MVT::v4f32, MVT::v2f64})
setOperationAction(Op, Ty, Legal);
if (Subtarget->hasFullFP16())
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index ccef85bfaa8afc..4e75c4f9e94285 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -5066,6 +5066,23 @@ def : Pat<(v1f64 (fmaxnum (v1f64 FPR64:$Rn), (v1f64 FPR64:$Rm))),
def : Pat<(v1f64 (fminnum (v1f64 FPR64:$Rn), (v1f64 FPR64:$Rm))),
(FMINNMDrr FPR64:$Rn, FPR64:$Rm)>;
+def : Pat<(v1f64 (fmaxnum_ieee (v1f64 FPR64:$Rn), (v1f64 FPR64:$Rm))),
+ (FMAXNMDrr FPR64:$Rn, FPR64:$Rm)>;
+def : Pat<(v1f64 (fminnum_ieee (v1f64 FPR64:$Rn), (v1f64 FPR64:$Rm))),
+ (FMINNMDrr FPR64:$Rn, FPR64:$Rm)>;
+def : Pat<(fminnum_ieee (f64 FPR64:$a), (f64 FPR64:$b)),
+ (FMINNMDrr FPR64:$a, FPR64:$b)>;
+def : Pat<(fminnum_ieee (f32 FPR32:$a), (f32 FPR32:$b)),
+ (FMINNMSrr FPR32:$a, FPR32:$b)>;
+def : Pat<(fminnum_ieee (f16 FPR16:$a), (f16 FPR16:$b)),
+ (FMINNMHrr FPR16:$a, FPR16:$b)>;
+def : Pat<(fmaxnum_ieee (f64 FPR64:$a), (f64 FPR64:$b)),
+ (FMAXNMDrr FPR64:$a, FPR64:$b)>;
+def : Pat<(fmaxnum_ieee (f32 FPR32:$a), (f32 FPR32:$b)),
+ (FMAXNMSrr FPR32:$a, FPR32:$b)>;
+def : Pat<(fmaxnum_ieee (f16 FPR16:$a), (f16 FPR16:$b)),
+ (FMAXNMHrr FPR16:$a, FPR16:$b)>;
+
//===----------------------------------------------------------------------===//
// Floating point three operand instructions.
//===----------------------------------------------------------------------===//
@@ -5558,6 +5575,27 @@ defm FMINNM : SIMDThreeSameVectorFP<0,1,0b000,"fminnm", any_fminnum>;
defm FMINP : SIMDThreeSameVectorFP<1,1,0b110,"fminp", int_aarch64_neon_fminp>;
defm FMIN : SIMDThreeSameVectorFP<0,1,0b110,"fmin", any_fminimum>;
+def : Pat<(v2f64 (fminnum_ieee (v2f64 V128:$Rn), (v2f64 V128:$Rm))),
+ (v2f64 (FMINNMv2f64 (v2f64 V128:$Rn), (v2f64 V128:$Rm)))>;
+def : Pat<(v4f32 (fminnum_ieee (v4f32 V128:$Rn), (v4f32 V128:$Rm))),
+ (v4f32 (FMINNMv4f32 (v4f32 V128:$Rn), (v4f32 V128:$Rm)))>;
+def : Pat<(v8f16 (fminnum_ieee (v8f16 V128:$Rn), (v8f16 V128:$Rm))),
+ (v8f16 (FMINNMv8f16 (v8f16 V128:$Rn), (v8f16 V128:$Rm)))>;
+def : Pat<(v2f32 (fminnum_ieee (v2f32 V64:$Rn), (v2f32 V64:$Rm))),
+ (v2f32 (FMINNMv2f32 (v2f32 V64:$Rn), (v2f32 V64:$Rm)))>;
+def : Pat<(v4f16 (fminnum_ieee (v4f16 V64:$Rn), (v4f16 V64:$Rm))),
+ (v4f16 (FMINNMv4f16 (v4f16 V64:$Rn), (v4f16 V64:$Rm)))>;
+def : Pat<(v2f64 (fmaxnum_ieee (v2f64 V128:$Rn), (v2f64 V128:$Rm))),
+ (v2f64 (FMAXNMv2f64 (v2f64 V128:$Rn), (v2f64 V128:$Rm)))>;
+def : Pat<(v4f32 (fmaxnum_ieee (v4f32 V128:$Rn), (v4f32 V128:$Rm))),
+ (v4f32 (FMAXNMv4f32 (v4f32 V128:$Rn), (v4f32 V128:$Rm)))>;
+def : Pat<(v8f16 (fmaxnum_ieee (v8f16 V128:$Rn), (v8f16 V128:$Rm))),
+ (v8f16 (FMAXNMv8f16 (v8f16 V128:$Rn), (v8f16 V128:$Rm)))>;
+def : Pat<(v2f32 (fmaxnum_ieee (v2f32 V64:$Rn), (v2f32 V64:$Rm))),
+ (v2f32 (FMAXNMv2f32 (v2f32 V64:$Rn), (v2f32 V64:$Rm)))>;
+def : Pat<(v4f16 (fmaxnum_ieee (v4f16 V64:$Rn), (v4f16 V64:$Rm))),
+ (v4f16 (FMAXNMv4f16 (v4f16 V64:$Rn), (v4f16 V64:$Rm)))>;
+
// NOTE: The operands of the PatFrag are reordered on FMLA/FMLS because the
// instruction expects the addend first, while the fma intrinsic puts it last.
defm FMLA : SIMDThreeSameVectorFPTied<0, 0, 0b001, "fmla",
diff --git a/llvm/test/CodeGen/AArch64/combine_andor_with_cmps.ll b/llvm/test/CodeGen/AArch64/combine_andor_with_cmps.ll
index 783683cf7e8442..89cb25f3d9d756 100644
--- a/llvm/test/CodeGen/AArch64/combine_andor_with_cmps.ll
+++ b/llvm/test/CodeGen/AArch64/combine_andor_with_cmps.ll
@@ -31,9 +31,6 @@ define i1 @test2(double %arg1, double %arg2, double %arg3) #0 {
ret i1 %or1
}
-; It is illegal to apply the optimization in the following two test cases
-; because FMINNUM_IEEE and FMAXNUM_IEEE are not supported.
-
define i1 @test3(float %arg1, float %arg2, float %arg3) {
; CHECK-LABEL: test3:
; CHECK: // %bb.0:
@@ -41,8 +38,8 @@ define i1 @test3(float %arg1, float %arg2, float %arg3) {
; CHECK-NEXT: fadd s0, s0, s3
; CHECK-NEXT: fmov s3, #2.00000000
; CHECK-NEXT: fadd s1, s1, s3
-; CHECK-NEXT: fcmp s1, s2
-; CHECK-NEXT: fccmp s0, s2, #0, lt
+; CHECK-NEXT: fmaxnm s0, s0, s1
+; CHECK-NEXT: fcmp s0, s2
; CHECK-NEXT: cset w0, lt
; CHECK-NEXT: ret
%add1 = fadd nnan float %arg1, 1.0
@@ -60,8 +57,8 @@ define i1 @test4(float %arg1, float %arg2, float %arg3) {
; CHECK-NEXT: fadd s0, s0, s3
; CHECK-NEXT: fmov s3, #2.00000000
; CHECK-NEXT: fadd s1, s1, s3
-; CHECK-NEXT: fcmp s1, s2
-; CHECK-NEXT: fccmp s0, s2, #4, gt
+; CHECK-NEXT: fminnm s0, s0, s1
+; CHECK-NEXT: fcmp s0, s2
; CHECK-NEXT: cset w0, gt
; CHECK-NEXT: ret
%add1 = fadd nnan float %arg1, 1.0
diff --git a/llvm/test/CodeGen/AArch64/fp-maximumnum-minimumnum.ll b/llvm/test/CodeGen/AArch64/fp-maximumnum-minimumnum.ll
new file mode 100644
index 00000000000000..686192615ddebe
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fp-maximumnum-minimumnum.ll
@@ -0,0 +1,145 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc --mtriple=aarch64 --mattr=+fullfp16 < %s | FileCheck %s --check-prefix=AARCH64
+
+define <2 x double> @max_v2f64(<2 x double> %a, <2 x double> %b) {
+; AARCH64-LABEL: max_v2f64:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fmaxnm v0.2d, v0.2d, v1.2d
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <2 x double> @llvm.maximumnum.v2f64(<2 x double> %a, <2 x double> %b)
+ ret <2 x double> %c
+}
+
+define <4 x float> @max_v4f32(<4 x float> %a, <4 x float> %b) {
+; AARCH64-LABEL: max_v4f32:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fmaxnm v0.4s, v0.4s, v1.4s
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <4 x float> @llvm.maximumnum.v2f64(<4 x float> %a, <4 x float> %b)
+ ret <4 x float> %c
+}
+
+
+define <8 x half> @max_v8f16(<8 x half> %a, <8 x half> %b) {
+; AARCH64-LABEL: max_v8f16:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fmaxnm v0.8h, v0.8h, v1.8h
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <8 x half> @llvm.maximumnum.v4f16(<8 x half> %a, <8 x half> %b)
+ ret <8 x half> %c
+}
+
+define <1 x double> @max_v1f64(<1 x double> %a, <1 x double> %b) {
+; AARCH64-LABEL: max_v1f64:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fmaxnm d0, d0, d1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <1 x double> @llvm.maximumnum.v1f64(<1 x double> %a, <1 x double> %b)
+ ret <1 x double> %c
+}
+
+define double @max_f64(double %a, double %b) {
+; AARCH64-LABEL: max_f64:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fmaxnm d0, d0, d1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan double @llvm.maximumnum.f64(double %a, double %b)
+ ret double %c
+}
+
+define float @max_f32(float %a, float %b) {
+; AARCH64-LABEL: max_f32:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fmaxnm s0, s0, s1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan float @llvm.maximumnum.f32(float %a, float %b)
+ ret float %c
+}
+
+define half @max_f16(half %a, half %b) {
+; AARCH64-LABEL: max_f16:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fmaxnm h0, h0, h1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan half @llvm.maximumnum.f16(half %a, half %b)
+ ret half %c
+}
+
+define <2 x double> @min_v2f64(<2 x double> %a, <2 x double> %b) {
+; AARCH64-LABEL: min_v2f64:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fminnm v0.2d, v0.2d, v1.2d
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <2 x double> @llvm.minimumnum.v2f64(<2 x double> %a, <2 x double> %b)
+ ret <2 x double> %c
+}
+
+define <4 x float> @min_v4f32(<4 x float> %a, <4 x float> %b) {
+; AARCH64-LABEL: min_v4f32:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fminnm v0.4s, v0.4s, v1.4s
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <4 x float> @llvm.minimumnum.v2f64(<4 x float> %a, <4 x float> %b)
+ ret <4 x float> %c
+}
+
+
+define <8 x half> @min_v8f16(<8 x half> %a, <8 x half> %b) {
+; AARCH64-LABEL: min_v8f16:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fminnm v0.8h, v0.8h, v1.8h
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <8 x half> @llvm.minimumnum.v4f16(<8 x half> %a, <8 x half> %b)
+ ret <8 x half> %c
+}
+
+define <1 x double> @min_v1f64(<1 x double> %a, <1 x double> %b) {
+; AARCH64-LABEL: min_v1f64:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fminnm d0, d0, d1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan <1 x double> @llvm.minimumnum.v1f64(<1 x double> %a, <1 x double> %b)
+ ret <1 x double> %c
+}
+
+define double @min_f64(double %a, double %b) {
+; AARCH64-LABEL: min_f64:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fminnm d0, d0, d1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan double @llvm.minimumnum.f64(double %a, double %b)
+ ret double %c
+}
+
+define float @min_f32(float %a, float %b) {
+; AARCH64-LABEL: min_f32:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fminnm s0, s0, s1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan float @llvm.minimumnum.f32(float %a, float %b)
+ ret float %c
+}
+
+define half @min_f16(half %a, half %b) {
+; AARCH64-LABEL: min_f16:
+; AARCH64: // %bb.0: // %entry
+; AARCH64-NEXT: fminnm h0, h0, h1
+; AARCH64-NEXT: ret
+entry:
+ %c = call nnan half @llvm.minimumnum.f16(half %a, half %b)
+ ret half %c
+}
+
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
9d47437
to
63a9197
Compare
63a9197
to
5f19028
Compare
@arsenm ping. |
5f19028
to
60cccde
Compare
60cccde
to
1eb6901
Compare
1eb6901
to
b7d2651
Compare
FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. Update combine_andor_with_cmps.ll. Add fp-maximumnum-minimumnum.ll, with nnan testcases only. Known problem: V1F64 is not supported yet. If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem: both of them use if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT)). AArch64 depends on expandFMINNUM_FMAXNUM returning SDValue() for FMAXNUM and FMINNUM. We should fix this problem, while it will be in future patch.
b7d2651
to
21fe931
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Is the plan to change how fmax() is lowered? Otherwise this seems backwards.
(FMINNMDrr FPR64:$a, FPR64:$b)>; | ||
def : Pat<(fminnum_ieee (f32 FPR32:$a), (f32 FPR32:$b)), | ||
(FMINNMSrr FPR32:$a, FPR32:$b)>; | ||
def : Pat<(fminnum_ieee (f16 FPR16:$a), (f16 FPR16:$b)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp16 patterns require HasFullFP16 predicates, otherwise they could select instructions that are not present.
FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. Update combine_andor_with_cmps.ll. Add fp-maximumnum-minimumnum.ll, with nnan testcases only. V1F64 is not supported yet. If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem: both of them use `if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT))`. AArch64 depends on `expandFMINNUM_FMAXNUM` returning `SDValue()` for FMAXNUM and FMINNUM. We should fix this problem, while it will be in future patch.
FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them.
Update combine_andor_with_cmps.ll.
Add fp-maximumnum-minimumnum.ll, with nnan testcases only.
V1F64 is not supported yet.
If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem:
both of them use
if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT))
.AArch64 depends on
expandFMINNUM_FMAXNUM
returningSDValue()
for FMAXNUM and FMINNUM.
We should fix this problem, while it will be in future patch.