Skip to content

Commit 28b573d

Browse files
committed
[TargetLowering] Fix another potential FPE in expandFP_TO_UINT
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105
1 parent 040c39d commit 28b573d

File tree

8 files changed

+435
-344
lines changed

8 files changed

+435
-344
lines changed

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6064,28 +6064,28 @@ bool TargetLowering::expandFP_TO_UINT(SDNode *Node, SDValue &Result,
60646064
// Expand based on maximum range of FP_TO_SINT, if the value exceeds the
60656065
// signmask then offset (the result of which should be fully representable).
60666066
// Sel = Src < 0x8000000000000000
6067-
// Val = select Sel, Src, Src - 0x8000000000000000
6068-
// Ofs = select Sel, 0, 0x8000000000000000
6069-
// Result = fp_to_sint(Val) ^ Ofs
6067+
// FltOfs = select Sel, 0, 0x8000000000000000
6068+
// IntOfs = select Sel, 0, 0x8000000000000000
6069+
// Result = fp_to_sint(Src - FltOfs) ^ IntOfs
60706070

60716071
// TODO: Should any fast-math-flags be set for the FSUB?
6072-
SDValue SrcBiased;
6073-
if (Node->isStrictFPOpcode())
6074-
SrcBiased = DAG.getNode(ISD::STRICT_FSUB, dl, { SrcVT, MVT::Other },
6075-
{ Node->getOperand(0), Src, Cst });
6076-
else
6077-
SrcBiased = DAG.getNode(ISD::FSUB, dl, SrcVT, Src, Cst);
6078-
SDValue Val = DAG.getSelect(dl, SrcVT, Sel, Src, SrcBiased);
6079-
SDValue Ofs = DAG.getSelect(dl, DstVT, Sel, DAG.getConstant(0, dl, DstVT),
6080-
DAG.getConstant(SignMask, dl, DstVT));
6072+
SDValue FltOfs = DAG.getSelect(dl, SrcVT, Sel,
6073+
DAG.getConstantFP(0.0, dl, SrcVT), Cst);
6074+
SDValue IntOfs = DAG.getSelect(dl, DstVT, Sel,
6075+
DAG.getConstant(0, dl, DstVT),
6076+
DAG.getConstant(SignMask, dl, DstVT));
60816077
SDValue SInt;
60826078
if (Node->isStrictFPOpcode()) {
6079+
SDValue Val = DAG.getNode(ISD::STRICT_FSUB, dl, { SrcVT, MVT::Other },
6080+
{ Node->getOperand(0), Src, FltOfs });
60836081
SInt = DAG.getNode(ISD::STRICT_FP_TO_SINT, dl, { DstVT, MVT::Other },
6084-
{ SrcBiased.getValue(1), Val });
6082+
{ Val.getValue(1), Val });
60856083
Chain = SInt.getValue(1);
6086-
} else
6084+
} else {
6085+
SDValue Val = DAG.getNode(ISD::FSUB, dl, SrcVT, Src, FltOfs);
60876086
SInt = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Val);
6088-
Result = DAG.getNode(ISD::XOR, dl, DstVT, SInt, Ofs);
6087+
}
6088+
Result = DAG.getNode(ISD::XOR, dl, DstVT, SInt, IntOfs);
60896089
} else {
60906090
// Expand based on maximum range of FP_TO_SINT:
60916091
// True = fp_to_sint(Src)

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19047,16 +19047,16 @@ X86TargetLowering::FP_TO_INTHelper(SDValue Op, SelectionDAG &DAG,
1904719047
// of a signed i64. Let Thresh be the FP equivalent of
1904819048
// 0x8000000000000000ULL.
1904919049
//
19050-
// Adjust i32 = (Value < Thresh) ? 0 : 0x80000000;
19051-
// FistSrc = (Value < Thresh) ? Value : (Value - Thresh);
19050+
// Adjust = (Value < Thresh) ? 0 : 0x80000000;
19051+
// FltOfs = (Value < Thresh) ? 0 : 0x80000000;
19052+
// FistSrc = (Value - FltOfs);
1905219053
// Fist-to-mem64 FistSrc
1905319054
// Add 0 or 0x800...0ULL to the 64-bit result, which is equivalent
1905419055
// to XOR'ing the high 32 bits with Adjust.
1905519056
//
1905619057
// Being a power of 2, Thresh is exactly representable in all FP formats.
1905719058
// For X87 we'd like to use the smallest FP type for this constant, but
1905819059
// for DAG type consistency we have to match the FP operand type.
19059-
// FIXME: This code generates a spurious inexact exception for 1.0.
1906019060

1906119061
APFloat Thresh(APFloat::IEEEsingle(), APInt(32, 0x5f000000));
1906219062
LLVM_ATTRIBUTE_UNUSED APFloat::opStatus Status = APFloat::opOK;
@@ -19082,18 +19082,16 @@ X86TargetLowering::FP_TO_INTHelper(SDValue Op, SelectionDAG &DAG,
1908219082
DAG.getConstant(0, DL, MVT::i64),
1908319083
DAG.getConstant(APInt::getSignMask(64),
1908419084
DL, MVT::i64));
19085-
SDValue Sub;
19085+
SDValue FltOfs = DAG.getSelect(DL, TheVT, Cmp,
19086+
DAG.getConstantFP(0.0, DL, TheVT),
19087+
ThreshVal);
19088+
1908619089
if (IsStrict) {
19087-
Sub = DAG.getNode(ISD::STRICT_FSUB, DL, { TheVT, MVT::Other},
19088-
{ Chain, Value, ThreshVal });
19089-
Chain = Sub.getValue(1);
19090+
Value = DAG.getNode(ISD::STRICT_FSUB, DL, { TheVT, MVT::Other},
19091+
{ Chain, Value, FltOfs });
19092+
Chain = Value.getValue(1);
1909019093
} else
19091-
Sub = DAG.getNode(ISD::FSUB, DL, TheVT, Value, ThreshVal);
19092-
19093-
Cmp = DAG.getSetCC(DL, getSetCCResultType(DAG.getDataLayout(),
19094-
*DAG.getContext(), TheVT),
19095-
Value, ThreshVal, ISD::SETLT);
19096-
Value = DAG.getSelect(DL, TheVT, Cmp, Value, Sub);
19094+
Value = DAG.getNode(ISD::FSUB, DL, TheVT, Value, FltOfs);
1909719095
}
1909819096

1909919097
MachinePointerInfo MPI = MachinePointerInfo::getFixedStack(MF, SSFI);

llvm/test/CodeGen/SystemZ/fp-strict-conv-10.ll

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,15 @@ define i32 @f1(float %f) #0 {
2020
; CHECK-NEXT: larl %r1, .LCPI0_0
2121
; CHECK-NEXT: le %f1, 0(%r1)
2222
; CHECK-NEXT: cebr %f0, %f1
23-
; CHECK-NEXT: lhi %r0, 0
24-
; CHECK-NEXT: jl .LBB0_2
23+
; CHECK-NEXT: jnl .LBB0_2
2524
; CHECK-NEXT: # %bb.1:
26-
; CHECK-NEXT: sebr %f0, %f1
27-
; CHECK-NEXT: llilh %r0, 32768
25+
; CHECK-NEXT: lhi %r0, 0
26+
; CHECK-NEXT: lzer %f1
27+
; CHECK-NEXT: j .LBB0_3
2828
; CHECK-NEXT: .LBB0_2:
29+
; CHECK-NEXT: llilh %r0, 32768
30+
; CHECK-NEXT: .LBB0_3:
31+
; CHECK-NEXT: sebr %f0, %f1
2932
; CHECK-NEXT: cfebr %r2, 5, %f0
3033
; CHECK-NEXT: xr %r2, %r0
3134
; CHECK-NEXT: br %r14
@@ -41,12 +44,15 @@ define i32 @f2(double %f) #0 {
4144
; CHECK-NEXT: larl %r1, .LCPI1_0
4245
; CHECK-NEXT: ldeb %f1, 0(%r1)
4346
; CHECK-NEXT: cdbr %f0, %f1
44-
; CHECK-NEXT: lhi %r0, 0
45-
; CHECK-NEXT: jl .LBB1_2
47+
; CHECK-NEXT: jnl .LBB1_2
4648
; CHECK-NEXT: # %bb.1:
47-
; CHECK-NEXT: sdbr %f0, %f1
48-
; CHECK-NEXT: llilh %r0, 32768
49+
; CHECK-NEXT: lhi %r0, 0
50+
; CHECK-NEXT: lzdr %f1
51+
; CHECK-NEXT: j .LBB1_3
4952
; CHECK-NEXT: .LBB1_2:
53+
; CHECK-NEXT: llilh %r0, 32768
54+
; CHECK-NEXT: .LBB1_3:
55+
; CHECK-NEXT: sdbr %f0, %f1
5056
; CHECK-NEXT: cfdbr %r2, 5, %f0
5157
; CHECK-NEXT: xr %r2, %r0
5258
; CHECK-NEXT: br %r14
@@ -64,12 +70,15 @@ define i32 @f3(fp128 *%src) #0 {
6470
; CHECK-NEXT: larl %r1, .LCPI2_0
6571
; CHECK-NEXT: lxeb %f1, 0(%r1)
6672
; CHECK-NEXT: cxbr %f0, %f1
67-
; CHECK-NEXT: lhi %r0, 0
68-
; CHECK-NEXT: jl .LBB2_2
73+
; CHECK-NEXT: jnl .LBB2_2
6974
; CHECK-NEXT: # %bb.1:
70-
; CHECK-NEXT: sxbr %f0, %f1
71-
; CHECK-NEXT: llilh %r0, 32768
75+
; CHECK-NEXT: lhi %r0, 0
76+
; CHECK-NEXT: lzxr %f1
77+
; CHECK-NEXT: j .LBB2_3
7278
; CHECK-NEXT: .LBB2_2:
79+
; CHECK-NEXT: llilh %r0, 32768
80+
; CHECK-NEXT: .LBB2_3:
81+
; CHECK-NEXT: sxbr %f0, %f1
7382
; CHECK-NEXT: cfxbr %r2, 5, %f0
7483
; CHECK-NEXT: xr %r2, %r0
7584
; CHECK-NEXT: br %r14

llvm/test/CodeGen/SystemZ/fp-strict-conv-12.ll

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,15 @@ define i64 @f1(float %f) #0 {
1919
; CHECK-NEXT: larl %r1, .LCPI0_0
2020
; CHECK-NEXT: le %f1, 0(%r1)
2121
; CHECK-NEXT: cebr %f0, %f1
22-
; CHECK-NEXT: lghi %r0, 0
23-
; CHECK-NEXT: jl .LBB0_2
22+
; CHECK-NEXT: jnl .LBB0_2
2423
; CHECK-NEXT: # %bb.1:
25-
; CHECK-NEXT: sebr %f0, %f1
26-
; CHECK-NEXT: llihh %r0, 32768
24+
; CHECK-NEXT: lghi %r0, 0
25+
; CHECK-NEXT: lzer %f1
26+
; CHECK-NEXT: j .LBB0_3
2727
; CHECK-NEXT: .LBB0_2:
28+
; CHECK-NEXT: llihh %r0, 32768
29+
; CHECK-NEXT: .LBB0_3:
30+
; CHECK-NEXT: sebr %f0, %f1
2831
; CHECK-NEXT: cgebr %r2, 5, %f0
2932
; CHECK-NEXT: xgr %r2, %r0
3033
; CHECK-NEXT: br %r14
@@ -40,12 +43,15 @@ define i64 @f2(double %f) #0 {
4043
; CHECK-NEXT: larl %r1, .LCPI1_0
4144
; CHECK-NEXT: ldeb %f1, 0(%r1)
4245
; CHECK-NEXT: cdbr %f0, %f1
43-
; CHECK-NEXT: lghi %r0, 0
44-
; CHECK-NEXT: jl .LBB1_2
46+
; CHECK-NEXT: jnl .LBB1_2
4547
; CHECK-NEXT: # %bb.1:
46-
; CHECK-NEXT: sdbr %f0, %f1
47-
; CHECK-NEXT: llihh %r0, 32768
48+
; CHECK-NEXT: lghi %r0, 0
49+
; CHECK-NEXT: lzdr %f1
50+
; CHECK-NEXT: j .LBB1_3
4851
; CHECK-NEXT: .LBB1_2:
52+
; CHECK-NEXT: llihh %r0, 32768
53+
; CHECK-NEXT: .LBB1_3:
54+
; CHECK-NEXT: sdbr %f0, %f1
4955
; CHECK-NEXT: cgdbr %r2, 5, %f0
5056
; CHECK-NEXT: xgr %r2, %r0
5157
; CHECK-NEXT: br %r14
@@ -63,12 +69,15 @@ define i64 @f3(fp128 *%src) #0 {
6369
; CHECK-NEXT: larl %r1, .LCPI2_0
6470
; CHECK-NEXT: lxeb %f1, 0(%r1)
6571
; CHECK-NEXT: cxbr %f0, %f1
66-
; CHECK-NEXT: lghi %r0, 0
67-
; CHECK-NEXT: jl .LBB2_2
72+
; CHECK-NEXT: jnl .LBB2_2
6873
; CHECK-NEXT: # %bb.1:
69-
; CHECK-NEXT: sxbr %f0, %f1
70-
; CHECK-NEXT: llihh %r0, 32768
74+
; CHECK-NEXT: lghi %r0, 0
75+
; CHECK-NEXT: lzxr %f1
76+
; CHECK-NEXT: j .LBB2_3
7177
; CHECK-NEXT: .LBB2_2:
78+
; CHECK-NEXT: llihh %r0, 32768
79+
; CHECK-NEXT: .LBB2_3:
80+
; CHECK-NEXT: sxbr %f0, %f1
7281
; CHECK-NEXT: cgxbr %r2, 5, %f0
7382
; CHECK-NEXT: xgr %r2, %r0
7483
; CHECK-NEXT: br %r14

llvm/test/CodeGen/X86/fp-cvt.ll

Lines changed: 32 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -444,28 +444,29 @@ define i64 @fptoui_i64_fp80(x86_fp80 %a0) nounwind {
444444
; X86-NEXT: subl $16, %esp
445445
; X86-NEXT: fldt 8(%ebp)
446446
; X86-NEXT: flds {{\.LCPI.*}}
447-
; X86-NEXT: fld %st(1)
448-
; X86-NEXT: fsub %st(1), %st
449-
; X86-NEXT: fxch %st(1)
450-
; X86-NEXT: fucomp %st(2)
447+
; X86-NEXT: fucom %st(1)
451448
; X86-NEXT: fnstsw %ax
449+
; X86-NEXT: xorl %edx, %edx
452450
; X86-NEXT: # kill: def $ah killed $ah killed $ax
453451
; X86-NEXT: sahf
452+
; X86-NEXT: setbe %al
453+
; X86-NEXT: fldz
454454
; X86-NEXT: ja .LBB10_2
455455
; X86-NEXT: # %bb.1:
456-
; X86-NEXT: fstp %st(1)
456+
; X86-NEXT: fstp %st(0)
457457
; X86-NEXT: fldz
458+
; X86-NEXT: fxch %st(1)
458459
; X86-NEXT: .LBB10_2:
459-
; X86-NEXT: fstp %st(0)
460-
; X86-NEXT: setbe %al
460+
; X86-NEXT: fstp %st(1)
461+
; X86-NEXT: fsubrp %st, %st(1)
461462
; X86-NEXT: fnstcw {{[0-9]+}}(%esp)
462463
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
463464
; X86-NEXT: orl $3072, %ecx # imm = 0xC00
464465
; X86-NEXT: movw %cx, {{[0-9]+}}(%esp)
465466
; X86-NEXT: fldcw {{[0-9]+}}(%esp)
466467
; X86-NEXT: fistpll {{[0-9]+}}(%esp)
467468
; X86-NEXT: fldcw {{[0-9]+}}(%esp)
468-
; X86-NEXT: movzbl %al, %edx
469+
; X86-NEXT: movb %al, %dl
469470
; X86-NEXT: shll $31, %edx
470471
; X86-NEXT: xorl {{[0-9]+}}(%esp), %edx
471472
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -477,14 +478,14 @@ define i64 @fptoui_i64_fp80(x86_fp80 %a0) nounwind {
477478
; X64-X87: # %bb.0:
478479
; X64-X87-NEXT: fldt {{[0-9]+}}(%rsp)
479480
; X64-X87-NEXT: flds {{.*}}(%rip)
480-
; X64-X87-NEXT: fld %st(1)
481-
; X64-X87-NEXT: fsub %st(1), %st
482481
; X64-X87-NEXT: xorl %eax, %eax
482+
; X64-X87-NEXT: fucomi %st(1), %st
483+
; X64-X87-NEXT: setbe %al
484+
; X64-X87-NEXT: fldz
483485
; X64-X87-NEXT: fxch %st(1)
484-
; X64-X87-NEXT: fucompi %st(2), %st
485486
; X64-X87-NEXT: fcmovnbe %st(1), %st
486487
; X64-X87-NEXT: fstp %st(1)
487-
; X64-X87-NEXT: setbe %al
488+
; X64-X87-NEXT: fsubrp %st, %st(1)
488489
; X64-X87-NEXT: fnstcw -{{[0-9]+}}(%rsp)
489490
; X64-X87-NEXT: movzwl -{{[0-9]+}}(%rsp), %ecx
490491
; X64-X87-NEXT: orl $3072, %ecx # imm = 0xC00
@@ -500,13 +501,13 @@ define i64 @fptoui_i64_fp80(x86_fp80 %a0) nounwind {
500501
; X64-SSSE3: # %bb.0:
501502
; X64-SSSE3-NEXT: fldt {{[0-9]+}}(%rsp)
502503
; X64-SSSE3-NEXT: flds {{.*}}(%rip)
503-
; X64-SSSE3-NEXT: fld %st(1)
504-
; X64-SSSE3-NEXT: fsub %st(1), %st
505504
; X64-SSSE3-NEXT: xorl %eax, %eax
505+
; X64-SSSE3-NEXT: fucomi %st(1), %st
506+
; X64-SSSE3-NEXT: fldz
506507
; X64-SSSE3-NEXT: fxch %st(1)
507-
; X64-SSSE3-NEXT: fucompi %st(2), %st
508508
; X64-SSSE3-NEXT: fcmovnbe %st(1), %st
509509
; X64-SSSE3-NEXT: fstp %st(1)
510+
; X64-SSSE3-NEXT: fsubrp %st, %st(1)
510511
; X64-SSSE3-NEXT: fisttpll -{{[0-9]+}}(%rsp)
511512
; X64-SSSE3-NEXT: setbe %al
512513
; X64-SSSE3-NEXT: shlq $63, %rax
@@ -526,28 +527,29 @@ define i64 @fptoui_i64_fp80_ld(x86_fp80 *%a0) nounwind {
526527
; X86-NEXT: movl 8(%ebp), %eax
527528
; X86-NEXT: fldt (%eax)
528529
; X86-NEXT: flds {{\.LCPI.*}}
529-
; X86-NEXT: fld %st(1)
530-
; X86-NEXT: fsub %st(1), %st
531-
; X86-NEXT: fxch %st(1)
532-
; X86-NEXT: fucomp %st(2)
530+
; X86-NEXT: fucom %st(1)
533531
; X86-NEXT: fnstsw %ax
532+
; X86-NEXT: xorl %edx, %edx
534533
; X86-NEXT: # kill: def $ah killed $ah killed $ax
535534
; X86-NEXT: sahf
535+
; X86-NEXT: setbe %al
536+
; X86-NEXT: fldz
536537
; X86-NEXT: ja .LBB11_2
537538
; X86-NEXT: # %bb.1:
538-
; X86-NEXT: fstp %st(1)
539+
; X86-NEXT: fstp %st(0)
539540
; X86-NEXT: fldz
541+
; X86-NEXT: fxch %st(1)
540542
; X86-NEXT: .LBB11_2:
541-
; X86-NEXT: fstp %st(0)
542-
; X86-NEXT: setbe %al
543+
; X86-NEXT: fstp %st(1)
544+
; X86-NEXT: fsubrp %st, %st(1)
543545
; X86-NEXT: fnstcw {{[0-9]+}}(%esp)
544546
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
545547
; X86-NEXT: orl $3072, %ecx # imm = 0xC00
546548
; X86-NEXT: movw %cx, {{[0-9]+}}(%esp)
547549
; X86-NEXT: fldcw {{[0-9]+}}(%esp)
548550
; X86-NEXT: fistpll {{[0-9]+}}(%esp)
549551
; X86-NEXT: fldcw {{[0-9]+}}(%esp)
550-
; X86-NEXT: movzbl %al, %edx
552+
; X86-NEXT: movb %al, %dl
551553
; X86-NEXT: shll $31, %edx
552554
; X86-NEXT: xorl {{[0-9]+}}(%esp), %edx
553555
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
@@ -559,14 +561,14 @@ define i64 @fptoui_i64_fp80_ld(x86_fp80 *%a0) nounwind {
559561
; X64-X87: # %bb.0:
560562
; X64-X87-NEXT: fldt (%rdi)
561563
; X64-X87-NEXT: flds {{.*}}(%rip)
562-
; X64-X87-NEXT: fld %st(1)
563-
; X64-X87-NEXT: fsub %st(1), %st
564564
; X64-X87-NEXT: xorl %eax, %eax
565+
; X64-X87-NEXT: fucomi %st(1), %st
566+
; X64-X87-NEXT: setbe %al
567+
; X64-X87-NEXT: fldz
565568
; X64-X87-NEXT: fxch %st(1)
566-
; X64-X87-NEXT: fucompi %st(2), %st
567569
; X64-X87-NEXT: fcmovnbe %st(1), %st
568570
; X64-X87-NEXT: fstp %st(1)
569-
; X64-X87-NEXT: setbe %al
571+
; X64-X87-NEXT: fsubrp %st, %st(1)
570572
; X64-X87-NEXT: fnstcw -{{[0-9]+}}(%rsp)
571573
; X64-X87-NEXT: movzwl -{{[0-9]+}}(%rsp), %ecx
572574
; X64-X87-NEXT: orl $3072, %ecx # imm = 0xC00
@@ -582,13 +584,13 @@ define i64 @fptoui_i64_fp80_ld(x86_fp80 *%a0) nounwind {
582584
; X64-SSSE3: # %bb.0:
583585
; X64-SSSE3-NEXT: fldt (%rdi)
584586
; X64-SSSE3-NEXT: flds {{.*}}(%rip)
585-
; X64-SSSE3-NEXT: fld %st(1)
586-
; X64-SSSE3-NEXT: fsub %st(1), %st
587587
; X64-SSSE3-NEXT: xorl %eax, %eax
588+
; X64-SSSE3-NEXT: fucomi %st(1), %st
589+
; X64-SSSE3-NEXT: fldz
588590
; X64-SSSE3-NEXT: fxch %st(1)
589-
; X64-SSSE3-NEXT: fucompi %st(2), %st
590591
; X64-SSSE3-NEXT: fcmovnbe %st(1), %st
591592
; X64-SSSE3-NEXT: fstp %st(1)
593+
; X64-SSSE3-NEXT: fsubrp %st, %st(1)
592594
; X64-SSSE3-NEXT: fisttpll -{{[0-9]+}}(%rsp)
593595
; X64-SSSE3-NEXT: setbe %al
594596
; X64-SSSE3-NEXT: shlq $63, %rax

0 commit comments

Comments
 (0)