Skip to content

Commit 924907b

Browse files
authored
[DAG] Prefer 0.0 over -0.0 as neutral value for FADD w/NoSignedZero (#106616)
When getting a neutral value, we can prefer using a positive zero over a negative zero if nsz is set on the FADD (or reduction). A positive zero should be cheaper to materialize on basically all targets. Arguably, we should be doing this kind of canonicalization in DAGCombine, but we don't do that for any of the other reduction variants, so this seems like path of least resistance. This does mean that we can only do this for "fast" reductions. Just nsz isn't enough, as that goes through the SEQ_FADD path where the IR level start value isn't folded away. If folks think this is to RISCV specific, let me know. There's a trivial RISCV specific implementation. I went with the generic one as I through this might benefit other targets.
1 parent a3816b5 commit 924907b

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13267,7 +13267,9 @@ SDValue SelectionDAG::getNeutralElement(unsigned Opcode, const SDLoc &DL,
1326713267
case ISD::SMIN:
1326813268
return getConstant(APInt::getSignedMaxValue(VT.getSizeInBits()), DL, VT);
1326913269
case ISD::FADD:
13270-
return getConstantFP(-0.0, DL, VT);
13270+
// If flags allow, prefer positive zero single it's generally cheaper
13271+
// to materialize on most targets.
13272+
return getConstantFP(Flags.hasNoSignedZeros() ? 0.0 : -0.0, DL, VT);
1327113273
case ISD::FMUL:
1327213274
return getConstantFP(1.0, DL, VT);
1327313275
case ISD::FMINNUM:

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -524,8 +524,7 @@ define float @vreduce_fadd_v7f32_neutralstart_fast(ptr %x) {
524524
; CHECK: # %bb.0:
525525
; CHECK-NEXT: vsetivli zero, 7, e32, m2, ta, ma
526526
; CHECK-NEXT: vle32.v v8, (a0)
527-
; CHECK-NEXT: lui a0, 524288
528-
; CHECK-NEXT: vmv.s.x v10, a0
527+
; CHECK-NEXT: vmv.s.x v10, zero
529528
; CHECK-NEXT: vfredusum.vs v8, v8, v10
530529
; CHECK-NEXT: vfmv.f.s fa0, v8
531530
; CHECK-NEXT: ret

0 commit comments

Comments
 (0)