Skip to content

Commit b7052fa

Browse files
committed
[DAGCombiner] Do not fold fadd (fmul x, y), (fmul x, y) -> fma x, y, (fmul x, y)
Differential Revision: https://reviews.llvm.org/D151890
1 parent 5952664 commit b7052fa

File tree

2 files changed

+8
-3
lines changed

2 files changed

+8
-3
lines changed

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15233,6 +15233,13 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
1523315233
if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())
1523415234
return SDValue();
1523515235

15236+
// Folding fadd (fmul x, y), (fmul x, y) -> fma x, y, (fmul x, y) is never
15237+
// beneficial. It does not reduce latency. It increases register pressure. It
15238+
// replaces an fadd with an fma which is a more complex instruction, so is
15239+
// likely to have a larger encoding, use more functional units, etc.
15240+
if (N0 == N1)
15241+
return SDValue();
15242+
1523615243
if (TLI.generateFMAsInMachineCombiner(VT, OptLevel))
1523715244
return SDValue();
1523815245

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,9 +277,7 @@ define amdgpu_ps float @fma_vs_output_modifier(float %x, i32 %n) #0 {
277277
define amdgpu_ps float @fma_vs_output_modifier_2(float %x) #0 {
278278
; GCN-LABEL: fma_vs_output_modifier_2:
279279
; GCN: ; %bb.0:
280-
; GCN-NEXT: v_mul_f32_e32 v1, v0, v0
281-
; GCN-NEXT: v_fmac_f32_e32 v1, v0, v0
282-
; GCN-NEXT: v_mov_b32_e32 v0, v1
280+
; GCN-NEXT: v_mul_f32_e64 v0, v0, v0 mul:2
283281
; GCN-NEXT: ; return to shader part epilog
284282
%m = fmul contract float %x, %x
285283
%a = fadd nsz contract float %m, %m

0 commit comments

Comments
 (0)