Skip to content

[DAGCombiner] Allow freeze to sink through fmul by adding it to AllowMultipleMaybePoisonOperands #142250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 8, 2025

Conversation

harrisonGPU
Copy link
Contributor

@harrisonGPU harrisonGPU commented May 31, 2025

Allow freeze to sink through fmul by treating it as a non-poison-generating op
when operands are not poison.

Adding ISD::FMUL to AllowMultipleMaybePoisonOperands lets DAG combine
push freeze through fmul. This helps expose patterns like fmul+fadd for FMA fusion.

When rebuilding the node, we drop flags like nnan/ninf/nsz that imply poison,
but keep contract, reassoc, afn, and arcp.

Closes: #141622

@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-selectiondag

Author: Harrison Hao (harrisonGPU)

Changes

Avoid blocking FMA formation on freeze(fmul x, y). Enable combining patterns like:

  • freeze(x * y) + z → fma(x, y, z)

  • z + freeze(x * y) → fma(x, y, z)

  • freeze(x * y) - z → fma(x, y, -z)

  • z - freeze(x * y) → fma(-x, y, z)

This improves precision and performance in common numerical code.

Closes: #141622


Full diff: https://github.com/llvm/llvm-project/pull/142250.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+46)
  • (added) llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll (+54)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index be2209a2f8faf..ea5b44da9e48b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16729,6 +16729,28 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
     }
   }
 
+  // fold (fadd (freeze (fmul x, y)), z) -> (fma x, y, z).
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N0.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N0.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, N1);
+    }
+  }
+
+  // fold (fadd x, (freeze (fmul y, z))) -> (fma y, z, x)
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N1.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N1.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, N0);
+    }
+  }
+
   // More folding opportunities when target permits.
   if (Aggressive) {
     // fold (fadd (fma x, y, (fpext (fmul u, v))), z)
@@ -17006,6 +17028,30 @@ SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
     }
   }
 
+  // fold (fsub (freeze (fmul x, y)), z) -> (fma x, y, (fneg z))
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N0.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N0.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      SDValue NegZ = matcher.getNode(ISD::FNEG, SL, VT, N1);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, NegZ);
+    }
+  }
+
+  // fold (fsub z, (freeze(fmul x, y))) -> (fma (fneg x), y, z)
+  if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+      N1.getOpcode() == ISD::FREEZE) {
+    SDValue FrozenMul = N1.getOperand(0);
+    if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+      SDValue X = FrozenMul.getOperand(0);
+      SDValue Y = FrozenMul.getOperand(1);
+      SDValue NegX = matcher.getNode(ISD::FNEG, SL, VT, X);
+      return matcher.getNode(PreferredFusedOpcode, SL, VT, NegX, Y, N0);
+    }
+  }
+
   auto isReassociable = [&Options](SDNode *N) {
     return Options.UnsafeFPMath || N->getFlags().hasAllowReassociation();
   };
diff --git a/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll b/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll
new file mode 100644
index 0000000000000..75fe67e743c03
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s -check-prefix GFX11
+
+define float @fma_from_freeze_mul_add_left(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_add_left:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, v0, v1, 1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %add = fadd contract float %mul.fr, 1.000000e+00
+  ret float %add
+}
+
+define float @fma_from_freeze_mul_add_right(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_add_right:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, v0, v1, 1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %add = fadd contract float 1.000000e+00, %mul.fr
+  ret float %add
+}
+
+define float @fma_from_freeze_mul_sub_left(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_sub_left:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, v0, v1, -1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %sub = fsub contract float %mul.fr, 1.000000e+00
+  ret float %sub
+}
+
+define float @fma_from_freeze_mul_sub_right(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_sub_right:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_fma_f32 v0, -v0, v1, 1.0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+bb:
+  %mul = fmul contract float %x, %y
+  %mul.fr = freeze float %mul
+  %sub = fsub contract float 1.000000e+00, %mul.fr
+  ret float %sub
+}

@RKSimon
Copy link
Collaborator

RKSimon commented May 31, 2025

By the looks of it SelectionDAG::canCreateUndefOrPoison doesn't handle fp ops at all - should we start with fixing those to more closely match ValueTracking llvm::canCreateUndefOrPoison?

@harrisonGPU
Copy link
Contributor Author

By the looks of it SelectionDAG::canCreateUndefOrPoison doesn't handle fp ops at all - should we start with fixing those to more closely match ValueTracking llvm::canCreateUndefOrPoison?

Thanks for the pointer. After rereading the LLVM Language Reference Manual I agree that
SelectionDAG::canCreateUndefOrPoison should return false for plain FP ops (fadd, fmul, …):
by default they propagate poison but never create it.
The only time they can generate new poison is when the instruction carries the nnan or ninf
(or the umbrella fast) fast-math flags, because those flags require a NaN/Inf result to become poison.

The LangRef notes for every FP op include:

“This instruction can also take any number of
fast-math flags, which are optimization hints to enable otherwise
unsafe floating-point optimizations.”

What do you think of this?

References

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 1, 2025

SelectionDAG::canCreateUndefOrPoison checks for hasPoisonGeneratingFlags - which seems to cover nonans/noinfs - but we'd definitely need to have test coverage for this.

@harrisonGPU
Copy link
Contributor Author

SelectionDAG::canCreateUndefOrPoison checks for hasPoisonGeneratingFlags - which seems to cover nonans/noinfs - but we'd definitely need to have test coverage for this.

Thanks a lot! I’ll add a test case with nnan / ninf to verify that canCreateUndefOrPoison correctly returns true when required.
I'm also preparing a new PR with lit tests ensuring that SelectionDAG::canCreateUndefOrPoison returns false for plain FP ops without poison-generating flags.

@harrisonGPU harrisonGPU changed the title [DAGCombiner] Fold freeze(fmul) + fadd/fsub into FMA combine [DAGCombiner] Fold freeze(fmul) + fadd/fsub into FMA when 'contract' is present Jun 4, 2025
@harrisonGPU harrisonGPU changed the title [DAGCombiner] Fold freeze(fmul) + fadd/fsub into FMA when 'contract' is present [DAGCombiner] Allow freeze(fmul) + fadd/fsub → FMA when using contract Jun 4, 2025
@harrisonGPU harrisonGPU requested a review from nikic June 4, 2025 04:59
@harrisonGPU
Copy link
Contributor Author

harrisonGPU commented Jun 4, 2025

Hi, I want to implement support for folding freeze(fmul) + fadd/fsub into FMA when using the contract flag. This is important for graphics GPU compilers because we set nnan by default. Could you please give me some suggestions? I have already updated the PR and lit tests. @jayfoad @nikic @RKSimon @arsenm :-)

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 4, 2025

DAGCombiner::visitFREEZE has a list of "AllowMultipleMaybePoisonOperands" opcodes where it will try to force the freeze through the op and onto all its operandss - I've never been sure of the requirements for this list (or why we don't perform it for all non-poison-generating ops) - but if its permissible to add FMUL to it then that might be a much simpler solution.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct as implemented, because you're losing the freeze. I think @RKSimon's suggestion is the correct way to approach this.

@harrisonGPU
Copy link
Contributor Author

harrisonGPU commented Jun 4, 2025

Thank you for your suggestions ! @RKSimon @nikic , I really appreciate it, I will try to implement!

@harrisonGPU harrisonGPU changed the title [DAGCombiner] Allow freeze(fmul) + fadd/fsub → FMA when using contract [DAGCombiner] Allow freeze to sink through fmul by adding it to AllowMultipleMaybePoisonOperands Jun 7, 2025
@harrisonGPU
Copy link
Contributor Author

harrisonGPU commented Jun 7, 2025

Hi @RKSimon @nikic @jayfoad @arsenm , I've added FMUL to AllowMultipleMaybePoisonOperands.
When you have time, could you please take a look and let me know if you have any suggestions?

@harrisonGPU harrisonGPU requested a review from nikic June 7, 2025 10:44
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The flag preservation matches the middle-end behavior (https://llvm.godbolt.org/z/a97Yr1Eh5). This assumes that rewrite-based FMF is transparent to freeze.

@harrisonGPU
Copy link
Contributor Author

Thanks !

@harrisonGPU harrisonGPU merged commit 102dfa8 into llvm:main Jun 8, 2025
7 checks passed
@harrisonGPU harrisonGPU deleted the dag-fma branch June 8, 2025 11:39
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 8, 2025

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/17180

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py (1202 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py (1203 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/runInTerminal/TestDAP_runInTerminal.py (1204 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/save-core/TestDAP_save_core.py (1205 of 3238)
PASS: lldb-api :: tools/lldb-dap/progress/TestDAP_Progress.py (1206 of 3238)
PASS: lldb-api :: tools/lldb-dap/send-event/TestDAP_sendEvent.py (1207 of 3238)
PASS: lldb-api :: tools/lldb-dap/source/TestDAP_source.py (1208 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/stackTrace-x86/TestDAP_source_x86.py (1209 of 3238)
PASS: lldb-api :: tools/lldb-dap/restart/TestDAP_restart.py (1210 of 3238)
UNSUPPORTED: lldb-api :: tools/lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.py (1211 of 3238)
FAIL: lldb-api :: tools/lldb-dap/server/TestDAP_server.py (1212 of 3238)
******************** TEST 'lldb-api :: tools/lldb-dap/server/TestDAP_server.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/server -p TestDAP_server.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 102dfa8a487a1c44a95a225825039d282be712a0)
  clang revision 102dfa8a487a1c44a95a225825039d282be712a0
  llvm revision 102dfa8a487a1c44a95a225825039d282be712a0
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_server_interrupt (TestDAP_server.TestDAP_server)
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_server_port (TestDAP_server.TestDAP_server)
DAP session (client_1) error: Bad file descriptor
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
========= DEBUG ADAPTER PROTOCOL LOGS =========
no log file available
========= END =========
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_server_unix_socket (TestDAP_server.TestDAP_server)
======================================================================
FAIL: test_server_interrupt (TestDAP_server.TestDAP_server)

omkar-mohanty pushed a commit to omkar-mohanty/llvm-project that referenced this pull request Jun 8, 2025
…MultipleMaybePoisonOperands (llvm#142250)

Allow freeze to sink through fmul by treating it as a
non-poison-generating op
when operands are not poison.

Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG
combine
push freeze through fmul. This helps expose patterns like `fmul+fadd`
for `FMA` fusion.

When rebuilding the node, we drop flags like nnan/ninf/nsz that imply
poison,
but keep contract, reassoc, afn, and arcp.


Closes: llvm#141622
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
…MultipleMaybePoisonOperands (llvm#142250)

Allow freeze to sink through fmul by treating it as a
non-poison-generating op
when operands are not poison.

Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG
combine
push freeze through fmul. This helps expose patterns like `fmul+fadd`
for `FMA` fusion.

When rebuilding the node, we drop flags like nnan/ninf/nsz that imply
poison,
but keep contract, reassoc, afn, and arcp.


Closes: llvm#141622
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
…MultipleMaybePoisonOperands (llvm#142250)

Allow freeze to sink through fmul by treating it as a
non-poison-generating op
when operands are not poison.

Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG
combine
push freeze through fmul. This helps expose patterns like `fmul+fadd`
for `FMA` fusion.

When rebuilding the node, we drop flags like nnan/ninf/nsz that imply
poison,
but keep contract, reassoc, afn, and arcp.


Closes: llvm#141622
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…MultipleMaybePoisonOperands (llvm#142250)

Allow freeze to sink through fmul by treating it as a
non-poison-generating op
when operands are not poison.

Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG
combine
push freeze through fmul. This helps expose patterns like `fmul+fadd`
for `FMA` fusion.

When rebuilding the node, we drop flags like nnan/ninf/nsz that imply
poison,
but keep contract, reassoc, afn, and arcp.


Closes: llvm#141622
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AMDGPU] InstCombine moving freeze instructions breaks FMA formation
5 participants