-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DAGCombiner] Allow freeze to sink through fmul by adding it to AllowMultipleMaybePoisonOperands #142250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-selectiondag Author: Harrison Hao (harrisonGPU) ChangesAvoid blocking FMA formation on freeze(fmul x, y). Enable combining patterns like:
This improves precision and performance in common numerical code. Closes: #141622 Full diff: https://github.com/llvm/llvm-project/pull/142250.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index be2209a2f8faf..ea5b44da9e48b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16729,6 +16729,28 @@ SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
}
}
+ // fold (fadd (freeze (fmul x, y)), z) -> (fma x, y, z).
+ if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+ N0.getOpcode() == ISD::FREEZE) {
+ SDValue FrozenMul = N0.getOperand(0);
+ if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+ SDValue X = FrozenMul.getOperand(0);
+ SDValue Y = FrozenMul.getOperand(1);
+ return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, N1);
+ }
+ }
+
+ // fold (fadd x, (freeze (fmul y, z))) -> (fma y, z, x)
+ if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+ N1.getOpcode() == ISD::FREEZE) {
+ SDValue FrozenMul = N1.getOperand(0);
+ if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+ SDValue X = FrozenMul.getOperand(0);
+ SDValue Y = FrozenMul.getOperand(1);
+ return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, N0);
+ }
+ }
+
// More folding opportunities when target permits.
if (Aggressive) {
// fold (fadd (fma x, y, (fpext (fmul u, v))), z)
@@ -17006,6 +17028,30 @@ SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
}
}
+ // fold (fsub (freeze (fmul x, y)), z) -> (fma x, y, (fneg z))
+ if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+ N0.getOpcode() == ISD::FREEZE) {
+ SDValue FrozenMul = N0.getOperand(0);
+ if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+ SDValue X = FrozenMul.getOperand(0);
+ SDValue Y = FrozenMul.getOperand(1);
+ SDValue NegZ = matcher.getNode(ISD::FNEG, SL, VT, N1);
+ return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y, NegZ);
+ }
+ }
+
+ // fold (fsub z, (freeze(fmul x, y))) -> (fma (fneg x), y, z)
+ if ((Options.UnsafeFPMath || N->getFlags().hasAllowContract()) &&
+ N1.getOpcode() == ISD::FREEZE) {
+ SDValue FrozenMul = N1.getOperand(0);
+ if (matcher.match(FrozenMul, ISD::FMUL) && isContractableFMUL(FrozenMul)) {
+ SDValue X = FrozenMul.getOperand(0);
+ SDValue Y = FrozenMul.getOperand(1);
+ SDValue NegX = matcher.getNode(ISD::FNEG, SL, VT, X);
+ return matcher.getNode(PreferredFusedOpcode, SL, VT, NegX, Y, N0);
+ }
+ }
+
auto isReassociable = [&Options](SDNode *N) {
return Options.UnsafeFPMath || N->getFlags().hasAllowReassociation();
};
diff --git a/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll b/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll
new file mode 100644
index 0000000000000..75fe67e743c03
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fold-freeze-fmul-to-fma.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s -check-prefix GFX11
+
+define float @fma_from_freeze_mul_add_left(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_add_left:
+; GFX11: ; %bb.0: ; %bb
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_fma_f32 v0, v0, v1, 1.0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+bb:
+ %mul = fmul contract float %x, %y
+ %mul.fr = freeze float %mul
+ %add = fadd contract float %mul.fr, 1.000000e+00
+ ret float %add
+}
+
+define float @fma_from_freeze_mul_add_right(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_add_right:
+; GFX11: ; %bb.0: ; %bb
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_fma_f32 v0, v0, v1, 1.0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+bb:
+ %mul = fmul contract float %x, %y
+ %mul.fr = freeze float %mul
+ %add = fadd contract float 1.000000e+00, %mul.fr
+ ret float %add
+}
+
+define float @fma_from_freeze_mul_sub_left(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_sub_left:
+; GFX11: ; %bb.0: ; %bb
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_fma_f32 v0, v0, v1, -1.0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+bb:
+ %mul = fmul contract float %x, %y
+ %mul.fr = freeze float %mul
+ %sub = fsub contract float %mul.fr, 1.000000e+00
+ ret float %sub
+}
+
+define float @fma_from_freeze_mul_sub_right(float %x, float %y) {
+; GFX11-LABEL: fma_from_freeze_mul_sub_right:
+; GFX11: ; %bb.0: ; %bb
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_fma_f32 v0, -v0, v1, 1.0
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+bb:
+ %mul = fmul contract float %x, %y
+ %mul.fr = freeze float %mul
+ %sub = fsub contract float 1.000000e+00, %mul.fr
+ ret float %sub
+}
|
By the looks of it SelectionDAG::canCreateUndefOrPoison doesn't handle fp ops at all - should we start with fixing those to more closely match ValueTracking llvm::canCreateUndefOrPoison? |
Thanks for the pointer. After rereading the LLVM Language Reference Manual I agree that The LangRef notes for every FP op include:
What do you think of this? References |
SelectionDAG::canCreateUndefOrPoison checks for hasPoisonGeneratingFlags - which seems to cover nonans/noinfs - but we'd definitely need to have test coverage for this. |
Thanks a lot! I’ll add a test case with nnan / ninf to verify that canCreateUndefOrPoison correctly returns true when required. |
Hi, I want to implement support for folding freeze(fmul) + fadd/fsub into FMA when using the |
DAGCombiner::visitFREEZE has a list of "AllowMultipleMaybePoisonOperands" opcodes where it will try to force the freeze through the op and onto all its operandss - I've never been sure of the requirements for this list (or why we don't perform it for all non-poison-generating ops) - but if its permissible to add FMUL to it then that might be a much simpler solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct as implemented, because you're losing the freeze. I think @RKSimon's suggestion is the correct way to approach this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The flag preservation matches the middle-end behavior (https://llvm.godbolt.org/z/a97Yr1Eh5). This assumes that rewrite-based FMF is transparent to freeze.
Thanks ! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/17180 Here is the relevant piece of the build log for the reference
|
…MultipleMaybePoisonOperands (llvm#142250) Allow freeze to sink through fmul by treating it as a non-poison-generating op when operands are not poison. Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG combine push freeze through fmul. This helps expose patterns like `fmul+fadd` for `FMA` fusion. When rebuilding the node, we drop flags like nnan/ninf/nsz that imply poison, but keep contract, reassoc, afn, and arcp. Closes: llvm#141622
…MultipleMaybePoisonOperands (llvm#142250) Allow freeze to sink through fmul by treating it as a non-poison-generating op when operands are not poison. Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG combine push freeze through fmul. This helps expose patterns like `fmul+fadd` for `FMA` fusion. When rebuilding the node, we drop flags like nnan/ninf/nsz that imply poison, but keep contract, reassoc, afn, and arcp. Closes: llvm#141622
…MultipleMaybePoisonOperands (llvm#142250) Allow freeze to sink through fmul by treating it as a non-poison-generating op when operands are not poison. Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG combine push freeze through fmul. This helps expose patterns like `fmul+fadd` for `FMA` fusion. When rebuilding the node, we drop flags like nnan/ninf/nsz that imply poison, but keep contract, reassoc, afn, and arcp. Closes: llvm#141622
…MultipleMaybePoisonOperands (llvm#142250) Allow freeze to sink through fmul by treating it as a non-poison-generating op when operands are not poison. Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG combine push freeze through fmul. This helps expose patterns like `fmul+fadd` for `FMA` fusion. When rebuilding the node, we drop flags like nnan/ninf/nsz that imply poison, but keep contract, reassoc, afn, and arcp. Closes: llvm#141622
Allow freeze to sink through fmul by treating it as a non-poison-generating op
when operands are not poison.
Adding
ISD::FMUL
toAllowMultipleMaybePoisonOperands
lets DAG combinepush freeze through fmul. This helps expose patterns like
fmul+fadd
forFMA
fusion.When rebuilding the node, we drop flags like nnan/ninf/nsz that imply poison,
but keep contract, reassoc, afn, and arcp.
Closes: #141622