[AArch64] Remove redundant FDIV Combine. #91924

sdesmalen-arm · 2024-05-13T07:08:50Z

The target combine is no longer required because InstCombine will
transform the DIV by a power of 2 into a multiply, so in practice
this case will never trigger.

Additionally, the generated code would have been incorrect for
streaming(-compatible) functions, because it assumed NEON was available.

llvmbot · 2024-05-13T07:09:24Z

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

NEON fixed-point SCVTF instruction is not available in Streaming-SVE
mode. There is no equivalent SVE instruction, so we simply expand to a
regular FDIV operation.

Full diff: https://github.com/llvm/llvm-project/pull/91924.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll (+18)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7344387ffe552..a76dc7b140b07 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -17912,7 +17912,7 @@ static SDValue performFpToIntCombine(SDNode *N, SelectionDAG &DAG,
 static SDValue performFDivCombine(SDNode *N, SelectionDAG &DAG,
                                   TargetLowering::DAGCombinerInfo &DCI,
                                   const AArch64Subtarget *Subtarget) {
-  if (!Subtarget->hasNEON())
+  if (!Subtarget->isNeonAvailable())
     return SDValue();
 
   SDValue Op = N->getOperand(0);
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
index c2d6ed4e9ccf9..c20e8521375a9 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
@@ -214,6 +214,24 @@ define <2 x float> @fdiv_v2f32(<2 x float> %op1, <2 x float> %op2) {
   ret <2 x float> %res
 }
 
+; Test that we don't optimise this using a NEON instruction, when
+; NEON is not available.
+define <2 x float> @fdiv_v232_pow2(<2 x i32> %in) {
+; CHECK-LABEL: fdiv_v232_pow2:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ptrue p0.s, vl2
+; CHECK-NEXT:    // kill: def $d0 killed $d0 def $z0
+; CHECK-NEXT:    fmov z1.s, #16.00000000
+; CHECK-NEXT:    scvtf z0.s, p0/m, z0.s
+; CHECK-NEXT:    fdiv z0.s, p0/m, z0.s, z1.s
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $z0
+; CHECK-NEXT:    ret
+entry:
+  %vcvt.i = sitofp <2 x i32> %in to <2 x float>
+  %div.i = fdiv <2 x float> %vcvt.i, <float 16.0, float 16.0>
+  ret <2 x float> %div.i
+}
+
 define <4 x float> @fdiv_v4f32(<4 x float> %op1, <4 x float> %op2) {
 ; CHECK-LABEL: fdiv_v4f32:
 ; CHECK:       // %bb.0:

efriedma-quic

Can we just delete this transform? It's specifically checking for a divide by a power of two... but instcombine always transforms that to a multiply, so this will basically never trigger.

The target combine is no longer required because InstCombine will transform the DIV by a power of 2 into a multiply, so in practice this case will never trigger. Additionally, the generated code would have been incorrect for streaming(-compatible) functions, because it assumed NEON was available.

sdesmalen-arm · 2024-05-13T10:20:33Z

Can we just delete this transform? It's specifically checking for a divide by a power of two... but instcombine always transforms that to a multiply, so this will basically never trigger.

Good point! I've repurposed the PR to instead remove the combine.

paulwalker-arm

I cannot see aarch64_neon_vcvtfxs2fp used anywhere else so I guess this means we're missing the equivalent fmul combine, which seems unfortunate but not your fault.

…x), 0.5) -> scvtf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value.

…tf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value. fix formating

…tf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value.

…tf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value. [AArch64] Fix llvm#94909: Optimize vector fmul(sitofp(x), 0.5) -> scvtf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value.

…tf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value. [AArch64] Fix llvm#94909: Optimize vector fmul(sitofp(x), 0.5) -> scvtf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value. revert orig try

…tf(x, 2) This commit reintroduces the optimization in InstCombine that was previously removed due to limited applicability. See: llvm#91924 This update targets `fmul(sitofp(x), C)` where `C` is a constant reciprocal of a power of two. For both scalar and vector inputs, if we have `sitofp(X) * C` (where `C` is `1/2^N`), this can be optimized to `scvtf(X, 2^N)`. This eliminates the floating-point multiply by directly converting the integer to a scaled floating-point value.

sdesmalen-arm requested review from aemerson, hassnaaHamdi and paulwalker-arm May 13, 2024 07:08

llvmbot added the backend:AArch64 label May 13, 2024

efriedma-quic reviewed May 13, 2024

View reviewed changes

sdesmalen-arm force-pushed the sme-fix-neon-scvtf-in-streaming-mode branch from 9d4307a to f744db3 Compare May 13, 2024 10:19

sdesmalen-arm changed the title ~~[AArch64] Avoid NEON fixed-point SCVTF in Streaming-SVE mode.~~ [AArch64] Remove redundant FDIV Combine. May 13, 2024

paulwalker-arm approved these changes May 13, 2024

View reviewed changes

sdesmalen-arm merged commit 3d6f18d into llvm:main May 14, 2024
4 checks passed

efriedma-quic mentioned this pull request Jun 9, 2024

[AArch64] Missed optimization converting vector fmul(sitofp(x), 0.5) -> scvtf(x, 2) #94909

Open

jph-13 mentioned this pull request May 26, 2025

[AArch64] Fix #94909: Optimize vector fmul(sitofp(x), 0.5) -> scvtf(x, 2) #141480

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Remove redundant FDIV Combine. #91924

[AArch64] Remove redundant FDIV Combine. #91924

Uh oh!

sdesmalen-arm commented May 13, 2024 •

edited

Loading

Uh oh!

llvmbot commented May 13, 2024

Uh oh!

efriedma-quic left a comment

Uh oh!

sdesmalen-arm commented May 13, 2024

Uh oh!

paulwalker-arm left a comment

Uh oh!

Uh oh!

Uh oh!

[AArch64] Remove redundant FDIV Combine. #91924

[AArch64] Remove redundant FDIV Combine. #91924

Uh oh!

Conversation

sdesmalen-arm commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 13, 2024

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm commented May 13, 2024

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sdesmalen-arm commented May 13, 2024 •

edited

Loading