[CodeGen][AArch64] Sink splat operands of FMul instructions #116222

hazzlim · 2024-11-14T13:05:21Z

Sink shuffle operands of FMul instructions if these are splats, as we can generate lane-indexed variants for these.

In the `fmul` test in CodeGen/AArch64/sinksplat.ll both operands to the fmul are loop-invariant and so the operation is hoisted out of the loop, meaning we fail to test what we are trying to test (sinking of splats). Fix this by making one of the operands not loop-invariant.

Sink shuffle operands of FMul instructions if these are splats, as we can generate lane-indexed variants for these.

llvmbot · 2024-11-14T13:05:54Z

@llvm/pr-subscribers-backend-aarch64

Author: Hari Limaye (hazzlim)

Changes

Sink shuffle operands of FMul instructions if these are splats, as we can generate lane-indexed variants for these.

Full diff: https://github.com/llvm/llvm-project/pull/116222.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+8)
(modified) llvm/test/CodeGen/AArch64/sinksplat.ll (+15-11)

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index a97b0d3b1db92a..eb3b9609f697ca 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5239,6 +5239,14 @@ bool AArch64TTIImpl::isProfitableToSinkOperands(
     // Is it profitable to sink if we found two of the same type of extends.
     return !Ops.empty() && (NumSExts == 2 || NumZExts == 2);
   }
+  case Instruction::FMul: {
+    // Sink splats for index lane variants
+    if (isSplatShuffle(I->getOperand(0)))
+      Ops.push_back(&I->getOperandUse(0));
+    if (isSplatShuffle(I->getOperand(1)))
+      Ops.push_back(&I->getOperandUse(1));
+    return !Ops.empty();
+  }
   default:
     return false;
   }
diff --git a/llvm/test/CodeGen/AArch64/sinksplat.ll b/llvm/test/CodeGen/AArch64/sinksplat.ll
index d156ec079ae941..c94d5bf2a208ff 100644
--- a/llvm/test/CodeGen/AArch64/sinksplat.ll
+++ b/llvm/test/CodeGen/AArch64/sinksplat.ll
@@ -230,29 +230,34 @@ l2:
   ret <4 x i32> %c
 }
 
-define <4 x float> @fmul(<4 x float> %x, ptr %y) {
+define <4 x float> @fmul(ptr %x, ptr %y) {
 ; CHECK-LABEL: fmul:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    mov v1.16b, v0.16b
-; CHECK-NEXT:    ldr q2, [x0]
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
-; CHECK-NEXT:    mov w8, #1 // =0x1
-; CHECK-NEXT:    fmul v1.4s, v2.4s, v1.s[3]
+; CHECK-NEXT:    ldr s1, [x0]
+; CHECK-NEXT:    mov x8, xzr
 ; CHECK-NEXT:  .LBB7_1: // %l1
 ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    fadd v0.4s, v1.4s, v0.4s
-; CHECK-NEXT:    subs w8, w8, #1
+; CHECK-NEXT:    ldr q2, [x1, x8]
+; CHECK-NEXT:    add x8, x8, #16
+; CHECK-NEXT:    cmp w8, #16
+; CHECK-NEXT:    fmul v2.4s, v2.4s, v1.s[0]
+; CHECK-NEXT:    fadd v0.4s, v2.4s, v0.4s
 ; CHECK-NEXT:    b.eq .LBB7_1
 ; CHECK-NEXT:  // %bb.2: // %l2
 ; CHECK-NEXT:    ret
 entry:
-  %a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
+  %x.val = load float, ptr %x
+  %x.ins = insertelement <4 x float> poison, float %x.val, i64 0
+  %a = shufflevector <4 x float> %x.ins, <4 x float> undef, <4 x i32> zeroinitializer
   br label %l1
 
 l1:
   %p = phi i32 [ 0, %entry ], [ %pa, %l1 ]
   %q = phi <4 x float> [ zeroinitializer, %entry ], [ %c, %l1 ]
-  %l = load <4 x float>, ptr %y
+  %idx.y = mul nuw nsw i32 %p, 4
+  %ptr.y = getelementptr float, ptr %y, i32 %idx.y
+  %l = load <4 x float>, ptr %ptr.y
   %b = fmul <4 x float> %l, %a
   %c = fadd <4 x float> %b, %q
   %pa = add i32 %p, 1
@@ -270,10 +275,9 @@ define <4 x float> @fmuladd(<4 x float> %x, ptr %y) {
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
 ; CHECK-NEXT:    ldr q2, [x0]
 ; CHECK-NEXT:    mov w8, #1 // =0x1
-; CHECK-NEXT:    dup v1.4s, v1.s[3]
 ; CHECK-NEXT:  .LBB8_1: // %l1
 ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    fmla v0.4s, v1.4s, v2.4s
+; CHECK-NEXT:    fmla v0.4s, v2.4s, v1.s[3]
 ; CHECK-NEXT:    subs w8, w8, #1
 ; CHECK-NEXT:    b.eq .LBB8_1
 ; CHECK-NEXT:  // %bb.2: // %l2

hazzlim · 2024-11-14T13:14:53Z

The first commit here updates one of the tests, as currently the fmul is hoisted out of the loop anyway so we don't test sinking the splat. I changed it to a splat(insertelement(scalar)) so that it doesn't get matched by one of the MachineCombiner patterns in:

llvm-project/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

Line 6789 in 5c3befb

static bool getFMULPatterns(MachineInstr &Root,

Also to note - it looks like we don't have a MachineCombiner pattern for FMLA(DUPLANE) -> FMLA_indexed (we only have the FMUL pattern).

hazzlim · 2024-11-14T17:45:09Z

Updated to avoid sinking for scalable vector types (thanks to @georges-arm for pointing this out!)

- Add tests for half element types, and only sink operands when subtargtet has fullfp16 - Refactor scalable test to use target-features attribute, rather than -mattr on the RUN line

davemgreen

Looks sensible to me. LGTM

hazzlim added 2 commits November 14, 2024 13:00

[CodeGen][AArch64] Sink splat operands of FMul instructions

3686f68

Sink shuffle operands of FMul instructions if these are splats, as we can generate lane-indexed variants for these.

llvmbot added the backend:AArch64 label Nov 14, 2024

hazzlim requested review from jrbyrnes, davemgreen and paulwalker-arm November 14, 2024 13:15

hazzlim added 2 commits November 14, 2024 16:07

Add scalable test

6b9c2e2

Don't sink splat operands when VT is scalable

4ab2719

Improve handling of half element types

698c06b

- Add tests for half element types, and only sink operands when subtargtet has fullfp16 - Refactor scalable test to use target-features attribute, rather than -mattr on the RUN line

hazzlim force-pushed the sink-shuffle-fmul branch from 894b4c5 to 698c06b Compare November 15, 2024 09:32

davemgreen approved these changes Nov 18, 2024

View reviewed changes

hazzlim merged commit 4f0403f into llvm:main Nov 19, 2024
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CodeGen][AArch64] Sink splat operands of FMul instructions #116222

[CodeGen][AArch64] Sink splat operands of FMul instructions #116222

Uh oh!

hazzlim commented Nov 14, 2024

Uh oh!

llvmbot commented Nov 14, 2024

Uh oh!

hazzlim commented Nov 14, 2024

Uh oh!

hazzlim commented Nov 14, 2024

Uh oh!

davemgreen left a comment

Uh oh!

Uh oh!

Uh oh!

[CodeGen][AArch64] Sink splat operands of FMul instructions #116222

[CodeGen][AArch64] Sink splat operands of FMul instructions #116222

Uh oh!

Conversation

hazzlim commented Nov 14, 2024

Uh oh!

llvmbot commented Nov 14, 2024

Uh oh!

hazzlim commented Nov 14, 2024

Uh oh!

hazzlim commented Nov 14, 2024

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!