[ARM] Prefer MUL to MULS on some implementations #112540

VladiKrapp-Arm · 2024-10-16T13:17:57Z

MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.

llvmbot · 2024-10-16T13:18:35Z

@llvm/pr-subscribers-backend-arm

Author: None (VladiKrapp-Arm)

Changes

MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.

Full diff: https://github.com/llvm/llvm-project/pull/112540.diff

4 Files Affected:

(modified) llvm/lib/Target/ARM/ARMFeatures.td (+7)
(modified) llvm/lib/Target/ARM/ARMProcessors.td (+1)
(modified) llvm/lib/Target/ARM/Thumb2SizeReduction.cpp (+3)
(modified) llvm/test/CodeGen/Thumb2/avoidmuls.mir (+2-2)

diff --git a/llvm/lib/Target/ARM/ARMFeatures.td b/llvm/lib/Target/ARM/ARMFeatures.td
index 3a2188adbec33b..bb437698296ce8 100644
--- a/llvm/lib/Target/ARM/ARMFeatures.td
+++ b/llvm/lib/Target/ARM/ARMFeatures.td
@@ -398,6 +398,13 @@ def FeatureAvoidPartialCPSR : SubtargetFeature<"avoid-partial-cpsr",
                                                "AvoidCPSRPartialUpdate", "true",
                                  "Avoid CPSR partial update for OOO execution">;
 
+/// FeatureAvoidMULS - If true, codegen would avoid using the MULS instruction,
+/// prefering the thumb2 MUL which doesn't set flags.
+def FeatureAvoidMULS : SubtargetFeature<"avoid-muls",
+                                        "AvoidMULS", "true",
+                                 "Avoid MULS instructions for M class cores">;
+
+
 /// Disable +1 predication cost for instructions updating CPSR.
 /// Enabled for Cortex-A57.
 /// True if disable +1 predication cost for instructions updating CPSR. Enabled for Cortex-A57.
diff --git a/llvm/lib/Target/ARM/ARMProcessors.td b/llvm/lib/Target/ARM/ARMProcessors.td
index 08f62d12f4a9f1..b94a5fc1614697 100644
--- a/llvm/lib/Target/ARM/ARMProcessors.td
+++ b/llvm/lib/Target/ARM/ARMProcessors.td
@@ -360,6 +360,7 @@ def : ProcessorModel<"cortex-m33", CortexM4Model,       [ARMv8mMainline,
                                                          FeatureHasSlowFPVFMx,
                                                          FeatureUseMISched,
                                                          FeatureHasNoBranchPredictor,
+                                                         FeatureAvoidMULS,
                                                          FeatureFixCMSE_CVE_2021_35465]>;
 
 def : ProcessorModel<"star-mc1", CortexM4Model,         [ARMv8mMainline,
diff --git a/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp b/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
index f572af98600738..f4a9915a78b99d 100644
--- a/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
+++ b/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
@@ -755,6 +755,9 @@ Thumb2SizeReduce::ReduceTo2Addr(MachineBasicBlock &MBB, MachineInstr *MI,
   Register Reg1 = MI->getOperand(1).getReg();
   // t2MUL is "special". The tied source operand is second, not first.
   if (MI->getOpcode() == ARM::t2MUL) {
+    // MULS can be slower than MUL
+    if (!MinimizeSize && STI->avoidMULS())
+      return false;
     Register Reg2 = MI->getOperand(2).getReg();
     // Early exit if the regs aren't all low regs.
     if (!isARMLowRegister(Reg0) || !isARMLowRegister(Reg1)
diff --git a/llvm/test/CodeGen/Thumb2/avoidmuls.mir b/llvm/test/CodeGen/Thumb2/avoidmuls.mir
index 8d5567482d5cd7..09b7e62bee04e3 100644
--- a/llvm/test/CodeGen/Thumb2/avoidmuls.mir
+++ b/llvm/test/CodeGen/Thumb2/avoidmuls.mir
@@ -63,5 +63,5 @@ body:             |
 
 ...
 # CHECK-LABEL: test
-# CHECK: tMUL
-# CHECK-NOT: t2MUL
+# CHECK: t2MUL
+# CHECK-NOT: tMUL

llvm/test/CodeGen/Thumb2/avoidmuls.mir

davemgreen · 2024-10-16T15:29:28Z

Thanks. LGTM

stuij

LGTM too. Thanks!

AlbertHuang-CPU · 2024-10-23T10:14:25Z

If the mul instruction is in the IT block, there won't be MULS. Is this useful as a condition to help judging shrink or not?
Why only m33 added with this feature? Do we need to add it to the ProcessorModel of other other Armv8-M CPUs?

davemgreen · 2024-10-23T13:03:49Z

Hello. I believe that it is only the M33 where muls is slower than the mul. The Cortex-M52 for example lists them as having the same latency, and the T1 variable can potentially dual-issue (I believe), if it is in the right slot. If you do have other CPUs where this is helpful the feature could be added there too.

I'm not sure I fully understand the first sentence. I believe that for mul in IT blocks we will not attempt to do the shrinking from T2->T1, so we don't have to prevent it on CPUs where the muls is slower than mul.

AlbertHuang-CPU · 2025-01-08T06:58:28Z

Thanks @davemgreen
As checked/discussed offline, in some scenario mul is as slower as muls on M33, and STAR-MC1 is the same as M33 in terms of these behavior.
So I will add the feature also.

PR #112540 as the reference.

PR llvm#112540 as the reference.

[ARM] Prefer MUL to MULS on some implementations

064f507

MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.

llvmbot added the backend:ARM label Oct 16, 2024

davemgreen reviewed Oct 16, 2024

View reviewed changes

llvm/test/CodeGen/Thumb2/avoidmuls.mir Show resolved Hide resolved

fixup! [ARM] Prefer MUL to MULS on some implementations

475683b

davemgreen approved these changes Oct 16, 2024

View reviewed changes

davemgreen requested a review from stuij October 16, 2024 15:31

stuij approved these changes Oct 16, 2024

View reviewed changes

stuij merged commit ea796e5 into llvm:main Oct 17, 2024
8 checks passed

VladiKrapp-Arm mentioned this pull request Dec 6, 2024

Request Commit Access For VladiKrapp-Arm #118998

Closed

AlbertHuang-CPU mentioned this pull request Jan 21, 2025

Add "avoidmuls" to STAR-MC1 also #123706

Merged

davemgreen pushed a commit that referenced this pull request Feb 5, 2025

[ARM] Add "avoidmuls" to STAR-MC1 also (#123706)

409fa78

PR #112540 as the reference.

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025

[ARM] Add "avoidmuls" to STAR-MC1 also (llvm#123706)

b15f60a

PR llvm#112540 as the reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ARM] Prefer MUL to MULS on some implementations #112540

[ARM] Prefer MUL to MULS on some implementations #112540

Uh oh!

VladiKrapp-Arm commented Oct 16, 2024

Uh oh!

llvmbot commented Oct 16, 2024

Uh oh!

Uh oh!

davemgreen commented Oct 16, 2024

Uh oh!

stuij left a comment

Uh oh!

Uh oh!

AlbertHuang-CPU commented Oct 23, 2024

Uh oh!

davemgreen commented Oct 23, 2024

Uh oh!

AlbertHuang-CPU commented Jan 8, 2025

Uh oh!

Uh oh!

[ARM] Prefer MUL to MULS on some implementations #112540

[ARM] Prefer MUL to MULS on some implementations #112540

Uh oh!

Conversation

VladiKrapp-Arm commented Oct 16, 2024

Uh oh!

llvmbot commented Oct 16, 2024

Uh oh!

Uh oh!

davemgreen commented Oct 16, 2024

Uh oh!

stuij left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlbertHuang-CPU commented Oct 23, 2024

Uh oh!

davemgreen commented Oct 23, 2024

Uh oh!

AlbertHuang-CPU commented Jan 8, 2025

Uh oh!

Uh oh!