Skip to content

[ARM] Prefer MUL to MULS on some implementations #112540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

VladiKrapp-Arm
Copy link
Contributor

MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.

MULS adversely affects performance on many implementations.
Where this is the case, we prefer not to shrink MUL to MULS.
@llvmbot
Copy link
Member

llvmbot commented Oct 16, 2024

@llvm/pr-subscribers-backend-arm

Author: None (VladiKrapp-Arm)

Changes

MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.


Full diff: https://github.com/llvm/llvm-project/pull/112540.diff

4 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMFeatures.td (+7)
  • (modified) llvm/lib/Target/ARM/ARMProcessors.td (+1)
  • (modified) llvm/lib/Target/ARM/Thumb2SizeReduction.cpp (+3)
  • (modified) llvm/test/CodeGen/Thumb2/avoidmuls.mir (+2-2)
diff --git a/llvm/lib/Target/ARM/ARMFeatures.td b/llvm/lib/Target/ARM/ARMFeatures.td
index 3a2188adbec33b..bb437698296ce8 100644
--- a/llvm/lib/Target/ARM/ARMFeatures.td
+++ b/llvm/lib/Target/ARM/ARMFeatures.td
@@ -398,6 +398,13 @@ def FeatureAvoidPartialCPSR : SubtargetFeature<"avoid-partial-cpsr",
                                                "AvoidCPSRPartialUpdate", "true",
                                  "Avoid CPSR partial update for OOO execution">;
 
+/// FeatureAvoidMULS - If true, codegen would avoid using the MULS instruction,
+/// prefering the thumb2 MUL which doesn't set flags.
+def FeatureAvoidMULS : SubtargetFeature<"avoid-muls",
+                                        "AvoidMULS", "true",
+                                 "Avoid MULS instructions for M class cores">;
+
+
 /// Disable +1 predication cost for instructions updating CPSR.
 /// Enabled for Cortex-A57.
 /// True if disable +1 predication cost for instructions updating CPSR. Enabled for Cortex-A57.
diff --git a/llvm/lib/Target/ARM/ARMProcessors.td b/llvm/lib/Target/ARM/ARMProcessors.td
index 08f62d12f4a9f1..b94a5fc1614697 100644
--- a/llvm/lib/Target/ARM/ARMProcessors.td
+++ b/llvm/lib/Target/ARM/ARMProcessors.td
@@ -360,6 +360,7 @@ def : ProcessorModel<"cortex-m33", CortexM4Model,       [ARMv8mMainline,
                                                          FeatureHasSlowFPVFMx,
                                                          FeatureUseMISched,
                                                          FeatureHasNoBranchPredictor,
+                                                         FeatureAvoidMULS,
                                                          FeatureFixCMSE_CVE_2021_35465]>;
 
 def : ProcessorModel<"star-mc1", CortexM4Model,         [ARMv8mMainline,
diff --git a/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp b/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
index f572af98600738..f4a9915a78b99d 100644
--- a/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
+++ b/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
@@ -755,6 +755,9 @@ Thumb2SizeReduce::ReduceTo2Addr(MachineBasicBlock &MBB, MachineInstr *MI,
   Register Reg1 = MI->getOperand(1).getReg();
   // t2MUL is "special". The tied source operand is second, not first.
   if (MI->getOpcode() == ARM::t2MUL) {
+    // MULS can be slower than MUL
+    if (!MinimizeSize && STI->avoidMULS())
+      return false;
     Register Reg2 = MI->getOperand(2).getReg();
     // Early exit if the regs aren't all low regs.
     if (!isARMLowRegister(Reg0) || !isARMLowRegister(Reg1)
diff --git a/llvm/test/CodeGen/Thumb2/avoidmuls.mir b/llvm/test/CodeGen/Thumb2/avoidmuls.mir
index 8d5567482d5cd7..09b7e62bee04e3 100644
--- a/llvm/test/CodeGen/Thumb2/avoidmuls.mir
+++ b/llvm/test/CodeGen/Thumb2/avoidmuls.mir
@@ -63,5 +63,5 @@ body:             |
 
 ...
 # CHECK-LABEL: test
-# CHECK: tMUL
-# CHECK-NOT: t2MUL
+# CHECK: t2MUL
+# CHECK-NOT: tMUL

@davemgreen
Copy link
Collaborator

Thanks. LGTM

@davemgreen davemgreen requested a review from stuij October 16, 2024 15:31
Copy link
Member

@stuij stuij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too. Thanks!

@stuij stuij merged commit ea796e5 into llvm:main Oct 17, 2024
8 checks passed
@AlbertHuang-CPU
Copy link
Contributor

If the mul instruction is in the IT block, there won't be MULS. Is this useful as a condition to help judging shrink or not?
Why only m33 added with this feature? Do we need to add it to the ProcessorModel of other other Armv8-M CPUs?

@davemgreen
Copy link
Collaborator

Hello. I believe that it is only the M33 where muls is slower than the mul. The Cortex-M52 for example lists them as having the same latency, and the T1 variable can potentially dual-issue (I believe), if it is in the right slot. If you do have other CPUs where this is helpful the feature could be added there too.

I'm not sure I fully understand the first sentence. I believe that for mul in IT blocks we will not attempt to do the shrinking from T2->T1, so we don't have to prevent it on CPUs where the muls is slower than mul.

@AlbertHuang-CPU
Copy link
Contributor

Thanks @davemgreen
As checked/discussed offline, in some scenario mul is as slower as muls on M33, and STAR-MC1 is the same as M33 in terms of these behavior.
So I will add the feature also.

davemgreen pushed a commit that referenced this pull request Feb 5, 2025
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants