-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[ARM] Prefer MUL to MULS on some implementations #112540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.
@llvm/pr-subscribers-backend-arm Author: None (VladiKrapp-Arm) ChangesMULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS. Full diff: https://github.com/llvm/llvm-project/pull/112540.diff 4 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMFeatures.td b/llvm/lib/Target/ARM/ARMFeatures.td
index 3a2188adbec33b..bb437698296ce8 100644
--- a/llvm/lib/Target/ARM/ARMFeatures.td
+++ b/llvm/lib/Target/ARM/ARMFeatures.td
@@ -398,6 +398,13 @@ def FeatureAvoidPartialCPSR : SubtargetFeature<"avoid-partial-cpsr",
"AvoidCPSRPartialUpdate", "true",
"Avoid CPSR partial update for OOO execution">;
+/// FeatureAvoidMULS - If true, codegen would avoid using the MULS instruction,
+/// prefering the thumb2 MUL which doesn't set flags.
+def FeatureAvoidMULS : SubtargetFeature<"avoid-muls",
+ "AvoidMULS", "true",
+ "Avoid MULS instructions for M class cores">;
+
+
/// Disable +1 predication cost for instructions updating CPSR.
/// Enabled for Cortex-A57.
/// True if disable +1 predication cost for instructions updating CPSR. Enabled for Cortex-A57.
diff --git a/llvm/lib/Target/ARM/ARMProcessors.td b/llvm/lib/Target/ARM/ARMProcessors.td
index 08f62d12f4a9f1..b94a5fc1614697 100644
--- a/llvm/lib/Target/ARM/ARMProcessors.td
+++ b/llvm/lib/Target/ARM/ARMProcessors.td
@@ -360,6 +360,7 @@ def : ProcessorModel<"cortex-m33", CortexM4Model, [ARMv8mMainline,
FeatureHasSlowFPVFMx,
FeatureUseMISched,
FeatureHasNoBranchPredictor,
+ FeatureAvoidMULS,
FeatureFixCMSE_CVE_2021_35465]>;
def : ProcessorModel<"star-mc1", CortexM4Model, [ARMv8mMainline,
diff --git a/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp b/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
index f572af98600738..f4a9915a78b99d 100644
--- a/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
+++ b/llvm/lib/Target/ARM/Thumb2SizeReduction.cpp
@@ -755,6 +755,9 @@ Thumb2SizeReduce::ReduceTo2Addr(MachineBasicBlock &MBB, MachineInstr *MI,
Register Reg1 = MI->getOperand(1).getReg();
// t2MUL is "special". The tied source operand is second, not first.
if (MI->getOpcode() == ARM::t2MUL) {
+ // MULS can be slower than MUL
+ if (!MinimizeSize && STI->avoidMULS())
+ return false;
Register Reg2 = MI->getOperand(2).getReg();
// Early exit if the regs aren't all low regs.
if (!isARMLowRegister(Reg0) || !isARMLowRegister(Reg1)
diff --git a/llvm/test/CodeGen/Thumb2/avoidmuls.mir b/llvm/test/CodeGen/Thumb2/avoidmuls.mir
index 8d5567482d5cd7..09b7e62bee04e3 100644
--- a/llvm/test/CodeGen/Thumb2/avoidmuls.mir
+++ b/llvm/test/CodeGen/Thumb2/avoidmuls.mir
@@ -63,5 +63,5 @@ body: |
...
# CHECK-LABEL: test
-# CHECK: tMUL
-# CHECK-NOT: t2MUL
+# CHECK: t2MUL
+# CHECK-NOT: tMUL
|
Thanks. LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too. Thanks!
If the mul instruction is in the IT block, there won't be MULS. Is this useful as a condition to help judging shrink or not? |
Hello. I believe that it is only the M33 where muls is slower than the mul. The Cortex-M52 for example lists them as having the same latency, and the T1 variable can potentially dual-issue (I believe), if it is in the right slot. If you do have other CPUs where this is helpful the feature could be added there too. I'm not sure I fully understand the first sentence. I believe that for mul in IT blocks we will not attempt to do the shrinking from T2->T1, so we don't have to prevent it on CPUs where the muls is slower than mul. |
Thanks @davemgreen |
PR #112540 as the reference.
PR llvm#112540 as the reference.
MULS adversely affects performance on many implementations. Where this is the case, we prefer not to shrink MUL to MULS.