-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLVM][ARM][CodeGen]Define branch instruction alignment for m85 and m7 #109647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Branch instruction alignments were not defined for cortex-m85 and cortex-m7 which misses an optimisation opportunity. With this patch we see a performance improvements as high as 5% on some benchmarks with most around 1%. Change-Id: I6c7c51f1e3fdff3b53b9c10f6ed87ec5b19fdcf3
@llvm/pr-subscribers-backend-arm Author: Nashe Mncube (nasherm) ChangesBranch instruction alignments were not defined for cortex-m85 and cortex-m7 which misses an optimisation opportunity. With this patch we see a performance improvements as high as 5% on some benchmarks with most around 1%. Full diff: https://github.com/llvm/llvm-project/pull/109647.diff 3 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMFeatures.td b/llvm/lib/Target/ARM/ARMFeatures.td
index 8b0ade54b46d3c..dc0e86c696f63a 100644
--- a/llvm/lib/Target/ARM/ARMFeatures.td
+++ b/llvm/lib/Target/ARM/ARMFeatures.td
@@ -375,6 +375,9 @@ def FeaturePref32BitThumb : SubtargetFeature<"32bit", "Prefers32BitThumb", "true
def FeaturePrefLoopAlign32 : SubtargetFeature<"loop-align", "PrefLoopLogAlignment","2",
"Prefer 32-bit alignment for loops">;
+def FeaturePrefLoopAlign64 : SubtargetFeature<"loop-align-64", "PrefLoopLogAlignment","3",
+ "Prefer 64-bit alignment for loops">;
+
def FeatureMVEVectorCostFactor1 : SubtargetFeature<"mve1beat", "MVEVectorCostFactor", "4",
"Model MVE instructions as a 1 beat per tick architecture">;
diff --git a/llvm/lib/Target/ARM/ARMProcessors.td b/llvm/lib/Target/ARM/ARMProcessors.td
index e4e122a0d1339b..a66a2c0b1981d8 100644
--- a/llvm/lib/Target/ARM/ARMProcessors.td
+++ b/llvm/lib/Target/ARM/ARMProcessors.td
@@ -344,6 +344,7 @@ def : ProcessorModel<"cortex-m4", CortexM4Model, [ARMv7em,
def : ProcessorModel<"cortex-m7", CortexM7Model, [ARMv7em,
ProcM7,
FeatureFPARMv8_D16,
+ FeaturePrefLoopAlign64,
FeatureUseMIPipeliner,
FeatureUseMISched]>;
@@ -385,6 +386,7 @@ def : ProcessorModel<"cortex-m85", CortexM85Model, [ARMv81mMainline,
FeatureDSP,
FeatureFPARMv8_D16,
FeaturePACBTI,
+ FeaturePrefLoopAlign64,
FeatureUseMISched,
HasMVEFloatOps]>;
diff --git a/llvm/test/CodeGen/ARM/preferred-function-alignment.ll b/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
index afe64a22c5e808..f3a227c4765eb8 100644
--- a/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
+++ b/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
@@ -1,14 +1,15 @@
-; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m85 < %s | FileCheck --check-prefixes=CHECK,ALIGN-16,ALIGN-CS-16 %s
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m85 < %s | FileCheck --check-prefixes=CHECK,ALIGN-64,ALIGN-CS-16 %s
; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m23 < %s | FileCheck --check-prefixes=CHECK,ALIGN-16,ALIGN-CS-16 %s
; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a5 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-32 %s
; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m33 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s
; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m55 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s
-
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m7 < %s | FileCheck --check-prefixes=CHECK,ALIGN-64,ALIGN-CS-16 %s
; CHECK-LABEL: test
; ALIGN-16: .p2align 1
; ALIGN-32: .p2align 2
+; ALIGN-64: .p2align 3
define void @test() {
ret void
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sounds OK. LGTM
(We should maybe be renaming FeaturePrefLoopAlign if it aligns both functions and loops).
Agreed. This should be quick. I'll open a separate PR to do this this week :) |
Branch instruction alignments were not defined for cortex-m85 and cortex-m7 which misses an optimisation opportunity. With this patch we see performance improvements as high as 5% on some benchmarks with most around 1%.