-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Update the scheduling model for Cortex-X1/2/3/4 #118826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These Neoverse-V scheduling models more closely match the Cortex-X series cpus with 4 vector pipelines, even if they do not match exactly.
@llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesThese Neoverse-V scheduling models more closely match the Cortex-X series cpus with 4 vector pipelines, even if they do not match exactly. Full diff: https://github.com/llvm/llvm-project/pull/118826.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 6886df5392565d..af9554085cacde 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -1113,15 +1113,15 @@ def : ProcessorModel<"cortex-r82", CortexA55Model, ProcessorFeatures.R82,
[TuneR82]>;
def : ProcessorModel<"cortex-r82ae", CortexA55Model, ProcessorFeatures.R82AE,
[TuneR82AE]>;
-def : ProcessorModel<"cortex-x1", CortexA57Model, ProcessorFeatures.X1,
+def : ProcessorModel<"cortex-x1", NeoverseV1Model, ProcessorFeatures.X1,
[TuneX1]>;
-def : ProcessorModel<"cortex-x1c", CortexA57Model, ProcessorFeatures.X1C,
+def : ProcessorModel<"cortex-x1c", NeoverseV1Model, ProcessorFeatures.X1C,
[TuneX1]>;
-def : ProcessorModel<"cortex-x2", NeoverseN2Model, ProcessorFeatures.X2,
+def : ProcessorModel<"cortex-x2", NeoverseV2Model, ProcessorFeatures.X2,
[TuneX2]>;
-def : ProcessorModel<"cortex-x3", NeoverseN2Model, ProcessorFeatures.X3,
+def : ProcessorModel<"cortex-x3", NeoverseV2Model, ProcessorFeatures.X3,
[TuneX3]>;
-def : ProcessorModel<"cortex-x4", NeoverseN2Model, ProcessorFeatures.X4,
+def : ProcessorModel<"cortex-x4", NeoverseV2Model, ProcessorFeatures.X4,
[TuneX4]>;
def : ProcessorModel<"cortex-x925", NeoverseV2Model, ProcessorFeatures.X925,
[TuneX925]>;
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X1-neon-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X1-neon-instructions.s
new file mode 100644
index 00000000000000..dc1bb486aeef7d
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X1-neon-instructions.s
@@ -0,0 +1,45 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x1 -instruction-tables < %s | FileCheck %s
+
+# Check the Neoverse V1 model is used.
+
+add v0.16b, v1.16b, v31.16b
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 1 2 0.25 add v0.16b, v1.16b, v31.16b
+
+# CHECK: Resources:
+# CHECK-NEXT: [0.0] - V1UnitB
+# CHECK-NEXT: [0.1] - V1UnitB
+# CHECK-NEXT: [1.0] - V1UnitD
+# CHECK-NEXT: [1.1] - V1UnitD
+# CHECK-NEXT: [2.0] - V1UnitFlg
+# CHECK-NEXT: [2.1] - V1UnitFlg
+# CHECK-NEXT: [2.2] - V1UnitFlg
+# CHECK-NEXT: [3] - V1UnitL2
+# CHECK-NEXT: [4.0] - V1UnitL01
+# CHECK-NEXT: [4.1] - V1UnitL01
+# CHECK-NEXT: [5] - V1UnitM0
+# CHECK-NEXT: [6] - V1UnitM1
+# CHECK-NEXT: [7.0] - V1UnitS
+# CHECK-NEXT: [7.1] - V1UnitS
+# CHECK-NEXT: [8] - V1UnitV0
+# CHECK-NEXT: [9] - V1UnitV1
+# CHECK-NEXT: [10] - V1UnitV2
+# CHECK-NEXT: [11] - V1UnitV3
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7.0] [7.1] [8] [9] [10] [11]
+# CHECK-NEXT: - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7.0] [7.1] [8] [9] [10] [11] Instructions:
+# CHECK-NEXT: - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25 add v0.16b, v1.16b, v31.16b
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s
index 2912ea35f1ee88..6497860ecfbacb 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s
@@ -1,7 +1,7 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x2 -instruction-tables < %s | FileCheck %s
-# Check the Neoverse N2 model is used.
+# Check the Neoverse V2 model is used.
addhnb z0.b, z1.h, z31.h
@@ -14,27 +14,34 @@ addhnb z0.b, z1.h, z31.h
# CHECK-NEXT: [6]: HasSideEffects (U)
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
-# CHECK-NEXT: 1 2 0.50 addhnb z0.b, z1.h, z31.h
+# CHECK-NEXT: 1 2 0.25 addhnb z0.b, z1.h, z31.h
# CHECK: Resources:
-# CHECK-NEXT: [0.0] - N2UnitB
-# CHECK-NEXT: [0.1] - N2UnitB
-# CHECK-NEXT: [1.0] - N2UnitD
-# CHECK-NEXT: [1.1] - N2UnitD
-# CHECK-NEXT: [2] - N2UnitL2
-# CHECK-NEXT: [3.0] - N2UnitL01
-# CHECK-NEXT: [3.1] - N2UnitL01
-# CHECK-NEXT: [4] - N2UnitM0
-# CHECK-NEXT: [5] - N2UnitM1
-# CHECK-NEXT: [6.0] - N2UnitS
-# CHECK-NEXT: [6.1] - N2UnitS
-# CHECK-NEXT: [7] - N2UnitV0
-# CHECK-NEXT: [8] - N2UnitV1
+# CHECK-NEXT: [0.0] - V2UnitB
+# CHECK-NEXT: [0.1] - V2UnitB
+# CHECK-NEXT: [1.0] - V2UnitD
+# CHECK-NEXT: [1.1] - V2UnitD
+# CHECK-NEXT: [2.0] - V2UnitFlg
+# CHECK-NEXT: [2.1] - V2UnitFlg
+# CHECK-NEXT: [2.2] - V2UnitFlg
+# CHECK-NEXT: [3] - V2UnitL2
+# CHECK-NEXT: [4.0] - V2UnitL01
+# CHECK-NEXT: [4.1] - V2UnitL01
+# CHECK-NEXT: [5] - V2UnitM0
+# CHECK-NEXT: [6] - V2UnitM1
+# CHECK-NEXT: [7] - V2UnitS0
+# CHECK-NEXT: [8] - V2UnitS1
+# CHECK-NEXT: [9] - V2UnitS2
+# CHECK-NEXT: [10] - V2UnitS3
+# CHECK-NEXT: [11] - V2UnitV0
+# CHECK-NEXT: [12] - V2UnitV1
+# CHECK-NEXT: [13] - V2UnitV2
+# CHECK-NEXT: [14] - V2UnitV3
# CHECK: Resource pressure per iteration:
-# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6.0] [6.1] [7] [8]
-# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
+# CHECK-NEXT: - - - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25
# CHECK: Resource pressure by instruction:
-# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6.0] [6.1] [7] [8] Instructions:
-# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 addhnb z0.b, z1.h, z31.h
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Instructions:
+# CHECK-NEXT: - - - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25 addhnb z0.b, z1.h, z31.h
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X3-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X3-sve-instructions.s
new file mode 100644
index 00000000000000..042e621f9a03d6
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X3-sve-instructions.s
@@ -0,0 +1,47 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x3 -instruction-tables < %s | FileCheck %s
+
+# Check the Neoverse V2 model is used.
+
+addhnb z0.b, z1.h, z31.h
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 1 2 0.25 addhnb z0.b, z1.h, z31.h
+
+# CHECK: Resources:
+# CHECK-NEXT: [0.0] - V2UnitB
+# CHECK-NEXT: [0.1] - V2UnitB
+# CHECK-NEXT: [1.0] - V2UnitD
+# CHECK-NEXT: [1.1] - V2UnitD
+# CHECK-NEXT: [2.0] - V2UnitFlg
+# CHECK-NEXT: [2.1] - V2UnitFlg
+# CHECK-NEXT: [2.2] - V2UnitFlg
+# CHECK-NEXT: [3] - V2UnitL2
+# CHECK-NEXT: [4.0] - V2UnitL01
+# CHECK-NEXT: [4.1] - V2UnitL01
+# CHECK-NEXT: [5] - V2UnitM0
+# CHECK-NEXT: [6] - V2UnitM1
+# CHECK-NEXT: [7] - V2UnitS0
+# CHECK-NEXT: [8] - V2UnitS1
+# CHECK-NEXT: [9] - V2UnitS2
+# CHECK-NEXT: [10] - V2UnitS3
+# CHECK-NEXT: [11] - V2UnitV0
+# CHECK-NEXT: [12] - V2UnitV1
+# CHECK-NEXT: [13] - V2UnitV2
+# CHECK-NEXT: [14] - V2UnitV3
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
+# CHECK-NEXT: - - - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Instructions:
+# CHECK-NEXT: - - - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25 addhnb z0.b, z1.h, z31.h
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X4-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X4-sve-instructions.s
new file mode 100644
index 00000000000000..19fba62ea30c6b
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X4-sve-instructions.s
@@ -0,0 +1,47 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x4 -instruction-tables < %s | FileCheck %s
+
+# Check the Neoverse V2 model is used.
+
+addhnb z0.b, z1.h, z31.h
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 1 2 0.25 addhnb z0.b, z1.h, z31.h
+
+# CHECK: Resources:
+# CHECK-NEXT: [0.0] - V2UnitB
+# CHECK-NEXT: [0.1] - V2UnitB
+# CHECK-NEXT: [1.0] - V2UnitD
+# CHECK-NEXT: [1.1] - V2UnitD
+# CHECK-NEXT: [2.0] - V2UnitFlg
+# CHECK-NEXT: [2.1] - V2UnitFlg
+# CHECK-NEXT: [2.2] - V2UnitFlg
+# CHECK-NEXT: [3] - V2UnitL2
+# CHECK-NEXT: [4.0] - V2UnitL01
+# CHECK-NEXT: [4.1] - V2UnitL01
+# CHECK-NEXT: [5] - V2UnitM0
+# CHECK-NEXT: [6] - V2UnitM1
+# CHECK-NEXT: [7] - V2UnitS0
+# CHECK-NEXT: [8] - V2UnitS1
+# CHECK-NEXT: [9] - V2UnitS2
+# CHECK-NEXT: [10] - V2UnitS3
+# CHECK-NEXT: [11] - V2UnitV0
+# CHECK-NEXT: [12] - V2UnitV1
+# CHECK-NEXT: [13] - V2UnitV2
+# CHECK-NEXT: [14] - V2UnitV3
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
+# CHECK-NEXT: - - - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2.0] [2.1] [2.2] [3] [4.0] [4.1] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Instructions:
+# CHECK-NEXT: - - - - - - - - - - - - - - - - 0.25 0.25 0.25 0.25 addhnb z0.b, z1.h, z31.h
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice. How much extra performance (roughly) does this provide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cheers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, makes sense, thanks.
Thanks
Scheduling on OoO machines doesn't usually give a lot of performance, when taken as an aggregate. The OoO machines can do a lot of scheduling themselves so long as we get it close enough, and my tests it was not very much. This is closer in terms of the correct number of pipelines though, which would help if we start to use them more. |
These Neoverse-V scheduling models more closely match the Cortex-X series cpus with 4 vector pipelines, even if they do not match exactly.