Skip to content

[AArch64] Update the scheduling model for Cortex-X1/2/3/4 #118826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 6, 2024

Conversation

davemgreen
Copy link
Collaborator

These Neoverse-V scheduling models more closely match the Cortex-X series cpus with 4 vector pipelines, even if they do not match exactly.

These Neoverse-V scheduling models more closely match the Cortex-X series cpus
with 4 vector pipelines, even if they do not match exactly.
@llvmbot
Copy link
Member

llvmbot commented Dec 5, 2024

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

These Neoverse-V scheduling models more closely match the Cortex-X series cpus with 4 vector pipelines, even if they do not match exactly.


Full diff: https://github.com/llvm/llvm-project/pull/118826.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+5-5)
  • (added) llvm/test/tools/llvm-mca/AArch64/Cortex/X1-neon-instructions.s (+45)
  • (modified) llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s (+26-19)
  • (added) llvm/test/tools/llvm-mca/AArch64/Cortex/X3-sve-instructions.s (+47)
  • (added) llvm/test/tools/llvm-mca/AArch64/Cortex/X4-sve-instructions.s (+47)
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 6886df5392565d..af9554085cacde 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -1113,15 +1113,15 @@ def : ProcessorModel<"cortex-r82", CortexA55Model, ProcessorFeatures.R82,
                      [TuneR82]>;
 def : ProcessorModel<"cortex-r82ae", CortexA55Model, ProcessorFeatures.R82AE,
                      [TuneR82AE]>;
-def : ProcessorModel<"cortex-x1", CortexA57Model, ProcessorFeatures.X1,
+def : ProcessorModel<"cortex-x1", NeoverseV1Model, ProcessorFeatures.X1,
                      [TuneX1]>;
-def : ProcessorModel<"cortex-x1c", CortexA57Model, ProcessorFeatures.X1C,
+def : ProcessorModel<"cortex-x1c", NeoverseV1Model, ProcessorFeatures.X1C,
                      [TuneX1]>;
-def : ProcessorModel<"cortex-x2", NeoverseN2Model, ProcessorFeatures.X2,
+def : ProcessorModel<"cortex-x2", NeoverseV2Model, ProcessorFeatures.X2,
                      [TuneX2]>;
-def : ProcessorModel<"cortex-x3", NeoverseN2Model, ProcessorFeatures.X3,
+def : ProcessorModel<"cortex-x3", NeoverseV2Model, ProcessorFeatures.X3,
                      [TuneX3]>;
-def : ProcessorModel<"cortex-x4", NeoverseN2Model, ProcessorFeatures.X4,
+def : ProcessorModel<"cortex-x4", NeoverseV2Model, ProcessorFeatures.X4,
                      [TuneX4]>;
 def : ProcessorModel<"cortex-x925", NeoverseV2Model, ProcessorFeatures.X925,
                      [TuneX925]>;
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X1-neon-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X1-neon-instructions.s
new file mode 100644
index 00000000000000..dc1bb486aeef7d
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X1-neon-instructions.s
@@ -0,0 +1,45 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x1 -instruction-tables < %s | FileCheck %s
+
+# Check the Neoverse V1 model is used.
+
+add	v0.16b, v1.16b, v31.16b
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      2     0.25                        add	v0.16b, v1.16b, v31.16b
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0.0] - V1UnitB
+# CHECK-NEXT: [0.1] - V1UnitB
+# CHECK-NEXT: [1.0] - V1UnitD
+# CHECK-NEXT: [1.1] - V1UnitD
+# CHECK-NEXT: [2.0] - V1UnitFlg
+# CHECK-NEXT: [2.1] - V1UnitFlg
+# CHECK-NEXT: [2.2] - V1UnitFlg
+# CHECK-NEXT: [3]   - V1UnitL2
+# CHECK-NEXT: [4.0] - V1UnitL01
+# CHECK-NEXT: [4.1] - V1UnitL01
+# CHECK-NEXT: [5]   - V1UnitM0
+# CHECK-NEXT: [6]   - V1UnitM1
+# CHECK-NEXT: [7.0] - V1UnitS
+# CHECK-NEXT: [7.1] - V1UnitS
+# CHECK-NEXT: [8]   - V1UnitV0
+# CHECK-NEXT: [9]   - V1UnitV1
+# CHECK-NEXT: [10]  - V1UnitV2
+# CHECK-NEXT: [11]  - V1UnitV3
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7.0]  [7.1]  [8]    [9]    [10]   [11]
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7.0]  [7.1]  [8]    [9]    [10]   [11]   Instructions:
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25   add	v0.16b, v1.16b, v31.16b
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s
index 2912ea35f1ee88..6497860ecfbacb 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X2-sve-instructions.s
@@ -1,7 +1,7 @@
 # NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
 # RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x2 -instruction-tables < %s | FileCheck %s
 
-# Check the Neoverse N2 model is used.
+# Check the Neoverse V2 model is used.
 
 addhnb	z0.b, z1.h, z31.h
 
@@ -14,27 +14,34 @@ addhnb	z0.b, z1.h, z31.h
 # CHECK-NEXT: [6]: HasSideEffects (U)
 
 # CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
-# CHECK-NEXT:  1      2     0.50                        addhnb	z0.b, z1.h, z31.h
+# CHECK-NEXT:  1      2     0.25                        addhnb	z0.b, z1.h, z31.h
 
 # CHECK:      Resources:
-# CHECK-NEXT: [0.0] - N2UnitB
-# CHECK-NEXT: [0.1] - N2UnitB
-# CHECK-NEXT: [1.0] - N2UnitD
-# CHECK-NEXT: [1.1] - N2UnitD
-# CHECK-NEXT: [2]   - N2UnitL2
-# CHECK-NEXT: [3.0] - N2UnitL01
-# CHECK-NEXT: [3.1] - N2UnitL01
-# CHECK-NEXT: [4]   - N2UnitM0
-# CHECK-NEXT: [5]   - N2UnitM1
-# CHECK-NEXT: [6.0] - N2UnitS
-# CHECK-NEXT: [6.1] - N2UnitS
-# CHECK-NEXT: [7]   - N2UnitV0
-# CHECK-NEXT: [8]   - N2UnitV1
+# CHECK-NEXT: [0.0] - V2UnitB
+# CHECK-NEXT: [0.1] - V2UnitB
+# CHECK-NEXT: [1.0] - V2UnitD
+# CHECK-NEXT: [1.1] - V2UnitD
+# CHECK-NEXT: [2.0] - V2UnitFlg
+# CHECK-NEXT: [2.1] - V2UnitFlg
+# CHECK-NEXT: [2.2] - V2UnitFlg
+# CHECK-NEXT: [3]   - V2UnitL2
+# CHECK-NEXT: [4.0] - V2UnitL01
+# CHECK-NEXT: [4.1] - V2UnitL01
+# CHECK-NEXT: [5]   - V2UnitM0
+# CHECK-NEXT: [6]   - V2UnitM1
+# CHECK-NEXT: [7]   - V2UnitS0
+# CHECK-NEXT: [8]   - V2UnitS1
+# CHECK-NEXT: [9]   - V2UnitS2
+# CHECK-NEXT: [10]  - V2UnitS3
+# CHECK-NEXT: [11]  - V2UnitV0
+# CHECK-NEXT: [12]  - V2UnitV1
+# CHECK-NEXT: [13]  - V2UnitV2
+# CHECK-NEXT: [14]  - V2UnitV3
 
 # CHECK:      Resource pressure per iteration:
-# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6.0]  [6.1]  [7]    [8]
-# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -     0.50   0.50
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   [14]
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25
 
 # CHECK:      Resource pressure by instruction:
-# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6.0]  [6.1]  [7]    [8]    Instructions:
-# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -     0.50   0.50   addhnb	z0.b, z1.h, z31.h
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   [14]   Instructions:
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25   addhnb	z0.b, z1.h, z31.h
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X3-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X3-sve-instructions.s
new file mode 100644
index 00000000000000..042e621f9a03d6
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X3-sve-instructions.s
@@ -0,0 +1,47 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x3 -instruction-tables < %s | FileCheck %s
+
+# Check the Neoverse V2 model is used.
+
+addhnb	z0.b, z1.h, z31.h
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      2     0.25                        addhnb	z0.b, z1.h, z31.h
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0.0] - V2UnitB
+# CHECK-NEXT: [0.1] - V2UnitB
+# CHECK-NEXT: [1.0] - V2UnitD
+# CHECK-NEXT: [1.1] - V2UnitD
+# CHECK-NEXT: [2.0] - V2UnitFlg
+# CHECK-NEXT: [2.1] - V2UnitFlg
+# CHECK-NEXT: [2.2] - V2UnitFlg
+# CHECK-NEXT: [3]   - V2UnitL2
+# CHECK-NEXT: [4.0] - V2UnitL01
+# CHECK-NEXT: [4.1] - V2UnitL01
+# CHECK-NEXT: [5]   - V2UnitM0
+# CHECK-NEXT: [6]   - V2UnitM1
+# CHECK-NEXT: [7]   - V2UnitS0
+# CHECK-NEXT: [8]   - V2UnitS1
+# CHECK-NEXT: [9]   - V2UnitS2
+# CHECK-NEXT: [10]  - V2UnitS3
+# CHECK-NEXT: [11]  - V2UnitV0
+# CHECK-NEXT: [12]  - V2UnitV1
+# CHECK-NEXT: [13]  - V2UnitV2
+# CHECK-NEXT: [14]  - V2UnitV3
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   [14]
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   [14]   Instructions:
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25   addhnb	z0.b, z1.h, z31.h
diff --git a/llvm/test/tools/llvm-mca/AArch64/Cortex/X4-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Cortex/X4-sve-instructions.s
new file mode 100644
index 00000000000000..19fba62ea30c6b
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AArch64/Cortex/X4-sve-instructions.s
@@ -0,0 +1,47 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-x4 -instruction-tables < %s | FileCheck %s
+
+# Check the Neoverse V2 model is used.
+
+addhnb	z0.b, z1.h, z31.h
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      2     0.25                        addhnb	z0.b, z1.h, z31.h
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0.0] - V2UnitB
+# CHECK-NEXT: [0.1] - V2UnitB
+# CHECK-NEXT: [1.0] - V2UnitD
+# CHECK-NEXT: [1.1] - V2UnitD
+# CHECK-NEXT: [2.0] - V2UnitFlg
+# CHECK-NEXT: [2.1] - V2UnitFlg
+# CHECK-NEXT: [2.2] - V2UnitFlg
+# CHECK-NEXT: [3]   - V2UnitL2
+# CHECK-NEXT: [4.0] - V2UnitL01
+# CHECK-NEXT: [4.1] - V2UnitL01
+# CHECK-NEXT: [5]   - V2UnitM0
+# CHECK-NEXT: [6]   - V2UnitM1
+# CHECK-NEXT: [7]   - V2UnitS0
+# CHECK-NEXT: [8]   - V2UnitS1
+# CHECK-NEXT: [9]   - V2UnitS2
+# CHECK-NEXT: [10]  - V2UnitS3
+# CHECK-NEXT: [11]  - V2UnitV0
+# CHECK-NEXT: [12]  - V2UnitV1
+# CHECK-NEXT: [13]  - V2UnitV2
+# CHECK-NEXT: [14]  - V2UnitV3
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   [14]
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0.0]  [0.1]  [1.0]  [1.1]  [2.0]  [2.1]  [2.2]  [3]    [4.0]  [4.1]  [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   [14]   Instructions:
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     0.25   0.25   0.25   0.25   addhnb	z0.b, z1.h, z31.h

Copy link
Contributor

@jthackray jthackray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice. How much extra performance (roughly) does this provide?

Copy link
Collaborator

@c-rhodes c-rhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cheers

Copy link

@Asher8118 Asher8118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, makes sense, thanks.

@davemgreen
Copy link
Collaborator Author

Thanks

LGTM, nice. How much extra performance (roughly) does this provide?

Scheduling on OoO machines doesn't usually give a lot of performance, when taken as an aggregate. The OoO machines can do a lot of scheduling themselves so long as we get it close enough, and my tests it was not very much. This is closer in terms of the correct number of pipelines though, which would help if we start to use them more.

@davemgreen davemgreen merged commit 2a4c74c into llvm:main Dec 6, 2024
10 checks passed
@davemgreen davemgreen deleted the gh-a64-xscheds branch December 6, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants