[AArch64] Disable Pre-RA Scheduler for Neoverse V2 #127784

sjoerdmeijer · 2025-02-19T11:26:44Z

We would like to disable the pre-RA machine scheduler for the Neoverse V2 because we have a key workload that massively benefits from this (25% uplift). Despite the machine scheduler being register pressure aware, it results in spills for this workload. Disabling the scheduler seems a lot more attractive than trying to tweak regalloc heuristics:

We see no benefit of scheduling anyway on this big core, and have never seen this. I.e., when we added the V2 scheduling model, this wasn't for perf reasons, only to enable LLVM-MCA.
Scheduling can consume significant compile-time, not resulting in any perf gains. This is a bad deal.

FWIW: the GCC folks realised the same not that long ago, and did exactly the same, see also:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/667074.html

I guess other big cores could benefit from this too, but I would like to leave that decision to folks with more experience on those cores, so that's why I propose to change this for the V2 here only.

Numbers:

We know the Eigen library is somewhat sensitive to scheduling, but I found one kernel to regress with ~2%, and another to improve with ~2%. They cancel each other out, and overall the result is neutral.
SPEC FP and INT seem totally unaffected.
LLVM test-suite: a little bit up and down, all within noise levels I think, so is neutral.
Compile-time numbers: I see a geomean 3% improvement for the LLVM test-suite, and a very decent one for the sqlite amalgamation version.

I haven't looked at the post-RA scheduling, maybe that's interesting as a follow up.

llvmbot · 2025-02-19T11:27:17Z

@llvm/pr-subscribers-backend-aarch64

Author: Sjoerd Meijer (sjoerdmeijer)

Changes

We would like to disable the pre-RA machine scheduler for the Neoverse V2 because we have a key workload that massively benefits from this (25% uplift). Despite the machine scheduler being register pressure aware, it results in spills for this workload. Disabling the scheduler seems a lot more attractive than trying to tweak regalloc heuristics:

We see no benefit of scheduling anyway on this big core, and have never seen this. I.e., when we added the V2 scheduling model, this wasn't for perf reasons, only to enable LLVM-MCA.
Scheduling can consume significant compile-time, not resulting in any perf gains. This is a bad deal.

FWIW: the GCC folks realised the same not that long ago, and did exactly the same, see also:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/667074.html

I guess other big cores could benefit from this too, but I would like to leave that decision to folks with more experience on those cores, so that's why I propose to change this for the V2 here only.

Numbers:

We know the Eigen library is somewhat sensitive to scheduling, but I found one kernel to regress with ~2%, and another to improve with ~2%. They cancel each other out, and overall the result is neutral.
SPEC FP and INT seem totally unaffected.
LLVM test-suite: a little bit up and down, all within noise levels I think, so is neutral.
Compile-time numbers: I see a geomean 3% improvement for the LLVM test-suite, and a very decent one for the sqlite amalgamation version.

I haven't looked at the post-RA scheduling, maybe that's interesting as a follow up.

Patch is 31.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127784.diff

6 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64Features.td (+3)
(modified) llvm/lib/Target/AArch64/AArch64Processors.td (+1)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+1-1)
(modified) llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll (+142-142)
(modified) llvm/test/CodeGen/AArch64/selectopt-const.ll (+10-10)
(modified) llvm/test/CodeGen/AArch64/sve-fixed-vector-zext.ll (+6-6)

diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 357f526d5e308..13c1a386149b7 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -669,6 +669,9 @@ def FeatureExynosCheapAsMoveHandling : SubtargetFeature<"exynos-cheap-as-move",
     "HasExynosCheapAsMoveHandling", "true",
     "Use Exynos specific handling of cheap instructions">;
 
+def FeatureDisablePreRAScheduler : SubtargetFeature<"use-prera-scheduler",
+    "DisablePreRAScheduler", "true", "Disable scheduling before register allocation">;
+
 def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
     "UsePostRAScheduler", "true", "Schedule again after register allocation">;
 
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index b977b6aaaf619..401a2637fa9f3 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -540,6 +540,7 @@ def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2
                                       FeatureCmpBccFusion,
                                       FeatureFuseAdrpAdd,
                                       FeatureALULSLFast,
+                                      FeatureDisablePreRAScheduler,
                                       FeaturePostRAScheduler,
                                       FeatureEnableSelectOptimize,
                                       FeatureUseFixedOverScalableIfEqualCost,
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index c6eb77e3bc3ba..9e8c50e376272 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -156,7 +156,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   const LegalizerInfo *getLegalizerInfo() const override;
   const RegisterBankInfo *getRegBankInfo() const override;
   const Triple &getTargetTriple() const { return TargetTriple; }
-  bool enableMachineScheduler() const override { return true; }
+  bool enableMachineScheduler() const override { return !disablePreRAScheduler(); }
   bool enablePostRAScheduler() const override { return usePostRAScheduler(); }
   bool enableSubRegLiveness() const override { return EnableSubregLiveness; }
 
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
index 02ff12c27fcda..d29fdd23863a6 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
@@ -48,12 +48,12 @@ define i8 @load_atomic_i8_aligned_acquire(ptr %ptr) {
 ; GISEL:    add x8, x0, #4
 ; GISEL:    ldaprb w0, [x8]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_aligned_acquire:
-; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i8_aligned_acquire:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #4
 ; SDAG-AVOIDLDAPUR:    ldaprb w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_aligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
     %gep = getelementptr inbounds i8, ptr %ptr, i32 4
     %r = load atomic i8, ptr %gep acquire, align 1
     ret i8 %r
@@ -64,12 +64,12 @@ define i8 @load_atomic_i8_aligned_acquire_const(ptr readonly %ptr) {
 ; GISEL:    add x8, x0, #4
 ; GISEL:    ldaprb w0, [x8]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_aligned_acquire_const:
-; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i8_aligned_acquire_const:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #4
 ; SDAG-AVOIDLDAPUR:    ldaprb w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_aligned_acquire_const:
+; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
     %gep = getelementptr inbounds i8, ptr %ptr, i32 4
     %r = load atomic i8, ptr %gep acquire, align 1
     ret i8 %r
@@ -130,12 +130,12 @@ define i16 @load_atomic_i16_aligned_acquire(ptr %ptr) {
 ; GISEL:    add x8, x0, #8
 ; GISEL:    ldaprh w0, [x8]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_aligned_acquire:
-; SDAG-NOAVOIDLDAPUR:    ldapurh w0, [x0, #8]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i16_aligned_acquire:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #8
 ; SDAG-AVOIDLDAPUR:    ldaprh w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_aligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    ldapurh w0, [x0, #8]
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep acquire, align 2
     ret i16 %r
@@ -146,12 +146,12 @@ define i16 @load_atomic_i16_aligned_acquire_const(ptr readonly %ptr) {
 ; GISEL:    add x8, x0, #8
 ; GISEL:    ldaprh w0, [x8]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_aligned_acquire_const:
-; SDAG-NOAVOIDLDAPUR:    ldapurh w0, [x0, #8]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i16_aligned_acquire_const:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #8
 ; SDAG-AVOIDLDAPUR:    ldaprh w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_aligned_acquire_const:
+; SDAG-NOAVOIDLDAPUR:    ldapurh w0, [x0, #8]
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep acquire, align 2
     ret i16 %r
@@ -211,12 +211,12 @@ define i32 @load_atomic_i32_aligned_acquire(ptr %ptr) {
 ; GISEL-LABEL: load_atomic_i32_aligned_acquire:
 ; GISEL:    ldapur w0, [x0, #16]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_aligned_acquire:
-; SDAG-NOAVOIDLDAPUR:    ldapur w0, [x0, #16]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i32_aligned_acquire:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #16
 ; SDAG-AVOIDLDAPUR:    ldapr w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_aligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    ldapur w0, [x0, #16]
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep acquire, align 4
     ret i32 %r
@@ -226,12 +226,12 @@ define i32 @load_atomic_i32_aligned_acquire_const(ptr readonly %ptr) {
 ; GISEL-LABEL: load_atomic_i32_aligned_acquire_const:
 ; GISEL:    ldapur w0, [x0, #16]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_aligned_acquire_const:
-; SDAG-NOAVOIDLDAPUR:    ldapur w0, [x0, #16]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i32_aligned_acquire_const:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #16
 ; SDAG-AVOIDLDAPUR:    ldapr w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_aligned_acquire_const:
+; SDAG-NOAVOIDLDAPUR:    ldapur w0, [x0, #16]
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep acquire, align 4
     ret i32 %r
@@ -291,12 +291,12 @@ define i64 @load_atomic_i64_aligned_acquire(ptr %ptr) {
 ; GISEL-LABEL: load_atomic_i64_aligned_acquire:
 ; GISEL:    ldapur x0, [x0, #32]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_aligned_acquire:
-; SDAG-NOAVOIDLDAPUR:    ldapur x0, [x0, #32]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i64_aligned_acquire:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #32
 ; SDAG-AVOIDLDAPUR:    ldapr x0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_aligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    ldapur x0, [x0, #32]
     %gep = getelementptr inbounds i64, ptr %ptr, i32 4
     %r = load atomic i64, ptr %gep acquire, align 8
     ret i64 %r
@@ -306,12 +306,12 @@ define i64 @load_atomic_i64_aligned_acquire_const(ptr readonly %ptr) {
 ; GISEL-LABEL: load_atomic_i64_aligned_acquire_const:
 ; GISEL:    ldapur x0, [x0, #32]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_aligned_acquire_const:
-; SDAG-NOAVOIDLDAPUR:    ldapur x0, [x0, #32]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i64_aligned_acquire_const:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #32
 ; SDAG-AVOIDLDAPUR:    ldapr x0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_aligned_acquire_const:
+; SDAG-NOAVOIDLDAPUR:    ldapur x0, [x0, #32]
     %gep = getelementptr inbounds i64, ptr %ptr, i32 4
     %r = load atomic i64, ptr %gep acquire, align 8
     ret i64 %r
@@ -440,12 +440,12 @@ define i8 @load_atomic_i8_unaligned_acquire(ptr %ptr) {
 ; GISEL:    add x8, x0, #4
 ; GISEL:    ldaprb w0, [x8]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_unaligned_acquire:
-; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i8_unaligned_acquire:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #4
 ; SDAG-AVOIDLDAPUR:    ldaprb w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_unaligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
     %gep = getelementptr inbounds i8, ptr %ptr, i32 4
     %r = load atomic i8, ptr %gep acquire, align 1
     ret i8 %r
@@ -456,12 +456,12 @@ define i8 @load_atomic_i8_unaligned_acquire_const(ptr readonly %ptr) {
 ; GISEL:    add x8, x0, #4
 ; GISEL:    ldaprb w0, [x8]
 ;
-; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_unaligned_acquire_const:
-; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
-;
 ; SDAG-AVOIDLDAPUR-LABEL: load_atomic_i8_unaligned_acquire_const:
 ; SDAG-AVOIDLDAPUR:    add x8, x0, #4
 ; SDAG-AVOIDLDAPUR:    ldaprb w0, [x8]
+;
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i8_unaligned_acquire_const:
+; SDAG-NOAVOIDLDAPUR:    ldapurb w0, [x0, #4]
     %gep = getelementptr inbounds i8, ptr %ptr, i32 4
     %r = load atomic i8, ptr %gep acquire, align 1
     ret i8 %r
@@ -490,9 +490,9 @@ define i16 @load_atomic_i16_unaligned_unordered(ptr %ptr) {
 ; GISEL:    add x1, x8, #4
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_unordered:
-; SDAG:    add x1, x0, #4
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_unordered:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #4
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i8, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep unordered, align 1
     ret i16 %r
@@ -503,9 +503,9 @@ define i16 @load_atomic_i16_unaligned_unordered_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #4
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_unordered_const:
-; SDAG:    add x1, x0, #4
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_unordered_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #4
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i8, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep unordered, align 1
     ret i16 %r
@@ -516,9 +516,9 @@ define i16 @load_atomic_i16_unaligned_monotonic(ptr %ptr) {
 ; GISEL:    add x1, x8, #8
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_monotonic:
-; SDAG:    add x1, x0, #8
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_monotonic:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #8
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep monotonic, align 1
     ret i16 %r
@@ -529,9 +529,9 @@ define i16 @load_atomic_i16_unaligned_monotonic_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #8
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_monotonic_const:
-; SDAG:    add x1, x0, #8
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_monotonic_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #8
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep monotonic, align 1
     ret i16 %r
@@ -542,9 +542,9 @@ define i16 @load_atomic_i16_unaligned_acquire(ptr %ptr) {
 ; GISEL:    add x1, x8, #8
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_acquire:
-; SDAG:    add x1, x0, #8
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #8
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep acquire, align 1
     ret i16 %r
@@ -555,9 +555,9 @@ define i16 @load_atomic_i16_unaligned_acquire_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #8
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_acquire_const:
-; SDAG:    add x1, x0, #8
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_acquire_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #8
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep acquire, align 1
     ret i16 %r
@@ -568,9 +568,9 @@ define i16 @load_atomic_i16_unaligned_seq_cst(ptr %ptr) {
 ; GISEL:    add x1, x8, #8
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_seq_cst:
-; SDAG:    add x1, x0, #8
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_seq_cst:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #8
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep seq_cst, align 1
     ret i16 %r
@@ -581,9 +581,9 @@ define i16 @load_atomic_i16_unaligned_seq_cst_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #8
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i16_unaligned_seq_cst_const:
-; SDAG:    add x1, x0, #8
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i16_unaligned_seq_cst_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #8
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i16, ptr %ptr, i32 4
     %r = load atomic i16, ptr %gep seq_cst, align 1
     ret i16 %r
@@ -594,9 +594,9 @@ define i32 @load_atomic_i32_unaligned_unordered(ptr %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_unordered:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_unordered:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep unordered, align 1
     ret i32 %r
@@ -607,9 +607,9 @@ define i32 @load_atomic_i32_unaligned_unordered_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_unordered_const:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_unordered_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep unordered, align 1
     ret i32 %r
@@ -620,9 +620,9 @@ define i32 @load_atomic_i32_unaligned_monotonic(ptr %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_monotonic:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_monotonic:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep monotonic, align 1
     ret i32 %r
@@ -633,9 +633,9 @@ define i32 @load_atomic_i32_unaligned_monotonic_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_monotonic_const:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_monotonic_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep monotonic, align 1
     ret i32 %r
@@ -646,9 +646,9 @@ define i32 @load_atomic_i32_unaligned_acquire(ptr %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_acquire:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep acquire, align 1
     ret i32 %r
@@ -659,9 +659,9 @@ define i32 @load_atomic_i32_unaligned_acquire_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_acquire_const:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_acquire_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep acquire, align 1
     ret i32 %r
@@ -672,9 +672,9 @@ define i32 @load_atomic_i32_unaligned_seq_cst(ptr %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_seq_cst:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_seq_cst:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep seq_cst, align 1
     ret i32 %r
@@ -685,9 +685,9 @@ define i32 @load_atomic_i32_unaligned_seq_cst_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #16
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i32_unaligned_seq_cst_const:
-; SDAG:    add x1, x0, #16
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i32_unaligned_seq_cst_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #16
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i32, ptr %ptr, i32 4
     %r = load atomic i32, ptr %gep seq_cst, align 1
     ret i32 %r
@@ -698,9 +698,9 @@ define i64 @load_atomic_i64_unaligned_unordered(ptr %ptr) {
 ; GISEL:    add x1, x8, #32
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i64_unaligned_unordered:
-; SDAG:    add x1, x0, #32
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_unaligned_unordered:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #32
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i64, ptr %ptr, i32 4
     %r = load atomic i64, ptr %gep unordered, align 1
     ret i64 %r
@@ -711,9 +711,9 @@ define i64 @load_atomic_i64_unaligned_unordered_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #32
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i64_unaligned_unordered_const:
-; SDAG:    add x1, x0, #32
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_unaligned_unordered_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #32
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i64, ptr %ptr, i32 4
     %r = load atomic i64, ptr %gep unordered, align 1
     ret i64 %r
@@ -724,9 +724,9 @@ define i64 @load_atomic_i64_unaligned_monotonic(ptr %ptr) {
 ; GISEL:    add x1, x8, #32
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i64_unaligned_monotonic:
-; SDAG:    add x1, x0, #32
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_unaligned_monotonic:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #32
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i64, ptr %ptr, i32 4
     %r = load atomic i64, ptr %gep monotonic, align 1
     ret i64 %r
@@ -737,9 +737,9 @@ define i64 @load_atomic_i64_unaligned_monotonic_const(ptr readonly %ptr) {
 ; GISEL:    add x1, x8, #32
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i64_unaligned_monotonic_const:
-; SDAG:    add x1, x0, #32
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_unaligned_monotonic_const:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #32
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i64, ptr %ptr, i32 4
     %r = load atomic i64, ptr %gep monotonic, align 1
     ret i64 %r
@@ -750,9 +750,9 @@ define i64 @load_atomic_i64_unaligned_acquire(ptr %ptr) {
 ; GISEL:    add x1, x8, #32
 ; GISEL:    bl __atomic_load
 ;
-; SDAG-LABEL: load_atomic_i64_unaligned_acquire:
-; SDAG:    add x1, x0, #32
-; SDAG:    bl __atomic_load
+; SDAG-NOAVOIDLDAPUR-LABEL: load_atomic_i64_unaligned_acquire:
+; SDAG-NOAVOIDLDAPUR:    add x1, x0, #32
+; SDAG-NOAVOIDLDAPUR:    bl __atomic_load
     %gep = getelementptr inbounds i64, ptr %ptr, i32 4
     %r = load a...
[truncated]

github-actions · 2025-02-19T11:30:02Z

✅ With the latest revision this PR passed the C/C++ code formatter.

sjoerdmeijer · 2025-02-19T11:30:16Z

Ah, now that #127620 was merged, I need to set this for Grace too.

sjoerdmeijer · 2025-02-19T11:43:22Z

Ah, now that #127620 was merged, I need to set this for Grace too.

Ignore this, this isn't necessary the way grace is defined.

davemgreen

This doesn't feel to me like the right approach - in that if it was randomly right or wrong before it will still be randomly right or wrong, not always right (or more likely to be right, heuristics being what they are). I agree that scheduling for register pressure is the most important thing to consider, but there are times when scheduling on OoO can be helpful (out of a mis-predict for example, where you need to get instructions through the pipeline as efficiently as possible).

But it appears this might be running into problems at the moment. Disabling enableMachineScheduler() starts to enable some scheduling out of SDAG, and that might be hitting crashed now. (You could say that if it was scheduling out of SDAG for register pressure then we don't need to alter the scheduling later on, it looked like it was trying hybrid at the moment. But this wouldn't help with GISel). There is a chance this works differently to specifying the option, can you give it another test?

The best way to test perf might be to check the codesize (at -O3). It is the number of dynamically executed spills that is likely the most important, but codesize can be a proxy for the number of spills. Is it possible to add a test case for the problem you saw where no scheduling was better than with? We have seen a case recently where in-order scheduling was accidentally better than OoO, but I don't believe that the default order worked well there either.

c-rhodes · 2025-02-20T13:49:34Z

But it appears this might be running into problems at the moment. Disabling enableMachineScheduler() starts to enable some scheduling out of SDAG, and that might be hitting crashed now.

Could you clarify what you mean by "might be hitting crashed"?

sjoerdmeijer · 2025-02-20T15:35:25Z

but there are times when scheduling on OoO can be helpful

But we don't have the evidence for this. At least, not in the workloads that we see. Now, we might find one such a case, but is that then representative? The whole hypothesis is that scheduling isn't a win overall and is just eating up compile time (this idea is shared by gcc folks, fwiw).

Disabling enableMachineScheduler() starts to enable some scheduling out of SDAG, and that might be hitting crashed now.

I also don't know what this crash is.

There is a chance this works differently to specifying the option, can you give it another test?

Sorry, the experiment is unclear to me, what is it exactly?

Is it possible to add a test case for the problem you saw where no scheduling was better than with?

I am afraid it will be difficult to reduce and come up with something small, but can give it a try.

davemgreen · 2025-02-20T16:57:38Z

I also don't know what this crash is.

I was running the llvm-test-suite (+spec's) with.. I think -stdlib=libc++ --rtlib=compiler-rt -fuse-ld=lld -flto -O3 -mcpu=native. It is hitting a crash that looks like below, it looks like the scheduling out of SDAG. It was hitting it in enough cases that I assumed it was easy enough to reproduce, let me know if it isn't and I can try to check further.

clang: error: unable to execute command: Segmentation fault
clang: error: linker command failed due to signal (use -v to see invocation)
 #0 0x00000000008e0040 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (ld.lld+0x8e0040)
 #1 0x00000000008ddc5c llvm::sys::RunSignalHandlers() (ld.lld+0x8ddc5c)
 #2 0x00000000008dddb4 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x0000ffffb5bc67f0 (linux-vdso.so.1+0x7f0)
 #4 0x00000000017b34ec GetCostForDef(llvm::ScheduleDAGSDNodes::RegDefIter const&, llvm::TargetLowering const*, llvm::TargetInstrInfo const*, llvm::TargetRegisterInfo const*, unsigned int&, unsigned int&, llvm::MachineFunction const&) ScheduleDAGRRList.cpp:0:0
 #5 0x00000000017b4c74 (anonymous namespace)::RegReductionPQBase::HighRegPressure(llvm::SUnit const*) const ScheduleDAGRRList.cpp:0:0
 #6 0x00000000017b68d8 (anonymous namespace)::hybrid_ls_rr_sort::operator()(llvm::SUnit*, llvm::SUnit*) const (.part.0) ScheduleDAGRRList.cpp:0:0
 #7 0x00000000017b75f4 llvm::SUnit* (anonymous namespace)::popFromQueueImpl<(anonymous namespace)::hybrid_ls_rr_sort>(std::vector<llvm::SUnit*, std::allocator<llvm::SUnit*>>&, (anonymous namespace)::hybrid_ls_rr_sort&) ScheduleDAGRRList.cpp:0:0
 #8 0x00000000017b7684 (anonymous namespace)::RegReductionPriorityQueue<(anonymous namespace)::hybrid_ls_rr_sort>::pop() ScheduleDAGRRList.cpp:0:0
 #9 0x00000000017c0394 (anonymous namespace)::ScheduleDAGRRList::Schedule() ScheduleDAGRRList.cpp:0:0
#10 0x00000000016e17f4 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (ld.lld+0x16e17f4)
#11 0x00000000016e4398 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (ld.lld+0x16e4398)
#12 0x00000000016e582c llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (ld.lld+0x16e582c)
#13 0x00000000016d7308 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (ld.lld+0x16d7308)
#14 0x0000000001c55208 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#15 0x000000000383476c llvm::FPPassManager::runOnFunction(llvm::Function&) (ld.lld+0x383476c)
#16 0x0000000003834a20 llvm::FPPassManager::runOnModule(llvm::Module&) (ld.lld+0x3834a20)
#17 0x00000000038353e8 llvm::legacy::PassManagerImpl::run(llvm::Module&) (ld.lld+0x38353e8)
#18 0x0000000001915800 codegen(llvm::lto::Config const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) LTOBackend.cpp:0:0
#19 0x0000000001916d70 llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (ld.lld+0x1916d70)

There is a chance this works differently to specifying the option, can you give it another test?

Sorry, the experiment is unclear to me, what is it exactly?

Sorry, I just meant test it doesn't crash. I was guessing that you might have been testing with -enable-misched=0, but that might have worked a little differently.

sjoerdmeijer · 2025-02-20T17:11:10Z

Sorry, I just meant test it doesn't crash. I was guessing that you might have been testing with -enable-misched=0, but that might have worked a little differently.

Ohhh, I see, thanks.
Yes, I have done all my testing with -mllvm -enable-misched=false, then made my this change and didn't rerun things with it. So you mean that with this patch you see crashes, that's interesting! Will look.

We would like to disable the pre-RA machine scheduler for the Neoverse V2 because we have a key workload that massively benefits from this (25% uplift). Despite the machine scheduler being register pressure aware, it results in spills for this workload. Disabling the scheduler seems a lot more attractive than trying to tweak regalloc heuristics: - We see no benefit of scheduling anyway on this big core, and have never seen this. I.e., when we added the V2 scheduling model, this wasn't for perf reasons, only to enable LLVM-MCA. - Scheduling can consume significant compile-time, not resulting in any perf gains. This is a bad deal. FWIW: the GCC folks realised the same not that long ago, and did exactly the same, see also: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/667074.html I guess other big cores could benefit from this too, but I would like to leave that decision to folks with more experience on those cores, so that's why I propose to change this for the V2 here only. Numbers: * We know the Eigen library is somewhat sensitive to scheduling, but I found one kernel to regress with ~2%, and another to improve with ~2%. They cancel each other out, and overall the result is neutral. * SPEC FP and INT seem totally unaffected. * LLVM test-suite: a little bit up and down, all within noise levels I think, so is neutral. * Compile-time numbers: I see a geomean 3% improvement for the LLVM test-suite, and a very decent one for the sqlite amalgamation version. I haven't looked at the post-RA scheduling, maybe that's interesting as a follow up.

sjoerdmeijer · 2025-02-26T13:42:29Z

This should fix the crash. There were some interesting things going on: despite the machine scheduler being disabled, there was scheduling for register pressure going on for the SelectionDAG. The reason was that the default scheduling policy was set to Hybrid, which causes this. If the MachineScheduler is disabled, this now chances the default sched policy to Source. This is the behaviour that we want.

There is something to fix for the combination "disabled MIScheduler + Hybrid SelectionDAG scheduling" but I can't trigger that (with options); I do have a fix, but can't add a test.

Anyway, with this fix, I will take these next steps:

I need to rerun perf numbers,
I will see if I can create a reproducer that isn't too big for the code that starts spilling.

sjoerdmeijer · 2025-04-28T12:19:24Z

I am going to abandon this because this is not going to work. I still haven't seen any evidence that instruction scheduling on itself is beneficial, but turns out it is the other things that the scheduler is also doing that is making a difference:

copy elimination
and possibly the clustering of loads/stores.

We found regressions disabling the scheduler, and the reason I have seen so far is because of MOV instructions that are no longer eliminated. I am not entirely sure yet about the loads/stores and if they influence performance, but it is a difference.

One idea therefore for some light weight scheduling is to separate out these optimisations from the other scheduling business.

To address the regression that we are seeing I will now pursue this direction:

a heuristic to bail out from the machine scheduler when the function is extremely large. I forgot the exact number, but the function we look at has more than 7000 instructions, so the intuition is that for such for large functions we eat up compile-time with a very high risk of getting it wrong.
Better would be to skip scheduling but still perform copy-elimination, but that needs some investigation how easy it is to run these things separately.

@davemgreen: let me know what you think or if you have any objections.

davemgreen · 2025-04-30T18:58:50Z

Hi - Another of the things that is possibly quite important is to schedule for register pressure. That might be related to the copy elimination you mention, I wasn't sure, it might be different. It's one those things that could happen to be right from the original order of the instructions (people tend to write code which uses variables close together), but if it is needed then something should be improving the order if it can.

@c-rhodes has been looking at a couple of cases recently where we have found scheduling has not been performing as well as it could. Either the inorder scheduling model or no scheduling was better for specific cases, but when we tried that on a larger selection of benchmarks the results ended up as a wash I believe, with some things getting better and some worse in about equal proportions. The GCC team recently tried this (they have very old scheduling models too), but ended up going back on it I believe. It still worries me to remove it entirely but at the end of the day it is the data that matters if we can show it reliably.

… vector intrinsic codes Skip the Pre-RA MachineScheduler for large hand-written vector intrinsic codes when targetting the Neoverse V2. The motivation to skip the scheduler is the same as this abandoned patch: llvm#127784 But this reimplementation is much more focused and fine-grained and based on the following heuristic: - only skip the pre-ra machine scheduler for large (hand-written) vector intrinsic code, - do this only for the Neoverse V2 (a wide micro-architecture). The intuition of this patch is that: - scheduling based on instruction latency isn't useful for a very wide micro-architecture (which is why GCC also partly stopped doing this), - however, the machine scheduler also performs some optimisations: i) load/store clusttering, and ii) copy elimination. These are useful optimisations, and that's why disabling the machine scheduler in general isn't a good idea, i.e. this results in some regressions. - but the function where the machine scheduler and register allocator are not working well together is a large, hand-written vector code. Thus, one could argue that scheduling this kind of code is against the programmer's intent, so let's not do that, which avoids complications later down in the optimisation pipeline. The heuristic is trying to recognise large hand-written intrinsic code by calculating a percentage of vector code and other instructions in a function and skips the machine scheduler if certain treshold values are exceeded. I.e., if a function is more than 70% vector code, contains more than 2800 IR instructions and 425 intrinsics, don't schedule this function. This obviously is a heuristic, but is hopefully narrow enough to not cause regressions (I haven't found any). The alternative is to look into regalloc, which is where the problems occur with the placement of spill/reload code. However, there will be heuristics involved there too, and so this seems like a valid heuristic and looking into regalloc is an orthogonal exercise.

This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (llvm#127784), and llvm#139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to: - Ricardo Jesus for pointing out this subtarget feature, and - Cameron McInally for the extensive performance testing.

This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (#127784), and #139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to: - Ricardo Jesus for pointing out this subtarget feature, and - Cameron McInally for the extensive performance testing.

This adds FeatureDisableLatencySchedHeuristic to the Neoverse V2 core tuning description. This gives us a 20% improvement on a key workload, some other minor improvements here and there, and no real regressions; nothing outside the noise levels. Earlier attempts to solve this problems included disabling the MI scheduler entirely (llvm#127784), and llvm#139557 was about a heuristic to not schedule hand-written vector code. This solution is preferred because it avoids another heuristic and achieves what we want, and for what is worth, there is a lot of precedent for setting this feature. Thanks to: - Ricardo Jesus for pointing out this subtarget feature, and - Cameron McInally for the extensive performance testing.

sjoerdmeijer requested a review from davemgreen February 19, 2025 11:26

llvmbot added the backend:AArch64 label Feb 19, 2025

sjoerdmeijer closed this Feb 19, 2025

sjoerdmeijer reopened this Feb 19, 2025

sjoerdmeijer force-pushed the grace-disable-misched branch from 3c9df63 to d38b98f Compare February 19, 2025 11:45

davemgreen reviewed Feb 20, 2025

View reviewed changes

davemgreen requested a review from c-rhodes February 20, 2025 09:56

sjoerdmeijer force-pushed the grace-disable-misched branch from d38b98f to f641df8 Compare February 26, 2025 13:32

sjoerdmeijer closed this Apr 28, 2025

sjoerdmeijer mentioned this pull request May 12, 2025

[MachineScheduler][AArch64] Skip Neoverse V2 Pre-RA MISched for large vector intrinsic codes #139557

Closed

sjoerdmeijer mentioned this pull request May 21, 2025

[AArch64] Neoverse V2 FeatureDisableLatencySchedHeuristic #140897

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Disable Pre-RA Scheduler for Neoverse V2 #127784

[AArch64] Disable Pre-RA Scheduler for Neoverse V2 #127784

Uh oh!

sjoerdmeijer commented Feb 19, 2025

Uh oh!

llvmbot commented Feb 19, 2025

Uh oh!

github-actions bot commented Feb 19, 2025 •

edited

Loading

Uh oh!

sjoerdmeijer commented Feb 19, 2025

Uh oh!

sjoerdmeijer commented Feb 19, 2025

Uh oh!

davemgreen left a comment •

edited

Loading

Uh oh!

c-rhodes commented Feb 20, 2025

Uh oh!

sjoerdmeijer commented Feb 20, 2025

Uh oh!

davemgreen commented Feb 20, 2025

Uh oh!

sjoerdmeijer commented Feb 20, 2025

Uh oh!

sjoerdmeijer commented Feb 26, 2025 •

edited

Loading

Uh oh!

sjoerdmeijer commented Apr 28, 2025

Uh oh!

davemgreen commented Apr 30, 2025

Uh oh!

Uh oh!

[AArch64] Disable Pre-RA Scheduler for Neoverse V2 #127784

[AArch64] Disable Pre-RA Scheduler for Neoverse V2 #127784

Uh oh!

Conversation

sjoerdmeijer commented Feb 19, 2025

Uh oh!

llvmbot commented Feb 19, 2025

Uh oh!

github-actions bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjoerdmeijer commented Feb 19, 2025

Uh oh!

sjoerdmeijer commented Feb 19, 2025

Uh oh!

davemgreen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c-rhodes commented Feb 20, 2025

Uh oh!

sjoerdmeijer commented Feb 20, 2025

Uh oh!

davemgreen commented Feb 20, 2025

Uh oh!

sjoerdmeijer commented Feb 20, 2025

Uh oh!

sjoerdmeijer commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjoerdmeijer commented Apr 28, 2025

Uh oh!

davemgreen commented Apr 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2025 •

edited

Loading

davemgreen left a comment •

edited

Loading

sjoerdmeijer commented Feb 26, 2025 •

edited

Loading