[RISCV] Enable bidirectional scheduling and tracking register pressure #115445

wangpc-pp · 2024-11-08T09:16:51Z

This is based on other targets like PPC/AArch64 and some experiments.

This PR will only enable bidirectional scheduling and tracking register
pressure.

Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.

llvmbot · 2024-11-08T09:17:29Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Pengcheng Wang (wangpc-pp)

Changes

This is based on other targets like PPC/AArch64 and some experiments.

Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.

Full diff: https://github.com/llvm/llvm-project/pull/115445.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+23)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+3)

diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index e7db1ededf383b..f43c520422f13d 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -16,6 +16,7 @@
 #include "RISCV.h"
 #include "RISCVFrameLowering.h"
 #include "RISCVTargetMachine.h"
+#include "llvm/CodeGen/MachineScheduler.h"
 #include "llvm/CodeGen/MacroFusion.h"
 #include "llvm/CodeGen/ScheduleDAGMutation.h"
 #include "llvm/MC/TargetRegistry.h"
@@ -199,3 +200,25 @@ unsigned RISCVSubtarget::getMinimumJumpTableEntries() const {
              ? RISCVMinimumJumpTableEntries
              : TuneInfo->MinimumJumpTableEntries;
 }
+
+void RISCVSubtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
+                                         unsigned NumRegionInstrs) const {
+  // Do bidirectional scheduling since it provides a more balanced scheduling
+  // leading to better performance. This will increase compile time.
+  Policy.OnlyTopDown = false;
+  Policy.OnlyBottomUp = false;
+
+  // Enabling or Disabling the latency heuristic is a close call: It seems to
+  // help nearly no benchmark on out-of-order architectures, on the other hand
+  // it regresses register pressure on a few benchmarking.
+  // FIXME: This is from AArch64, but we haven't evaluated it on RISC-V.
+  Policy.DisableLatencyHeuristic = true;
+
+  // Spilling is generally expensive on all RISC-V cores, so always enable
+  // register-pressure tracking. This will increase compile time.
+  Policy.ShouldTrackPressure = true;
+
+  // Enabling ShouldTrackLaneMasks when vector instructions are supported.
+  // TODO: Add extensions that need register pairs as well?
+  Policy.ShouldTrackLaneMasks = hasVInstructions();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index f59a3737ae76f9..f2c0a3d85c998a 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -327,6 +327,9 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   unsigned getTailDupAggressiveThreshold() const {
     return TuneInfo->TailDupAggressiveThreshold;
   }
+
+  void overrideSchedPolicy(MachineSchedPolicy &Policy,
+                           unsigned NumRegionInstrs) const override;
 };
 } // End llvm namespace

wangpc-pp · 2024-11-08T10:28:16Z

This causes more spills for some cases, which may be fixed by #113675.

preames

LGTM - This looks very clearly like a good idea - to the point where I am mildly shocked we haven't done this before. I think, in fact, that this probably fixes a bug I was chasing just yesterday. :)

I am not too worried about the noted increase in spills as any perturbation to scheduling is going to cause changes in register allocation. Looking through the test diffs, I see both improvements and regressions. I don't see evidence of systematic regressions.

It'd be nice if we had perf data (and/or stats) from the BP3 (@lukel97 , @mikhailramalho ), but I don't consider that blocking as this is so clearly a good idea from first principles.

Please do wait a day or so in case @topperc , @asb , @lukel97 or others have comments.

topperc · 2024-11-08T18:01:07Z

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

+  // help nearly no benchmark on out-of-order architectures, on the other hand
+  // it regresses register pressure on a few benchmarking.
+  // FIXME: This is from AArch64, but we haven't evaluated it on RISC-V.
+  Policy.DisableLatencyHeuristic = true;


What does this do on in order cores?

GenericScheduler picks the next SU by inspecting heuristics (not an exhaustive list) in the following order:

register pressure

acyclic critical path in a loop

load/store cluster

resource pressure (probably the most important for out-of-order core)

latency (critical path)

program order

DisableLatencyHeuristic basically turns off (5), which means that at that point we're relying on program order. I don't think using program order will be more favorable than reducing critical path, especially for in-order cores. Disabling it won't save compile time either, turning on bidirectional scheduler has more impact on compile time I believe.

Maybe we should enable this for in-oder cores?

Maybe we should enable this for in-oder cores?

Personally I would not disabling it at all because I still don't think program order is better than reducing critical path, regardless of in-order or out-of-order cores.

That said, I'm fine with enabling it only for in-order cores for now and see how it goes.

mshockwave · 2024-11-08T18:18:55Z

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

+  // help nearly no benchmark on out-of-order architectures, on the other hand
+  // it regresses register pressure on a few benchmarking.
+  // FIXME: This is from AArch64, but we haven't evaluated it on RISC-V.
+  Policy.DisableLatencyHeuristic = true;


GenericScheduler picks the next SU by inspecting heuristics (not an exhaustive list) in the following order:

register pressure

acyclic critical path in a loop

load/store cluster

resource pressure (probably the most important for out-of-order core)

latency (critical path)

program order

DisableLatencyHeuristic basically turns off (5), which means that at that point we're relying on program order. I don't think using program order will be more favorable than reducing critical path, especially for in-order cores. Disabling it won't save compile time either, turning on bidirectional scheduler has more impact on compile time I believe.

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

asb · 2024-11-11T16:43:33Z

Good spot! I just wonder about ShouldTrackLaneMasks and whether that might make sense as a follow-up rather than being enabled at first. It looks like of the in-tree targets, only AMDGPU is enabling it right now. Not a strong view though, so if you've delved into how it's used and put a reasonable amount of code through it I wouldn't object leaving it as-is.

mshockwave

LGTM

wangpc-pp · 2024-11-12T09:01:13Z

This PR will only enable bidirectional scheduling and tracking register pressure. And ShouldTrackLaneMasks/DisableLatencyHeuristic will be follow-ups (need more inputs).

I will land this in a few days if there is no further more discusstion.

lukel97 · 2024-11-12T09:22:06Z

Non blocking, can you check that #107532 still spills? I have a feeling we still need to set microOpBufferSize otherwise latency always overrides register pressure

wangpc-pp · 2024-11-12T09:42:24Z

Non blocking, can you check that #107532 still spills? I have a feeling we still need to set microOpBufferSize otherwise latency always overrides register pressure

Yeah, it still spills. I think we can add generic models as what I commented before. :-)

mshockwave · 2024-11-12T18:31:25Z

This PR will only enable bidirectional scheduling and tracking register pressure. And ShouldTrackLaneMasks/DisableLatencyHeuristic will be follow-ups (need more inputs).

Agree. Could you update the patch title to reflect this change?

lukel97 · 2024-11-14T07:04:53Z

I've finished a run comparing this PR with tip of tree @ e887f82, and there's some really nice performance gains: 11% on 538.imagick_r and 10% on 508.namd_r! On the Banana Pi F3 with -mcpu=spacemit-x60 -O3 -flto.

https://lnt.lukelau.me/db_default/v4/nts/34?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=35&submit=Update

I'm re-running imagick and namd just to double check, but otherwise this is great!

EDIT: I can reproduce the imagick and namd results on 9358905 vs fef37f2

wangpc-pp · 2024-11-15T03:14:10Z

I will land this at the end of today (18:00 UTC+8, Friday) if there is no objection.

This is based on other targets like PPC/AArch64 and some experiments. According to the evaluation, we can see 2-3% performance gain on spacemit-x60. Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome.

llvmbot added the backend:RISC-V label Nov 8, 2024

wangpc-pp requested review from asb, preames, lukel97, kito-cheng, michaelmaitland, dtcxzyw, topperc, zixuan-wu and mshockwave November 8, 2024 09:18

wangpc-pp force-pushed the main-riscv-override-sched-policy branch from 074c3c3 to 6017bfc Compare November 8, 2024 09:52

llvmbot added the llvm:globalisel label Nov 8, 2024

preames approved these changes Nov 8, 2024

View reviewed changes

topperc reviewed Nov 8, 2024

View reviewed changes

mshockwave reviewed Nov 8, 2024

View reviewed changes

mshockwave approved these changes Nov 11, 2024

View reviewed changes

wangpc-pp force-pushed the main-riscv-override-sched-policy branch from 7d36dea to fef37f2 Compare November 12, 2024 08:44

This was referenced Nov 12, 2024

[RISCV] Enable ShouldTrackLaneMasks when having vector instructions #115843

Open

[RISCV] Add TuneDisableLatencySchedHeuristic #115858

Merged

wangpc-pp changed the title ~~[RISCV] Override default sched policy~~ [RISCV] Enable bidirectional scheduling and tracking register pressure Nov 15, 2024

wangpc-pp force-pushed the main-riscv-override-sched-policy branch from fef37f2 to 0f8ed88 Compare November 15, 2024 09:34

wangpc-pp merged commit 9122c52 into llvm:main Nov 15, 2024
5 of 7 checks passed

wangpc-pp deleted the main-riscv-override-sched-policy branch November 15, 2024 09:53

lukel97 mentioned this pull request Dec 9, 2024

Increased spilling on rva22u64 after bidirectional scheduling/register pressure tracking #119222

Open

sihuan mentioned this pull request Jun 5, 2025

[RISCV] 507.cactuBSSN_r regression after bidirectional scheduling/register pressure tracking #143005

Open

[RISCV] Enable bidirectional scheduling and tracking register pressure #115445

[RISCV] Enable bidirectional scheduling and tracking register pressure #115445

Uh oh!

Conversation

wangpc-pp commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangpc-pp commented Nov 8, 2024

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

topperc Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

mshockwave Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

mshockwave Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshockwave Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asb commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshockwave left a comment

Choose a reason for hiding this comment

Uh oh!

wangpc-pp commented Nov 12, 2024

Uh oh!

lukel97 commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangpc-pp commented Nov 12, 2024

Uh oh!

mshockwave commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangpc-pp commented Nov 15, 2024

Uh oh!

Uh oh!

Uh oh!

wangpc-pp commented Nov 8, 2024 •

edited

Loading

llvmbot commented Nov 8, 2024 •

edited

Loading

mshockwave Nov 8, 2024 •

edited

Loading

mshockwave Nov 11, 2024 •

edited

Loading

mshockwave Nov 8, 2024 •

edited

Loading

asb commented Nov 11, 2024 •

edited

Loading

lukel97 commented Nov 12, 2024 •

edited

Loading

mshockwave commented Nov 12, 2024 •

edited

Loading

lukel97 commented Nov 14, 2024 •

edited

Loading