-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Add TuneDisableLatencySchedHeuristic #115858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Add TuneDisableLatencySchedHeuristic #115858
Conversation
Created using spr 1.3.6-beta.1 [skip ci]
Created using spr 1.3.6-beta.1
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-risc-v Author: Pengcheng Wang (wangpc-pp) ChangesThis helps reduce register pressure for some cases. Patch is 7.18 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115858.diff 465 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index 3eae2b9774203f..ac81d8980fd3e0 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -208,6 +208,13 @@ void RISCVSubtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
Policy.OnlyTopDown = false;
Policy.OnlyBottomUp = false;
+ // Enabling or Disabling the latency heuristic is a close call: It seems to
+ // help nearly no benchmark on out-of-order architectures, on the other hand
+ // it regresses register pressure on a few benchmarking.
+ // FIXME: This is from AArch64, but we haven't evaluated it on RISC-V.
+ // TODO: We may disable it for out-of-order architectures only.
+ Policy.DisableLatencyHeuristic = true;
+
// Spilling is generally expensive on all RISC-V cores, so always enable
// register-pressure tracking. This will increase compile time.
Policy.ShouldTrackPressure = true;
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll b/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
index ee414992a5245c..330f8b16065f13 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
@@ -25,8 +25,8 @@ define i32 @add_i8_signext_i32(i8 %a, i8 %b) {
; RV32IM-LABEL: add_i8_signext_i32:
; RV32IM: # %bb.0: # %entry
; RV32IM-NEXT: slli a0, a0, 24
-; RV32IM-NEXT: slli a1, a1, 24
; RV32IM-NEXT: srai a0, a0, 24
+; RV32IM-NEXT: slli a1, a1, 24
; RV32IM-NEXT: srai a1, a1, 24
; RV32IM-NEXT: add a0, a0, a1
; RV32IM-NEXT: ret
@@ -34,8 +34,8 @@ define i32 @add_i8_signext_i32(i8 %a, i8 %b) {
; RV64IM-LABEL: add_i8_signext_i32:
; RV64IM: # %bb.0: # %entry
; RV64IM-NEXT: slli a0, a0, 56
-; RV64IM-NEXT: slli a1, a1, 56
; RV64IM-NEXT: srai a0, a0, 56
+; RV64IM-NEXT: slli a1, a1, 56
; RV64IM-NEXT: srai a1, a1, 56
; RV64IM-NEXT: add a0, a0, a1
; RV64IM-NEXT: ret
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
index bce6dfacf8e82c..f33ba1d7a302ef 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
@@ -6,8 +6,8 @@ define i2 @bitreverse_i2(i2 %x) {
; RV32-LABEL: bitreverse_i2:
; RV32: # %bb.0:
; RV32-NEXT: slli a1, a0, 1
-; RV32-NEXT: andi a0, a0, 3
; RV32-NEXT: andi a1, a1, 2
+; RV32-NEXT: andi a0, a0, 3
; RV32-NEXT: srli a0, a0, 1
; RV32-NEXT: or a0, a1, a0
; RV32-NEXT: ret
@@ -15,8 +15,8 @@ define i2 @bitreverse_i2(i2 %x) {
; RV64-LABEL: bitreverse_i2:
; RV64: # %bb.0:
; RV64-NEXT: slli a1, a0, 1
-; RV64-NEXT: andi a0, a0, 3
; RV64-NEXT: andi a1, a1, 2
+; RV64-NEXT: andi a0, a0, 3
; RV64-NEXT: srli a0, a0, 1
; RV64-NEXT: or a0, a1, a0
; RV64-NEXT: ret
@@ -28,8 +28,8 @@ define i3 @bitreverse_i3(i3 %x) {
; RV32-LABEL: bitreverse_i3:
; RV32: # %bb.0:
; RV32-NEXT: slli a1, a0, 2
-; RV32-NEXT: andi a0, a0, 7
; RV32-NEXT: andi a1, a1, 4
+; RV32-NEXT: andi a0, a0, 7
; RV32-NEXT: andi a2, a0, 2
; RV32-NEXT: or a1, a1, a2
; RV32-NEXT: srli a0, a0, 2
@@ -39,8 +39,8 @@ define i3 @bitreverse_i3(i3 %x) {
; RV64-LABEL: bitreverse_i3:
; RV64: # %bb.0:
; RV64-NEXT: slli a1, a0, 2
-; RV64-NEXT: andi a0, a0, 7
; RV64-NEXT: andi a1, a1, 4
+; RV64-NEXT: andi a0, a0, 7
; RV64-NEXT: andi a2, a0, 2
; RV64-NEXT: or a1, a1, a2
; RV64-NEXT: srli a0, a0, 2
@@ -54,11 +54,11 @@ define i4 @bitreverse_i4(i4 %x) {
; RV32-LABEL: bitreverse_i4:
; RV32: # %bb.0:
; RV32-NEXT: slli a1, a0, 3
-; RV32-NEXT: slli a2, a0, 1
-; RV32-NEXT: andi a0, a0, 15
; RV32-NEXT: andi a1, a1, 8
+; RV32-NEXT: slli a2, a0, 1
; RV32-NEXT: andi a2, a2, 4
; RV32-NEXT: or a1, a1, a2
+; RV32-NEXT: andi a0, a0, 15
; RV32-NEXT: srli a2, a0, 1
; RV32-NEXT: andi a2, a2, 2
; RV32-NEXT: or a1, a1, a2
@@ -69,11 +69,11 @@ define i4 @bitreverse_i4(i4 %x) {
; RV64-LABEL: bitreverse_i4:
; RV64: # %bb.0:
; RV64-NEXT: slli a1, a0, 3
-; RV64-NEXT: slli a2, a0, 1
-; RV64-NEXT: andi a0, a0, 15
; RV64-NEXT: andi a1, a1, 8
+; RV64-NEXT: slli a2, a0, 1
; RV64-NEXT: andi a2, a2, 4
; RV64-NEXT: or a1, a1, a2
+; RV64-NEXT: andi a0, a0, 15
; RV64-NEXT: srli a2, a0, 1
; RV64-NEXT: andi a2, a2, 2
; RV64-NEXT: or a1, a1, a2
@@ -88,21 +88,21 @@ define i7 @bitreverse_i7(i7 %x) {
; RV32-LABEL: bitreverse_i7:
; RV32: # %bb.0:
; RV32-NEXT: slli a1, a0, 6
-; RV32-NEXT: slli a2, a0, 4
-; RV32-NEXT: slli a3, a0, 2
-; RV32-NEXT: andi a0, a0, 127
; RV32-NEXT: andi a1, a1, 64
+; RV32-NEXT: slli a2, a0, 4
; RV32-NEXT: andi a2, a2, 32
-; RV32-NEXT: andi a3, a3, 16
; RV32-NEXT: or a1, a1, a2
-; RV32-NEXT: andi a2, a0, 8
-; RV32-NEXT: or a2, a3, a2
-; RV32-NEXT: srli a3, a0, 2
+; RV32-NEXT: slli a2, a0, 2
+; RV32-NEXT: andi a2, a2, 16
+; RV32-NEXT: andi a0, a0, 127
+; RV32-NEXT: andi a3, a0, 8
+; RV32-NEXT: or a2, a2, a3
; RV32-NEXT: or a1, a1, a2
-; RV32-NEXT: srli a2, a0, 4
-; RV32-NEXT: andi a3, a3, 4
-; RV32-NEXT: andi a2, a2, 2
-; RV32-NEXT: or a2, a3, a2
+; RV32-NEXT: srli a2, a0, 2
+; RV32-NEXT: andi a2, a2, 4
+; RV32-NEXT: srli a3, a0, 4
+; RV32-NEXT: andi a3, a3, 2
+; RV32-NEXT: or a2, a2, a3
; RV32-NEXT: or a1, a1, a2
; RV32-NEXT: srli a0, a0, 6
; RV32-NEXT: or a0, a1, a0
@@ -111,21 +111,21 @@ define i7 @bitreverse_i7(i7 %x) {
; RV64-LABEL: bitreverse_i7:
; RV64: # %bb.0:
; RV64-NEXT: slli a1, a0, 6
-; RV64-NEXT: slli a2, a0, 4
-; RV64-NEXT: slli a3, a0, 2
-; RV64-NEXT: andi a0, a0, 127
; RV64-NEXT: andi a1, a1, 64
+; RV64-NEXT: slli a2, a0, 4
; RV64-NEXT: andi a2, a2, 32
-; RV64-NEXT: andi a3, a3, 16
; RV64-NEXT: or a1, a1, a2
-; RV64-NEXT: andi a2, a0, 8
-; RV64-NEXT: or a2, a3, a2
-; RV64-NEXT: srli a3, a0, 2
+; RV64-NEXT: slli a2, a0, 2
+; RV64-NEXT: andi a2, a2, 16
+; RV64-NEXT: andi a0, a0, 127
+; RV64-NEXT: andi a3, a0, 8
+; RV64-NEXT: or a2, a2, a3
; RV64-NEXT: or a1, a1, a2
-; RV64-NEXT: srli a2, a0, 4
-; RV64-NEXT: andi a3, a3, 4
-; RV64-NEXT: andi a2, a2, 2
-; RV64-NEXT: or a2, a3, a2
+; RV64-NEXT: srli a2, a0, 2
+; RV64-NEXT: andi a2, a2, 4
+; RV64-NEXT: srli a3, a0, 4
+; RV64-NEXT: andi a3, a3, 2
+; RV64-NEXT: or a2, a2, a3
; RV64-NEXT: or a1, a1, a2
; RV64-NEXT: srli a0, a0, 6
; RV64-NEXT: or a0, a1, a0
@@ -139,33 +139,33 @@ define i24 @bitreverse_i24(i24 %x) {
; RV32: # %bb.0:
; RV32-NEXT: slli a1, a0, 16
; RV32-NEXT: lui a2, 4096
-; RV32-NEXT: lui a3, 1048335
; RV32-NEXT: addi a2, a2, -1
-; RV32-NEXT: addi a3, a3, 240
; RV32-NEXT: and a0, a0, a2
; RV32-NEXT: srli a0, a0, 16
; RV32-NEXT: or a0, a0, a1
-; RV32-NEXT: and a1, a3, a2
-; RV32-NEXT: and a1, a0, a1
+; RV32-NEXT: lui a1, 1048335
+; RV32-NEXT: addi a1, a1, 240
+; RV32-NEXT: and a3, a1, a2
+; RV32-NEXT: and a3, a0, a3
+; RV32-NEXT: srli a3, a3, 4
; RV32-NEXT: slli a0, a0, 4
-; RV32-NEXT: and a0, a0, a3
-; RV32-NEXT: lui a3, 1047757
-; RV32-NEXT: addi a3, a3, -820
-; RV32-NEXT: srli a1, a1, 4
-; RV32-NEXT: or a0, a1, a0
-; RV32-NEXT: and a1, a3, a2
-; RV32-NEXT: and a1, a0, a1
+; RV32-NEXT: and a0, a0, a1
+; RV32-NEXT: or a0, a3, a0
+; RV32-NEXT: lui a1, 1047757
+; RV32-NEXT: addi a1, a1, -820
+; RV32-NEXT: and a3, a1, a2
+; RV32-NEXT: and a3, a0, a3
+; RV32-NEXT: srli a3, a3, 2
; RV32-NEXT: slli a0, a0, 2
-; RV32-NEXT: and a0, a0, a3
-; RV32-NEXT: lui a3, 1047211
-; RV32-NEXT: addi a3, a3, -1366
-; RV32-NEXT: and a2, a3, a2
-; RV32-NEXT: srli a1, a1, 2
-; RV32-NEXT: or a0, a1, a0
+; RV32-NEXT: and a0, a0, a1
+; RV32-NEXT: or a0, a3, a0
+; RV32-NEXT: lui a1, 1047211
+; RV32-NEXT: addi a1, a1, -1366
+; RV32-NEXT: and a2, a1, a2
; RV32-NEXT: and a2, a0, a2
-; RV32-NEXT: slli a0, a0, 1
; RV32-NEXT: srli a2, a2, 1
-; RV32-NEXT: and a0, a0, a3
+; RV32-NEXT: slli a0, a0, 1
+; RV32-NEXT: and a0, a0, a1
; RV32-NEXT: or a0, a2, a0
; RV32-NEXT: ret
;
@@ -173,33 +173,33 @@ define i24 @bitreverse_i24(i24 %x) {
; RV64: # %bb.0:
; RV64-NEXT: slli a1, a0, 16
; RV64-NEXT: lui a2, 4096
-; RV64-NEXT: lui a3, 1048335
; RV64-NEXT: addiw a2, a2, -1
-; RV64-NEXT: addiw a3, a3, 240
; RV64-NEXT: and a0, a0, a2
; RV64-NEXT: srli a0, a0, 16
; RV64-NEXT: or a0, a0, a1
-; RV64-NEXT: and a1, a3, a2
-; RV64-NEXT: and a1, a0, a1
+; RV64-NEXT: lui a1, 1048335
+; RV64-NEXT: addiw a1, a1, 240
+; RV64-NEXT: and a3, a1, a2
+; RV64-NEXT: and a3, a0, a3
+; RV64-NEXT: srli a3, a3, 4
; RV64-NEXT: slli a0, a0, 4
-; RV64-NEXT: and a0, a0, a3
-; RV64-NEXT: lui a3, 1047757
-; RV64-NEXT: addiw a3, a3, -820
-; RV64-NEXT: srli a1, a1, 4
-; RV64-NEXT: or a0, a1, a0
-; RV64-NEXT: and a1, a3, a2
-; RV64-NEXT: and a1, a0, a1
+; RV64-NEXT: and a0, a0, a1
+; RV64-NEXT: or a0, a3, a0
+; RV64-NEXT: lui a1, 1047757
+; RV64-NEXT: addiw a1, a1, -820
+; RV64-NEXT: and a3, a1, a2
+; RV64-NEXT: and a3, a0, a3
+; RV64-NEXT: srli a3, a3, 2
; RV64-NEXT: slli a0, a0, 2
-; RV64-NEXT: and a0, a0, a3
-; RV64-NEXT: lui a3, 1047211
-; RV64-NEXT: addiw a3, a3, -1366
-; RV64-NEXT: and a2, a3, a2
-; RV64-NEXT: srli a1, a1, 2
-; RV64-NEXT: or a0, a1, a0
+; RV64-NEXT: and a0, a0, a1
+; RV64-NEXT: or a0, a3, a0
+; RV64-NEXT: lui a1, 1047211
+; RV64-NEXT: addiw a1, a1, -1366
+; RV64-NEXT: and a2, a1, a2
; RV64-NEXT: and a2, a0, a2
-; RV64-NEXT: slli a0, a0, 1
; RV64-NEXT: srli a2, a2, 1
-; RV64-NEXT: and a0, a0, a3
+; RV64-NEXT: slli a0, a0, 1
+; RV64-NEXT: and a0, a0, a1
; RV64-NEXT: or a0, a2, a0
; RV64-NEXT: ret
%rev = call i24 @llvm.bitreverse.i24(i24 %x)
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
index cf7cef83bcc135..70d1b25309c844 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
@@ -21,34 +21,34 @@ define void @constant_fold_barrier_i128(ptr %p) {
; RV32-LABEL: constant_fold_barrier_i128:
; RV32: # %bb.0: # %entry
; RV32-NEXT: li a1, 1
+; RV32-NEXT: slli a1, a1, 11
; RV32-NEXT: lw a2, 0(a0)
; RV32-NEXT: lw a3, 4(a0)
; RV32-NEXT: lw a4, 8(a0)
; RV32-NEXT: lw a5, 12(a0)
-; RV32-NEXT: slli a1, a1, 11
; RV32-NEXT: and a2, a2, a1
; RV32-NEXT: and a3, a3, zero
; RV32-NEXT: and a4, a4, zero
; RV32-NEXT: and a5, a5, zero
; RV32-NEXT: add a2, a2, a1
-; RV32-NEXT: add a6, a3, zero
; RV32-NEXT: sltu a1, a2, a1
+; RV32-NEXT: add a6, a3, zero
; RV32-NEXT: sltu a3, a6, a3
; RV32-NEXT: add a6, a6, a1
; RV32-NEXT: seqz a7, a6
; RV32-NEXT: and a1, a7, a1
-; RV32-NEXT: add a7, a4, zero
-; RV32-NEXT: add a5, a5, zero
-; RV32-NEXT: sltu a4, a7, a4
; RV32-NEXT: or a1, a3, a1
-; RV32-NEXT: add a7, a7, a1
-; RV32-NEXT: seqz a3, a7
-; RV32-NEXT: and a1, a3, a1
+; RV32-NEXT: add a3, a4, zero
+; RV32-NEXT: sltu a4, a3, a4
+; RV32-NEXT: add a3, a3, a1
+; RV32-NEXT: seqz a7, a3
+; RV32-NEXT: and a1, a7, a1
; RV32-NEXT: or a1, a4, a1
+; RV32-NEXT: add a5, a5, zero
; RV32-NEXT: add a1, a5, a1
; RV32-NEXT: sw a2, 0(a0)
; RV32-NEXT: sw a6, 4(a0)
-; RV32-NEXT: sw a7, 8(a0)
+; RV32-NEXT: sw a3, 8(a0)
; RV32-NEXT: sw a1, 12(a0)
; RV32-NEXT: ret
entry:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
index 2c3e3faddc3916..51e8b6da39d099 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
@@ -21,9 +21,9 @@ define i128 @constant_fold_barrier_i128(i128 %x) {
; RV64-LABEL: constant_fold_barrier_i128:
; RV64: # %bb.0: # %entry
; RV64-NEXT: li a2, 1
-; RV64-NEXT: and a1, a1, zero
; RV64-NEXT: slli a2, a2, 11
; RV64-NEXT: and a0, a0, a2
+; RV64-NEXT: and a1, a1, zero
; RV64-NEXT: add a0, a0, a2
; RV64-NEXT: sltu a2, a0, a2
; RV64-NEXT: add a1, a1, zero
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll b/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
index 1156edffe91943..05989c310541b8 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
@@ -117,8 +117,8 @@ define i64 @abs64(i64 %x) {
; RV32I: # %bb.0:
; RV32I-NEXT: srai a2, a1, 31
; RV32I-NEXT: add a0, a0, a2
-; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: sltu a3, a0, a2
+; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: add a1, a1, a3
; RV32I-NEXT: xor a0, a0, a2
; RV32I-NEXT: xor a1, a1, a2
@@ -128,8 +128,8 @@ define i64 @abs64(i64 %x) {
; RV32ZBB: # %bb.0:
; RV32ZBB-NEXT: srai a2, a1, 31
; RV32ZBB-NEXT: add a0, a0, a2
-; RV32ZBB-NEXT: add a1, a1, a2
; RV32ZBB-NEXT: sltu a3, a0, a2
+; RV32ZBB-NEXT: add a1, a1, a2
; RV32ZBB-NEXT: add a1, a1, a3
; RV32ZBB-NEXT: xor a0, a0, a2
; RV32ZBB-NEXT: xor a1, a1, a2
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll
index 68bf9240ccd1df..c558639fda424e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll
@@ -302,8 +302,8 @@ define i64 @rori_i64(i64 %a) nounwind {
; CHECK-NEXT: slli a2, a0, 31
; CHECK-NEXT: srli a0, a0, 1
; CHECK-NEXT: slli a3, a1, 31
-; CHECK-NEXT: srli a1, a1, 1
; CHECK-NEXT: or a0, a0, a3
+; CHECK-NEXT: srli a1, a1, 1
; CHECK-NEXT: or a1, a2, a1
; CHECK-NEXT: ret
%1 = tail call i64 @llvm.fshl.i64(i64 %a, i64 %a, i64 63)
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
index 7f22127ad3536c..1184905c17edea 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
@@ -12,31 +12,31 @@ define i32 @ctlz_i32(i32 %a) nounwind {
; RV32I-NEXT: beqz a0, .LBB0_2
; RV32I-NEXT: # %bb.1: # %cond.false
; RV32I-NEXT: srli a1, a0, 1
-; RV32I-NEXT: lui a2, 349525
; RV32I-NEXT: or a0, a0, a1
-; RV32I-NEXT: addi a1, a2, 1365
-; RV32I-NEXT: srli a2, a0, 2
-; RV32I-NEXT: or a0, a0, a2
-; RV32I-NEXT: srli a2, a0, 4
-; RV32I-NEXT: or a0, a0, a2
-; RV32I-NEXT: srli a2, a0, 8
-; RV32I-NEXT: or a0, a0, a2
-; RV32I-NEXT: srli a2, a0, 16
-; RV32I-NEXT: or a0, a0, a2
-; RV32I-NEXT: srli a2, a0, 1
-; RV32I-NEXT: and a1, a2, a1
-; RV32I-NEXT: lui a2, 209715
-; RV32I-NEXT: addi a2, a2, 819
+; RV32I-NEXT: srli a1, a0, 2
+; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: srli a1, a0, 4
+; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: srli a1, a0, 8
+; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: srli a1, a0, 16
+; RV32I-NEXT: or a0, a0, a1
+; RV32I-NEXT: srli a1, a0, 1
+; RV32I-NEXT: lui a2, 349525
+; RV32I-NEXT: addi a2, a2, 1365
+; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: srli a1, a0, 2
-; RV32I-NEXT: and a0, a0, a2
+; RV32I-NEXT: lui a2, 209715
+; RV32I-NEXT: addi a2, a2, 819
; RV32I-NEXT: and a1, a1, a2
-; RV32I-NEXT: lui a2, 61681
-; RV32I-NEXT: addi a2, a2, -241
+; RV32I-NEXT: and a0, a0, a2
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a1, a0
-; RV32I-NEXT: and a0, a0, a2
+; RV32I-NEXT: lui a1, 61681
+; RV32I-NEXT: addi a1, a1, -241
+; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: slli a1, a0, 8
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: slli a1, a0, 16
@@ -63,11 +63,11 @@ define i64 @ctlz_i64(i64 %a) nounwind {
; RV32I-LABEL: ctlz_i64:
; RV32I: # %bb.0:
; RV32I-NEXT: lui a2, 349525
-; RV32I-NEXT: lui a3, 209715
-; RV32I-NEXT: lui a6, 61681
; RV32I-NEXT: addi a5, a2, 1365
-; RV32I-NEXT: addi a4, a3, 819
-; RV32I-NEXT: addi a3, a6, -241
+; RV32I-NEXT: lui a2, 209715
+; RV32I-NEXT: addi a4, a2, 819
+; RV32I-NEXT: lui a2, 61681
+; RV32I-NEXT: addi a3, a2, -241
; RV32I-NEXT: li a2, 32
; RV32I-NEXT: beqz a1, .LBB1_2
; RV32I-NEXT: # %bb.1:
@@ -155,22 +155,22 @@ define i32 @cttz_i32(i32 %a) nounwind {
; RV32I-NEXT: # %bb.1: # %cond.false
; RV32I-NEXT: not a1, a0
; RV32I-NEXT: addi a0, a0, -1
-; RV32I-NEXT: lui a2, 349525
; RV32I-NEXT: and a0, a1, a0
-; RV32I-NEXT: addi a1, a2, 1365
-; RV32I-NEXT: srli a2, a0, 1
-; RV32I-NEXT: and a1, a2, a1
-; RV32I-NEXT: lui a2, 209715
-; RV32I-NEXT: addi a2, a2, 819
+; RV32I-NEXT: srli a1, a0, 1
+; RV32I-NEXT: lui a2, 349525
+; RV32I-NEXT: addi a2, a2, 1365
+; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: srli a1, a0, 2
-; RV32I-NEXT: and a0, a0, a2
+; RV32I-NEXT: lui a2, 209715
+; RV32I-NEXT: addi a2, a2, 819
; RV32I-NEXT: and a1, a1, a2
-; RV32I-NEXT: lui a2, 61681
+; RV32I-NEXT: and a0, a0, a2
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a1, a0
-; RV32I-NEXT: addi a1, a2, -241
+; RV32I-NEXT: lui a1, 61681
+; RV32I-NEXT: addi a1, a1, -241
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: slli a1, a0, 8
; RV32I-NEXT: add a0, a0, a1
@@ -196,11 +196,11 @@ define i64 @cttz_i64(i64 %a) nounwind {
; RV32I-LABEL: cttz_i64:
; RV32I: # %bb.0:
; RV32I-NEXT: lui a2, 349525
-; RV32I-NEXT: lui a3, 209715
-; RV32I-NEXT: lui a5, 61681
; RV32I-NEXT: addi a4, a2, 1365
-; RV32I-NEXT: addi a3, a3, 819
-; RV32I-NEXT: addi a2, a5, -241
+; RV32I-NEXT: lui a2, 209715
+; RV32I-NEXT: addi a3, a2, 819
+; RV32I-NEXT: lui a2, 61681
+; RV32I-NEXT: addi a2, a2, -241
; RV32I-NEXT: beqz a0, .LBB3_2
; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: not a1, a0
@@ -271,17 +271,17 @@ define i32 @ctpop_i32(i32 %a) nounwind {
; RV32I-NEXT: lui a2, 349525
; RV32I-NEXT: addi a2, a2, 1365
; RV32I-NEXT: and a1, a1, a2
-; RV32I-NEXT: lui a2, 209715
-; RV32I-NEXT: addi a2, a2, 819
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: srli a1, a0, 2
-; RV32I-NEXT: and a0, a0, a2
+; RV32I-NEXT: lui a2, 209715
+; RV32I-NEXT: addi a2, a2, 819
; RV32I-NEXT: and a1, a1, a2
-; RV32I-NEXT: lui a2, 61681
+; RV32I-NEXT: and a0, a0, a2
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a1, a0
-; RV32I-NEXT: addi a1, a2, -241
+; RV32I-NEXT: lui a1, 61681
+; RV32I-NEXT: addi a1, a1, -241
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: slli a1, a0, 8
; RV32I-NEXT: add a0, a0, a1
@@ -305,39 +305,39 @@ define i64 @ctpop_i64(i64 %a) nounwind {
; RV32I: # %bb.0:
; RV32I-NEXT: srli a2, a0, 1
; RV32I-NEXT: lui a3, 349525
-; RV32I-NEXT: lui a4, 209715
-; RV32I-NEXT: srli a5, a1, 1
; RV32I-NEXT: addi a3, a3, 1365
; RV32I-NEXT: and a2, a2, a3
-; RV32I-NEXT: and a3, a5, a3
-; RV32I-NEXT: lui a5, 61681
-; RV32I-NEXT: addi a4, a4, 819
-; RV32I-NEXT: addi a5, a5, -241
; RV32I-NEXT: sub a0, a0, a2
-; RV32I-NEXT: sub a1, a1, a3
; RV32I-NEXT: srli a2, a0, 2
+; RV32I-NEXT: lui a4, 209715
+; RV32I-NEXT: addi a4, a4, 819
+; RV32I-NEXT: and a2, a2, a4
; RV32I-NEXT: and a0, a0, a4
+; RV32I-NEXT: add a0, a2, a0
+; RV32I-NEXT: srli a2, a0, 4
+; RV32I-NEXT: add a0, a2, a0
+; RV32I-NEXT: lui a2, 61681
+; RV32I-NEXT: addi a2, a2, -241
+; RV32I-NEXT: and a0, a0, a2
+; RV32I-NEXT: slli a5, a0, 8
+; RV32I-NEXT: add a0, a0, a5
+; RV32I-NEXT: slli a5, a0, 16
+; RV32I-NEXT: add a0, a0, a5
+; RV32I-NEXT: srli a0, a0, 24
+; RV32I-NEXT: srli a5, a1, 1
+; RV32I-NEXT: and a3, a5, a3
+; RV32I-NEXT: sub a1, a1, a3
; RV32I-NEXT: srli a3, a1, 2
-; RV32I-NEXT: and a1, a1, a4
-; RV32I-NEXT: and a2, a2, a4
; RV32I-NEXT: and a3, a3, a4
-; RV32I-NEXT: add a0, a2, a0
+; RV32I-NEXT: and a1, a1, a4
; RV32I-NEXT: add a1, a3, a1
-; RV32I-NEXT: srli a2, a0, 4
; RV32I-NEXT: srli a3, a1, 4
-; RV32I-NEXT: add a0, a2, a0
; RV32I-NEXT: add a1, a3, a1
-; RV32I-NEXT: and a0...
[truncated]
|
I suggest we let this sit for 2-3 days after the prior patch has landed. As I said on the previous review, I think this is reasonable, but a) we had a bunch of discussion on this point and b) it'd be good to have some staging between the individual commits to simplify regression analysis. |
For the recent scheduler patches, the common theme is we saw another target did something brought that functionality to RISC-V. How do we know that these changes are sensible defaults for RISC-V cores? Are you making measurements on any cores? Are they in order, out of order, both? In my experience tuning for different cores, there is often a difference between OOO and in order cores. |
Is it possible to provide some numbers to back this up? Preferably using some well known benchmarks like SPEC and/or llvm-test-suite |
I added two experimental options:
We can see that both options can reduce the mean of spills/reloads. I didn't run these tests on real hardwares, so these data may not be so convincing. I'd appreciate it if you can evaluate this on some platforms, that will be helpful. If you find this common setting is not suitable for your microarchitectures, please let me know, we can make it a tune feature. All I want is just to unify the common sched policy and make part of the policy being tune features. |
Created using spr 1.3.6-beta.1 [skip ci]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM . I do have some minor questions but they're not blocking.
; CHECK-NEXT: vse32.v v8, (a5) | ||
; CHECK-NEXT: vse32.v v9, (a6) | ||
; CHECK-NEXT: ret | ||
; RV32-LABEL: buildvec_vid_step1o2_v4i32: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have different scheduling between RV32 and RV64?
; CHECK-NEXT: slli a1, a1, 4 | ||
; CHECK-NEXT: add a1, sp, a1 | ||
; CHECK-NEXT: addi a1, a1, 16 | ||
; CHECK-NEXT: vs8r.v v16, (a1) # Unknown-size Folded Spill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have more spills / reloads in this function?
I measured this patch on spec for an in-order and an out-of-order core. For the out-of-order core, 557.xz_r saw a regression. I saw no other significant improvements or regressions. These findings, combined with the fact that latency heuristic is so low on the heuristic list (above only program order), I'm not sure that I see a strong argument to set this to true on either in-order or out-of-order by default. |
Given @michaelmaitland's data, @wangpc-pp the burden shifts to you to clearly justify which cases this is profitable and figure out how to selectively enable only in profitable cases. I agree with @michaelmaitland's conclusion that this should not move forward otherwise. @michaelmaitland Can you say anything about the magnitude of regression in either case? I assume they were statistically significant given you mention them, but are these small regressions or largish ones? |
|
Thanks for evaluating this! The data is very helpful! @michaelmaitland
I don't have other data other than the spill/reload data above. I don't know how to dynamically determine if a SchedDAG region will benefit from disabling it because we can only know Again, if the conclusion is that we shouldn't make it true by default, I can make it a tune feature. All I want is making scheduling infrastructure easy to tune for downstreams. :-) |
Created using spr 1.3.6-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Created using spr 1.3.6-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Created using spr 1.3.6-beta.1 [skip ci]
This tune feature will disable latency scheduling heuristic.
This can reduce the number of spills/reloads but will cause some
regressions on some cores.
CPU may add this tune feature if they find it's profitable.