Skip to content

[RISCV] Add TuneDisableLatencySchedHeuristic #115858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Nov 12, 2024

This tune feature will disable latency scheduling heuristic.

This can reduce the number of spills/reloads but will cause some
regressions on some cores.

CPU may add this tune feature if they find it's profitable.

Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Pengcheng Wang (wangpc-pp)

Changes

This helps reduce register pressure for some cases.


Patch is 7.18 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115858.diff

465 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+7)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll (+68-68)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll (+181-180)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbkb.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+261-261)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/abds-neg.ll (+262-262)
  • (modified) llvm/test/CodeGen/RISCV/abds.ll (+254-254)
  • (modified) llvm/test/CodeGen/RISCV/abdu-neg.ll (+358-358)
  • (modified) llvm/test/CodeGen/RISCV/abdu.ll (+244-244)
  • (modified) llvm/test/CodeGen/RISCV/add-before-shl.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/add-imm.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/addcarry.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/addimm-mulimm.ll (+63-63)
  • (modified) llvm/test/CodeGen/RISCV/alu16.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/alu8.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/and.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll (+192-192)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw.ll (+1343-1343)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+122-122)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-uinc-udec-wrap.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/avgceils.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/avgceilu.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/avgfloors.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/avgflooru.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-arith.ll (+57-57)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-br-fcmp.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+91-91)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-fcmp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-mem.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/bfloat.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/bitextract-mac.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/branch-relaxation.ll (+95-95)
  • (modified) llvm/test/CodeGen/RISCV/bswap-bitreverse.ll (+528-524)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-half.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-common.ll (+48-52)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll (+126-126)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32d.ll (+34-34)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32e.ll (+308-308)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32f-ilp32d-common.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-lp64-lp64f-lp64d-common.ll (+58-58)
  • (modified) llvm/test/CodeGen/RISCV/cmov-branch-opt.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/compress.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/condbinops.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/condops.ll (+450-424)
  • (modified) llvm/test/CodeGen/RISCV/copysign-casts.ll (+48-48)
  • (modified) llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll (+583-583)
  • (modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+31-31)
  • (modified) llvm/test/CodeGen/RISCV/div-by-constant.ll (+31-31)
  • (modified) llvm/test/CodeGen/RISCV/div-pow2.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/div.ll (+28-28)
  • (modified) llvm/test/CodeGen/RISCV/double-arith.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/double-bitmanip-dagcombines.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/double-calling-conv.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+43-43)
  • (modified) llvm/test/CodeGen/RISCV/double-imm.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/double-intrinsics.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/double-previous-failure.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/double-round-conv-sat.ll (+216-216)
  • (modified) llvm/test/CodeGen/RISCV/double_reduct.ll (+15-15)
  • (modified) llvm/test/CodeGen/RISCV/early-clobber-tied-def-subreg-liveness.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/float-arith.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/float-bitmanip-dagcombines.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+52-52)
  • (modified) llvm/test/CodeGen/RISCV/float-intrinsics.ll (+49-49)
  • (modified) llvm/test/CodeGen/RISCV/float-round-conv-sat.ll (+144-144)
  • (modified) llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/fold-binop-into-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/forced-atomics.ll (+27-27)
  • (modified) llvm/test/CodeGen/RISCV/fp128.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/fpclamptosat.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/fpenv.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/ghccc-rv32.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/ghccc-rv64.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/ghccc-without-f-reg.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/global-merge.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/half-arith-strict.ll (+46-46)
  • (modified) llvm/test/CodeGen/RISCV/half-arith.ll (+91-91)
  • (modified) llvm/test/CodeGen/RISCV/half-bitmanip-dagcombines.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/half-br-fcmp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+469-469)
  • (modified) llvm/test/CodeGen/RISCV/half-fcmp-strict.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/half-fcmp.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/half-intrinsics.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/half-mem.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/half-round-conv-sat.ll (+294-294)
  • (modified) llvm/test/CodeGen/RISCV/half-select-fcmp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/iabs.ll (+52-52)
  • (modified) llvm/test/CodeGen/RISCV/imm.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-d-constraint-f.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-d-modifier-N.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/intrinsic-cttz-elts-vscale.ll (+29-29)
  • (modified) llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/loop-strength-reduce-loop-invar.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/lsr-legaladdimm.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/machinelicm-constant-phys-reg.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+131-131)
  • (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+291-291)
  • (modified) llvm/test/CodeGen/RISCV/memcpy.ll (+73-73)
  • (modified) llvm/test/CodeGen/RISCV/mul.ll (+268-263)
  • (modified) llvm/test/CodeGen/RISCV/neg-abs.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/or-is-add.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/overflow-intrinsics.ll (+19-17)
  • (modified) llvm/test/CodeGen/RISCV/pr51206.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/pr56457.ll (+25-25)
  • (modified) llvm/test/CodeGen/RISCV/pr58511.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/pr65025.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/pr68855.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/pr69586.ll (+1150-1498)
  • (modified) llvm/test/CodeGen/RISCV/pr84653_pr85190.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/pr95271.ll (+25-25)
  • (modified) llvm/test/CodeGen/RISCV/regalloc-last-chance-recoloring-failure.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rem.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/riscv-codegenprepare-asm.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/riscv-shifted-extend.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rotl-rotr.ll (+219-219)
  • (modified) llvm/test/CodeGen/RISCV/rv32xtheadbb.ll (+38-37)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb-zbkb.ll (+29-26)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+225-224)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbs.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rv64-double-convert.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/rv64-float-convert.ll (+15-15)
  • (modified) llvm/test/CodeGen/RISCV/rv64-half-convert.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/rv64-trampoline.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rv64i-shift-sext.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rv64i-w-insts-legalization.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rv64xtheadbb.ll (+150-150)
  • (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb-intrinsic.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb-zbkb.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+259-259)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/65704-illegal-instruction.ll (+3-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/abs-vp.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll (+57-52)
  • (modified) llvm/test/CodeGen/RISCV/rvv/alloca-load-store-scalable-array.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/alloca-load-store-scalable-struct.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/alloca-load-store-vector-tuple.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/allocate-lmul-2-4-8.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/bitreverse-sdnode.ll (+481-488)
  • (modified) llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll (+1188-1229)
  • (modified) llvm/test/CodeGen/RISCV/rvv/bswap-sdnode.ll (+170-177)
  • (modified) llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll (+518-564)
  • (modified) llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll (+148-170)
  • (modified) llvm/test/CodeGen/RISCV/rvv/calling-conv.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll (+198-153)
  • (modified) llvm/test/CodeGen/RISCV/rvv/compressstore.ll (+87-85)
  • (modified) llvm/test/CodeGen/RISCV/rvv/constant-folding-crash.ll (+15-19)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll (+539-539)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll (+137-137)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ctpop-sdnode.ll (+175-175)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll (+609-598)
  • (modified) llvm/test/CodeGen/RISCV/rvv/cttz-sdnode.ll (+702-702)
  • (modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+801-806)
  • (modified) llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expand-no-v.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+2193-2140)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-fp.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-i1.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv32.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fceil-constrained-sdnode.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fceil-sdnode.ll (+28-32)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ffloor-constrained-sdnode.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ffloor-sdnode.ll (+28-32)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+54-54)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-binop-splats.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitreverse-vp.ll (+1150-1197)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll (+210-210)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap-vp.ll (+467-513)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll (+86-86)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-buildvec-of-binop.ll (+36-37)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv-fastcc.ll (+26-30)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv.ll (+22-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll (+61-74)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz-vp.ll (+1522-1646)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll (+266-266)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop-vp.ll (+643-643)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop.ll (+86-86)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz-vp.ll (+1598-1482)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll (+276-280)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll (+19-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-elen.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll (+48-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll (+62-62)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fceil-constrained-sdnode.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ffloor-constrained-sdnode.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-floor-vp.ll (+61-74)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fmaximum-vp.ll (+12-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fmaximum.ll (+13-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fminimum-vp.ll (+12-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fminimum.ll (+13-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fnearbyint-constrained-sdnode.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll (+283-267)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll (+17-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-setcc.ll (+26-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll (+94-94)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i.ll (+84-84)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptrunc-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround-constrained-sdnode.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven-constrained-sdnode.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll (+15-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+794-763)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll (+332-346)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll (+21-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll (+24-25)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll (+216-212)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access-zve32x.ll (+21-25)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+340-279)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-llrint.ll (+114-111)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-lrint.ll (+338-327)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-buildvec.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+494-494)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-load-fp.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-load-int.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+200-216)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-store-fp.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-store-int.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-nearbyint-vp.ll (+16-35)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-formation.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+214-126)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int-vp.ll (+21-21)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+211-141)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-rint-vp.ll (+10-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-round-vp.ll (+61-74)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundeven-vp.ll (+61-74)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundtozero-vp.ll (+61-74)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-sad.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-scalarized.ll (+34-30)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-fp-vp.ll (+1371-1883)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-int-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll (+75-78)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-concat.ll (+20-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll (+258-269)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-rotate.ll (+54-54)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shufflevector-vnsrl.ll (+15-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+64-64)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-vp.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+8-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfcmp-constrained-sdnode.ll (+216-216)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfcmps-constrained-sdnode.ll (+33-33)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfma-vp.ll (+71-92)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmuladd-vp.ll (+27-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfw-web-simplification.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll (+43-43)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpload.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpscatter.ll (+49-49)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpstore.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vrol.ll (+83-83)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vror.ll (+125-125)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vsadd-vp.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect-vp.ll (+23-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect.ll (+84-84)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vssub-vp.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vssubu-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/floor-vp.ll (+198-153)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-sdnode.ll (+81-200)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-vp.ll (+267-288)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-sdnode.ll (+81-200)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-vp.ll (+267-288)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fnearbyint-constrained-sdnode.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fnearbyint-sdnode.ll (+30-30)
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index 3eae2b9774203f..ac81d8980fd3e0 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -208,6 +208,13 @@ void RISCVSubtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
   Policy.OnlyTopDown = false;
   Policy.OnlyBottomUp = false;
 
+  // Enabling or Disabling the latency heuristic is a close call: It seems to
+  // help nearly no benchmark on out-of-order architectures, on the other hand
+  // it regresses register pressure on a few benchmarking.
+  // FIXME: This is from AArch64, but we haven't evaluated it on RISC-V.
+  // TODO: We may disable it for out-of-order architectures only.
+  Policy.DisableLatencyHeuristic = true;
+
   // Spilling is generally expensive on all RISC-V cores, so always enable
   // register-pressure tracking. This will increase compile time.
   Policy.ShouldTrackPressure = true;
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll b/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
index ee414992a5245c..330f8b16065f13 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
@@ -25,8 +25,8 @@ define i32 @add_i8_signext_i32(i8 %a, i8 %b) {
 ; RV32IM-LABEL: add_i8_signext_i32:
 ; RV32IM:       # %bb.0: # %entry
 ; RV32IM-NEXT:    slli a0, a0, 24
-; RV32IM-NEXT:    slli a1, a1, 24
 ; RV32IM-NEXT:    srai a0, a0, 24
+; RV32IM-NEXT:    slli a1, a1, 24
 ; RV32IM-NEXT:    srai a1, a1, 24
 ; RV32IM-NEXT:    add a0, a0, a1
 ; RV32IM-NEXT:    ret
@@ -34,8 +34,8 @@ define i32 @add_i8_signext_i32(i8 %a, i8 %b) {
 ; RV64IM-LABEL: add_i8_signext_i32:
 ; RV64IM:       # %bb.0: # %entry
 ; RV64IM-NEXT:    slli a0, a0, 56
-; RV64IM-NEXT:    slli a1, a1, 56
 ; RV64IM-NEXT:    srai a0, a0, 56
+; RV64IM-NEXT:    slli a1, a1, 56
 ; RV64IM-NEXT:    srai a1, a1, 56
 ; RV64IM-NEXT:    add a0, a0, a1
 ; RV64IM-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
index bce6dfacf8e82c..f33ba1d7a302ef 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
@@ -6,8 +6,8 @@ define i2 @bitreverse_i2(i2 %x) {
 ; RV32-LABEL: bitreverse_i2:
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    slli a1, a0, 1
-; RV32-NEXT:    andi a0, a0, 3
 ; RV32-NEXT:    andi a1, a1, 2
+; RV32-NEXT:    andi a0, a0, 3
 ; RV32-NEXT:    srli a0, a0, 1
 ; RV32-NEXT:    or a0, a1, a0
 ; RV32-NEXT:    ret
@@ -15,8 +15,8 @@ define i2 @bitreverse_i2(i2 %x) {
 ; RV64-LABEL: bitreverse_i2:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    slli a1, a0, 1
-; RV64-NEXT:    andi a0, a0, 3
 ; RV64-NEXT:    andi a1, a1, 2
+; RV64-NEXT:    andi a0, a0, 3
 ; RV64-NEXT:    srli a0, a0, 1
 ; RV64-NEXT:    or a0, a1, a0
 ; RV64-NEXT:    ret
@@ -28,8 +28,8 @@ define i3 @bitreverse_i3(i3 %x) {
 ; RV32-LABEL: bitreverse_i3:
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    slli a1, a0, 2
-; RV32-NEXT:    andi a0, a0, 7
 ; RV32-NEXT:    andi a1, a1, 4
+; RV32-NEXT:    andi a0, a0, 7
 ; RV32-NEXT:    andi a2, a0, 2
 ; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    srli a0, a0, 2
@@ -39,8 +39,8 @@ define i3 @bitreverse_i3(i3 %x) {
 ; RV64-LABEL: bitreverse_i3:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    slli a1, a0, 2
-; RV64-NEXT:    andi a0, a0, 7
 ; RV64-NEXT:    andi a1, a1, 4
+; RV64-NEXT:    andi a0, a0, 7
 ; RV64-NEXT:    andi a2, a0, 2
 ; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    srli a0, a0, 2
@@ -54,11 +54,11 @@ define i4 @bitreverse_i4(i4 %x) {
 ; RV32-LABEL: bitreverse_i4:
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    slli a1, a0, 3
-; RV32-NEXT:    slli a2, a0, 1
-; RV32-NEXT:    andi a0, a0, 15
 ; RV32-NEXT:    andi a1, a1, 8
+; RV32-NEXT:    slli a2, a0, 1
 ; RV32-NEXT:    andi a2, a2, 4
 ; RV32-NEXT:    or a1, a1, a2
+; RV32-NEXT:    andi a0, a0, 15
 ; RV32-NEXT:    srli a2, a0, 1
 ; RV32-NEXT:    andi a2, a2, 2
 ; RV32-NEXT:    or a1, a1, a2
@@ -69,11 +69,11 @@ define i4 @bitreverse_i4(i4 %x) {
 ; RV64-LABEL: bitreverse_i4:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    slli a1, a0, 3
-; RV64-NEXT:    slli a2, a0, 1
-; RV64-NEXT:    andi a0, a0, 15
 ; RV64-NEXT:    andi a1, a1, 8
+; RV64-NEXT:    slli a2, a0, 1
 ; RV64-NEXT:    andi a2, a2, 4
 ; RV64-NEXT:    or a1, a1, a2
+; RV64-NEXT:    andi a0, a0, 15
 ; RV64-NEXT:    srli a2, a0, 1
 ; RV64-NEXT:    andi a2, a2, 2
 ; RV64-NEXT:    or a1, a1, a2
@@ -88,21 +88,21 @@ define i7 @bitreverse_i7(i7 %x) {
 ; RV32-LABEL: bitreverse_i7:
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    slli a1, a0, 6
-; RV32-NEXT:    slli a2, a0, 4
-; RV32-NEXT:    slli a3, a0, 2
-; RV32-NEXT:    andi a0, a0, 127
 ; RV32-NEXT:    andi a1, a1, 64
+; RV32-NEXT:    slli a2, a0, 4
 ; RV32-NEXT:    andi a2, a2, 32
-; RV32-NEXT:    andi a3, a3, 16
 ; RV32-NEXT:    or a1, a1, a2
-; RV32-NEXT:    andi a2, a0, 8
-; RV32-NEXT:    or a2, a3, a2
-; RV32-NEXT:    srli a3, a0, 2
+; RV32-NEXT:    slli a2, a0, 2
+; RV32-NEXT:    andi a2, a2, 16
+; RV32-NEXT:    andi a0, a0, 127
+; RV32-NEXT:    andi a3, a0, 8
+; RV32-NEXT:    or a2, a2, a3
 ; RV32-NEXT:    or a1, a1, a2
-; RV32-NEXT:    srli a2, a0, 4
-; RV32-NEXT:    andi a3, a3, 4
-; RV32-NEXT:    andi a2, a2, 2
-; RV32-NEXT:    or a2, a3, a2
+; RV32-NEXT:    srli a2, a0, 2
+; RV32-NEXT:    andi a2, a2, 4
+; RV32-NEXT:    srli a3, a0, 4
+; RV32-NEXT:    andi a3, a3, 2
+; RV32-NEXT:    or a2, a2, a3
 ; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    srli a0, a0, 6
 ; RV32-NEXT:    or a0, a1, a0
@@ -111,21 +111,21 @@ define i7 @bitreverse_i7(i7 %x) {
 ; RV64-LABEL: bitreverse_i7:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    slli a1, a0, 6
-; RV64-NEXT:    slli a2, a0, 4
-; RV64-NEXT:    slli a3, a0, 2
-; RV64-NEXT:    andi a0, a0, 127
 ; RV64-NEXT:    andi a1, a1, 64
+; RV64-NEXT:    slli a2, a0, 4
 ; RV64-NEXT:    andi a2, a2, 32
-; RV64-NEXT:    andi a3, a3, 16
 ; RV64-NEXT:    or a1, a1, a2
-; RV64-NEXT:    andi a2, a0, 8
-; RV64-NEXT:    or a2, a3, a2
-; RV64-NEXT:    srli a3, a0, 2
+; RV64-NEXT:    slli a2, a0, 2
+; RV64-NEXT:    andi a2, a2, 16
+; RV64-NEXT:    andi a0, a0, 127
+; RV64-NEXT:    andi a3, a0, 8
+; RV64-NEXT:    or a2, a2, a3
 ; RV64-NEXT:    or a1, a1, a2
-; RV64-NEXT:    srli a2, a0, 4
-; RV64-NEXT:    andi a3, a3, 4
-; RV64-NEXT:    andi a2, a2, 2
-; RV64-NEXT:    or a2, a3, a2
+; RV64-NEXT:    srli a2, a0, 2
+; RV64-NEXT:    andi a2, a2, 4
+; RV64-NEXT:    srli a3, a0, 4
+; RV64-NEXT:    andi a3, a3, 2
+; RV64-NEXT:    or a2, a2, a3
 ; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    srli a0, a0, 6
 ; RV64-NEXT:    or a0, a1, a0
@@ -139,33 +139,33 @@ define i24 @bitreverse_i24(i24 %x) {
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    slli a1, a0, 16
 ; RV32-NEXT:    lui a2, 4096
-; RV32-NEXT:    lui a3, 1048335
 ; RV32-NEXT:    addi a2, a2, -1
-; RV32-NEXT:    addi a3, a3, 240
 ; RV32-NEXT:    and a0, a0, a2
 ; RV32-NEXT:    srli a0, a0, 16
 ; RV32-NEXT:    or a0, a0, a1
-; RV32-NEXT:    and a1, a3, a2
-; RV32-NEXT:    and a1, a0, a1
+; RV32-NEXT:    lui a1, 1048335
+; RV32-NEXT:    addi a1, a1, 240
+; RV32-NEXT:    and a3, a1, a2
+; RV32-NEXT:    and a3, a0, a3
+; RV32-NEXT:    srli a3, a3, 4
 ; RV32-NEXT:    slli a0, a0, 4
-; RV32-NEXT:    and a0, a0, a3
-; RV32-NEXT:    lui a3, 1047757
-; RV32-NEXT:    addi a3, a3, -820
-; RV32-NEXT:    srli a1, a1, 4
-; RV32-NEXT:    or a0, a1, a0
-; RV32-NEXT:    and a1, a3, a2
-; RV32-NEXT:    and a1, a0, a1
+; RV32-NEXT:    and a0, a0, a1
+; RV32-NEXT:    or a0, a3, a0
+; RV32-NEXT:    lui a1, 1047757
+; RV32-NEXT:    addi a1, a1, -820
+; RV32-NEXT:    and a3, a1, a2
+; RV32-NEXT:    and a3, a0, a3
+; RV32-NEXT:    srli a3, a3, 2
 ; RV32-NEXT:    slli a0, a0, 2
-; RV32-NEXT:    and a0, a0, a3
-; RV32-NEXT:    lui a3, 1047211
-; RV32-NEXT:    addi a3, a3, -1366
-; RV32-NEXT:    and a2, a3, a2
-; RV32-NEXT:    srli a1, a1, 2
-; RV32-NEXT:    or a0, a1, a0
+; RV32-NEXT:    and a0, a0, a1
+; RV32-NEXT:    or a0, a3, a0
+; RV32-NEXT:    lui a1, 1047211
+; RV32-NEXT:    addi a1, a1, -1366
+; RV32-NEXT:    and a2, a1, a2
 ; RV32-NEXT:    and a2, a0, a2
-; RV32-NEXT:    slli a0, a0, 1
 ; RV32-NEXT:    srli a2, a2, 1
-; RV32-NEXT:    and a0, a0, a3
+; RV32-NEXT:    slli a0, a0, 1
+; RV32-NEXT:    and a0, a0, a1
 ; RV32-NEXT:    or a0, a2, a0
 ; RV32-NEXT:    ret
 ;
@@ -173,33 +173,33 @@ define i24 @bitreverse_i24(i24 %x) {
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    slli a1, a0, 16
 ; RV64-NEXT:    lui a2, 4096
-; RV64-NEXT:    lui a3, 1048335
 ; RV64-NEXT:    addiw a2, a2, -1
-; RV64-NEXT:    addiw a3, a3, 240
 ; RV64-NEXT:    and a0, a0, a2
 ; RV64-NEXT:    srli a0, a0, 16
 ; RV64-NEXT:    or a0, a0, a1
-; RV64-NEXT:    and a1, a3, a2
-; RV64-NEXT:    and a1, a0, a1
+; RV64-NEXT:    lui a1, 1048335
+; RV64-NEXT:    addiw a1, a1, 240
+; RV64-NEXT:    and a3, a1, a2
+; RV64-NEXT:    and a3, a0, a3
+; RV64-NEXT:    srli a3, a3, 4
 ; RV64-NEXT:    slli a0, a0, 4
-; RV64-NEXT:    and a0, a0, a3
-; RV64-NEXT:    lui a3, 1047757
-; RV64-NEXT:    addiw a3, a3, -820
-; RV64-NEXT:    srli a1, a1, 4
-; RV64-NEXT:    or a0, a1, a0
-; RV64-NEXT:    and a1, a3, a2
-; RV64-NEXT:    and a1, a0, a1
+; RV64-NEXT:    and a0, a0, a1
+; RV64-NEXT:    or a0, a3, a0
+; RV64-NEXT:    lui a1, 1047757
+; RV64-NEXT:    addiw a1, a1, -820
+; RV64-NEXT:    and a3, a1, a2
+; RV64-NEXT:    and a3, a0, a3
+; RV64-NEXT:    srli a3, a3, 2
 ; RV64-NEXT:    slli a0, a0, 2
-; RV64-NEXT:    and a0, a0, a3
-; RV64-NEXT:    lui a3, 1047211
-; RV64-NEXT:    addiw a3, a3, -1366
-; RV64-NEXT:    and a2, a3, a2
-; RV64-NEXT:    srli a1, a1, 2
-; RV64-NEXT:    or a0, a1, a0
+; RV64-NEXT:    and a0, a0, a1
+; RV64-NEXT:    or a0, a3, a0
+; RV64-NEXT:    lui a1, 1047211
+; RV64-NEXT:    addiw a1, a1, -1366
+; RV64-NEXT:    and a2, a1, a2
 ; RV64-NEXT:    and a2, a0, a2
-; RV64-NEXT:    slli a0, a0, 1
 ; RV64-NEXT:    srli a2, a2, 1
-; RV64-NEXT:    and a0, a0, a3
+; RV64-NEXT:    slli a0, a0, 1
+; RV64-NEXT:    and a0, a0, a1
 ; RV64-NEXT:    or a0, a2, a0
 ; RV64-NEXT:    ret
   %rev = call i24 @llvm.bitreverse.i24(i24 %x)
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
index cf7cef83bcc135..70d1b25309c844 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
@@ -21,34 +21,34 @@ define void @constant_fold_barrier_i128(ptr %p) {
 ; RV32-LABEL: constant_fold_barrier_i128:
 ; RV32:       # %bb.0: # %entry
 ; RV32-NEXT:    li a1, 1
+; RV32-NEXT:    slli a1, a1, 11
 ; RV32-NEXT:    lw a2, 0(a0)
 ; RV32-NEXT:    lw a3, 4(a0)
 ; RV32-NEXT:    lw a4, 8(a0)
 ; RV32-NEXT:    lw a5, 12(a0)
-; RV32-NEXT:    slli a1, a1, 11
 ; RV32-NEXT:    and a2, a2, a1
 ; RV32-NEXT:    and a3, a3, zero
 ; RV32-NEXT:    and a4, a4, zero
 ; RV32-NEXT:    and a5, a5, zero
 ; RV32-NEXT:    add a2, a2, a1
-; RV32-NEXT:    add a6, a3, zero
 ; RV32-NEXT:    sltu a1, a2, a1
+; RV32-NEXT:    add a6, a3, zero
 ; RV32-NEXT:    sltu a3, a6, a3
 ; RV32-NEXT:    add a6, a6, a1
 ; RV32-NEXT:    seqz a7, a6
 ; RV32-NEXT:    and a1, a7, a1
-; RV32-NEXT:    add a7, a4, zero
-; RV32-NEXT:    add a5, a5, zero
-; RV32-NEXT:    sltu a4, a7, a4
 ; RV32-NEXT:    or a1, a3, a1
-; RV32-NEXT:    add a7, a7, a1
-; RV32-NEXT:    seqz a3, a7
-; RV32-NEXT:    and a1, a3, a1
+; RV32-NEXT:    add a3, a4, zero
+; RV32-NEXT:    sltu a4, a3, a4
+; RV32-NEXT:    add a3, a3, a1
+; RV32-NEXT:    seqz a7, a3
+; RV32-NEXT:    and a1, a7, a1
 ; RV32-NEXT:    or a1, a4, a1
+; RV32-NEXT:    add a5, a5, zero
 ; RV32-NEXT:    add a1, a5, a1
 ; RV32-NEXT:    sw a2, 0(a0)
 ; RV32-NEXT:    sw a6, 4(a0)
-; RV32-NEXT:    sw a7, 8(a0)
+; RV32-NEXT:    sw a3, 8(a0)
 ; RV32-NEXT:    sw a1, 12(a0)
 ; RV32-NEXT:    ret
 entry:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
index 2c3e3faddc3916..51e8b6da39d099 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
@@ -21,9 +21,9 @@ define i128 @constant_fold_barrier_i128(i128 %x) {
 ; RV64-LABEL: constant_fold_barrier_i128:
 ; RV64:       # %bb.0: # %entry
 ; RV64-NEXT:    li a2, 1
-; RV64-NEXT:    and a1, a1, zero
 ; RV64-NEXT:    slli a2, a2, 11
 ; RV64-NEXT:    and a0, a0, a2
+; RV64-NEXT:    and a1, a1, zero
 ; RV64-NEXT:    add a0, a0, a2
 ; RV64-NEXT:    sltu a2, a0, a2
 ; RV64-NEXT:    add a1, a1, zero
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll b/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
index 1156edffe91943..05989c310541b8 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
@@ -117,8 +117,8 @@ define i64 @abs64(i64 %x) {
 ; RV32I:       # %bb.0:
 ; RV32I-NEXT:    srai a2, a1, 31
 ; RV32I-NEXT:    add a0, a0, a2
-; RV32I-NEXT:    add a1, a1, a2
 ; RV32I-NEXT:    sltu a3, a0, a2
+; RV32I-NEXT:    add a1, a1, a2
 ; RV32I-NEXT:    add a1, a1, a3
 ; RV32I-NEXT:    xor a0, a0, a2
 ; RV32I-NEXT:    xor a1, a1, a2
@@ -128,8 +128,8 @@ define i64 @abs64(i64 %x) {
 ; RV32ZBB:       # %bb.0:
 ; RV32ZBB-NEXT:    srai a2, a1, 31
 ; RV32ZBB-NEXT:    add a0, a0, a2
-; RV32ZBB-NEXT:    add a1, a1, a2
 ; RV32ZBB-NEXT:    sltu a3, a0, a2
+; RV32ZBB-NEXT:    add a1, a1, a2
 ; RV32ZBB-NEXT:    add a1, a1, a3
 ; RV32ZBB-NEXT:    xor a0, a0, a2
 ; RV32ZBB-NEXT:    xor a1, a1, a2
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll
index 68bf9240ccd1df..c558639fda424e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll
@@ -302,8 +302,8 @@ define i64 @rori_i64(i64 %a) nounwind {
 ; CHECK-NEXT:    slli a2, a0, 31
 ; CHECK-NEXT:    srli a0, a0, 1
 ; CHECK-NEXT:    slli a3, a1, 31
-; CHECK-NEXT:    srli a1, a1, 1
 ; CHECK-NEXT:    or a0, a0, a3
+; CHECK-NEXT:    srli a1, a1, 1
 ; CHECK-NEXT:    or a1, a2, a1
 ; CHECK-NEXT:    ret
   %1 = tail call i64 @llvm.fshl.i64(i64 %a, i64 %a, i64 63)
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
index 7f22127ad3536c..1184905c17edea 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll
@@ -12,31 +12,31 @@ define i32 @ctlz_i32(i32 %a) nounwind {
 ; RV32I-NEXT:    beqz a0, .LBB0_2
 ; RV32I-NEXT:  # %bb.1: # %cond.false
 ; RV32I-NEXT:    srli a1, a0, 1
-; RV32I-NEXT:    lui a2, 349525
 ; RV32I-NEXT:    or a0, a0, a1
-; RV32I-NEXT:    addi a1, a2, 1365
-; RV32I-NEXT:    srli a2, a0, 2
-; RV32I-NEXT:    or a0, a0, a2
-; RV32I-NEXT:    srli a2, a0, 4
-; RV32I-NEXT:    or a0, a0, a2
-; RV32I-NEXT:    srli a2, a0, 8
-; RV32I-NEXT:    or a0, a0, a2
-; RV32I-NEXT:    srli a2, a0, 16
-; RV32I-NEXT:    or a0, a0, a2
-; RV32I-NEXT:    srli a2, a0, 1
-; RV32I-NEXT:    and a1, a2, a1
-; RV32I-NEXT:    lui a2, 209715
-; RV32I-NEXT:    addi a2, a2, 819
+; RV32I-NEXT:    srli a1, a0, 2
+; RV32I-NEXT:    or a0, a0, a1
+; RV32I-NEXT:    srli a1, a0, 4
+; RV32I-NEXT:    or a0, a0, a1
+; RV32I-NEXT:    srli a1, a0, 8
+; RV32I-NEXT:    or a0, a0, a1
+; RV32I-NEXT:    srli a1, a0, 16
+; RV32I-NEXT:    or a0, a0, a1
+; RV32I-NEXT:    srli a1, a0, 1
+; RV32I-NEXT:    lui a2, 349525
+; RV32I-NEXT:    addi a2, a2, 1365
+; RV32I-NEXT:    and a1, a1, a2
 ; RV32I-NEXT:    sub a0, a0, a1
 ; RV32I-NEXT:    srli a1, a0, 2
-; RV32I-NEXT:    and a0, a0, a2
+; RV32I-NEXT:    lui a2, 209715
+; RV32I-NEXT:    addi a2, a2, 819
 ; RV32I-NEXT:    and a1, a1, a2
-; RV32I-NEXT:    lui a2, 61681
-; RV32I-NEXT:    addi a2, a2, -241
+; RV32I-NEXT:    and a0, a0, a2
 ; RV32I-NEXT:    add a0, a1, a0
 ; RV32I-NEXT:    srli a1, a0, 4
 ; RV32I-NEXT:    add a0, a1, a0
-; RV32I-NEXT:    and a0, a0, a2
+; RV32I-NEXT:    lui a1, 61681
+; RV32I-NEXT:    addi a1, a1, -241
+; RV32I-NEXT:    and a0, a0, a1
 ; RV32I-NEXT:    slli a1, a0, 8
 ; RV32I-NEXT:    add a0, a0, a1
 ; RV32I-NEXT:    slli a1, a0, 16
@@ -63,11 +63,11 @@ define i64 @ctlz_i64(i64 %a) nounwind {
 ; RV32I-LABEL: ctlz_i64:
 ; RV32I:       # %bb.0:
 ; RV32I-NEXT:    lui a2, 349525
-; RV32I-NEXT:    lui a3, 209715
-; RV32I-NEXT:    lui a6, 61681
 ; RV32I-NEXT:    addi a5, a2, 1365
-; RV32I-NEXT:    addi a4, a3, 819
-; RV32I-NEXT:    addi a3, a6, -241
+; RV32I-NEXT:    lui a2, 209715
+; RV32I-NEXT:    addi a4, a2, 819
+; RV32I-NEXT:    lui a2, 61681
+; RV32I-NEXT:    addi a3, a2, -241
 ; RV32I-NEXT:    li a2, 32
 ; RV32I-NEXT:    beqz a1, .LBB1_2
 ; RV32I-NEXT:  # %bb.1:
@@ -155,22 +155,22 @@ define i32 @cttz_i32(i32 %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %cond.false
 ; RV32I-NEXT:    not a1, a0
 ; RV32I-NEXT:    addi a0, a0, -1
-; RV32I-NEXT:    lui a2, 349525
 ; RV32I-NEXT:    and a0, a1, a0
-; RV32I-NEXT:    addi a1, a2, 1365
-; RV32I-NEXT:    srli a2, a0, 1
-; RV32I-NEXT:    and a1, a2, a1
-; RV32I-NEXT:    lui a2, 209715
-; RV32I-NEXT:    addi a2, a2, 819
+; RV32I-NEXT:    srli a1, a0, 1
+; RV32I-NEXT:    lui a2, 349525
+; RV32I-NEXT:    addi a2, a2, 1365
+; RV32I-NEXT:    and a1, a1, a2
 ; RV32I-NEXT:    sub a0, a0, a1
 ; RV32I-NEXT:    srli a1, a0, 2
-; RV32I-NEXT:    and a0, a0, a2
+; RV32I-NEXT:    lui a2, 209715
+; RV32I-NEXT:    addi a2, a2, 819
 ; RV32I-NEXT:    and a1, a1, a2
-; RV32I-NEXT:    lui a2, 61681
+; RV32I-NEXT:    and a0, a0, a2
 ; RV32I-NEXT:    add a0, a1, a0
 ; RV32I-NEXT:    srli a1, a0, 4
 ; RV32I-NEXT:    add a0, a1, a0
-; RV32I-NEXT:    addi a1, a2, -241
+; RV32I-NEXT:    lui a1, 61681
+; RV32I-NEXT:    addi a1, a1, -241
 ; RV32I-NEXT:    and a0, a0, a1
 ; RV32I-NEXT:    slli a1, a0, 8
 ; RV32I-NEXT:    add a0, a0, a1
@@ -196,11 +196,11 @@ define i64 @cttz_i64(i64 %a) nounwind {
 ; RV32I-LABEL: cttz_i64:
 ; RV32I:       # %bb.0:
 ; RV32I-NEXT:    lui a2, 349525
-; RV32I-NEXT:    lui a3, 209715
-; RV32I-NEXT:    lui a5, 61681
 ; RV32I-NEXT:    addi a4, a2, 1365
-; RV32I-NEXT:    addi a3, a3, 819
-; RV32I-NEXT:    addi a2, a5, -241
+; RV32I-NEXT:    lui a2, 209715
+; RV32I-NEXT:    addi a3, a2, 819
+; RV32I-NEXT:    lui a2, 61681
+; RV32I-NEXT:    addi a2, a2, -241
 ; RV32I-NEXT:    beqz a0, .LBB3_2
 ; RV32I-NEXT:  # %bb.1:
 ; RV32I-NEXT:    not a1, a0
@@ -271,17 +271,17 @@ define i32 @ctpop_i32(i32 %a) nounwind {
 ; RV32I-NEXT:    lui a2, 349525
 ; RV32I-NEXT:    addi a2, a2, 1365
 ; RV32I-NEXT:    and a1, a1, a2
-; RV32I-NEXT:    lui a2, 209715
-; RV32I-NEXT:    addi a2, a2, 819
 ; RV32I-NEXT:    sub a0, a0, a1
 ; RV32I-NEXT:    srli a1, a0, 2
-; RV32I-NEXT:    and a0, a0, a2
+; RV32I-NEXT:    lui a2, 209715
+; RV32I-NEXT:    addi a2, a2, 819
 ; RV32I-NEXT:    and a1, a1, a2
-; RV32I-NEXT:    lui a2, 61681
+; RV32I-NEXT:    and a0, a0, a2
 ; RV32I-NEXT:    add a0, a1, a0
 ; RV32I-NEXT:    srli a1, a0, 4
 ; RV32I-NEXT:    add a0, a1, a0
-; RV32I-NEXT:    addi a1, a2, -241
+; RV32I-NEXT:    lui a1, 61681
+; RV32I-NEXT:    addi a1, a1, -241
 ; RV32I-NEXT:    and a0, a0, a1
 ; RV32I-NEXT:    slli a1, a0, 8
 ; RV32I-NEXT:    add a0, a0, a1
@@ -305,39 +305,39 @@ define i64 @ctpop_i64(i64 %a) nounwind {
 ; RV32I:       # %bb.0:
 ; RV32I-NEXT:    srli a2, a0, 1
 ; RV32I-NEXT:    lui a3, 349525
-; RV32I-NEXT:    lui a4, 209715
-; RV32I-NEXT:    srli a5, a1, 1
 ; RV32I-NEXT:    addi a3, a3, 1365
 ; RV32I-NEXT:    and a2, a2, a3
-; RV32I-NEXT:    and a3, a5, a3
-; RV32I-NEXT:    lui a5, 61681
-; RV32I-NEXT:    addi a4, a4, 819
-; RV32I-NEXT:    addi a5, a5, -241
 ; RV32I-NEXT:    sub a0, a0, a2
-; RV32I-NEXT:    sub a1, a1, a3
 ; RV32I-NEXT:    srli a2, a0, 2
+; RV32I-NEXT:    lui a4, 209715
+; RV32I-NEXT:    addi a4, a4, 819
+; RV32I-NEXT:    and a2, a2, a4
 ; RV32I-NEXT:    and a0, a0, a4
+; RV32I-NEXT:    add a0, a2, a0
+; RV32I-NEXT:    srli a2, a0, 4
+; RV32I-NEXT:    add a0, a2, a0
+; RV32I-NEXT:    lui a2, 61681
+; RV32I-NEXT:    addi a2, a2, -241
+; RV32I-NEXT:    and a0, a0, a2
+; RV32I-NEXT:    slli a5, a0, 8
+; RV32I-NEXT:    add a0, a0, a5
+; RV32I-NEXT:    slli a5, a0, 16
+; RV32I-NEXT:    add a0, a0, a5
+; RV32I-NEXT:    srli a0, a0, 24
+; RV32I-NEXT:    srli a5, a1, 1
+; RV32I-NEXT:    and a3, a5, a3
+; RV32I-NEXT:    sub a1, a1, a3
 ; RV32I-NEXT:    srli a3, a1, 2
-; RV32I-NEXT:    and a1, a1, a4
-; RV32I-NEXT:    and a2, a2, a4
 ; RV32I-NEXT:    and a3, a3, a4
-; RV32I-NEXT:    add a0, a2, a0
+; RV32I-NEXT:    and a1, a1, a4
 ; RV32I-NEXT:    add a1, a3, a1
-; RV32I-NEXT:    srli a2, a0, 4
 ; RV32I-NEXT:    srli a3, a1, 4
-; RV32I-NEXT:    add a0, a2, a0
 ; RV32I-NEXT:    add a1, a3, a1
-; RV32I-NEXT:    and a0...
[truncated]

@preames
Copy link
Collaborator

preames commented Nov 12, 2024

I suggest we let this sit for 2-3 days after the prior patch has landed. As I said on the previous review, I think this is reasonable, but a) we had a bunch of discussion on this point and b) it'd be good to have some staging between the individual commits to simplify regression analysis.

@michaelmaitland
Copy link
Contributor

For the recent scheduler patches, the common theme is we saw another target did something brought that functionality to RISC-V. How do we know that these changes are sensible defaults for RISC-V cores? Are you making measurements on any cores? Are they in order, out of order, both? In my experience tuning for different cores, there is often a difference between OOO and in order cores.

@mshockwave
Copy link
Member

This helps reduce register pressure for some cases.

Is it possible to provide some numbers to back this up? Preferably using some well known benchmarks like SPEC and/or llvm-test-suite

@wangpc-pp
Copy link
Contributor Author

I added two experimental options: -riscv-disable-latency-heuristic and -riscv-should-track-lane-masks and evaluated the statistics (regalloc.NumSpills/regalloc.NumReloads) on llvm-test-suite (option: -O3 -march=rva23u64):

  1. -riscv-disable-latency-heuristic=true and -riscv-should-track-lane-masks=false:
Program                                       regalloc.NumSpills                   regalloc.NumReloads                  
                                              00                 10       diff     00                  10       diff    
SingleSour...ce/UnitTests/matrix-types-spec    8823.00            6166.00 -2657.00 15603.00            13403.00 -2200.00
External/S...rate/510.parest_r/510.parest_r   43817.00           43262.00  -555.00 87058.00            87033.00   -25.00
External/S...017speed/625.x264_s/625.x264_s    2373.00            1991.00  -382.00  4808.00             4287.00  -521.00
External/S...2017rate/525.x264_r/525.x264_r    2373.00            1991.00  -382.00  4808.00             4287.00  -521.00
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000    2684.00            2334.00  -350.00  4820.00             4349.00  -471.00
MultiSourc...nchmarks/FreeBench/pifft/pifft     442.00             126.00  -316.00   595.00              281.00  -314.00
MultiSourc.../Applications/JM/ldecod/ldecod    1335.00            1131.00  -204.00  2311.00             2142.00  -169.00
External/S...00.perlbench_s/600.perlbench_s    4354.00            4154.00  -200.00  9615.00             9435.00  -180.00
External/S...00.perlbench_r/500.perlbench_r    4354.00            4154.00  -200.00  9615.00             9435.00  -180.00
MultiSourc.../Applications/JM/lencod/lencod    3368.00            3172.00  -196.00  7261.00             7069.00  -192.00
External/S...te/538.imagick_r/538.imagick_r    4163.00            4000.00  -163.00 10354.00             9964.00  -390.00
MultiSourc...ch/consumer-lame/consumer-lame     722.00             559.00  -163.00  1098.00              994.00  -104.00
External/S...ed/638.imagick_s/638.imagick_s    4163.00            4000.00  -163.00 10354.00             9964.00  -390.00
MultiSource/Applications/oggenc/oggenc          970.00             817.00  -153.00  2327.00             2120.00  -207.00
MultiSourc...e/Applications/ClamAV/clamscan    2072.00            1937.00  -135.00  4836.00             4648.00  -188.00
      regalloc.NumSpills                            regalloc.NumReloads                           
run                   00            10         diff                  00            10         diff
mean   87.747460          84.068699    -3.678761     1371.475285         1357.146154  -3.792937   
  1. -riscv-disable-latency-heuristic=false and -riscv-should-track-lane-masks=true:
Program                                       regalloc.NumSpills                 regalloc.NumReloads                 
                                              00                 01      diff    00                  01       diff   
SingleSour...ce/UnitTests/matrix-types-spec   8823.00            8233.00 -590.00 15603.00            15020.00 -583.00
MultiSourc...ch/consumer-lame/consumer-lame    722.00             689.00  -33.00  1098.00             1065.00  -33.00
MultiSourc...s/Prolangs-C/football/football    248.00             250.00    2.00   349.00              350.00    1.00
MultiSourc...ench/telecomm-gsm/telecomm-gsm    182.00             181.00   -1.00   196.00              195.00   -1.00
MultiSourc...Benchmarks/7zip/7zip-benchmark   1272.00            1273.00    1.00  2436.00             2437.00    1.00
MicroBench...arks/ImageProcessing/Blur/blur    114.00             113.00   -1.00   136.00              136.00    0.00
MultiSourc...rks/mediabench/gsm/toast/toast    182.00             181.00   -1.00   196.00              195.00   -1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc   1196.00            1195.00   -1.00  2036.00             2029.00   -7.00
SingleSour.../execute/GCC-C-execute-pr36321      0.00               0.00    0.00                                 0.00
SingleSour.../execute/GCC-C-execute-pr36077      0.00               0.00    0.00                                 0.00
SingleSour...xecute/GCC-C-execute-pr33779-1      0.00               0.00    0.00                                 0.00
SingleSour.../execute/GCC-C-execute-pr33669      0.00               0.00    0.00                                 0.00
SingleSour.../execute/GCC-C-execute-pr33631      0.00               0.00    0.00                                 0.00
SingleSour.../execute/GCC-C-execute-pr33382      0.00               0.00    0.00                                 0.00
SingleSour.../execute/GCC-C-execute-pr37102      0.00               0.00    0.00                                 0.00
      regalloc.NumSpills                            regalloc.NumReloads                           
run                   00            01         diff                  00            01         diff
mean   87.747460          87.445573    -0.301887     1371.475285         1369.091255  -0.303338   
  1. -riscv-disable-latency-heuristic=true and -riscv-should-track-lane-masks=true:
Program                                       regalloc.NumSpills                   regalloc.NumReloads                  
                                              00                 11       diff     00                  11       diff    
SingleSour...ce/UnitTests/matrix-types-spec    8823.00            6320.00 -2503.00 15603.00            13544.00 -2059.00
External/S...rate/510.parest_r/510.parest_r   43817.00           43262.00  -555.00 87058.00            87033.00   -25.00
External/S...017speed/625.x264_s/625.x264_s    2373.00            1991.00  -382.00  4808.00             4287.00  -521.00
External/S...2017rate/525.x264_r/525.x264_r    2373.00            1991.00  -382.00  4808.00             4287.00  -521.00
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000    2684.00            2334.00  -350.00  4820.00             4349.00  -471.00
MultiSourc...nchmarks/FreeBench/pifft/pifft     442.00             126.00  -316.00   595.00              281.00  -314.00
MultiSourc.../Applications/JM/ldecod/ldecod    1335.00            1131.00  -204.00  2311.00             2142.00  -169.00
External/S...00.perlbench_s/600.perlbench_s    4354.00            4154.00  -200.00  9615.00             9435.00  -180.00
External/S...00.perlbench_r/500.perlbench_r    4354.00            4154.00  -200.00  9615.00             9435.00  -180.00
MultiSourc.../Applications/JM/lencod/lencod    3368.00            3172.00  -196.00  7261.00             7069.00  -192.00
External/S...te/538.imagick_r/538.imagick_r    4163.00            4000.00  -163.00 10354.00             9964.00  -390.00
MultiSourc...ch/consumer-lame/consumer-lame     722.00             559.00  -163.00  1098.00              994.00  -104.00
External/S...ed/638.imagick_s/638.imagick_s    4163.00            4000.00  -163.00 10354.00             9964.00  -390.00
MultiSource/Applications/oggenc/oggenc          970.00             817.00  -153.00  2327.00             2120.00  -207.00
MultiSourc...e/Applications/ClamAV/clamscan    2072.00            1937.00  -135.00  4836.00             4648.00  -188.00
      regalloc.NumSpills                            regalloc.NumReloads                           
run                   00            11         diff                  00            11         diff
mean   87.747460          84.142235    -3.605225     1371.475285         1357.692308  -3.724238   

We can see that both options can reduce the mean of spills/reloads. ShouldTrackLaneMasks has smaller influence because only vector registers (with sub-registers) can benefit from this.

I didn't run these tests on real hardwares, so these data may not be so convincing. I'd appreciate it if you can evaluate this on some platforms, that will be helpful. If you find this common setting is not suitable for your microarchitectures, please let me know, we can make it a tune feature. All I want is just to unify the common sched policy and make part of the policy being tune features.

Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM . I do have some minor questions but they're not blocking.

; CHECK-NEXT: vse32.v v8, (a5)
; CHECK-NEXT: vse32.v v9, (a6)
; CHECK-NEXT: ret
; RV32-LABEL: buildvec_vid_step1o2_v4i32:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have different scheduling between RV32 and RV64?

; CHECK-NEXT: slli a1, a1, 4
; CHECK-NEXT: add a1, sp, a1
; CHECK-NEXT: addi a1, a1, 16
; CHECK-NEXT: vs8r.v v16, (a1) # Unknown-size Folded Spill
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have more spills / reloads in this function?

@michaelmaitland
Copy link
Contributor

I measured this patch on spec for an in-order and an out-of-order core.

For the out-of-order core, 557.xz_r saw a regression.
For the in-order core, I see regression on 456.hmmer and 458.sjeng.

I saw no other significant improvements or regressions.

These findings, combined with the fact that latency heuristic is so low on the heuristic list (above only program order), I'm not sure that I see a strong argument to set this to true on either in-order or out-of-order by default.

@preames
Copy link
Collaborator

preames commented Nov 26, 2024

Given @michaelmaitland's data, @wangpc-pp the burden shifts to you to clearly justify which cases this is profitable and figure out how to selectively enable only in profitable cases. I agree with @michaelmaitland's conclusion that this should not move forward otherwise.

@michaelmaitland Can you say anything about the magnitude of regression in either case? I assume they were statistically significant given you mention them, but are these small regressions or largish ones?

@michaelmaitland
Copy link
Contributor

Given @michaelmaitland's data, @wangpc-pp the burden shifts to you to clearly justify which cases this is profitable and figure out how to selectively enable only in profitable cases. I agree with @michaelmaitland's conclusion that this should not move forward otherwise.

@michaelmaitland Can you say anything about the magnitude of regression in either case? I assume they were statistically significant given you mention them, but are these small regressions or largish ones?

  • sjeng: 1.86% regression
  • 557.xz_r: 1.14% regression
  • 456.hmmer: 1.03% regression
  • All other x in results were -1 < x < 1

@wangpc-pp
Copy link
Contributor Author

Thanks for evaluating this! The data is very helpful! @michaelmaitland

Given @michaelmaitland's data, @wangpc-pp the burden shifts to you to clearly justify which cases this is profitable and figure out how to selectively enable only in profitable cases. I agree with @michaelmaitland's conclusion that this should not move forward otherwise.

I don't have other data other than the spill/reload data above. I don't know how to dynamically determine if a SchedDAG region will benefit from disabling it because we can only know NumRegionInstrs (we may change the function signature and pass DAG directly in the future so that we can analyse the region). AArch64 is the only target will disable it and almost all Apple's CPUs have this feature on (don't know if it is profitable or they are just some inertial copies when defining new processor).

Again, if the conclusion is that we shouldn't make it true by default, I can make it a tune feature. All I want is making scheduling infrastructure easy to tune for downstreams. :-)

Created using spr 1.3.6-beta.1
@wangpc-pp wangpc-pp changed the title [RISCV] Set DisableLatencyHeuristic to true [RISCV] Add FeatureDisableLatencySchedHeuristic Nov 27, 2024
Copy link
Contributor

@michaelmaitland michaelmaitland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Created using spr 1.3.6-beta.1
@wangpc-pp wangpc-pp changed the title [RISCV] Add FeatureDisableLatencySchedHeuristic [RISCV] Add TuneDisableLatencySchedHeuristic Nov 28, 2024
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
@wangpc-pp wangpc-pp changed the base branch from users/wangpc-pp/spr/main.riscv-set-disablelatencyheuristic-to-true to main November 28, 2024 07:16
@wangpc-pp wangpc-pp merged commit 93f7398 into main Nov 28, 2024
7 of 11 checks passed
@wangpc-pp wangpc-pp deleted the users/wangpc-pp/spr/riscv-set-disablelatencyheuristic-to-true branch November 28, 2024 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants