Skip to content

[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default #89927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,8 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
return true;
}

bool shouldDropLSRSolutionIfLessProfitable() const { return true; }

std::optional<unsigned> getMinPageSize() const { return 4096; }
};

Expand Down
137 changes: 67 additions & 70 deletions llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll
Original file line number Diff line number Diff line change
Expand Up @@ -86,30 +86,29 @@ declare i64 @llvm.vscale.i64()
define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
; NO-SINK-LABEL: sink_splat_add_scalable:
; NO-SINK: # %bb.0: # %entry
; NO-SINK-NEXT: csrr a5, vlenb
; NO-SINK-NEXT: srli a2, a5, 1
; NO-SINK-NEXT: csrr a2, vlenb
; NO-SINK-NEXT: srli a2, a2, 1
; NO-SINK-NEXT: li a3, 1024
; NO-SINK-NEXT: bgeu a3, a2, .LBB1_2
; NO-SINK-NEXT: # %bb.1:
; NO-SINK-NEXT: li a3, 0
; NO-SINK-NEXT: j .LBB1_5
; NO-SINK-NEXT: .LBB1_2: # %vector.ph
; NO-SINK-NEXT: li a5, 0
; NO-SINK-NEXT: addi a3, a2, -1
; NO-SINK-NEXT: andi a4, a3, 1024
; NO-SINK-NEXT: xori a3, a4, 1024
; NO-SINK-NEXT: vsetvli a6, zero, e32, m2, ta, ma
; NO-SINK-NEXT: vmv.v.x v8, a1
; NO-SINK-NEXT: slli a5, a5, 1
; NO-SINK-NEXT: mv a6, a0
; NO-SINK-NEXT: mv a7, a3
; NO-SINK-NEXT: .LBB1_3: # %vector.body
; NO-SINK-NEXT: # =>This Inner Loop Header: Depth=1
; NO-SINK-NEXT: slli a6, a5, 2
; NO-SINK-NEXT: add a6, a0, a6
; NO-SINK-NEXT: vl2re32.v v10, (a6)
; NO-SINK-NEXT: vadd.vv v10, v10, v8
; NO-SINK-NEXT: add a5, a5, a2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These diffs seem likely unprofitable, but more than that, they seems weird. The solution being chosen should "clearly" be higher cost than the one being rejected here. But more than that, the previous solution was also sub-optimal.

At first, I thought this might relate to the shifted add and went looking for a case where we forgot to guard a shNadd instruction on Zba. However, that turned out to be a red herring.

However, when doing that investigation, I noticed something. We have two different implementations of isLegalAddressing mode. One on TLI and one on TTI. We have overridden the former, but don't appear to have done the later.

The default implementation on TTIImpl assumes reg+reg addressing is legal. For RISCV vector, this is not a legal addressing mode. I think what we're seeing here is that LSR is computing a bad cost for basically all addresses used by vector loads and stores, and that the actual effect of this change is basically noise.

I suspect that if we implement TTI::isLegalAddressing, we'll see significant improvements in LSR solutions. Once we do that, we can revisit this patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I got myself confused here. BasicTTI does perform the proxy to TLI, so these implementations should be the same.

; NO-SINK-NEXT: vs2r.v v10, (a6)
; NO-SINK-NEXT: sub a7, a7, a2
; NO-SINK-NEXT: add a6, a6, a5
; NO-SINK-NEXT: bnez a7, .LBB1_3
; NO-SINK-NEXT: bne a5, a3, .LBB1_3
; NO-SINK-NEXT: # %bb.4: # %middle.block
; NO-SINK-NEXT: beqz a4, .LBB1_7
; NO-SINK-NEXT: .LBB1_5: # %for.body.preheader
Expand All @@ -129,29 +128,28 @@ define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
;
; SINK-LABEL: sink_splat_add_scalable:
; SINK: # %bb.0: # %entry
; SINK-NEXT: csrr a5, vlenb
; SINK-NEXT: srli a2, a5, 1
; SINK-NEXT: csrr a2, vlenb
; SINK-NEXT: srli a2, a2, 1
; SINK-NEXT: li a3, 1024
; SINK-NEXT: bgeu a3, a2, .LBB1_2
; SINK-NEXT: # %bb.1:
; SINK-NEXT: li a3, 0
; SINK-NEXT: j .LBB1_5
; SINK-NEXT: .LBB1_2: # %vector.ph
; SINK-NEXT: li a5, 0
; SINK-NEXT: addi a3, a2, -1
; SINK-NEXT: andi a4, a3, 1024
; SINK-NEXT: xori a3, a4, 1024
; SINK-NEXT: slli a5, a5, 1
; SINK-NEXT: mv a6, a0
; SINK-NEXT: mv a7, a3
; SINK-NEXT: vsetvli t0, zero, e32, m2, ta, ma
; SINK-NEXT: vsetvli a6, zero, e32, m2, ta, ma
; SINK-NEXT: .LBB1_3: # %vector.body
; SINK-NEXT: # =>This Inner Loop Header: Depth=1
; SINK-NEXT: slli a6, a5, 2
; SINK-NEXT: add a6, a0, a6
; SINK-NEXT: vl2re32.v v8, (a6)
; SINK-NEXT: vadd.vx v8, v8, a1
; SINK-NEXT: add a5, a5, a2
; SINK-NEXT: vs2r.v v8, (a6)
; SINK-NEXT: sub a7, a7, a2
; SINK-NEXT: add a6, a6, a5
; SINK-NEXT: bnez a7, .LBB1_3
; SINK-NEXT: bne a5, a3, .LBB1_3
; SINK-NEXT: # %bb.4: # %middle.block
; SINK-NEXT: beqz a4, .LBB1_7
; SINK-NEXT: .LBB1_5: # %for.body.preheader
Expand All @@ -171,29 +169,28 @@ define void @sink_splat_add_scalable(ptr nocapture %a, i32 signext %x) {
;
; DEFAULT-LABEL: sink_splat_add_scalable:
; DEFAULT: # %bb.0: # %entry
; DEFAULT-NEXT: csrr a5, vlenb
; DEFAULT-NEXT: srli a2, a5, 1
; DEFAULT-NEXT: csrr a2, vlenb
; DEFAULT-NEXT: srli a2, a2, 1
; DEFAULT-NEXT: li a3, 1024
; DEFAULT-NEXT: bgeu a3, a2, .LBB1_2
; DEFAULT-NEXT: # %bb.1:
; DEFAULT-NEXT: li a3, 0
; DEFAULT-NEXT: j .LBB1_5
; DEFAULT-NEXT: .LBB1_2: # %vector.ph
; DEFAULT-NEXT: li a5, 0
; DEFAULT-NEXT: addi a3, a2, -1
; DEFAULT-NEXT: andi a4, a3, 1024
; DEFAULT-NEXT: xori a3, a4, 1024
; DEFAULT-NEXT: slli a5, a5, 1
; DEFAULT-NEXT: mv a6, a0
; DEFAULT-NEXT: mv a7, a3
; DEFAULT-NEXT: vsetvli t0, zero, e32, m2, ta, ma
; DEFAULT-NEXT: vsetvli a6, zero, e32, m2, ta, ma
; DEFAULT-NEXT: .LBB1_3: # %vector.body
; DEFAULT-NEXT: # =>This Inner Loop Header: Depth=1
; DEFAULT-NEXT: slli a6, a5, 2
; DEFAULT-NEXT: add a6, a0, a6
; DEFAULT-NEXT: vl2re32.v v8, (a6)
; DEFAULT-NEXT: vadd.vx v8, v8, a1
; DEFAULT-NEXT: add a5, a5, a2
; DEFAULT-NEXT: vs2r.v v8, (a6)
; DEFAULT-NEXT: sub a7, a7, a2
; DEFAULT-NEXT: add a6, a6, a5
; DEFAULT-NEXT: bnez a7, .LBB1_3
; DEFAULT-NEXT: bne a5, a3, .LBB1_3
; DEFAULT-NEXT: # %bb.4: # %middle.block
; DEFAULT-NEXT: beqz a4, .LBB1_7
; DEFAULT-NEXT: .LBB1_5: # %for.body.preheader
Expand Down Expand Up @@ -407,32 +404,32 @@ define void @sink_splat_fadd_scalable(ptr nocapture %a, float %x) {
; NO-SINK-LABEL: sink_splat_fadd_scalable:
; NO-SINK: # %bb.0: # %entry
; NO-SINK-NEXT: csrr a1, vlenb
; NO-SINK-NEXT: srli a2, a1, 2
; NO-SINK-NEXT: li a3, 1024
; NO-SINK-NEXT: bgeu a3, a2, .LBB4_2
; NO-SINK-NEXT: srli a1, a1, 2
; NO-SINK-NEXT: li a2, 1024
; NO-SINK-NEXT: bgeu a2, a1, .LBB4_2
; NO-SINK-NEXT: # %bb.1:
; NO-SINK-NEXT: li a3, 0
; NO-SINK-NEXT: li a2, 0
; NO-SINK-NEXT: j .LBB4_5
; NO-SINK-NEXT: .LBB4_2: # %vector.ph
; NO-SINK-NEXT: addi a3, a2, -1
; NO-SINK-NEXT: andi a4, a3, 1024
; NO-SINK-NEXT: xori a3, a4, 1024
; NO-SINK-NEXT: li a4, 0
; NO-SINK-NEXT: addi a2, a1, -1
; NO-SINK-NEXT: andi a3, a2, 1024
; NO-SINK-NEXT: xori a2, a3, 1024
; NO-SINK-NEXT: vsetvli a5, zero, e32, m1, ta, ma
; NO-SINK-NEXT: vfmv.v.f v8, fa0
; NO-SINK-NEXT: mv a5, a0
; NO-SINK-NEXT: mv a6, a3
; NO-SINK-NEXT: .LBB4_3: # %vector.body
; NO-SINK-NEXT: # =>This Inner Loop Header: Depth=1
; NO-SINK-NEXT: slli a5, a4, 2
; NO-SINK-NEXT: add a5, a0, a5
; NO-SINK-NEXT: vl1re32.v v9, (a5)
; NO-SINK-NEXT: vfadd.vv v9, v9, v8
; NO-SINK-NEXT: add a4, a4, a1
; NO-SINK-NEXT: vs1r.v v9, (a5)
; NO-SINK-NEXT: sub a6, a6, a2
; NO-SINK-NEXT: add a5, a5, a1
; NO-SINK-NEXT: bnez a6, .LBB4_3
; NO-SINK-NEXT: bne a4, a2, .LBB4_3
; NO-SINK-NEXT: # %bb.4: # %middle.block
; NO-SINK-NEXT: beqz a4, .LBB4_7
; NO-SINK-NEXT: beqz a3, .LBB4_7
; NO-SINK-NEXT: .LBB4_5: # %for.body.preheader
; NO-SINK-NEXT: slli a1, a3, 2
; NO-SINK-NEXT: slli a1, a2, 2
; NO-SINK-NEXT: add a1, a0, a1
; NO-SINK-NEXT: lui a2, 1
; NO-SINK-NEXT: add a0, a0, a2
Expand All @@ -449,31 +446,31 @@ define void @sink_splat_fadd_scalable(ptr nocapture %a, float %x) {
; SINK-LABEL: sink_splat_fadd_scalable:
; SINK: # %bb.0: # %entry
; SINK-NEXT: csrr a1, vlenb
; SINK-NEXT: srli a2, a1, 2
; SINK-NEXT: li a3, 1024
; SINK-NEXT: bgeu a3, a2, .LBB4_2
; SINK-NEXT: srli a1, a1, 2
; SINK-NEXT: li a2, 1024
; SINK-NEXT: bgeu a2, a1, .LBB4_2
; SINK-NEXT: # %bb.1:
; SINK-NEXT: li a3, 0
; SINK-NEXT: li a2, 0
; SINK-NEXT: j .LBB4_5
; SINK-NEXT: .LBB4_2: # %vector.ph
; SINK-NEXT: addi a3, a2, -1
; SINK-NEXT: andi a4, a3, 1024
; SINK-NEXT: xori a3, a4, 1024
; SINK-NEXT: mv a5, a0
; SINK-NEXT: mv a6, a3
; SINK-NEXT: vsetvli a7, zero, e32, m1, ta, ma
; SINK-NEXT: li a4, 0
; SINK-NEXT: addi a2, a1, -1
; SINK-NEXT: andi a3, a2, 1024
; SINK-NEXT: xori a2, a3, 1024
; SINK-NEXT: vsetvli a5, zero, e32, m1, ta, ma
; SINK-NEXT: .LBB4_3: # %vector.body
; SINK-NEXT: # =>This Inner Loop Header: Depth=1
; SINK-NEXT: slli a5, a4, 2
; SINK-NEXT: add a5, a0, a5
; SINK-NEXT: vl1re32.v v8, (a5)
; SINK-NEXT: vfadd.vf v8, v8, fa0
; SINK-NEXT: add a4, a4, a1
; SINK-NEXT: vs1r.v v8, (a5)
; SINK-NEXT: sub a6, a6, a2
; SINK-NEXT: add a5, a5, a1
; SINK-NEXT: bnez a6, .LBB4_3
; SINK-NEXT: bne a4, a2, .LBB4_3
; SINK-NEXT: # %bb.4: # %middle.block
; SINK-NEXT: beqz a4, .LBB4_7
; SINK-NEXT: beqz a3, .LBB4_7
; SINK-NEXT: .LBB4_5: # %for.body.preheader
; SINK-NEXT: slli a1, a3, 2
; SINK-NEXT: slli a1, a2, 2
; SINK-NEXT: add a1, a0, a1
; SINK-NEXT: lui a2, 1
; SINK-NEXT: add a0, a0, a2
Expand All @@ -490,31 +487,31 @@ define void @sink_splat_fadd_scalable(ptr nocapture %a, float %x) {
; DEFAULT-LABEL: sink_splat_fadd_scalable:
; DEFAULT: # %bb.0: # %entry
; DEFAULT-NEXT: csrr a1, vlenb
; DEFAULT-NEXT: srli a2, a1, 2
; DEFAULT-NEXT: li a3, 1024
; DEFAULT-NEXT: bgeu a3, a2, .LBB4_2
; DEFAULT-NEXT: srli a1, a1, 2
; DEFAULT-NEXT: li a2, 1024
; DEFAULT-NEXT: bgeu a2, a1, .LBB4_2
; DEFAULT-NEXT: # %bb.1:
; DEFAULT-NEXT: li a3, 0
; DEFAULT-NEXT: li a2, 0
; DEFAULT-NEXT: j .LBB4_5
; DEFAULT-NEXT: .LBB4_2: # %vector.ph
; DEFAULT-NEXT: addi a3, a2, -1
; DEFAULT-NEXT: andi a4, a3, 1024
; DEFAULT-NEXT: xori a3, a4, 1024
; DEFAULT-NEXT: mv a5, a0
; DEFAULT-NEXT: mv a6, a3
; DEFAULT-NEXT: vsetvli a7, zero, e32, m1, ta, ma
; DEFAULT-NEXT: li a4, 0
; DEFAULT-NEXT: addi a2, a1, -1
; DEFAULT-NEXT: andi a3, a2, 1024
; DEFAULT-NEXT: xori a2, a3, 1024
; DEFAULT-NEXT: vsetvli a5, zero, e32, m1, ta, ma
; DEFAULT-NEXT: .LBB4_3: # %vector.body
; DEFAULT-NEXT: # =>This Inner Loop Header: Depth=1
; DEFAULT-NEXT: slli a5, a4, 2
; DEFAULT-NEXT: add a5, a0, a5
; DEFAULT-NEXT: vl1re32.v v8, (a5)
; DEFAULT-NEXT: vfadd.vf v8, v8, fa0
; DEFAULT-NEXT: add a4, a4, a1
; DEFAULT-NEXT: vs1r.v v8, (a5)
; DEFAULT-NEXT: sub a6, a6, a2
; DEFAULT-NEXT: add a5, a5, a1
; DEFAULT-NEXT: bnez a6, .LBB4_3
; DEFAULT-NEXT: bne a4, a2, .LBB4_3
; DEFAULT-NEXT: # %bb.4: # %middle.block
; DEFAULT-NEXT: beqz a4, .LBB4_7
; DEFAULT-NEXT: beqz a3, .LBB4_7
; DEFAULT-NEXT: .LBB4_5: # %for.body.preheader
; DEFAULT-NEXT: slli a1, a3, 2
; DEFAULT-NEXT: slli a1, a2, 2
; DEFAULT-NEXT: add a1, a0, a1
; DEFAULT-NEXT: lui a2, 1
; DEFAULT-NEXT: add a0, a0, a2
Expand Down
Loading
Loading