-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Improve lowering of spread(2) shuffles #118658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -242,33 +242,27 @@ define <64 x float> @interleave_v32f32(<32 x float> %x, <32 x float> %y) { | |
; V128-NEXT: slli a0, a0, 3 | ||
; V128-NEXT: sub sp, sp, a0 | ||
; V128-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb | ||
; V128-NEXT: vmv8r.v v24, v16 | ||
; V128-NEXT: vmv8r.v v16, v8 | ||
; V128-NEXT: vmv8r.v v8, v24 | ||
; V128-NEXT: addi a0, sp, 16 | ||
; V128-NEXT: vs8r.v v24, (a0) # Unknown-size Folded Spill | ||
; V128-NEXT: vs8r.v v8, (a0) # Unknown-size Folded Spill | ||
; V128-NEXT: vsetivli zero, 16, e32, m8, ta, ma | ||
; V128-NEXT: vslidedown.vi v0, v24, 16 | ||
; V128-NEXT: li a0, -1 | ||
; V128-NEXT: vsetivli zero, 16, e32, m4, ta, ma | ||
; V128-NEXT: vwaddu.vv v24, v8, v0 | ||
; V128-NEXT: vwmaccu.vx v24, a0, v0 | ||
; V128-NEXT: vsetivli zero, 16, e32, m8, ta, ma | ||
; V128-NEXT: vslidedown.vi v0, v16, 16 | ||
; V128-NEXT: vslidedown.vi v24, v16, 16 | ||
; V128-NEXT: li a0, 32 | ||
; V128-NEXT: vslidedown.vi v0, v8, 16 | ||
; V128-NEXT: lui a1, 699051 | ||
; V128-NEXT: li a2, 32 | ||
; V128-NEXT: vsetivli zero, 16, e32, m4, ta, ma | ||
; V128-NEXT: vwaddu.vv v8, v0, v16 | ||
; V128-NEXT: vsetivli zero, 16, e64, m8, ta, ma | ||
; V128-NEXT: vzext.vf2 v8, v24 | ||
; V128-NEXT: vzext.vf2 v24, v0 | ||
; V128-NEXT: addi a1, a1, -1366 | ||
; V128-NEXT: vmv.s.x v0, a1 | ||
; V128-NEXT: vwmaccu.vx v8, a0, v16 | ||
; V128-NEXT: vsetvli zero, a2, e32, m8, ta, ma | ||
; V128-NEXT: vmerge.vvm v24, v8, v24, v0 | ||
; V128-NEXT: addi a1, sp, 16 | ||
; V128-NEXT: vl8r.v v8, (a1) # Unknown-size Folded Reload | ||
; V128-NEXT: vsll.vx v8, v8, a0 | ||
; V128-NEXT: vsetvli zero, a0, e32, m8, ta, ma | ||
; V128-NEXT: vmerge.vvm v24, v24, v8, v0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The weird two spread(2) and then merge - which could be one interleave - comes from lowering this: This fails our current restrictions in the definition of isInterleaveShuffle, but we could probably relax that. I'm going to glance at that separately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did glance at this, and decided not to pursue for now. The change itself is simple, but it exposes two unrelated issues I don't have cycles to fix. 1) isShuffleMaskLegal causes unprofitable transforms somewhere in generic DAG 2) MachineScheduling creates a truly adversarial schedule, and register allocator is forced to greatly increase spilling. |
||
; V128-NEXT: addi a0, sp, 16 | ||
; V128-NEXT: vl8r.v v8, (a0) # Unknown-size Folded Reload | ||
; V128-NEXT: vsetivli zero, 16, e32, m4, ta, ma | ||
; V128-NEXT: vwaddu.vv v0, v16, v8 | ||
; V128-NEXT: vwmaccu.vx v0, a0, v8 | ||
; V128-NEXT: vwaddu.vv v0, v8, v16 | ||
; V128-NEXT: li a0, -1 | ||
; V128-NEXT: vwmaccu.vx v0, a0, v16 | ||
; V128-NEXT: vmv8r.v v8, v0 | ||
; V128-NEXT: vmv8r.v v16, v24 | ||
; V128-NEXT: csrr a0, vlenb | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we already have the required undef handling in getWideningInterleave because it's used by the deinterleave2 intrinsic lowering.