-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU][True16][CodeGen] update wwm reg sorting check condition #135053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py | ||
# RUN: llc -mtriple=amdgcn-- -mcpu=gfx1100 -mattr=+real-true16 -run-pass=prologepilog %s -o - | FileCheck -check-prefix=GCN %s | ||
|
||
--- | ||
name: wwm_reg_skip_sort_16bit | ||
tracksRegLiveness: true | ||
machineFunctionInfo: | ||
isEntryFunction: true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There won't be any prolog epilog CSR spills inserted for entry functions. This test isn't relevant for this patch, especially the shifting won't happen for the entry functions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi Christ. Then this sounds like a bug. The shifting is happening before the entryfunction check Should this entryFunction check being moved to early stage at line 1646? or even earlier to 1619? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah my bad. The shift-back is needed to ensure the wwm-regs are finally in the lowest range. This is needed for both entry functions and device functions. But their CSR spills/restores will be ignored for entry functions. |
||
body: | | ||
bb.0: | ||
; GCN-LABEL: name: wwm_reg_skip_sort_16bit | ||
; GCN: renamable $sgpr0 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec | ||
; GCN-NEXT: $vgpr0 = IMPLICIT_DEF | ||
; GCN-NEXT: renamable $sgpr1 = V_READLANE_B32 $vgpr0, 31 | ||
; GCN-NEXT: renamable $sgpr2 = S_MOV_B32 0 | ||
; GCN-NEXT: undef $vgpr0_lo16 = V_CNDMASK_B16_t16_e64 0, 0, 0, killed $sgpr1, killed $sgpr2, 0, implicit $exec | ||
; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM killed renamable $sgpr0 | ||
; GCN-NEXT: early-clobber renamable $vgpr1 = V_MOV_B32_e32 $vgpr0, implicit $exec | ||
; GCN-NEXT: S_ENDPGM 0, implicit killed renamable $vgpr1 | ||
renamable $sgpr0 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec | ||
$vgpr0 = IMPLICIT_DEF | ||
renamable $sgpr1 = V_READLANE_B32 $vgpr0, 31 | ||
renamable $sgpr2 = S_MOV_B32 0 | ||
undef $vgpr0_lo16 = V_CNDMASK_B16_t16_e64 0, 0, 0, killed $sgpr1, killed $sgpr2, 0, implicit $exec | ||
$exec_lo = EXIT_STRICT_WWM killed renamable $sgpr0 | ||
early-clobber renamable $vgpr1 = V_MOV_B32_e32 $vgpr0, implicit $exec | ||
S_ENDPGM 0, implicit killed renamable $vgpr1 | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this code needs to filter anything.
I don't understand this comment:
It can, but why? This should only track the 32-bit element registers
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Matt. I am not quite familiar with this. I think maybe @cdevadas can answer this question.
If we want to change something here I think we probably should do a seperate patch. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIPreallocateWWMRegs
pass handpicks VGPRs from the lower end. They might allocate VGPR tuples as well. However, the wwm-regalloc pass gets registers from the tail-end (we reserve them from the higher end to ensure the per-lane VGPR tuple allocation gets sufficient free contiguous registers from the initial scratch range.). This code inserted during PEI is trying to shift them down to the lowest range. Remember, the shifting is only required for those allocated during wwm-regalloc pass. Since we have the unified setWWMReservedRegs
that holds all sort of wwm-regs, this loop here is trying to identify only 32-bit registers that are allocated during wwm-regalloc pass (the VGPRs used for SGPR spilling, at the moment). We may also pick the 32-bit regs custom allocated during the SIPreallcoate pass. But that's ok. This filter avoids any VGPR tuple allocated for wwm-operands during the custom allocate pass. The shift-down logic currently Inserted here considers only 32-bit regclasses.The 16-bit registers weren't enabled earlier. So, it makes sense to change the condition to exactly match 32-bit classes.