Skip to content

[AMDGPU] Remove implicit definition of register group when restoring the last sub-register after a spill #133986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1928,7 +1928,9 @@ void SIRegisterInfo::buildSpillLoadStore(
MIB->setAsmPrinterFlag(MachineInstr::ReloadReuse);
}

if (NeedSuperRegImpOperand && (IsFirstSubReg || IsLastSubReg))
bool IsSrcDstDef = SrcDstRegState & RegState::Define;
if (NeedSuperRegImpOperand &&
(IsFirstSubReg || (IsLastSubReg && !IsSrcDstDef)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the second condition is doing here, why isn't this just IsFirstSubReg && isDef?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the IsLastSubReg is needed for other flags, namely RegState::Kill. Removing it, the following change is seen in the test I wrote:

-    ; GFX942-NEXT: SCRATCH_STORE_DWORDX2_SADDR killed $vgpr12_vgpr13, $sgpr32, 48, 0, implicit $exec, implicit $flat_scr, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 :: (store (s64) into %stack.0 + 48, align 4, addrspace 5)
+    ; GFX942-NEXT: SCRATCH_STORE_DWORDX2_SADDR killed $vgpr12_vgpr13, $sgpr32, 48, 0, implicit $exec, implicit $flat_scr :: (store (s64) into %stack.0 + 48, align 4, addrspace 5)

This seems to be the analog of implicit def on the first sub-register.

MIB.addReg(ValueReg, RegState::Implicit | SrcDstRegState);

// The epilog restore of a wwm-scratch register can cause undesired
Expand Down
28 changes: 14 additions & 14 deletions llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
Original file line number Diff line number Diff line change
Expand Up @@ -1050,7 +1050,7 @@ body: |
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GFX908-NEXT: $agpr0 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX908-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-NEXT: {{ $}}
Expand Down Expand Up @@ -1297,7 +1297,7 @@ body: |
; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (store (s32) into %stack.226, addrspace 5)
; GFX90A-NEXT: $vgpr40 = V_MOV_B32_e32 8904, implicit $exec
; GFX90A-NEXT: $agpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $agpr0_agpr1 :: (load (s32) from %stack.1, addrspace 5)
; GFX90A-NEXT: $agpr1 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec, implicit-def $agpr0_agpr1 :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX90A-NEXT: $agpr1 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX90A-NEXT: $vgpr40 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (load (s32) from %stack.226, addrspace 5)
; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX90A-NEXT: {{ $}}
Expand Down Expand Up @@ -1548,7 +1548,7 @@ body: |
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1, addrspace 5)
; GFX908-FLATSCR-NEXT: $agpr0 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 4, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX908-FLATSCR-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-FLATSCR-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-FLATSCR-NEXT: {{ $}}
Expand Down Expand Up @@ -2075,7 +2075,7 @@ body: |
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX908-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 8, addrspace 5)
; GFX908-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
; GFX908-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-NEXT: {{ $}}
Expand Down Expand Up @@ -2323,7 +2323,7 @@ body: |
; GFX90A-NEXT: $vgpr40 = V_MOV_B32_e32 8904, implicit $exec
; GFX90A-NEXT: $agpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $agpr0_agpr1_agpr2 :: (load (s32) from %stack.1, addrspace 5)
; GFX90A-NEXT: $agpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX90A-NEXT: $agpr2 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec, implicit-def $agpr0_agpr1_agpr2 :: (load (s32) from %stack.1 + 8, addrspace 5)
; GFX90A-NEXT: $agpr2 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 8, addrspace 5)
; GFX90A-NEXT: $vgpr40 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (load (s32) from %stack.226, addrspace 5)
; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX90A-NEXT: {{ $}}
Expand Down Expand Up @@ -2576,7 +2576,7 @@ body: |
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 4, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX908-FLATSCR-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 8, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1 + 8, addrspace 5)
; GFX908-FLATSCR-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
; GFX908-FLATSCR-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-FLATSCR-NEXT: {{ $}}
Expand Down Expand Up @@ -4115,9 +4115,9 @@ body: |
; GFX908-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; GFX908-NEXT: $vgpr1 = V_MOV_B32_e32 8200, implicit $exec
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1, implicit $agpr0_agpr1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a side effect where more implicit register reads are now included, unclear to me why this occurs. I will dig further to try to understand unless someone here has any thoughts.

; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1 :: (store (s32) into %stack.1, addrspace 5)
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec, implicit $agpr0_agpr1
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec, implicit $agpr0_agpr1 :: (store (s32) into %stack.1 + 4, addrspace 5)
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
Expand Down Expand Up @@ -4613,9 +4613,9 @@ body: |
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 $sgpr32, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = V_ADD_U32_e32 8200, $vgpr1, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1, implicit $agpr0_agpr1
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 0, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1 :: (store (s32) into %stack.1, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec, implicit $agpr0_agpr1
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 4, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1 :: (store (s32) into %stack.1 + 4, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
Expand Down Expand Up @@ -5136,11 +5136,11 @@ body: |
; GFX908-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; GFX908-NEXT: $vgpr1 = V_MOV_B32_e32 8200, implicit $exec
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1, addrspace 5)
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 4, addrspace 5)
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1 + 8, addrspace 5)
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
Expand Down Expand Up @@ -5637,11 +5637,11 @@ body: |
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 $sgpr32, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = V_ADD_U32_e32 8200, $vgpr1, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 0, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 4, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.1 + 4, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 8, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1 + 8, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
Expand Down
43 changes: 43 additions & 0 deletions llvm/test/CodeGen/AMDGPU/agpr-spill-copy.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx942 -run-pass prologepilog,machine-cp -o - %s | FileCheck -check-prefix=GFX942 %s

--- |
define amdgpu_kernel void @agpr_spill_copy() #0 { ret void }

attributes #0 = { "amdgpu-waves-per-eu"="8,8" }
---
name: agpr_spill_copy
tracksRegLiveness: true
stack:
- { id: 0, name: '', type: spill-slot, offset: 0, size: 64, alignment: 4 }
machineFunctionInfo:
stackPtrOffsetReg: '$sgpr32'
hasSpilledVGPRs: true
body: |
bb.0:
; GFX942-LABEL: name: agpr_spill_copy
; GFX942: liveins: $agpr30, $agpr31
; GFX942-NEXT: {{ $}}
; GFX942-NEXT: renamable $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27 = IMPLICIT_DEF
; GFX942-NEXT: renamable $agpr28_agpr29 = IMPLICIT_DEF
; GFX942-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = IMPLICIT_DEF
; GFX942-NEXT: SCRATCH_STORE_DWORDX4_SADDR killed $vgpr0_vgpr1_vgpr2_vgpr3, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, implicit $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 :: (store (s128) into %stack.0, align 4, addrspace 5)
; GFX942-NEXT: SCRATCH_STORE_DWORDX4_SADDR killed $vgpr4_vgpr5_vgpr6_vgpr7, $sgpr32, 16, 0, implicit $exec, implicit $flat_scr :: (store (s128) into %stack.0 + 16, align 4, addrspace 5)
; GFX942-NEXT: SCRATCH_STORE_DWORDX4_SADDR killed $vgpr8_vgpr9_vgpr10_vgpr11, $sgpr32, 32, 0, implicit $exec, implicit $flat_scr :: (store (s128) into %stack.0 + 32, align 4, addrspace 5)
; GFX942-NEXT: $agpr30 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr15, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
; GFX942-NEXT: $agpr31 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr14, implicit $exec, implicit $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
; GFX942-NEXT: SCRATCH_STORE_DWORDX2_SADDR killed $vgpr12_vgpr13, $sgpr32, 48, 0, implicit $exec, implicit $flat_scr, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 :: (store (s64) into %stack.0 + 48, align 4, addrspace 5)
; GFX942-NEXT: $agpr0_agpr1_agpr2_agpr3 = SCRATCH_LOAD_DWORDX4_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15 :: (load (s128) from %stack.0, align 4, addrspace 5)
; GFX942-NEXT: $agpr4_agpr5_agpr6_agpr7 = SCRATCH_LOAD_DWORDX4_SADDR $sgpr32, 16, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.0 + 16, align 4, addrspace 5)
; GFX942-NEXT: $agpr8_agpr9_agpr10_agpr11 = SCRATCH_LOAD_DWORDX4_SADDR $sgpr32, 32, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.0 + 32, align 4, addrspace 5)
; GFX942-NEXT: $agpr15 = COPY $agpr30, implicit $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
; GFX942-NEXT: $agpr14 = COPY $agpr31, implicit $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
; GFX942-NEXT: $agpr12_agpr13 = SCRATCH_LOAD_DWORDX2_SADDR $sgpr32, 48, 0, implicit $exec, implicit $flat_scr :: (load (s64) from %stack.0 + 48, align 4, addrspace 5)
; GFX942-NEXT: S_ENDPGM 0, implicit $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
renamable $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27 = IMPLICIT_DEF
renamable $agpr28_agpr29 = IMPLICIT_DEF
renamable $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = IMPLICIT_DEF
SI_SPILL_AV512_SAVE killed $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
renamable $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15 = SI_SPILL_AV512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
S_ENDPGM 0, implicit $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
...
Loading