-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Remove implicit definition of register group when restoring the last sub-register after a spill #133986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@@ -4111,10 +4111,10 @@ body: | | |||
; GFX908-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc | |||
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5) | |||
; GFX908-NEXT: $vgpr1 = V_MOV_B32_e32 8200, implicit $exec | |||
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1 | |||
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1, implicit $agpr0_agpr1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a side effect where more implicit register reads are now included, unclear to me why this occurs. I will dig further to try to understand unless someone here has any thoughts.
If there is any interest, here's the emulator I wrote to help with debug: https://github.com/bababuck/AMDGPU |
Ping Adding @arsenm, apologies is this is out of your area, but you seem to be involved in the AMDGPU backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a straightforward bugfix to me but I don't understand the backend well enough to approve it
Do you know who would be an appropriate reviewer? |
Pinging the people Github suggested |
@llvm/pr-subscribers-backend-amdgpu Author: Ryan Buchner (bababuck) ChangesExtension of https://reviews.llvm.org/D141101 to remove the define flag from the last stack memory access. Fixes case where COPY instructions are used for some of the stack restoration, but the copies get optimized away during the machine-cp pass. Prior to this change, was possible to produce the following code: where Fixes #131386. Made the simple fix, but I'm not completely comfortable with this change since the reason for the previous inclusion of @krzysz00 Patch is 372.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133986.diff 9 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index c1ac9491b2363..da373a2a18df5 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -1910,8 +1910,10 @@ void SIRegisterInfo::buildSpillLoadStore(
MIB->setAsmPrinterFlag(MachineInstr::ReloadReuse);
}
- if (NeedSuperRegImpOperand && (IsFirstSubReg || IsLastSubReg))
- MIB.addReg(ValueReg, RegState::Implicit | SrcDstRegState);
+ if (NeedSuperRegImpOperand && IsFirstSubReg) {
+ unsigned State = RegState::Implicit | SrcDstRegState;
+ MIB.addReg(ValueReg, State);
+ }
// The epilog restore of a wwm-scratch register can cause undesired
// optimization during machine-cp post PrologEpilogInserter if the same
diff --git a/llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir b/llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
index c91b686697b9d..a99f44776eda9 100644
--- a/llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+++ b/llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
@@ -1046,7 +1046,7 @@ body: |
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GFX908-NEXT: $agpr0 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
- ; GFX908-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
+ ; GFX908-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-NEXT: {{ $}}
@@ -1293,7 +1293,7 @@ body: |
; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (store (s32) into %stack.226, addrspace 5)
; GFX90A-NEXT: $vgpr40 = V_MOV_B32_e32 8904, implicit $exec
; GFX90A-NEXT: $agpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $agpr0_agpr1 :: (load (s32) from %stack.1, addrspace 5)
- ; GFX90A-NEXT: $agpr1 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec, implicit-def $agpr0_agpr1 :: (load (s32) from %stack.1 + 4, addrspace 5)
+ ; GFX90A-NEXT: $agpr1 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX90A-NEXT: $vgpr40 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (load (s32) from %stack.226, addrspace 5)
; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX90A-NEXT: {{ $}}
@@ -1544,7 +1544,7 @@ body: |
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1, addrspace 5)
; GFX908-FLATSCR-NEXT: $agpr0 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 4, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1 + 4, addrspace 5)
- ; GFX908-FLATSCR-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1
+ ; GFX908-FLATSCR-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-FLATSCR-NEXT: {{ $}}
@@ -2071,7 +2071,7 @@ body: |
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX908-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 8, addrspace 5)
- ; GFX908-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
+ ; GFX908-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-NEXT: {{ $}}
@@ -2319,7 +2319,7 @@ body: |
; GFX90A-NEXT: $vgpr40 = V_MOV_B32_e32 8904, implicit $exec
; GFX90A-NEXT: $agpr0 = BUFFER_LOAD_DWORD_OFFEN $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $agpr0_agpr1_agpr2 :: (load (s32) from %stack.1, addrspace 5)
; GFX90A-NEXT: $agpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
- ; GFX90A-NEXT: $agpr2 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec, implicit-def $agpr0_agpr1_agpr2 :: (load (s32) from %stack.1 + 8, addrspace 5)
+ ; GFX90A-NEXT: $agpr2 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 8, addrspace 5)
; GFX90A-NEXT: $vgpr40 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (load (s32) from %stack.226, addrspace 5)
; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX90A-NEXT: {{ $}}
@@ -2572,7 +2572,7 @@ body: |
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 4, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1 + 4, addrspace 5)
; GFX908-FLATSCR-NEXT: $agpr1 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr0 = SCRATCH_LOAD_DWORD $vgpr1, 8, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.1 + 8, addrspace 5)
- ; GFX908-FLATSCR-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
+ ; GFX908-FLATSCR-NEXT: $agpr2 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr0, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-FLATSCR-NEXT: {{ $}}
@@ -4111,10 +4111,10 @@ body: |
; GFX908-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; GFX908-NEXT: $vgpr1 = V_MOV_B32_e32 8200, implicit $exec
- ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1
+ ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1, implicit $agpr0_agpr1
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1 :: (store (s32) into %stack.1, addrspace 5)
- ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
- ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec, implicit $agpr0_agpr1 :: (store (s32) into %stack.1 + 4, addrspace 5)
+ ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec, implicit $agpr0_agpr1
+ ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 4, addrspace 5)
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-NEXT: {{ $}}
@@ -4361,7 +4361,7 @@ body: |
; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (store (s32) into %stack.226, addrspace 5)
; GFX90A-NEXT: $vgpr40 = V_MOV_B32_e32 8904, implicit $exec
; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFEN $agpr0, $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $agpr0_agpr1, implicit $agpr0_agpr1 :: (store (s32) into %stack.1, addrspace 5)
- ; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFEN $agpr1, killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec, implicit $agpr0_agpr1 :: (store (s32) into %stack.1 + 4, addrspace 5)
+ ; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFEN $agpr1, killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 4, addrspace 5)
; GFX90A-NEXT: $vgpr40 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (load (s32) from %stack.226, addrspace 5)
; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX90A-NEXT: {{ $}}
@@ -4609,10 +4609,10 @@ body: |
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 $sgpr32, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = V_ADD_U32_e32 8200, $vgpr1, implicit $exec
- ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1
+ ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1, implicit $agpr0_agpr1
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 0, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1 :: (store (s32) into %stack.1, addrspace 5)
- ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
- ; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 4, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1 :: (store (s32) into %stack.1 + 4, addrspace 5)
+ ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec, implicit $agpr0_agpr1
+ ; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 4, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.1 + 4, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-FLATSCR-NEXT: {{ $}}
@@ -5132,12 +5132,12 @@ body: |
; GFX908-NEXT: S_CMP_EQ_U32 0, 0, implicit-def $scc
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
; GFX908-NEXT: $vgpr1 = V_MOV_B32_e32 8200, implicit $exec
- ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
+ ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1, addrspace 5)
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 4, addrspace 5)
- ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec
- ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1 + 8, addrspace 5)
+ ; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
+ ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFEN $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 8, addrspace 5)
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-NEXT: {{ $}}
@@ -5385,7 +5385,7 @@ body: |
; GFX90A-NEXT: $vgpr40 = V_MOV_B32_e32 8904, implicit $exec
; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFEN $agpr0, $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1, addrspace 5)
; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFEN $agpr1, $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 4, addrspace 5)
- ; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFEN $agpr2, killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1 + 8, addrspace 5)
+ ; GFX90A-NEXT: BUFFER_STORE_DWORD_OFFEN $agpr2, killed $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 8, addrspace 5)
; GFX90A-NEXT: $vgpr40 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 704, 0, 0, implicit $exec :: (load (s32) from %stack.226, addrspace 5)
; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX90A-NEXT: {{ $}}
@@ -5633,12 +5633,12 @@ body: |
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR killed $vgpr1, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = V_MOV_B32_e32 $sgpr32, implicit $exec
; GFX908-FLATSCR-NEXT: $vgpr1 = V_ADD_U32_e32 8200, $vgpr1, implicit $exec
- ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
+ ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 0, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 4, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.1 + 4, addrspace 5)
- ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec
- ; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 8, 0, implicit $exec, implicit $flat_scr, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.1 + 8, addrspace 5)
+ ; GFX908-FLATSCR-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
+ ; GFX908-FLATSCR-NEXT: SCRATCH_STORE_DWORD $vgpr0, $vgpr1, 8, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.1 + 8, addrspace 5)
; GFX908-FLATSCR-NEXT: $vgpr1 = SCRATCH_LOAD_DWORD_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.2, addrspace 5)
; GFX908-FLATSCR-NEXT: S_CBRANCH_SCC1 %bb.2, implicit $scc
; GFX908-FLATSCR-NEXT: {{ $}}
diff --git a/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir b/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
index 3228962ed01f7..63c7eb6e7c83b 100644
--- a/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
+++ b/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
@@ -65,16 +65,16 @@ body: |
; GCN-NEXT: $vgpr40 = SI_SPILL_S32_TO_VGPR $sgpr30, 0, $vgpr40, implicit-def $sgpr30_sgpr31, implicit $sgpr30_sgpr31
; GCN-NEXT: $vgpr40 = SI_SPILL_S32_TO_VGPR $sgpr31, 1, $vgpr40, implicit $sgpr30_sgpr31
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr14, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 52, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15, implicit $vgpr14_vgpr15 :: (store (s32) into %stack.1, addrspace 5)
- ; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr15, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec, implicit killed $vgpr14_vgpr15 :: (store (s32) into %stack.1 + 4, addrspace 5)
+ ; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr15, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec :: (store (s32) into %stack.1 + 4, addrspace 5)
; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr10, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 60, 0, 0, implicit $exec, implicit-def $vgpr10_vgpr11, implicit $vgpr10_vgpr11 :: (store (s32) into %stack.2, addrspace 5)
- ; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr11, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 64, 0, 0, implicit $exec, implicit killed $vgpr10_vgpr11 :: (store (s32) into %stack.2 + 4, addrspace 5)
+ ; GCN-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr11, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 64, 0, 0, implicit $exec :: (store (s32) into %stack.2 + 4, addrspace 5)
; GCN-NEXT: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr16_sgpr17, 0, csr_amdgpu, implicit-def dead $vgpr0
; GCN-NEXT: $vgpr14 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 52, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15 :: (load (s32) from %stack.1, addrspace 5)
- ; GCN-NEXT: $vgpr15 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec, implicit-def $vgpr14_vgpr15 :: (load (s32) from %stack.1 + 4, addrspace 5)
+ ; GCN-NEXT: $vgpr15 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 56, 0, 0, implicit $exec :: (load (s32) from %stack.1 + 4, addrspace 5)
; GCN-NEXT: renamable $vgpr0_vgpr1 = nofpexcept V_FMA_F64_e64 0, killed $vgpr45_vgpr46, 0, killed $vgpr41_vgpr42, 0, killed $vgpr60_vgpr61, 0, 0, implicit $mode, implicit $exec
; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr58_vgpr59, killed renamable $vgpr0_vgpr1, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))
; GCN-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 60, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.2, addrspace 5)
- ; GCN-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 64, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.2 + 4, addrspace 5)
+ ; GCN-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 64, 0, 0, implicit $exec :: (load (s32) from %stack.2 + 4, addrspace 5)
; GCN-NEXT: FLAT_STORE_DWORDX2 killed renamable $vgpr0_vgpr1, killed renamable $vgpr56_vgpr57, 0, 0, implicit $exec, implicit $flat_scr :: (store (s64))
; GCN-NEXT: {{ $}}
; GCN-NEXT: bb.2:
diff --git a/llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir b/llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
index 9ce2464e0968a..45820091d59ce 100644
--- a/llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+++ b/llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
@@ -95,9 +95,9 @@ body: |
; MUBUF-LABEL: name: test_spill_av_v2
; MUBUF: $vgpr0_vgpr1 = IMPLICIT_DEF
; MUBUF-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr0_vgpr1 :: (store (s32) into %stack.0, addrspace 5)
- ; MUBUF-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec, implicit killed $vgpr0_vgpr1 :: (store (s32) into %stack.0 + 4, addrspace 5)
+ ; MUBUF-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5)
; MUBUF-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.0, addrspace 5)
- ; MUBUF-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced version of the failing situation with the lowered spill + machine-cp would be good
if (NeedSuperRegImpOperand && IsFirstSubReg) { | ||
unsigned State = RegState::Implicit | SrcDstRegState; | ||
MIB.addReg(ValueReg, State); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (NeedSuperRegImpOperand && IsFirstSubReg) { | |
unsigned State = RegState::Implicit | SrcDstRegState; | |
MIB.addReg(ValueReg, State); | |
} | |
if (NeedSuperRegImpOperand && IsFirstSubReg) | |
MIB.addReg(ValueReg, RegState::Implicit | SrcDstRegState); |
Can leave the second line alone, this is only dropping the IsLastSubReg part
Updated with the following changes:
|
5f0ea4f
to
2559b66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First line description should only contain direct description, not the issue number
--- | | ||
define amdgpu_kernel void @agpr_spill_copy() #0 { ret void } | ||
|
||
attributes #0 = { "amdgpu-num-vgpr"="32" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using amdgpu-num-vgpr. Use amdgpu-waves-per-eu (does this even need a budget restriction? There's no actual allocation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a budget, the registers will spill from vgpr
into available agpr
registers rather than onto the stack. The bug only occurs when there is enough register pressure such that some of the registers in the register group spill onto the stack, and some spill into the agprs
. Technically, I could remove the budget restriction, but that would require adding code to the test to put all of the agprs
into use (that seems much messier to me).
I made the change to using amdgpu-waves-per-eu
in this latest push, just curious as to why this is preferred? It seems like a more indirect way of getting the behavior that I want (which is limiting the number of vgpr
's to 32
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amdgpu-num-vgpr is hacky and introduces a lot of edge cases that don't actually work, and I want to remove it. It's not future proof when the number of registers change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't using amdgpu-waves-per-eu
also not robust to if the number of registers change in the context of this test? Because I'm relying on the fact that there are 256 registers so with amdgpu-waves-per-eu=8
, 32 registers are allocated for each wave, which is the condition I need for the failing case to occur. Is there a better way to write the test to avoid this dependency; my end goal is to have exactly 2 unused agpr
's when the register spill occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future proof for future targets. This test is fixed to a specific target. You should use the equivalent amdgpu-waves-per-eu
@@ -0,0 +1,43 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
# RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx942 -verify-machineinstrs -run-pass prologepilog,machine-cp -o - %s | FileCheck -check-prefix=GFX942 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx942 -verify-machineinstrs -run-pass prologepilog,machine-cp -o - %s | FileCheck -check-prefix=GFX942 %s | |
# RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx942 -run-pass prologepilog,machine-cp -o - %s | FileCheck -check-prefix=GFX942 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is updated with the latest push.
2559b66
to
99b3a44
Compare
Still should fix description |
--- | | ||
define amdgpu_kernel void @agpr_spill_copy() #0 { ret void } | ||
|
||
attributes #0 = { "amdgpu-waves-per-eu"="8" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attributes #0 = { "amdgpu-waves-per-eu"="8" } | |
attributes #0 = { "amdgpu-waves-per-eu"="8,8" } |
…-reg's are saved to agpr's Replicates llvm#131386. Currently, this test "fails", note the missing copy to $agpr14.
…the last sub-register after a spill Extension of https://reviews.llvm.org/D141101 to remove the define flag from the last stack memory access. Fixes case where COPY instructions are used for some of the stack restoration, but the copies get optimized away during the machine-cp pass. Prior to this change, was possible to produce the following code: $agpr16_agpr17_agpr18_agpr19 = SCRATCH_LOAD_DWORDX4_ST 64, 0, implicit $exec, implicit $flat_scr, implicit-def $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 :: (load (s128) from %stack.17, align 4, addrspace 5) $agpr20_agpr21_agpr22_agpr23 = SCRATCH_LOAD_DWORDX4_ST 80, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.17 + 16, align 4, addrspace 5) $agpr24_agpr25_agpr26_agpr27 = SCRATCH_LOAD_DWORDX4_ST 96, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.17 + 32, align 4, addrspace 5) $agpr31 = COPY $agpr112, implicit $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 $agpr30 = COPY $agpr208, implicit $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 $agpr28_agpr29 = SCRATCH_LOAD_DWORDX2_ST 112, 0, implicit $exec, implicit $flat_scr, implicit-def $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 :: (load (s64) from %stack.17 + 48, align 4, addrspace 5) where `$agpr30 = COPY $agpr208` would be optimized away by `machine-cp` pass. Instead, change to: $agpr28_agpr29 = SCRATCH_LOAD_DWORDX2_ST 112, 0, implicit $exec, implicit $flat_scr :: (load (s64) from %stack.17 + 48, align 4, addrspace 5) Fixes llvm#131386.
Fixed:
|
if (NeedSuperRegImpOperand && (IsFirstSubReg || IsLastSubReg)) | ||
bool IsSrcDstDef = SrcDstRegState & RegState::Define; | ||
if (NeedSuperRegImpOperand && | ||
(IsFirstSubReg || (IsLastSubReg && !IsSrcDstDef))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the second condition is doing here, why isn't this just IsFirstSubReg && isDef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the IsLastSubReg is needed for other flags, namely RegState::Kill
. Removing it, the following change is seen in the test I wrote:
- ; GFX942-NEXT: SCRATCH_STORE_DWORDX2_SADDR killed $vgpr12_vgpr13, $sgpr32, 48, 0, implicit $exec, implicit $flat_scr, implicit killed $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 :: (store (s64) into %stack.0 + 48, align 4, addrspace 5)
+ ; GFX942-NEXT: SCRATCH_STORE_DWORDX2_SADDR killed $vgpr12_vgpr13, $sgpr32, 48, 0, implicit $exec, implicit $flat_scr :: (store (s64) into %stack.0 + 48, align 4, addrspace 5)
This seems to be the analog of implicit def
on the first sub-register.
@bababuck Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
…the last sub-register after a spill (llvm#133986) Extension of https://reviews.llvm.org/D141101 to remove the define flag from the last stack memory access. Fixes case where COPY instructions are used for some of the stack restoration, but the copies get optimized away during the machine-cp pass. Prior to this change, was possible to produce the following code: $agpr16_agpr17_agpr18_agpr19 = SCRATCH_LOAD_DWORDX4_ST 64, 0, implicit $exec, implicit $flat_scr, implicit-def $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 :: (load (s128) from %stack.17, align 4, addrspace 5) $agpr20_agpr21_agpr22_agpr23 = SCRATCH_LOAD_DWORDX4_ST 80, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.17 + 16, align 4, addrspace 5) $agpr24_agpr25_agpr26_agpr27 = SCRATCH_LOAD_DWORDX4_ST 96, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.17 + 32, align 4, addrspace 5) $agpr31 = COPY $agpr112, implicit $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 $agpr30 = COPY $agpr208, implicit $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 $agpr28_agpr29 = SCRATCH_LOAD_DWORDX2_ST 112, 0, implicit $exec, implicit $flat_scr, implicit-def $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 :: (load (s64) from %stack.17 + 48, align 4, addrspace 5) where `$agpr30 = COPY $agpr208` would be optimized away by `machine-cp` pass. Instead, change to: $agpr28_agpr29 = SCRATCH_LOAD_DWORDX2_ST 112, 0, implicit $exec, implicit $flat_scr :: (load (s64) from %stack.17 + 48, align 4, addrspace 5) Fixes llvm#131386. Made the simple fix, but I'm not completely comfortable with this change since the reason for the previous inclusion of `IsLastSubReg` is unclear to me. @krzysz00
Extension of https://reviews.llvm.org/D141101 to remove the define flag from the last stack memory access. Fixes case where COPY instructions are used for some of the stack restoration, but the copies get optimized away during the machine-cp pass.
Prior to this change, was possible to produce the following code:
$agpr16_agpr17_agpr18_agpr19 = SCRATCH_LOAD_DWORDX4_ST 64, 0, implicit $exec, implicit $flat_scr, implicit-def $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 :: (load (s128) from %stack.17, align 4, addrspace 5)
$agpr20_agpr21_agpr22_agpr23 = SCRATCH_LOAD_DWORDX4_ST 80, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.17 + 16, align 4, addrspace 5)
$agpr24_agpr25_agpr26_agpr27 = SCRATCH_LOAD_DWORDX4_ST 96, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.17 + 32, align 4, addrspace 5)
$agpr31 = COPY $agpr112, implicit $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31
$agpr30 = COPY $agpr208, implicit $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31
$agpr28_agpr29 = SCRATCH_LOAD_DWORDX2_ST 112, 0, implicit $exec, implicit $flat_scr, implicit-def $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23_agpr24_agpr25_agpr26_agpr27_agpr28_agpr29_agpr30_agpr31 :: (load (s64) from %stack.17 + 48, align 4, addrspace 5)
where
$agpr30 = COPY $agpr208
would be optimized away bymachine-cp
pass. Instead, change to:$agpr28_agpr29 = SCRATCH_LOAD_DWORDX2_ST 112, 0, implicit $exec, implicit $flat_scr :: (load (s64) from %stack.17 + 48, align 4, addrspace 5)
Fixes #131386.
Made the simple fix, but I'm not completely comfortable with this change since the reason for the previous inclusion of
IsLastSubReg
is unclear to me.@krzysz00