-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLVM] Make s_getpc_b64 rematerializable #71823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: None (Acim-Maravic) ChangesFull diff: https://github.com/llvm/llvm-project/pull/71823.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 0a16a07cb5ec35b..ffcbf48461aa531 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -292,6 +292,7 @@ def S_BITSET0_B64 : SOP1_64_32 <"s_bitset0_b64", [], 1>;
def S_BITSET1_B32 : SOP1_32 <"s_bitset1_b32", [], 1>;
def S_BITSET1_B64 : SOP1_64_32 <"s_bitset1_b64", [], 1>;
+let isReMaterializable = 1 in
def S_GETPC_B64 : SOP1_64_0 <"s_getpc_b64",
[(set i64:$sdst, (int_amdgcn_s_getpc))]
>;
diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir
index 649f0d7f7799637..f3caa1a588e32be 100644
--- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir
+++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir
@@ -573,3 +573,25 @@ body: |
S_NOP 0, implicit %2
S_ENDPGM 0
...
+
+---
+name: test_remat_s_getpc_b64
+tracksRegLiveness: true
+body: |
+ bb.0:
+ ; GCN-LABEL: name: test_remat_s_getpc_b64
+ ; GCN: renamable $sgpr0_sgpr1 = S_GETPC_B64
+ ; GCN-NEXT: renamable $sgpr2_sgpr3 = S_GETPC_B64
+ ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1
+ ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3
+ ; GCN-NEXT: renamable $sgpr0_sgpr1 = S_GETPC_B64
+ ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1
+ ; GCN-NEXT: S_ENDPGM 0
+ %0:sgpr_64 = S_GETPC_B64
+ %1:sgpr_64 = S_GETPC_B64
+ %2:sgpr_64 = S_GETPC_B64
+ S_NOP 0, implicit %0
+ S_NOP 0, implicit %1
+ S_NOP 0, implicit %2
+ S_ENDPGM 0
+...
|
I found two places that can code-generate an S_GETPC_B64, one in function SIInstrInfo::insertIndirectBranch() and one in function SIInstrInfo::expandPostRAPseudo(). I think that both of these are happening after register allocation, so at that point rematerialization can not be done. |
Right. Rematerializing getpc is a weird case, because the rematerialized instruction will definitely not have the same result as the original instruction. But I guess the only reasonable use of getpc (pre-RA) is calling the intrinsic llvm.amdgc.s.getpc and then extracting the high 32 bits of the result, which should be safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case doesn't actually show the S_GETPC_B64
s as getting merged or rematerialized. I suspect we want a test structure like:
%x = s_getpc_b64
inline asm that uses approximately all SGPRs
use %x here
as a pre-commit which should spill %x
without your change. And then your change causes it to instead rematerialize the s_getpc_b64.
Keep the pre-commit in the same PR, i.e. your PR should have two commits:
- Add the new test with the CHECK lines corresponding to behavior before your change
- The change to isRematerializable plus changes to the test caused by it
The option |
280606d
to
12a7f7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. Acim, in general it would still be good to split this into two commits, but I suppose for this one it can go in as-is.
S_NOP 0, implicit %1 | ||
S_NOP 0, implicit %2 | ||
S_ENDPGM 0 | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also work if it's in the bundle sequence with the following add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the SI_PC_ADD_REL_OFFSET sequence? That's not even expanded to a bundle until after RA.
12a7f7d
to
45c5f46
Compare
No description provided.