Skip to content

[LLVM] Make s_getpc_b64 rematerializable #71823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 23, 2023

Conversation

Acim-Maravic
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Nov 9, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: None (Acim-Maravic)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/71823.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SOPInstructions.td (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/remat-sop.mir (+22)
diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 0a16a07cb5ec35b..ffcbf48461aa531 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -292,6 +292,7 @@ def S_BITSET0_B64 : SOP1_64_32 <"s_bitset0_b64", [], 1>;
 def S_BITSET1_B32 : SOP1_32    <"s_bitset1_b32", [], 1>;
 def S_BITSET1_B64 : SOP1_64_32 <"s_bitset1_b64", [], 1>;
 
+let isReMaterializable = 1 in
 def S_GETPC_B64 : SOP1_64_0  <"s_getpc_b64",
   [(set i64:$sdst, (int_amdgcn_s_getpc))]
 >;
diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir
index 649f0d7f7799637..f3caa1a588e32be 100644
--- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir
+++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir
@@ -573,3 +573,25 @@ body:             |
     S_NOP 0, implicit %2
     S_ENDPGM 0
 ...
+
+---
+name:            test_remat_s_getpc_b64
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: test_remat_s_getpc_b64
+    ; GCN: renamable $sgpr0_sgpr1 = S_GETPC_B64
+    ; GCN-NEXT: renamable $sgpr2_sgpr3 = S_GETPC_B64
+    ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1
+    ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr2_sgpr3
+    ; GCN-NEXT: renamable $sgpr0_sgpr1 = S_GETPC_B64
+    ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0_sgpr1
+    ; GCN-NEXT: S_ENDPGM 0
+    %0:sgpr_64 = S_GETPC_B64
+    %1:sgpr_64 = S_GETPC_B64
+    %2:sgpr_64 = S_GETPC_B64
+    S_NOP 0, implicit %0
+    S_NOP 0, implicit %1
+    S_NOP 0, implicit %2
+    S_ENDPGM 0
+...

@Acim-Maravic
Copy link
Contributor Author

I found two places that can code-generate an S_GETPC_B64, one in function SIInstrInfo::insertIndirectBranch() and one in function SIInstrInfo::expandPostRAPseudo(). I think that both of these are happening after register allocation, so at that point rematerialization can not be done.

@jayfoad
Copy link
Contributor

jayfoad commented Nov 9, 2023

I found two places that can code-generate an S_GETPC_B64, one in function SIInstrInfo::insertIndirectBranch() and one in function SIInstrInfo::expandPostRAPseudo(). I think that both of these are happening after register allocation, so at that point rematerialization can not be done.

Right. Rematerializing getpc is a weird case, because the rematerialized instruction will definitely not have the same result as the original instruction. But I guess the only reasonable use of getpc (pre-RA) is calling the intrinsic llvm.amdgc.s.getpc and then extracting the high 32 bits of the result, which should be safe.

Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case doesn't actually show the S_GETPC_B64s as getting merged or rematerialized. I suspect we want a test structure like:

  %x = s_getpc_b64
  inline asm that uses approximately all SGPRs
  use %x here

as a pre-commit which should spill %x without your change. And then your change causes it to instead rematerialize the s_getpc_b64.

Keep the pre-commit in the same PR, i.e. your PR should have two commits:

  1. Add the new test with the CHECK lines corresponding to behavior before your change
  2. The change to isRematerializable plus changes to the test caused by it

@piotrAMD
Copy link
Collaborator

The option --stress-regalloc=2 passed in the test has the effect of limiting the number of available registers already.
Separating out the test as a pre-commit, as Nicolai said, will demonstrate this (hopefully!).

@Acim-Maravic Acim-Maravic force-pushed the rematerializable branch 2 times, most recently from 280606d to 12a7f7d Compare November 13, 2023 18:00
Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Acim, in general it would still be good to split this into two commits, but I suppose for this one it can go in as-is.

S_NOP 0, implicit %1
S_NOP 0, implicit %2
S_ENDPGM 0
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also work if it's in the bundle sequence with the following add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the SI_PC_ADD_REL_OFFSET sequence? That's not even expanded to a bundle until after RA.

@mbrkusanin mbrkusanin merged commit 376b22a into llvm:main Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants