Skip to content

[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. #68293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 2, 2023

Conversation

tsymalla
Copy link
Contributor

@tsymalla tsymalla commented Oct 5, 2023

During the SIOptimizeExecMasking pass, we try to form V_CMPX instructions by detecting S_AND_SAVEEXEC and V_MOV instructions. Generally, we require the input operand of the V_MOV, which is the input operand to the to-be-formed V_CMPX, to be alive. This is forced by clearing the kill flags on the operand after V_CMPX has been generated.

However, if we have a kill of a register set that contains said register, this will not be detected by clearKillFlags.
With this change, possible additional kill-flag candidates will be detected during the final call to findInstrBackwards and then, the kill flag will be removed to keep all registers in the set alive.

@llvmbot
Copy link
Member

llvmbot commented Oct 5, 2023

@llvm/pr-subscribers-backend-amdgpu

Changes

During the SIOptimizeExecMasking pass, we try to form V_CMPX instructions by detecting S_AND_SAVEEXEC and V_MOV instructions. Generally, we require the input operand of the V_MOV, which is the input operand to the to-be-formed V_CMPX, to be alive. This is forced by clearing the kill flags on the operand after V_CMPX has been generated.

However, if we have a kill of a renamable register that contains said register, this will not be detected by clearKillFlags. Since clearing the kill flags will possibly cause the other VGPRs to stay alive as well, we skip forming a V_CMPX instruction in that case.


Full diff: https://github.com/llvm/llvm-project/pull/68293.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp (+16-2)
  • (added) llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-renamable-kill.mir (+25)
diff --git a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
index 04c9a6457944c5f..e3082f7d8aa1742 100644
--- a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
+++ b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
@@ -49,6 +49,7 @@ class SIOptimizeExecMasking : public MachineFunctionPass {
   MachineInstr *findInstrBackwards(MachineInstr &Origin,
                                    std::function<bool(MachineInstr *)> Pred,
                                    ArrayRef<MCRegister> NonModifiableRegs,
+                                   MachineInstr *Terminator = nullptr,
                                    unsigned MaxInstructions = 20) const;
   bool optimizeExecSequence();
   void tryRecordVCmpxAndSaveexecSequence(MachineInstr &MI);
@@ -329,7 +330,8 @@ static bool isLiveOut(const MachineBasicBlock &MBB, unsigned Reg) {
 // registers given in NonModifiableRegs is modified by the current instruction.
 MachineInstr *SIOptimizeExecMasking::findInstrBackwards(
     MachineInstr &Origin, std::function<bool(MachineInstr *)> Pred,
-    ArrayRef<MCRegister> NonModifiableRegs, unsigned MaxInstructions) const {
+    ArrayRef<MCRegister> NonModifiableRegs, MachineInstr *Terminator,
+    unsigned MaxInstructions) const {
   MachineBasicBlock::reverse_iterator A = Origin.getReverseIterator(),
                                       E = Origin.getParent()->rend();
   unsigned CurrentIteration = 0;
@@ -344,6 +346,17 @@ MachineInstr *SIOptimizeExecMasking::findInstrBackwards(
     for (MCRegister Reg : NonModifiableRegs) {
       if (A->modifiesRegister(Reg, TRI))
         return nullptr;
+
+      // Check for kills that appear after the terminator instruction, that
+      // would not be detected by clearKillFlags, since they will cause the
+      // register to be dead at a later place, causing the verifier to fail.
+      if (Terminator && A != Terminator && A->killsRegister(Reg, TRI)) {
+        for (MachineOperand &MO : A->operands()) {
+          if (MO.isReg() && MO.isKill() && MO.isRenamable() &&
+              TRI->regsOverlap(MO.getReg(), Reg))
+            return nullptr;
+        }
+      }
     }
 
     ++CurrentIteration;
@@ -690,7 +703,8 @@ void SIOptimizeExecMasking::tryRecordVCmpxAndSaveexecSequence(
     NonDefRegs.push_back(Src1->getReg());
 
   if (!findInstrBackwards(
-          MI, [&](MachineInstr *Check) { return Check == VCmp; }, NonDefRegs))
+          MI, [&](MachineInstr *Check) { return Check == VCmp; }, NonDefRegs,
+          VCmp))
     return;
 
   if (VCmp)
diff --git a/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-renamable-kill.mir b/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-renamable-kill.mir
new file mode 100644
index 000000000000000..d4d0cb12f217544
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-renamable-kill.mir
@@ -0,0 +1,25 @@
+# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=si-optimize-exec-masking -verify-machineinstrs %s -o - | FileCheck -check-prefix=GFX1100 %s
+
+---
+
+# GFX1100-LABEL: name: vcmp_saveexec_to_vcmpx_kill_between
+# GFX1100-NOT: V_CMPX
+name: vcmp_saveexec_to_vcmpx_kill_between
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr43, $sgpr44, $sgpr45, $sgpr55, $sgpr62, $sgpr63, $sgpr64, $sgpr65, $sgpr66, $sgpr67, $vgpr40, $vgpr41, $vgpr76, $vgpr77, $vgpr78, $vgpr95, $vgpr109, $vgpr110, $vgpr111, $sgpr48_sgpr49_sgpr50_sgpr51:0x000000000000000C, $sgpr52_sgpr53_sgpr54_sgpr55:0x0000000000000003, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr56_sgpr57, $sgpr58_sgpr59, $sgpr60_sgpr61, $vgpr92_vgpr93_vgpr94_vgpr95:0x000000000000003F, $vgpr104_vgpr105_vgpr106_vgpr107:0x000000000000003F, $vgpr46_vgpr47:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F, $vgpr62_vgpr63:0x000000000000000C, $vgpr72_vgpr73:0x000000000000000F, $vgpr74_vgpr75:0x000000000000000F, $vgpr88_vgpr89:0x000000000000000C, $vgpr90_vgpr91:0x0000000000000003, $vgpr124_vgpr125:0x000000000000000F, $vgpr126_vgpr127:0x000000000000000F
+
+  liveins: $vgpr0
+  renamable $vgpr1 = COPY $vgpr0, implicit $exec
+  renamable $sgpr0 = S_MOV_B32 0
+  renamable $sgpr1 = COPY renamable $sgpr0
+  renamable $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+  renamable $vcc_lo = V_CMP_EQ_U32_e64 0, $vgpr1, implicit $exec
+  renamable $sgpr2 = COPY renamable $sgpr0
+  renamable $sgpr3 = COPY renamable $sgpr0
+  BUFFER_STORE_DWORDX2_OFFSET_exact killed renamable $vgpr0_vgpr1, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 344, 1, 0, implicit $exec :: (dereferenceable store (s64), align 1, addrspace 8)
+  $sgpr0 = S_MOV_B32 $exec_lo
+  $sgpr0 = S_AND_SAVEEXEC_B32 $vcc_lo, implicit-def $exec, implicit-def $scc, implicit $exec
+  $exec_lo = S_MOV_B32_term killed renamable $sgpr0
+...

@jayfoad
Copy link
Contributor

jayfoad commented Oct 5, 2023

I don't think this has anything to do with "renamable" does it? The test case should still provoke the bug even if you remove all the "renamable" flags.

@tsymalla
Copy link
Contributor Author

tsymalla commented Oct 5, 2023

I don't think this has anything to do with "renamable" does it? The test case should still provoke the bug even if you remove all the "renamable" flags.

It rather has to do with the register being contained in the set of the registers to be killed, right. I'll adjust the check.

@tsymalla tsymalla force-pushed the fix_kill_vcmpx branch 2 times, most recently from fabb658 to de7de92 Compare October 5, 2023 09:57
@tsymalla tsymalla changed the title [AMDGPU] Detect renamable kills when trying to form V_CMPX instructions. [AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. Oct 5, 2023
@tsymalla
Copy link
Contributor Author

tsymalla commented Oct 5, 2023

Do some additional bookkeeping to store the MachineOperands that are potentially left "killed" and remove the kill flag on them at the end.

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

…tructions.

During the SIOptimizeExecMasking pass, we try to form V_CMPX instructions by
detecting S_AND_SAVEEXEC and V_MOV instructions. Generally, we require the input
operand of the V_MOV, which is the input operand to the to-be-formed V_CMPX, to
be alive. This is forced by clearing the kill flags on the operand after V_CMPX
has been generated.

However, if we have a kill of a register set that contains said register, this
will not be detected by clearKillFlags.
With this change, possible additional kill-flag candidates will be
detected during the final call to findInstrBackwards and then, the kill
flag will be removed to keep all registers in the set alive.
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just drop kill flags without prejudice? We shouldn't really need them anymore at this point

@tsymalla
Copy link
Contributor Author

tsymalla commented Oct 5, 2023

Can you just drop kill flags without prejudice? We shouldn't really need them anymore at this point

I am not sure about that.

@perlfu
Copy link
Contributor

perlfu commented Oct 6, 2023

Can you just drop kill flags without prejudice? We shouldn't really need them anymore at this point

I am not sure about that.

In the general case we still use kill flags after this point in SIShrinkInstructions and SIPreEmitPeephole.
More specifically can we drop kill flags from anything used by V_CMPX?
I would say "yes", but I think that is basically what this patch is trying to do?

Perhaps Matt is suggesting that we permute all the sub and super registers of the registers which are passed to clearKillFlags, and call clearKillFlags on those as well.

@tsymalla
Copy link
Contributor Author

tsymalla commented Oct 6, 2023

Can you just drop kill flags without prejudice? We shouldn't really need them anymore at this point

I am not sure about that.

In the general case we still use kill flags after this point in SIShrinkInstructions and SIPreEmitPeephole. More specifically can we drop kill flags from anything used by V_CMPX? I would say "yes", but I think that is basically what this patch is trying to do?

Yes, this is what this patch does (erasing the kill flags on everything that could block the V_CMPX operand from being live).

Perhaps Matt is suggesting that we permute all the sub and super registers of the registers which are passed to clearKillFlags, and call clearKillFlags on those as well.

Makes sense, but it seems more destructive than just dropping the flag from the relevant instructions (speaking of passes like SIShrinkInstructions and SIPreEmitPeephole).

@jayfoad
Copy link
Contributor

jayfoad commented Oct 10, 2023

Can you just drop kill flags without prejudice? We shouldn't really need them anymore at this point

I am not sure about that.

In the general case we still use kill flags after this point in SIShrinkInstructions and SIPreEmitPeephole. More specifically can we drop kill flags from anything used by V_CMPX? I would say "yes", but I think that is basically what this patch is trying to do?

Perhaps Matt is suggesting that we permute all the sub and super registers of the registers which are passed to clearKillFlags, and call clearKillFlags on those as well.

I tried that in #68675.

@tsymalla
Copy link
Contributor Author

Ping (for @jayfoad s change as well)

@tsymalla tsymalla requested a review from nhaehnle October 23, 2023 16:45
Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me.

@tsymalla tsymalla merged commit 18839ae into llvm:main Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants