[MachineLateInstrsCleanup] Handle multiple kills for a preceding definition. #119132

JonPsson1 · 2024-12-08T16:55:40Z

When removing a redundant definition in order to reuse an earlier identical one it is necessary to remove any earlier kill flag as well.

Previously, the assumption has been that any register that kills the defined Reg is enough to handle for this purpose, but this is actually not quite enough. A kill of a super-register does not necessarily imply that all of its subregs (including Reg) is defined at that point: a partial definition of a register is legal. This means Reg may have been killed earlier and is not live at that point.

This patch changes the tracking of kill flags to allow for multiple flags to be removed: instead of remembering just the single / latest kill flag, a vector is now used to track and remove them all. TinyPtrVector seems ideal for this as there are only very rarely more than one kill flag, and it doesn't seem to give much difference in compile time.

The kill flags handling here is making this pass much more complicated than it would have to be. This pass does not depend on kill flags for its own use, so an interesting alternative to all this handling would be in case there is no real use of them at this late stage, maybe just remove them all. If there actually is a serious user, maybe that pass could instead recompute them and also be able to trust them fully.

Also adding an assertion which is unrelated to kill flags, but it seems to make sense (according to liberal assertion policy), to verify that the preceding definition is in fact identical in clearKillsForDef(). This is a little border-line as this requires having an extra argument to that function, but my guess is that this will not matter. Could possibly wait with this as a a separate patch, or even keep skipping it.

Fixes #117783

@arsenm @nikic @RKSimon @jayfoad @topperc @uweigand

llvmbot · 2024-12-08T16:56:11Z

@llvm/pr-subscribers-backend-systemz

Author: Jonas Paulsson (JonPsson1)

Changes

When removing a redundant definition in order to reuse an earlier identical one it is necessary to remove any earlier kill flag as well.

Previously, the assumption has been that any register that kills the defined Reg is enough to handle for this purpose, but this is actually not quite enough. A kill of a super-register does not necessarily imply that all of its subregs (including Reg) is defined at that point: a partial definition of a register is legal. This means Reg may have been killed earlier and is not live at that point.

This patch changes the tracking of kill flags to allow for multiple flags to be removed: instead of remembering just the single / latest kill flag, a vector is now used to track and remove them all. TinyPtrVector seems ideal for this as there are only very rarely more than one kill flag, and it doesn't seem to give much difference in compile time.

The kill flags handling here is making this pass much more complicated than it would have to be. This pass does not depend on kill flags for its own use, so an interesting alternative to all this handling would be in case there is no real use of them at this late stage, maybe just remove them all. If there actually is a serious user, maybe that pass could instead recompute them and also be able to trust them fully.

Also adding an assertion which is unrelated to kill flags, but it seems to make sense (according to liberal assertion policy), to verify that the preceding definition is in fact identical in clearKillsForDef(). This is a little border-line as this requires having an extra argument to that function, but my guess is that this will not matter. Could possibly wait with this as a a separate patch, or even keep skipping it.

Fixes #117783

@arsenm @nikic @RKSimon @jayfoad @topperc

Full diff: https://github.com/llvm/llvm-project/pull/119132.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp (+24-18)
(added) llvm/test/CodeGen/SystemZ/machine-latecleanup-kills.mir (+52)

diff --git a/llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp b/llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
index 6399e8a9523685..5fefa2d8fbbab9 100644
--- a/llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
+++ b/llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
@@ -47,9 +47,9 @@ class MachineLateInstrsCleanup : public MachineFunctionPass {
       return MI && MI->isIdenticalTo(*ArgMI);
     }
   };
-
+  typedef SmallDenseMap<Register, TinyPtrVector<MachineInstr *>> Reg2MIVecMap;
   std::vector<Reg2MIMap> RegDefs;
-  std::vector<Reg2MIMap> RegKills;
+  std::vector<Reg2MIVecMap> RegKills;
 
   // Walk through the instructions in MBB and remove any redundant
   // instructions.
@@ -57,7 +57,7 @@ class MachineLateInstrsCleanup : public MachineFunctionPass {
 
   void removeRedundantDef(MachineInstr *MI);
   void clearKillsForDef(Register Reg, MachineBasicBlock *MBB,
-                        BitVector &VisitedPreds);
+                        BitVector &VisitedPreds, MachineInstr *ToRemoveMI);
 
 public:
   static char ID; // Pass identification, replacement for typeid
@@ -113,19 +113,25 @@ bool MachineLateInstrsCleanup::runOnMachineFunction(MachineFunction &MF) {
 // definition.
 void MachineLateInstrsCleanup::clearKillsForDef(Register Reg,
                                                 MachineBasicBlock *MBB,
-                                                BitVector &VisitedPreds) {
+                                                BitVector &VisitedPreds,
+                                                MachineInstr *ToRemoveMI) {
   VisitedPreds.set(MBB->getNumber());
 
-  // Kill flag in MBB
-  if (MachineInstr *KillMI = RegKills[MBB->getNumber()].lookup(Reg)) {
-    KillMI->clearRegisterKills(Reg, TRI);
-    return;
-  }
+  // Clear kill flag(s) in MBB, that have been seen after the preceding
+  // definition. If Reg or one of its subregs was killed, it would actually
+  // be ok to stop after removing that (and any other) kill-flag, but it
+  // doesn't seem noticeably faster while it would be a bit more complicated.
+  Reg2MIVecMap &MBBKills = RegKills[MBB->getNumber()];
+  if (MBBKills.contains(Reg))
+    for (auto *KillMI : MBBKills[Reg])
+      KillMI->clearRegisterKills(Reg, TRI);
 
-  // Def in MBB (missing kill flag)
-  if (MachineInstr *DefMI = RegDefs[MBB->getNumber()].lookup(Reg))
-    if (DefMI->getParent() == MBB)
-      return;
+  // Definition in current MBB: done.
+  Reg2MIMap &MBBDefs = RegDefs[MBB->getNumber()];
+  MachineInstr *DefMI = MBBDefs[Reg];
+  assert(DefMI->isIdenticalTo(*ToRemoveMI) && "Previous def not identical?");
+  if (DefMI->getParent() == MBB)
+    return;
 
   // If an earlier def is not in MBB, continue in predecessors.
   if (!MBB->isLiveIn(Reg))
@@ -133,13 +139,13 @@ void MachineLateInstrsCleanup::clearKillsForDef(Register Reg,
   assert(!MBB->pred_empty() && "Predecessor def not found!");
   for (MachineBasicBlock *Pred : MBB->predecessors())
     if (!VisitedPreds.test(Pred->getNumber()))
-      clearKillsForDef(Reg, Pred, VisitedPreds);
+      clearKillsForDef(Reg, Pred, VisitedPreds, ToRemoveMI);
 }
 
 void MachineLateInstrsCleanup::removeRedundantDef(MachineInstr *MI) {
   Register Reg = MI->getOperand(0).getReg();
   BitVector VisitedPreds(MI->getMF()->getNumBlockIDs());
-  clearKillsForDef(Reg, MI->getParent(), VisitedPreds);
+  clearKillsForDef(Reg, MI->getParent(), VisitedPreds, MI);
   MI->eraseFromParent();
   ++NumRemoved;
 }
@@ -175,7 +181,7 @@ static bool isCandidate(const MachineInstr *MI, Register &DefedReg,
 bool MachineLateInstrsCleanup::processBlock(MachineBasicBlock *MBB) {
   bool Changed = false;
   Reg2MIMap &MBBDefs = RegDefs[MBB->getNumber()];
-  Reg2MIMap &MBBKills = RegKills[MBB->getNumber()];
+  Reg2MIVecMap &MBBKills = RegKills[MBB->getNumber()];
 
   // Find reusable definitions in the predecessor(s).
   if (!MBB->pred_empty() && !MBB->isEHPad() &&
@@ -225,8 +231,8 @@ bool MachineLateInstrsCleanup::processBlock(MachineBasicBlock *MBB) {
         MBBDefs.erase(Reg);
         MBBKills.erase(Reg);
       } else if (MI.findRegisterUseOperandIdx(Reg, TRI, true /*isKill*/) != -1)
-        // Keep track of register kills.
-        MBBKills[Reg] = &MI;
+        // Keep track of all instructions that fully or partially kills Reg.
+        MBBKills[Reg].push_back(&MI);
     }
 
     // Record this MI for potential later reuse.
diff --git a/llvm/test/CodeGen/SystemZ/machine-latecleanup-kills.mir b/llvm/test/CodeGen/SystemZ/machine-latecleanup-kills.mir
new file mode 100644
index 00000000000000..51ae602fcd517c
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/machine-latecleanup-kills.mir
@@ -0,0 +1,52 @@
+# RUN: llc -mtriple=s390x-linux-gnu -run-pass=machine-latecleanup %s -o - -mcpu=z16 \
+# RUN:   -verify-machineinstrs 2>&1 | FileCheck %s
+
+# Kill flag of $r0q (super-reg) needs to be removed, and $r0l needs to be added as live-in.
+# CHECK-LABEL: name: fun0
+# CHECK-LABEL: bb.0:
+# CHECK:         renamable $r0l = LHIMux -1
+# CHECK-NEXT:    J %bb.1
+# CHECK-LABEL: bb.1:
+# CHECK-NEXT:    liveins: $r0l
+# CHECK:         renamable $r1d = LGFI 0
+# CHECK-NEXT:    ST128 renamable $r0q, $r15d, 168, $noreg
+# CHECK-NEXT:    ST killed renamable $r0l, $r15d, 160, $noreg
+# CHECK-NEXT:    Return
+---
+name:            fun0
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    renamable $r0l = LHIMux -1
+    J %bb.1
+
+  bb.1:
+    renamable $r1d = LGFI 0
+    ST128 killed renamable $r0q, $r15d, 168, $noreg
+    renamable $r0l = LHIMux -1
+    ST killed renamable $r0l, $r15d, 160, $noreg
+    Return
+...
+
+# Kill flags of both $r1d and $r0q (super-reg) need to be removed.
+# CHECK-LABEL: name: fun1
+# CHECK-LABEL: bb.0:
+# CHECK-NEXT:    renamable $r1d = LLILL 1
+# CHECK-NEXT:    STG renamable $r1d, killed $r15d, 8, $noreg
+# CHECK-NEXT:    renamable $r0d = LLILL 0
+# CHECK-NEXT:    ST128 renamable $r0q, $r15d, 0, $noreg
+# CHECK-NEXT:    STG killed renamable $r1d, killed $r15d, 8, $noreg
+# CHECK-NEXT:    Return
+---
+name:            fun1
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    renamable $r1d = LLILL 1
+    STG killed renamable $r1d, killed $r15d, 8, $noreg
+    renamable $r0d = LLILL 0
+    ST128 killed renamable $r0q, $r15d, 0, $noreg
+    renamable $r1d = LLILL 1
+    STG killed renamable $r1d, killed $r15d, 8, $noreg
+    Return
+...

arsenm · 2024-12-08T20:22:53Z

If there actually is a serious user, maybe that pass could instead recompute them and also be able to trust them fully.

No, there should be no users. All users should be changed to not rely on kill flags

arsenm · 2024-12-08T20:26:22Z

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp

+  if (MBBKills.contains(Reg))
+    for (auto *KillMI : MBBKills[Reg])
+      KillMI->clearRegisterKills(Reg, TRI);


Double map lookup

Doesn't the [] operator add an entry to the map if its missing? In this case a mapping of Reg to an empty vector.

Maybe:

if (auto Kills = MBBKills.find(Reg); Kills != MBBKills.end()) for (auto *KillMI : *Kills)

ok - tried this now, although I think it's a bit less readable.

llvm/test/CodeGen/SystemZ/machine-latecleanup-kills.mir

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp

JonPsson1 · 2024-12-08T21:45:43Z

If there actually is a serious user, maybe that pass could instead recompute them and also be able to trust them fully.

No, there should be no users. All users should be changed to not rely on kill flags

There seems to be users in the Target backends - which could include late things like expandPostRAPseudo(), perhaps. There are also some use of isKill() in BranchFolding and probably other places. Do you think I could try to just remove kill flags a bit more carelessly ("ahead of time", even in the many cases where there is no following identical definition), and go ahead with this if no tests fail? Or are you saying there is some work left to be done in places that we should aim to do first?

JonPsson1 · 2024-12-10T15:42:32Z

You could also just do MRI.clearKillFlags any time it could matter

I tried this now, and found that

Still need to update live-in lists recursively: At first I just added Reg as live-in to the current MBB which the verifier accepted (it would have been better if it could report a missing definition for a live-out register in the predecessor). It however became clear that with the CFG optimizations running later on, all the predecessors up the CFG need to have the live-in list updated.
It's still a nice simplification to the algorithm (to take a look, see HEAD^ on the branch / dc4ff43).
Practically NFC on SPEC / SystemZ: only 7 files change without any real impact. This seemed due to BranchFolding.cpp mostly, and also MachineCopyPropagation in one case.
Only two CodeGen tests fail, however CodeGen/Thumb/frame-access.ll crashes with an assertion, which was unexpected:

MachineOperand::isKill() const: Assertion `isReg() && "Wrong MachineOperand accessor"'

bool ARMLoadStoreOpt::CombineMovBx(MachineBasicBlock &MBB) {
  MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
  if (MBBI == MBB.begin() || MBBI == MBB.end() ||
      MBBI->getOpcode() != ARM::tBX_RET)
    return false;

  MachineBasicBlock::iterator Prev = MBBI;
  --Prev;
  if (Prev->getOpcode() != ARM::tMOVr ||
      !Prev->definesRegister(ARM::LR, /*TRI=*/nullptr))
    return false;

  for (auto Use : Prev->uses())
=>  if (Use.isKill()) {

(gdb) p Prev->dump()
  $lr = frame-destroy tMOVr $r1, 14, $noreg
 (gdb) p Use.dump()
14

It seems there is some assumption there that some register operand must be a kill.

There is also ReachingDefAnalysis.cpp that checks for kills, which is used in BreakFalseDeps.cpp (only used by ARM / X86 at the moment).

Due to the above and in particular the crash in ARMLoadStoreOpt (which has a lot of checks for kills), I reverted back to the tracking of kill flags and clearing them carefully as per before. I would agree that there is room for improvement here, and at least on SystemZ it seems completely fine to remove any and all kill flags after regalloc.

JonPsson1 · 2025-01-08T18:12:12Z

Ping!

Maybe at some point there could be something like a LivenessAnalysis pass that could be required by late passes that actually depend on correct livein-lists and kill flags? That way passes would not be required to preserve this info - which is much more cumbersome than (re)computing it in one place - and users would get all the kill flags including the ones conservatively removed.

For now, it seems that MachineLateInstrsCleanup has to continue to update this info as best as it can, with this patch fixing the case of a superregister kill.

JonPsson1 · 2025-01-22T22:41:41Z

Ping!

JonPsson1 · 2025-01-28T18:40:25Z

Ping!

Adding some more potential reviewers:
@vpykhtin
@nickdesaulniers
@efriedma-quic

JonPsson1 · 2025-01-31T20:24:52Z

updated per review

JonPsson1 · 2025-02-11T16:18:33Z

Ping! Does this look ok now?

JonPsson1 · 2025-02-20T15:02:24Z

PING!

Any objections to this?

JonPsson1 · 2025-03-03T22:51:54Z

Ping. Given that this was a problem reported by the MachineVerifier and that the patch is an improvement which handles cases of super-reg kills, I think I need to push this soon, if no one can see anything wrong with it.

The discussion around kill flags remains: I have tried your suggestions to simply remove them liberally, but that hasn't really worked. Tests breaks and even assertions trigger. So I think this is unfortunately a topic that would need a separate discussion: Should we remove the burden on optimizers to update kill flags, and if so how? Find lost optimizations acceptable, or recompute kill flags for passes that really depend on them?

@vpykhtin @nickdesaulniers

JonPsson1 · 2025-03-10T21:39:25Z

Adding more reviewers with the hopes that someone can take a look at my handling of kill flags here, thanks.

@fhahn @MatzeB @jayfoad @topperc @bevin-hansson @serguei-katkov @qcolombet @preames @sdesmalen-arm @perlfu

perlfu · 2025-03-11T11:05:44Z

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp

+  // Definition in current MBB: done.
+  Reg2MIMap &MBBDefs = RegDefs[MBB->getNumber()];
+  MachineInstr *DefMI = MBBDefs[Reg];
+  assert(DefMI->isIdenticalTo(*ToRemoveMI) && "Previous def not identical?");


Can you give more detail on the addition of this assert? Specifically because ToRemoveMI seems to exist only to support this?

Yeah, I guess maybe that assertion could be a separate patch. The whole idea of the pass is to remove a redundant definition because there is an identical dominating one. Given the CFG traversal involved here (DefMI may be in a predecessor), it seems fair to do the assertion. Should I remove it from this patch now, you think?

I don't disagree with the assertion, just wonder if there had been some scenario that triggered it.

perlfu

I think the substance of the change is good. Sub/super register kills definitely need to be handled in this case. This patch seems like a reasonable minimal change to address that need.

Can you add one further test that checks when the defs are super register and a kill is only for a sub register? I don't think existing tests cover this.

perlfu · 2025-03-12T04:40:48Z

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp

+  // Definition in current MBB: done.
+  Reg2MIMap &MBBDefs = RegDefs[MBB->getNumber()];
+  MachineInstr *DefMI = MBBDefs[Reg];
+  assert(DefMI->isIdenticalTo(*ToRemoveMI) && "Previous def not identical?");


I don't disagree with the assertion, just wonder if there had been some scenario that triggered it.

JonPsson1 · 2025-03-12T22:47:55Z

Can you add one further test that checks when the defs are super register and a kill is only for a sub register? I don't think existing tests cover this.

done.

JonPsson1 · 2025-03-13T14:51:36Z

@perlfu thanks for review!

…nition. (llvm#119132) When removing a redundant definition in order to reuse an earlier identical one it is necessary to remove any earlier kill flag as well. Previously, the assumption has been that any register that kills the defined Reg is enough to handle for this purpose, but this is actually not quite enough. A kill of a super-register does not necessarily imply that all of its subregs (including Reg) is defined at that point: a partial definition of a register is legal. This means Reg may have been killed earlier and is not live at that point. This patch changes the tracking of kill flags to allow for multiple flags to be removed: instead of remembering just the single / latest kill flag, a vector is now used to track and remove them all. TinyPtrVector seems ideal for this as there are only very rarely more than one kill flag, and it doesn't seem to give much difference in compile time. The kill flags handling here is making this pass much more complicated than it would have to be. This pass does not depend on kill flags for its own use, so an interesting alternative to all this handling would be to just remove them all. If there actually is a serious user, maybe that pass could instead recompute them. Also adding an assertion which is unrelated to kill flags, but it seems to make sense (according to liberal assertion policy), to verify that the preceding definition is in fact identical in clearKillsForDef(). Fixes llvm#117783

llvmbot added the backend:SystemZ label Dec 8, 2024

JonPsson1 added llvm:codegen and removed backend:SystemZ labels Dec 8, 2024

arsenm reviewed Dec 8, 2024

View reviewed changes

JonPsson1 force-pushed the MLIC branch from eff2a2f to c4e2f71 Compare December 10, 2024 15:22

llvmbot added the backend:SystemZ label Dec 10, 2024

JonPsson1 removed the backend:SystemZ label Dec 11, 2024

JonPsson1 force-pushed the MLIC branch from c4e2f71 to 287f418 Compare January 31, 2025 20:23

llvmbot added the backend:SystemZ label Jan 31, 2025

perlfu reviewed Mar 11, 2025

View reviewed changes

perlfu reviewed Mar 12, 2025

View reviewed changes

JonPsson1 added 5 commits March 12, 2025 11:39

Handle multiple kills for a preceding definition.

c01ebe8

Updates per review.

d07f3ff

Go back to clearing of specific kill-flags.

1a55770

Updated per review

fca3dd6

Add test

6231d1c

JonPsson1 force-pushed the MLIC branch from 287f418 to 6231d1c Compare March 12, 2025 22:45

perlfu approved these changes Mar 13, 2025

View reviewed changes

JonPsson1 merged commit 85318ba into llvm:main Mar 13, 2025
8 of 11 checks passed

JonPsson1 deleted the MLIC branch March 13, 2025 14:51

[MachineLateInstrsCleanup] Handle multiple kills for a preceding definition. #119132

[MachineLateInstrsCleanup] Handle multiple kills for a preceding definition. #119132

Uh oh!

Conversation

JonPsson1 commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 8, 2024

Uh oh!

arsenm commented Dec 8, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JonPsson1 commented Dec 8, 2024

Uh oh!

JonPsson1 commented Dec 10, 2024

Uh oh!

JonPsson1 commented Jan 8, 2025

Uh oh!

JonPsson1 commented Jan 22, 2025

Uh oh!

JonPsson1 commented Jan 28, 2025

Uh oh!

JonPsson1 commented Jan 31, 2025

Uh oh!

JonPsson1 commented Feb 11, 2025

Uh oh!

JonPsson1 commented Feb 20, 2025

Uh oh!

JonPsson1 commented Mar 3, 2025

Uh oh!

JonPsson1 commented Mar 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

perlfu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonPsson1 commented Mar 12, 2025

Uh oh!

Uh oh!

JonPsson1 commented Mar 13, 2025

Uh oh!

Uh oh!

JonPsson1 commented Dec 8, 2024 •

edited

Loading