Skip to content

[MCP] Enhance MCP copy Instruction removal for special case #70778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 22, 2023

Conversation

LWenH
Copy link
Member

@LWenH LWenH commented Oct 31, 2023

  • Machine Copy Propagation Pass may lose some opportunities to further remove
    the redundant copy instruction during the ForwardCopyPropagateBlock procedure,
    consider the following MI sequence:
L1: r0 = COPY r9     <- TrackMI
L2: r0 = COPY r8     <- TrackMI
L3: use r0           <- Remove L2 from MaybeDeadCopies
L4: early-clobber r9 <- Invalid L2 from Tracker
L5: r0 = COPY r8     <- Miss remove chance
L6: use r0           <- Miss remove L5 chance
  • During the ForwardCopyPropagateBlock procedure, we need to invalidate all the ”early-clobber register“ relative
    instructions from CopyTracker, because of the find method from DenseMap, we might invalidate the "wrong" instruction
    in the CopyTracker. As a result, there is no way to eliminate subsequent redundant instructions (Like L5 instruction in above example).

  • Please see the C++ test case generated code for vector.body in MIR: https://gcc.godbolt.org/z/nK4oMaWv5

@llvmbot
Copy link
Member

llvmbot commented Oct 31, 2023

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-regalloc

Author: Vettel (LWenH)

Changes
  • Machine Copy Propagation Pass may lost some opportunities to further remove
    the redundant copy instruction during the ForwardCopyPropagateBlock procedure,
    consider the following MI sequence:
L1: r0 = COPY r9     &lt;- TrackMI
L2: r0 = COPY r8     &lt;- TrackMI
L3: use r0           &lt;- Remove L2 from MaybeDeadCopies
L4: early-clobber r9 &lt;- Invalid L2 from Tracker
L5: r0 = COPY r8     &lt;- Miss remove chance
L6: use r0           &lt;- Miss remove L5 chance
  • During the ForwardCopyPropagateBlock procedure, we need to invalidate all the early-clobber register relative
    instructions from CopyTracker, because of the find method from DenseMap, we might invalidate the "wrong" instruction
    in the CopyTracker. As a result, there is no way to eliminate subsequent redundant instructions (Like L5 in the instruction).

  • Please see the C++ test case generated code for vector.body in MIR: https://gcc.godbolt.org/z/nK4oMaWv5


Full diff: https://github.com/llvm/llvm-project/pull/70778.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineCopyPropagation.cpp (+25)
  • (modified) llvm/test/CodeGen/RISCV/machine-cp.mir (+32)
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index a032b31a1fc7c62..b0640b48121febd 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -719,6 +719,7 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
   LLVM_DEBUG(dbgs() << "MCP: ForwardCopyPropagateBlock " << MBB.getName()
                     << "\n");
 
+  const MachineInstr *LastMI = nullptr;
   for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
     // Analyze copies (which don't overlap themselves).
     std::optional<DestSourcePair> CopyOperands =
@@ -735,6 +736,27 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
         MCRegister Def = RegDef.asMCReg();
         MCRegister Src = RegSrc.asMCReg();
 
+        // Target may lost some opportunity to further remove the redundant
+        // copy instruction, consider the following sequence:
+        // L1: r0 = COPY r9     <- TrackMI
+        // L2: r0 = COPY r8     <- TrackMI
+        // L3: use r0           <- Remove L2 from MaybeDeadCopies
+        // L4: early-clobber r9 <- Invalid L2 from Tracker
+        // L5: r0 = COPY r8     <- Miss remove chance
+        // L6: use r0           <- Miss remove L5 chance
+        if (LastMI) {
+          std::optional<DestSourcePair> PrevCopyOperands =
+              isCopyInstr(*LastMI, *TII, UseCopyInstr);
+          if (PrevCopyOperands) {
+            Register PrevRegDef = PrevCopyOperands->Destination->getReg();
+            // We could remove the previous copy from tracker directly.
+            if (TRI->isSubRegisterEq(RegDef, PrevRegDef)) {
+              Tracker.invalidateRegister(PrevRegDef.asMCReg(), *TRI, *TII,
+                                         UseCopyInstr);
+            }
+          }
+        }
+
         // The two copies cancel out and the source of the first copy
         // hasn't been overridden, eliminate the second one. e.g.
         //  %ecx = COPY %eax
@@ -795,6 +817,7 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
         }
 
         Tracker.trackCopy(&MI, *TRI, *TII, UseCopyInstr);
+        LastMI = &MI;
 
         continue;
       }
@@ -874,6 +897,8 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
     // Any previous copy definition or reading the Defs is no longer available.
     for (MCRegister Reg : Defs)
       Tracker.clobberRegister(Reg, *TRI, *TII, UseCopyInstr);
+
+    LastMI = &MI;
   }
 
   // If MBB doesn't have successors, delete the copies whose defs are not used.
diff --git a/llvm/test/CodeGen/RISCV/machine-cp.mir b/llvm/test/CodeGen/RISCV/machine-cp.mir
index f3674f89cd918b4..7523332a23c6839 100644
--- a/llvm/test/CodeGen/RISCV/machine-cp.mir
+++ b/llvm/test/CodeGen/RISCV/machine-cp.mir
@@ -9,6 +9,10 @@
   entry:
     ret void
   }
+  define void @bar() {
+  entry:
+    ret void
+  }
 ...
 ---
 name:            foo
@@ -21,6 +25,7 @@ body:             |
     ; RV32-NEXT: renamable $v4_v5_v6_v7_v8_v9_v10_v11 = COPY killed renamable $v0_v1_v2_v3_v4_v5_v6_v7
     ; RV32-NEXT: renamable $v28 = COPY renamable $v8, implicit killed $v28_v29_v30, implicit-def $v28_v29_v30
     ; RV32-NEXT: PseudoRET implicit $v28
+    ;
     ; RV64-LABEL: name: foo
     ; RV64: liveins: $v28_v29_v30, $v8_v9, $v1
     ; RV64-NEXT: {{  $}}
@@ -32,3 +37,30 @@ body:             |
     renamable $v28 = COPY renamable $v8, implicit killed $v28_v29_v30, implicit-def $v28_v29_v30
     PseudoRET implicit $v28
 ...
+---
+name:            bar
+body:             |
+  bb.0.entry:
+    liveins: $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x28, $x29, $x30, $x31
+    ; RV32-LABEL: name: bar
+    ; RV32: liveins: $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x28, $x29, $x30, $x31
+    ; RV32-NEXT: {{  $}}
+    ; RV32-NEXT: $v0 = COPY renamable $v8
+    ; RV32-NEXT: renamable $v14m2 = PseudoVLE32_V_M2_MASK undef renamable $v14m2, renamable $x15, $v0, -1, 5 /* e32 */, 1 /* ta, mu */, implicit $vl, implicit $vtype
+    ; RV32-NEXT: early-clobber renamable $v9 = PseudoVMSLE_VI_M2 killed renamable $v10m2, -1, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+    ; RV32-NEXT: PseudoVSE32_V_M2_MASK killed renamable $v14m2, renamable $x9, $v0, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+    ;
+    ; RV64-LABEL: name: bar
+    ; RV64: liveins: $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x28, $x29, $x30, $x31
+    ; RV64-NEXT: {{  $}}
+    ; RV64-NEXT: $v0 = COPY renamable $v8
+    ; RV64-NEXT: renamable $v14m2 = PseudoVLE32_V_M2_MASK undef renamable $v14m2, renamable $x15, $v0, -1, 5 /* e32 */, 1 /* ta, mu */, implicit $vl, implicit $vtype
+    ; RV64-NEXT: early-clobber renamable $v9 = PseudoVMSLE_VI_M2 killed renamable $v10m2, -1, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+    ; RV64-NEXT: PseudoVSE32_V_M2_MASK killed renamable $v14m2, renamable $x9, $v0, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+    $v0 = COPY killed renamable $v9; example.cpp:14:22
+    $v0 = COPY renamable $v8; example.cpp:12:25
+    renamable $v14m2 = PseudoVLE32_V_M2_MASK undef renamable $v14m2, renamable $x15, $v0, -1, 5, 1, implicit $vl, implicit $vtype; example.cpp:12:25
+    early-clobber renamable $v9 = PseudoVMSLE_VI_M2 killed renamable $v10m2, -1, -1, 5, implicit $vl, implicit $vtype; example.cpp:9:22
+    $v0 = COPY killed renamable $v8; example.cpp:12:22
+    PseudoVSE32_V_M2_MASK killed renamable $v14m2, renamable $x9, $v0, -1, 5, implicit $vl, implicit $vtype; example.cpp:12:22
+...

@LWenH LWenH changed the title [MCP] Enhance MCP copy reovemal for special case [MCP] Enhance MCP copy Instruction reovemal for special case Oct 31, 2023
@arsenm arsenm changed the title [MCP] Enhance MCP copy Instruction reovemal for special case [MCP] Enhance MCP copy Instruction removal for special case Oct 31, 2023
@@ -735,6 +736,27 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
MCRegister Def = RegDef.asMCReg();
MCRegister Src = RegSrc.asMCReg();

// Target may lost some opportunity to further remove the redundant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo s/lost/lose/

if (PrevCopyOperands) {
Register PrevRegDef = PrevCopyOperands->Destination->getReg();
// We could remove the previous copy from tracker directly.
if (TRI->isSubRegisterEq(RegDef, PrevRegDef)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sub register check is suspicious. The underlying tracker is already operating in terms of regunits, so I don't expect you to need to look for overlapping defs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. I think it might be a better way to remove the related record(the copy def record) in the tracker when we reach another new copy with the same def.

// Target may lost some opportunity to further remove the redundant
// copy instruction, consider the following sequence:
// L1: r0 = COPY r9 <- TrackMI
// L2: r0 = COPY r8 <- TrackMI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we forget about L1 when we get to L2?

I think the underlying issue is that when we reach L4, L2 shouldn't be in the tracker copy anyway.

Now, regarding the fix itself, just glancing at the code, I believe it is not general enough. For instance, what happens if I have an unrelated COPY between L1 and L2?

Copy link
Member Author

@LWenH LWenH Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with you. This's still not a very general method to fix this issue here. But I think it's a good place to discuss a more robust way to fix this issue here.

The root cause of this issue is that the tracker will invalidate L2 rather than L1 in tracker when we reach L4. When we reach L2, the current mechanism in ForwardCopyPropagateBlock will not remove L1 in tracker anyway, so if we unrelated L1 and L2, I think we could remove the record of def in the tracker(when we reach L2) to fix this problem.

Copy link

github-actions bot commented Nov 3, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@LWenH
Copy link
Member Author

LWenH commented Nov 3, 2023

Address comments, I think a more general solution might be to remove the MI pair record from the tracker when encountering a COPY instruction. This addresses the root cause of the problem, which is an outdated src record in the tracker causing the incorrect invalidation of the other COPY instructions.

@LWenH LWenH marked this pull request as draft November 3, 2023 13:04
@LWenH LWenH marked this pull request as ready for review November 4, 2023 16:06

// At this point, we need to locate the record in copy maps that use
// Src to define Def, and remove them from Tracker.
for (MCRegUnit SrcUnit : TRI.regunits(Src)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to always do that loop in clobberRegister instead of having two, very similar implementations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could put this "clean-up" behind a flag in clobberRegister and set that flag on the call site (i.e., pass it as an argument.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, very great idea for the clean up flag rather than reimplement the clobber register function, I will implement it later.

auto SrcCopy = Copies.find(SrcUnit);
if (SrcCopy != Copies.end() && SrcCopy->second.LastSeenUseInCopy) {
// If Src define multiple values, we only need
// to erase the such record in DefRegs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "If SrcCopy defines[...] to erase the record for Def in DefRegs"

SrcCopy->second.DefRegs.erase(itr);
// If DefReg becomes empty after removal, we can directly
// remove SrcCopy from the tracker's copy maps.
if (!SrcCopy->second.DefRegs.size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: SrcCopy->second.DefRegs.empty()

@LWenH
Copy link
Member Author

LWenH commented Nov 6, 2023

Address @qcolombet 's comments, I add a default CleanUp parameter in the clobberRegister function to reuse this function.

Although we might be able to do more "CleanUp" optimization when we calling the clobberRegister function. I think a more cautious solution currently might be only handle this case during the ForwardCopyPropagateBlock procedure, as we encountering another COPY instruction with the same Def.

@bzEq
Copy link
Collaborator

bzEq commented Nov 9, 2023

Shouldn't we forget about L1 when we get to L2?

I agree with we should forget about L1 when we get to L2. The semantic of Copies[Unit].DefRegs should be Registers defined by Unit in COPY instructions, we want to keep this invariant during tracking. I think it's appropriate to keep this property during trackCopy. After L2, Copies[<r9's units>].DefRegs still containing r0 is violating this property.

@LWenH
Copy link
Member Author

LWenH commented Nov 15, 2023

Resolve conflict test file:

 /llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

@@ -163,7 +163,8 @@ class CopyTracker {

/// Clobber a single register, removing it from the tracker's copy maps.
void clobberRegister(MCRegister Reg, const TargetRegisterInfo &TRI,
const TargetInstrInfo &TII, bool UseCopyInstr) {
const TargetInstrInfo &TII, bool UseCopyInstr,
bool CleanUp = false) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment for the CleanUp argument?

In particular, we need to make it clear why and when this argument needs to be set to true.
Without any explanation, I would expect that we need to actually set it to true all the time.

// L3: use r0 <- Remove L2 from MaybeDeadCopies
// L4: early-clobber r9 <- Invalid L2 from Tracker
// L5: r0 = COPY r8 <- Miss remove chance
// L6: use r0 <- Miss remove L5 chance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite this comment to explain what will happen now.
Not what was happening before this patch.

@LWenH
Copy link
Member Author

LWenH commented Nov 16, 2023

Shouldn't we forget about L1 when we get to L2?

I agree with we should forget about L1 when we get to L2. The semantic of Copies[Unit].DefRegs should be Registers defined by Unit in COPY instructions, we want to keep this invariant during tracking. I think it's appropriate to keep this property during trackCopy. After L2, Copies[<r9's units>].DefRegs still containing r0 is violating this property.

Address @qcolombet and @bzEq 's comments, after more carefully reconsideration, I think we could always do this DefRegs record elimination when we inivoke the clobberRegister routine.

When we clobber a single register(Def), the semantic of Src's DefRegs to contain Def is no longer effectual. Because the DefRegs is always used in findCopyDefViaUnit during the BackwardCopyPropagateBlock function. After we clobbering a single register, the Src is no longer propagatable either. In other word, we don't need to keep this record in the tracker after we clobbering a Def register.

@LWenH LWenH requested a review from qcolombet November 20, 2023 04:17
Copy link
Collaborator

@qcolombet qcolombet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience iterating through this.

LGTM

@LWenH
Copy link
Member Author

LWenH commented Nov 22, 2023

Thank you also for your careful suggestions too.

@LWenH LWenH merged commit cae46f6 into llvm:main Nov 22, 2023
@bjope
Copy link
Collaborator

bjope commented Nov 27, 2023

Tracked down some miscompiles found downstream that points at this commit.
Also wrote a new issue here about it: #73512

bjope added a commit that referenced this pull request Nov 27, 2023
LWenH added a commit that referenced this pull request Dec 26, 2023
…74239)

Machine Copy Propagation Pass may lose some opportunities to further
remove the redundant copy instructions during the ForwardCopyPropagateBlock
procedure. When we Clobber a "Def" register, we also need to remove the record 
from the copy maps that indicates "Src" defined "Def" to ensure the correct semantics
of the ClobberRegister function.  This patch reapplies #70778 and addresses the corner 
case bug  #73512 specific to the AMDGPU backend. Additionally, it refines the criteria 
for removing empty records from the copy maps, thereby enhancing overall safety.

For more information, please see the C++ test case generated code in 
"vector.body" after the MCP Pass: https://gcc.godbolt.org/z/nK4oMaWv5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants