-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MCP] Enhance MCP copy Instruction removal for special case #70778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-llvm-regalloc Author: Vettel (LWenH) Changes
L1: r0 = COPY r9 <- TrackMI
L2: r0 = COPY r8 <- TrackMI
L3: use r0 <- Remove L2 from MaybeDeadCopies
L4: early-clobber r9 <- Invalid L2 from Tracker
L5: r0 = COPY r8 <- Miss remove chance
L6: use r0 <- Miss remove L5 chance
Full diff: https://github.com/llvm/llvm-project/pull/70778.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index a032b31a1fc7c62..b0640b48121febd 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -719,6 +719,7 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
LLVM_DEBUG(dbgs() << "MCP: ForwardCopyPropagateBlock " << MBB.getName()
<< "\n");
+ const MachineInstr *LastMI = nullptr;
for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
// Analyze copies (which don't overlap themselves).
std::optional<DestSourcePair> CopyOperands =
@@ -735,6 +736,27 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
MCRegister Def = RegDef.asMCReg();
MCRegister Src = RegSrc.asMCReg();
+ // Target may lost some opportunity to further remove the redundant
+ // copy instruction, consider the following sequence:
+ // L1: r0 = COPY r9 <- TrackMI
+ // L2: r0 = COPY r8 <- TrackMI
+ // L3: use r0 <- Remove L2 from MaybeDeadCopies
+ // L4: early-clobber r9 <- Invalid L2 from Tracker
+ // L5: r0 = COPY r8 <- Miss remove chance
+ // L6: use r0 <- Miss remove L5 chance
+ if (LastMI) {
+ std::optional<DestSourcePair> PrevCopyOperands =
+ isCopyInstr(*LastMI, *TII, UseCopyInstr);
+ if (PrevCopyOperands) {
+ Register PrevRegDef = PrevCopyOperands->Destination->getReg();
+ // We could remove the previous copy from tracker directly.
+ if (TRI->isSubRegisterEq(RegDef, PrevRegDef)) {
+ Tracker.invalidateRegister(PrevRegDef.asMCReg(), *TRI, *TII,
+ UseCopyInstr);
+ }
+ }
+ }
+
// The two copies cancel out and the source of the first copy
// hasn't been overridden, eliminate the second one. e.g.
// %ecx = COPY %eax
@@ -795,6 +817,7 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
}
Tracker.trackCopy(&MI, *TRI, *TII, UseCopyInstr);
+ LastMI = &MI;
continue;
}
@@ -874,6 +897,8 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
// Any previous copy definition or reading the Defs is no longer available.
for (MCRegister Reg : Defs)
Tracker.clobberRegister(Reg, *TRI, *TII, UseCopyInstr);
+
+ LastMI = &MI;
}
// If MBB doesn't have successors, delete the copies whose defs are not used.
diff --git a/llvm/test/CodeGen/RISCV/machine-cp.mir b/llvm/test/CodeGen/RISCV/machine-cp.mir
index f3674f89cd918b4..7523332a23c6839 100644
--- a/llvm/test/CodeGen/RISCV/machine-cp.mir
+++ b/llvm/test/CodeGen/RISCV/machine-cp.mir
@@ -9,6 +9,10 @@
entry:
ret void
}
+ define void @bar() {
+ entry:
+ ret void
+ }
...
---
name: foo
@@ -21,6 +25,7 @@ body: |
; RV32-NEXT: renamable $v4_v5_v6_v7_v8_v9_v10_v11 = COPY killed renamable $v0_v1_v2_v3_v4_v5_v6_v7
; RV32-NEXT: renamable $v28 = COPY renamable $v8, implicit killed $v28_v29_v30, implicit-def $v28_v29_v30
; RV32-NEXT: PseudoRET implicit $v28
+ ;
; RV64-LABEL: name: foo
; RV64: liveins: $v28_v29_v30, $v8_v9, $v1
; RV64-NEXT: {{ $}}
@@ -32,3 +37,30 @@ body: |
renamable $v28 = COPY renamable $v8, implicit killed $v28_v29_v30, implicit-def $v28_v29_v30
PseudoRET implicit $v28
...
+---
+name: bar
+body: |
+ bb.0.entry:
+ liveins: $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x28, $x29, $x30, $x31
+ ; RV32-LABEL: name: bar
+ ; RV32: liveins: $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x28, $x29, $x30, $x31
+ ; RV32-NEXT: {{ $}}
+ ; RV32-NEXT: $v0 = COPY renamable $v8
+ ; RV32-NEXT: renamable $v14m2 = PseudoVLE32_V_M2_MASK undef renamable $v14m2, renamable $x15, $v0, -1, 5 /* e32 */, 1 /* ta, mu */, implicit $vl, implicit $vtype
+ ; RV32-NEXT: early-clobber renamable $v9 = PseudoVMSLE_VI_M2 killed renamable $v10m2, -1, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+ ; RV32-NEXT: PseudoVSE32_V_M2_MASK killed renamable $v14m2, renamable $x9, $v0, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+ ;
+ ; RV64-LABEL: name: bar
+ ; RV64: liveins: $x5, $x6, $x7, $x8, $x9, $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17, $x28, $x29, $x30, $x31
+ ; RV64-NEXT: {{ $}}
+ ; RV64-NEXT: $v0 = COPY renamable $v8
+ ; RV64-NEXT: renamable $v14m2 = PseudoVLE32_V_M2_MASK undef renamable $v14m2, renamable $x15, $v0, -1, 5 /* e32 */, 1 /* ta, mu */, implicit $vl, implicit $vtype
+ ; RV64-NEXT: early-clobber renamable $v9 = PseudoVMSLE_VI_M2 killed renamable $v10m2, -1, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+ ; RV64-NEXT: PseudoVSE32_V_M2_MASK killed renamable $v14m2, renamable $x9, $v0, -1, 5 /* e32 */, implicit $vl, implicit $vtype
+ $v0 = COPY killed renamable $v9; example.cpp:14:22
+ $v0 = COPY renamable $v8; example.cpp:12:25
+ renamable $v14m2 = PseudoVLE32_V_M2_MASK undef renamable $v14m2, renamable $x15, $v0, -1, 5, 1, implicit $vl, implicit $vtype; example.cpp:12:25
+ early-clobber renamable $v9 = PseudoVMSLE_VI_M2 killed renamable $v10m2, -1, -1, 5, implicit $vl, implicit $vtype; example.cpp:9:22
+ $v0 = COPY killed renamable $v8; example.cpp:12:22
+ PseudoVSE32_V_M2_MASK killed renamable $v14m2, renamable $x9, $v0, -1, 5, implicit $vl, implicit $vtype; example.cpp:12:22
+...
|
@@ -735,6 +736,27 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) { | |||
MCRegister Def = RegDef.asMCReg(); | |||
MCRegister Src = RegSrc.asMCReg(); | |||
|
|||
// Target may lost some opportunity to further remove the redundant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo s/lost/lose/
if (PrevCopyOperands) { | ||
Register PrevRegDef = PrevCopyOperands->Destination->getReg(); | ||
// We could remove the previous copy from tracker directly. | ||
if (TRI->isSubRegisterEq(RegDef, PrevRegDef)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sub register check is suspicious. The underlying tracker is already operating in terms of regunits, so I don't expect you to need to look for overlapping defs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. I think it might be a better way to remove the related record(the copy def record) in the tracker when we reach another new copy with the same def.
// Target may lost some opportunity to further remove the redundant | ||
// copy instruction, consider the following sequence: | ||
// L1: r0 = COPY r9 <- TrackMI | ||
// L2: r0 = COPY r8 <- TrackMI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we forget about L1 when we get to L2?
I think the underlying issue is that when we reach L4, L2 shouldn't be in the tracker copy anyway.
Now, regarding the fix itself, just glancing at the code, I believe it is not general enough. For instance, what happens if I have an unrelated COPY
between L1 and L2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with you. This's still not a very general method to fix this issue here. But I think it's a good place to discuss a more robust way to fix this issue here.
The root cause of this issue is that the tracker will invalidate L2 rather than L1 in tracker when we reach L4. When we reach L2, the current mechanism in ForwardCopyPropagateBlock will not remove L1 in tracker anyway, so if we unrelated L1 and L2, I think we could remove the record of def in the tracker(when we reach L2) to fix this problem.
✅ With the latest revision this PR passed the C/C++ code formatter. |
Address comments, I think a more general solution might be to remove the MI pair record from the tracker when encountering a COPY instruction. This addresses the root cause of the problem, which is an outdated src record in the tracker causing the incorrect invalidation of the other COPY instructions. |
|
||
// At this point, we need to locate the record in copy maps that use | ||
// Src to define Def, and remove them from Tracker. | ||
for (MCRegUnit SrcUnit : TRI.regunits(Src)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to always do that loop in clobberRegister
instead of having two, very similar implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could put this "clean-up" behind a flag in clobberRegister
and set that flag on the call site (i.e., pass it as an argument.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, very great idea for the clean up flag rather than reimplement the clobber register function, I will implement it later.
auto SrcCopy = Copies.find(SrcUnit); | ||
if (SrcCopy != Copies.end() && SrcCopy->second.LastSeenUseInCopy) { | ||
// If Src define multiple values, we only need | ||
// to erase the such record in DefRegs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "If SrcCopy defines[...] to erase the record for Def in DefRegs"
SrcCopy->second.DefRegs.erase(itr); | ||
// If DefReg becomes empty after removal, we can directly | ||
// remove SrcCopy from the tracker's copy maps. | ||
if (!SrcCopy->second.DefRegs.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: SrcCopy->second.DefRegs.empty()
Address @qcolombet 's comments, I add a default CleanUp parameter in the clobberRegister function to reuse this function. Although we might be able to do more "CleanUp" optimization when we calling the clobberRegister function. I think a more cautious solution currently might be only handle this case during the ForwardCopyPropagateBlock procedure, as we encountering another COPY instruction with the same Def. |
I agree with we should forget about L1 when we get to L2. The semantic of |
Resolve conflict test file:
|
@@ -163,7 +163,8 @@ class CopyTracker { | |||
|
|||
/// Clobber a single register, removing it from the tracker's copy maps. | |||
void clobberRegister(MCRegister Reg, const TargetRegisterInfo &TRI, | |||
const TargetInstrInfo &TII, bool UseCopyInstr) { | |||
const TargetInstrInfo &TII, bool UseCopyInstr, | |||
bool CleanUp = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment for the CleanUp
argument?
In particular, we need to make it clear why and when this argument needs to be set to true.
Without any explanation, I would expect that we need to actually set it to true all the time.
// L3: use r0 <- Remove L2 from MaybeDeadCopies | ||
// L4: early-clobber r9 <- Invalid L2 from Tracker | ||
// L5: r0 = COPY r8 <- Miss remove chance | ||
// L6: use r0 <- Miss remove L5 chance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite this comment to explain what will happen now.
Not what was happening before this patch.
Address @qcolombet and @bzEq 's comments, after more carefully reconsideration, I think we could always do this DefRegs record elimination when we inivoke the clobberRegister routine. When we clobber a single register(Def), the semantic of Src's DefRegs to contain Def is no longer effectual. Because the DefRegs is always used in findCopyDefViaUnit during the BackwardCopyPropagateBlock function. After we clobbering a single register, the Src is no longer propagatable either. In other word, we don't need to keep this record in the tracker after we clobbering a Def register. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience iterating through this.
LGTM
Thank you also for your careful suggestions too. |
Tracked down some miscompiles found downstream that points at this commit. |
…74239) Machine Copy Propagation Pass may lose some opportunities to further remove the redundant copy instructions during the ForwardCopyPropagateBlock procedure. When we Clobber a "Def" register, we also need to remove the record from the copy maps that indicates "Src" defined "Def" to ensure the correct semantics of the ClobberRegister function. This patch reapplies #70778 and addresses the corner case bug #73512 specific to the AMDGPU backend. Additionally, it refines the criteria for removing empty records from the copy maps, thereby enhancing overall safety. For more information, please see the C++ test case generated code in "vector.body" after the MCP Pass: https://gcc.godbolt.org/z/nK4oMaWv5.
the redundant copy instruction during the ForwardCopyPropagateBlock procedure,
consider the following MI sequence:
During the ForwardCopyPropagateBlock procedure, we need to invalidate all the ”early-clobber register“ relative
instructions from CopyTracker, because of the find method from DenseMap, we might invalidate the "wrong" instruction
in the CopyTracker. As a result, there is no way to eliminate subsequent redundant instructions (Like L5 instruction in above example).
Please see the C++ test case generated code for vector.body in MIR: https://gcc.godbolt.org/z/nK4oMaWv5