-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[X86] Avoid useless DomTree in flags copy lowering #97628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, flags copy lowering does two expensive things: - It traverses the CFG in RPO, and - It requires a dominator tree that is not preserved. Most notably, it is the only machine dominator tree user at -O0. Many functions have no flag copies to begin with, therefore, add an early exit if EFLAGS has no COPY def. The legacy pass manager has no way to dynamically decide whether an analysis is required. Therefore, if there's a copy, get the dominator tree from the pass manager, if it has one, otherwise, compute it. These changes should make the pass very cheap for the common case.
@llvm/pr-subscribers-backend-x86 Author: Alexis Engelke (aengelke) ChangesCurrently, flags copy lowering does two expensive things:
Most notably, it is the only machine dominator tree user at -O0. Many functions have no flag copies to begin with, therefore, add an early exit if EFLAGS has no COPY def. The legacy pass manager has no way to dynamically decide whether an analysis is required. Therefore, if there's a copy, get the dominator tree from the pass manager, if it has one, otherwise, compute it. These changes should make the pass very cheap for the common case. Full diff: https://github.com/llvm/llvm-project/pull/97628.diff 1 Files Affected:
diff --git a/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp b/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
index 394947bc65c89..d9ed1334d7376 100644
--- a/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
+++ b/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
@@ -128,7 +128,7 @@ FunctionPass *llvm::createX86FlagsCopyLoweringPass() {
char X86FlagsCopyLoweringPass::ID = 0;
void X86FlagsCopyLoweringPass::getAnalysisUsage(AnalysisUsage &AU) const {
- AU.addRequired<MachineDominatorTreeWrapperPass>();
+ AU.addUsedIfAvailable<MachineDominatorTreeWrapperPass>();
MachineFunctionPass::getAnalysisUsage(AU);
}
@@ -258,13 +258,38 @@ bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
MRI = &MF.getRegInfo();
TII = Subtarget->getInstrInfo();
TRI = Subtarget->getRegisterInfo();
- MDT = &getAnalysis<MachineDominatorTreeWrapperPass>().getDomTree();
PromoteRC = &X86::GR8RegClass;
if (MF.empty())
// Nothing to do for a degenerate empty function...
return false;
+ bool HasCopies = false;
+ for (const MachineInstr &DefInst : MRI->def_instructions(X86::EFLAGS)) {
+ if (DefInst.getOpcode() == TargetOpcode::COPY) {
+ HasCopies = true;
+ break;
+ }
+ }
+
+ if (!HasCopies)
+ return false;
+
+ // We change the code, so we don't preserve the dominator tree anyway. If we
+ // got a valid MDT from the pass manager, use that, otherwise construct one
+ // now. This is an optimization that avoids unnecessary MDT construction for
+ // functions that have no flag copies.
+
+ auto MDTWrapper = getAnalysisIfAvailable<MachineDominatorTreeWrapperPass>();
+ std::unique_ptr<MachineDominatorTree> OwnedMDT;
+ if (MDTWrapper) {
+ MDT = &MDTWrapper->getDomTree();
+ } else {
+ OwnedMDT = std::make_unique<MachineDominatorTree>();
+ OwnedMDT->getBase().recalculate(MF);
+ MDT = OwnedMDT.get();
+ }
+
// Collect the copies in RPO so that when there are chains where a copy is in
// turn copied again we visit the first one first. This ensures we can find
// viable locations for testing the original EFLAGS that dominate all the
@@ -688,7 +713,7 @@ bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
}
#endif
- return true;
+ return !Copies.empty();
}
/// Collect any conditions that have already been set in registers so that we
|
Shouldn't the O0-pipeline.ll test fail since the dominator tree pass is no longer needed at -O0? |
As a matter of fact they do, and I didn't notice, because these tests only runs in an assert build (do you know why it's not run in release builds?). Fixed now. |
|
It just happened to work for me and it also works with Fedora's build of LLVM (I'm certain they build without assertions). |
PromoteRC = &X86::GR8RegClass; | ||
|
||
if (MF.empty()) | ||
// Nothing to do for a degenerate empty function... | ||
return false; | ||
|
||
if (none_of(MRI->def_instructions(X86::EFLAGS), [](const MachineInstr &MI) { | ||
return MI.getOpcode() == TargetOpcode::COPY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't check MI.getOperand(0).getReg() == X86::EFLAGS
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's necessary (or beneficial): if there's a def of eflags from a copy instruction, then the first operand must be eflags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Currently, flags copy lowering does two expensive things: - It traverses the CFG in RPO, and - It requires a dominator tree that is not preserved. Most notably, it is the only machine dominator tree user at -O0. Many functions have no flag copies to begin with, therefore, add an early exit if EFLAGS has no COPY def. The legacy pass manager has no way to dynamically decide whether an analysis is required. Therefore, if there's a copy, get the dominator tree from the pass manager, if it has one, otherwise, compute it. These changes should make the pass very cheap for the common case.
Currently, flags copy lowering does two expensive things:
Most notably, it is the only machine dominator tree user at -O0.
Many functions have no flag copies to begin with, therefore, add an early exit if EFLAGS has no COPY def.
The legacy pass manager has no way to dynamically decide whether an analysis is required. Therefore, if there's a copy, get the dominator tree from the pass manager, if it has one, otherwise, compute it.
These changes should make the pass very cheap for the common case.