Skip to content

[X86] Avoid useless DomTree in flags copy lowering #97628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2024

Conversation

aengelke
Copy link
Contributor

@aengelke aengelke commented Jul 3, 2024

Currently, flags copy lowering does two expensive things:

  • It traverses the CFG in RPO, and
  • It requires a dominator tree that is not preserved.

Most notably, it is the only machine dominator tree user at -O0.

Many functions have no flag copies to begin with, therefore, add an early exit if EFLAGS has no COPY def.

The legacy pass manager has no way to dynamically decide whether an analysis is required. Therefore, if there's a copy, get the dominator tree from the pass manager, if it has one, otherwise, compute it.

These changes should make the pass very cheap for the common case.

Currently, flags copy lowering does two expensive things:

- It traverses the CFG in RPO, and
- It requires a dominator tree that is not preserved.

Most notably, it is the only machine dominator tree user at -O0.

Many functions have no flag copies to begin with, therefore, add an
early exit if EFLAGS has no COPY def.

The legacy pass manager has no way to dynamically decide whether an
analysis is required. Therefore, if there's a copy, get the dominator
tree from the pass manager, if it has one, otherwise, compute it.

These changes should make the pass very cheap for the common case.
@llvmbot
Copy link
Member

llvmbot commented Jul 3, 2024

@llvm/pr-subscribers-backend-x86

Author: Alexis Engelke (aengelke)

Changes

Currently, flags copy lowering does two expensive things:

  • It traverses the CFG in RPO, and
  • It requires a dominator tree that is not preserved.

Most notably, it is the only machine dominator tree user at -O0.

Many functions have no flag copies to begin with, therefore, add an early exit if EFLAGS has no COPY def.

The legacy pass manager has no way to dynamically decide whether an analysis is required. Therefore, if there's a copy, get the dominator tree from the pass manager, if it has one, otherwise, compute it.

These changes should make the pass very cheap for the common case.


Full diff: https://github.com/llvm/llvm-project/pull/97628.diff

1 Files Affected:

  • (modified) llvm/lib/Target/X86/X86FlagsCopyLowering.cpp (+28-3)
diff --git a/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp b/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
index 394947bc65c89..d9ed1334d7376 100644
--- a/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
+++ b/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
@@ -128,7 +128,7 @@ FunctionPass *llvm::createX86FlagsCopyLoweringPass() {
 char X86FlagsCopyLoweringPass::ID = 0;
 
 void X86FlagsCopyLoweringPass::getAnalysisUsage(AnalysisUsage &AU) const {
-  AU.addRequired<MachineDominatorTreeWrapperPass>();
+  AU.addUsedIfAvailable<MachineDominatorTreeWrapperPass>();
   MachineFunctionPass::getAnalysisUsage(AU);
 }
 
@@ -258,13 +258,38 @@ bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
   MRI = &MF.getRegInfo();
   TII = Subtarget->getInstrInfo();
   TRI = Subtarget->getRegisterInfo();
-  MDT = &getAnalysis<MachineDominatorTreeWrapperPass>().getDomTree();
   PromoteRC = &X86::GR8RegClass;
 
   if (MF.empty())
     // Nothing to do for a degenerate empty function...
     return false;
 
+  bool HasCopies = false;
+  for (const MachineInstr &DefInst : MRI->def_instructions(X86::EFLAGS)) {
+    if (DefInst.getOpcode() == TargetOpcode::COPY) {
+      HasCopies = true;
+      break;
+    }
+  }
+
+  if (!HasCopies)
+    return false;
+
+  // We change the code, so we don't preserve the dominator tree anyway. If we
+  // got a valid MDT from the pass manager, use that, otherwise construct one
+  // now. This is an optimization that avoids unnecessary MDT construction for
+  // functions that have no flag copies.
+
+  auto MDTWrapper = getAnalysisIfAvailable<MachineDominatorTreeWrapperPass>();
+  std::unique_ptr<MachineDominatorTree> OwnedMDT;
+  if (MDTWrapper) {
+    MDT = &MDTWrapper->getDomTree();
+  } else {
+    OwnedMDT = std::make_unique<MachineDominatorTree>();
+    OwnedMDT->getBase().recalculate(MF);
+    MDT = OwnedMDT.get();
+  }
+
   // Collect the copies in RPO so that when there are chains where a copy is in
   // turn copied again we visit the first one first. This ensures we can find
   // viable locations for testing the original EFLAGS that dominate all the
@@ -688,7 +713,7 @@ bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
       }
 #endif
 
-  return true;
+  return !Copies.empty();
 }
 
 /// Collect any conditions that have already been set in registers so that we

@topperc
Copy link
Collaborator

topperc commented Jul 3, 2024

Shouldn't the O0-pipeline.ll test fail since the dominator tree pass is no longer needed at -O0?

@aengelke
Copy link
Contributor Author

aengelke commented Jul 4, 2024

Shouldn't the O0-pipeline.ll test fail since the dominator tree pass is no longer needed at -O0?

As a matter of fact they do, and I didn't notice, because these tests only runs in an assert build (do you know why it's not run in release builds?). Fixed now.

@KanRobert
Copy link
Contributor

Shouldn't the O0-pipeline.ll test fail since the dominator tree pass is no longer needed at -O0?

As a matter of fact they do, and I didn't notice, because these tests only runs in an assert build (do you know why it's not run in release builds?). Fixed now.

--debug-pass does not work when LLVM_ENABLE_ASSERTIONS is off.

@aengelke
Copy link
Contributor Author

aengelke commented Jul 4, 2024

--debug-pass does not work when LLVM_ENABLE_ASSERTIONS is off.

It just happened to work for me and it also works with Fedora's build of LLVM (I'm certain they build without assertions).

PromoteRC = &X86::GR8RegClass;

if (MF.empty())
// Nothing to do for a degenerate empty function...
return false;

if (none_of(MRI->def_instructions(X86::EFLAGS), [](const MachineInstr &MI) {
return MI.getOpcode() == TargetOpcode::COPY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't check MI.getOperand(0).getReg() == X86::EFLAGS as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's necessary (or beneficial): if there's a def of eflags from a copy instruction, then the first operand must be eflags.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@aengelke aengelke merged commit 5db6eac into llvm:main Jul 4, 2024
5 of 6 checks passed
@aengelke aengelke deleted the faster-flag-copy-lowering branch July 4, 2024 14:41
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
Currently, flags copy lowering does two expensive things:

- It traverses the CFG in RPO, and
- It requires a dominator tree that is not preserved.

Most notably, it is the only machine dominator tree user at -O0.

Many functions have no flag copies to begin with, therefore, add an
early exit if EFLAGS has no COPY def.

The legacy pass manager has no way to dynamically decide whether an
analysis is required. Therefore, if there's a copy, get the dominator
tree from the pass manager, if it has one, otherwise, compute it.

These changes should make the pass very cheap for the common case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants