-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[InstCombine] Extend folding of aggregate construction to cases when source aggregates are partially available #100828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
efada13
e7a60fc
b15c315
f63b726
d9fe299
8db6d10
aca50cd
0737620
954f6c8
087f907
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -963,6 +963,9 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse( | |
return AggregateDescription::Found; | ||
}; | ||
|
||
// If an aggregate element is defined in UseBB, we can't use it in PredBB. | ||
bool EltDefinedInUseBB = false; | ||
|
||
// Given the value \p Elt that was being inserted into element \p EltIdx of an | ||
// aggregate AggTy, see if \p Elt was originally defined by an | ||
// appropriate extractvalue (same element index, same aggregate type). | ||
|
@@ -972,8 +975,11 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse( | |
[&](Instruction *Elt, unsigned EltIdx, std::optional<BasicBlock *> UseBB, | ||
std::optional<BasicBlock *> PredBB) -> std::optional<Value *> { | ||
// For now(?), only deal with, at most, a single level of PHI indirection. | ||
if (UseBB && PredBB) | ||
if (UseBB && PredBB) { | ||
Elt = dyn_cast<Instruction>(Elt->DoPHITranslation(*UseBB, *PredBB)); | ||
if (Elt && Elt->getParent() == *UseBB) | ||
EltDefinedInUseBB = true; | ||
} | ||
// FIXME: deal with multiple levels of PHI indirection? | ||
|
||
// Did we find an extraction? | ||
|
@@ -1106,6 +1112,7 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse( | |
// from which all the elements were originally extracted from? | ||
// Note that we want for the map to have stable iteration order! | ||
SmallDenseMap<BasicBlock *, Value *, 4> SourceAggregates; | ||
bool FoundSrcAgg = false; | ||
for (BasicBlock *Pred : Preds) { | ||
std::pair<decltype(SourceAggregates)::iterator, bool> IV = | ||
SourceAggregates.insert({Pred, nullptr}); | ||
|
@@ -1117,9 +1124,68 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse( | |
// aggregate produced by OrigIVI must have been originally extracted from | ||
// the same aggregate. Is that so? Can we find said original aggregate? | ||
SourceAggregate = FindCommonSourceAggregate(UseBB, Pred); | ||
if (Describe(SourceAggregate) != AggregateDescription::Found) | ||
return nullptr; // Give up. | ||
IV.first->second = *SourceAggregate; | ||
if (Describe(SourceAggregate) == AggregateDescription::Found) { | ||
FoundSrcAgg = true; | ||
IV.first->second = *SourceAggregate; | ||
} else { | ||
// If UseBB is the single successor of Pred, we can add InsertValue to | ||
// Pred. | ||
auto *BI = dyn_cast<BranchInst>(Pred->getTerminator()); | ||
if (!BI || !BI->isUnconditional()) | ||
return nullptr; | ||
} | ||
} | ||
|
||
if (!FoundSrcAgg) | ||
return nullptr; | ||
|
||
// Do some sanity check if we need to add insertvalue into predecessors. | ||
auto OrigBB = OrigIVI.getParent(); | ||
for (auto &It : SourceAggregates) { | ||
if (Describe(It.second) == AggregateDescription::Found) | ||
continue; | ||
|
||
// Element is defined in UseBB, so it can't be used in predecessors. | ||
if (EltDefinedInUseBB) | ||
return nullptr; | ||
|
||
// Do this transformation cross loop boundary may cause dead loop. So we | ||
// should avoid this situation. But LoopInfo is not generally available, we | ||
// must be conservative here. | ||
// If OrigIVI is in UseBB and it's the only successor of PredBB, PredBB | ||
// can't be in inner loop. | ||
if (UseBB != OrigBB) | ||
return nullptr; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a hard time understanding why this check is sufficient to prevent cycles in all cases, or even how the UseBB != OrigBB condition is related. I guess this is supposed to check the case where OrigBB is a loop header with phis and UseBB is an exit block with inserts -- but couldn't it fail on non-sunk inserts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It limits the insertvalue can only be inserted into immediate predecessors. When combining the previous condition that UseBB is the only successor of PredBB, the insertvalue can never be moved across any BB with multiple successors, such as a loop exit node. So it prevents moving insertvalue
|
||
|
||
// Avoid constructing constant aggregate because constant value may expose | ||
// more optimizations. | ||
bool ConstAgg = true; | ||
for (auto Val : AggElts) { | ||
Value *Elt = (*Val)->DoPHITranslation(UseBB, It.first); | ||
if (!isa<Constant>(Elt)) { | ||
ConstAgg = false; | ||
break; | ||
} | ||
} | ||
if (ConstAgg) | ||
return nullptr; | ||
} | ||
|
||
// For predecessors without appropriate source aggregate, create one in the | ||
// predecessor. | ||
for (auto &It : SourceAggregates) { | ||
if (Describe(It.second) == AggregateDescription::Found) | ||
continue; | ||
|
||
BasicBlock *Pred = It.first; | ||
Builder.SetInsertPoint(Pred->getTerminator()); | ||
Value *V = PoisonValue::get(AggTy); | ||
for (auto [Idx, Val] : enumerate(AggElts)) { | ||
Value *Elt = (*Val)->DoPHITranslation(UseBB, Pred); | ||
V = Builder.CreateInsertValue(V, Elt, Idx); | ||
} | ||
|
||
It.second = V; | ||
} | ||
|
||
// All good! Now we just need to thread the source aggregates here. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, this currently checks that at least one predecessor folds, rather than that all but one folds, so we could end up inserting extra insertvalues in many predecessors. Shouldn't we be limiting this such that we only insert into at most one predecessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, here is a slight variant of test2 with the current patch:
We remove extractvalues in one branch, but trade this with insertvalues in two branches. This doesn't look profitable, especially once we extend this to even more branches that don't fold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter one is better: https://godbolt.org/z/Yfhjhd3Yn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we limits the insertvalue to be moved to immediate predecessors and UseBB is the only successor of PredBB, the total execution count of UseBB is the sum of all predecessors. So once we have found some source aggregate in a PredBB, add insertvalue to all other predecessors still results in less dynamic count of insertvalue instructions.