Skip to content

[InstCombine] Extend folding of aggregate construction to cases when source aggregates are partially available #100828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 21, 2024
74 changes: 70 additions & 4 deletions llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -963,6 +963,9 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
return AggregateDescription::Found;
};

// If an aggregate element is defined in UseBB, we can't use it in PredBB.
bool EltDefinedInUseBB = false;

// Given the value \p Elt that was being inserted into element \p EltIdx of an
// aggregate AggTy, see if \p Elt was originally defined by an
// appropriate extractvalue (same element index, same aggregate type).
Expand All @@ -972,8 +975,11 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
[&](Instruction *Elt, unsigned EltIdx, std::optional<BasicBlock *> UseBB,
std::optional<BasicBlock *> PredBB) -> std::optional<Value *> {
// For now(?), only deal with, at most, a single level of PHI indirection.
if (UseBB && PredBB)
if (UseBB && PredBB) {
Elt = dyn_cast<Instruction>(Elt->DoPHITranslation(*UseBB, *PredBB));
if (Elt && Elt->getParent() == *UseBB)
EltDefinedInUseBB = true;
}
// FIXME: deal with multiple levels of PHI indirection?

// Did we find an extraction?
Expand Down Expand Up @@ -1106,6 +1112,7 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
// from which all the elements were originally extracted from?
// Note that we want for the map to have stable iteration order!
SmallDenseMap<BasicBlock *, Value *, 4> SourceAggregates;
bool FoundSrcAgg = false;
for (BasicBlock *Pred : Preds) {
std::pair<decltype(SourceAggregates)::iterator, bool> IV =
SourceAggregates.insert({Pred, nullptr});
Expand All @@ -1117,9 +1124,68 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
// aggregate produced by OrigIVI must have been originally extracted from
// the same aggregate. Is that so? Can we find said original aggregate?
SourceAggregate = FindCommonSourceAggregate(UseBB, Pred);
if (Describe(SourceAggregate) != AggregateDescription::Found)
return nullptr; // Give up.
IV.first->second = *SourceAggregate;
if (Describe(SourceAggregate) == AggregateDescription::Found) {
FoundSrcAgg = true;
IV.first->second = *SourceAggregate;
} else {
// If UseBB is the single successor of Pred, we can add InsertValue to
// Pred.
auto *BI = dyn_cast<BranchInst>(Pred->getTerminator());
if (!BI || !BI->isUnconditional())
return nullptr;
}
}

if (!FoundSrcAgg)
return nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, this currently checks that at least one predecessor folds, rather than that all but one folds, so we could end up inserting extra insertvalues in many predecessors. Shouldn't we be limiting this such that we only insert into at most one predecessor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, here is a slight variant of test2 with the current patch:

define {ptr, i64} @test2(i1 %cond1, i1 %cond2, ptr %p1, ptr %p2) {
; CHECK-LABEL: define { ptr, i64 } @test2(
; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) {
; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]]
; CHECK:       [[BBB1]]:
; CHECK-NEXT:    br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]]
; CHECK:       [[BBB2]]:
; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
; CHECK-NEXT:    br label %[[EXIT:.*]]
; CHECK:       [[BBB3]]:
; CHECK-NEXT:    [[VAL31:%.*]] = load ptr, ptr [[P1]], align 8
; CHECK-NEXT:    [[VAL32:%.*]] = load i64, ptr [[P2]], align 4
; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL31]], 0
; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL32]], 1
; CHECK-NEXT:    br label %[[EXIT]]
; CHECK:       [[BBB4]]:
; CHECK-NEXT:    [[VAL33:%.*]] = load ptr, ptr [[P1]], align 8
; CHECK-NEXT:    [[VAL34:%.*]] = load i64, ptr [[P2]], align 4
; CHECK-NEXT:    [[TMP3:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL33]], 0
; CHECK-NEXT:    [[TMP4:%.*]] = insertvalue { ptr, i64 } [[TMP3]], i64 [[VAL34]], 1
; CHECK-NEXT:    br label %[[EXIT]]
; CHECK:       [[EXIT]]:
; CHECK-NEXT:    [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB2]] ], [ [[TMP2]], %[[BBB3]] ], [ [[TMP4]], %[[BBB4]] ]
; CHECK-NEXT:    ret { ptr, i64 } [[RES_MERGED]]
;
  br i1 %cond1, label %bbb1, label %bbb4

bbb1:
  br i1 %cond2, label %bbb2, label %bbb3

bbb2:
  %call1 = call {ptr, i64} @bar(i64 0)
  %val11 = extractvalue { ptr, i64 } %call1, 0
  %val12 = extractvalue { ptr, i64 } %call1, 1
  br label %exit

bbb3:
  %val21 = load ptr, ptr %p1
  %val22 = load i64, ptr %p2
  br label %exit

bbb4:
  %val31 = load ptr, ptr %p1
  %val32 = load i64, ptr %p2
  br label %exit

exit:
  %val1 = phi ptr [%val11, %bbb2], [%val21, %bbb3], [%val31, %bbb4]
  %val2 = phi i64 [%val12, %bbb2], [%val22, %bbb3], [%val32, %bbb4]
  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
  ret {ptr, i64} %res
}

We remove extractvalues in one branch, but trade this with insertvalues in two branches. This doesn't look profitable, especially once we extend this to even more branches that don't fold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look profitable, especially once we extend this to even more branches that don't fold.

The latter one is better: https://godbolt.org/z/Yfhjhd3Yn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we limits the insertvalue to be moved to immediate predecessors and UseBB is the only successor of PredBB, the total execution count of UseBB is the sum of all predecessors. So once we have found some source aggregate in a PredBB, add insertvalue to all other predecessors still results in less dynamic count of insertvalue instructions.


// Do some sanity check if we need to add insertvalue into predecessors.
auto OrigBB = OrigIVI.getParent();
for (auto &It : SourceAggregates) {
if (Describe(It.second) == AggregateDescription::Found)
continue;

// Element is defined in UseBB, so it can't be used in predecessors.
if (EltDefinedInUseBB)
return nullptr;

// Do this transformation cross loop boundary may cause dead loop. So we
// should avoid this situation. But LoopInfo is not generally available, we
// must be conservative here.
// If OrigIVI is in UseBB and it's the only successor of PredBB, PredBB
// can't be in inner loop.
if (UseBB != OrigBB)
return nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time understanding why this check is sufficient to prevent cycles in all cases, or even how the UseBB != OrigBB condition is related. I guess this is supposed to check the case where OrigBB is a loop header with phis and UseBB is an exit block with inserts -- but couldn't it fail on non-sunk inserts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It limits the insertvalue can only be inserted into immediate predecessors. When combining the previous condition that UseBB is the only successor of PredBB, the insertvalue can never be moved across any BB with multiple successors, such as a loop exit node. So it prevents moving insertvalue

  • across loop iterations
  • to inner loop from outside


// Avoid constructing constant aggregate because constant value may expose
// more optimizations.
bool ConstAgg = true;
for (auto Val : AggElts) {
Value *Elt = (*Val)->DoPHITranslation(UseBB, It.first);
if (!isa<Constant>(Elt)) {
ConstAgg = false;
break;
}
}
if (ConstAgg)
return nullptr;
}

// For predecessors without appropriate source aggregate, create one in the
// predecessor.
for (auto &It : SourceAggregates) {
if (Describe(It.second) == AggregateDescription::Found)
continue;

BasicBlock *Pred = It.first;
Builder.SetInsertPoint(Pred->getTerminator());
Value *V = PoisonValue::get(AggTy);
for (auto [Idx, Val] : enumerate(AggElts)) {
Value *Elt = (*Val)->DoPHITranslation(UseBB, Pred);
V = Builder.CreateInsertValue(V, Elt, Idx);
}

It.second = V;
}

// All good! Now we just need to thread the source aggregates here.
Expand Down
Loading