Skip to content

Commit ae7f088

Browse files
committed
[InstCombine] Aggregate reconstruction simplification (PR47060)
This pattern happens in clang C++ exception lowering code, on unwind branch. We end up having a `landingpad` block after each `invoke`, where RAII cleanup is performed, and the elements of an aggregate `{i8*, i32}` holding exception info are `extractvalue`'d, and we then branch to common block that takes extracted `i8*` and `i32` elements (via `phi` nodes), form a new aggregate, and finally `resume`'s the exception. The problem is that, if the cleanup block is effectively empty, it shouldn't be there, there shouldn't be that `landingpad` and `resume`, said `invoke` should be a `call`. Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`. But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft, while it is pointless, does not look like "empty cleanup block". So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has higher cost than it could have on unwind branch :S This doesn't happen *that* often, but it will basically happen once per C++ function with complex CFG that called more than one other function that isn't known to be `nounwind`. I think, this is a missing fold in InstCombine, so i've implemented it. I think, the algorithm/implementation is rather self-explanatory: 1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate. 2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great. 3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it. 4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor. I'm not sure if this already handles everything, and there are some FIXME's, i'll deal with all that later in followups. I'd be fine with going with post-commit review here code-wise, but just in case there are thoughts, i'm posting this. On RawSpeed, for example, this has the following effect: ``` | statistic name | baseline | proposed | Δ | % | abs(%) | |---------------------------------------------------|---------:|---------:|------:|--------:|-------:| | instcombine.NumAggregateReconstructionsSimplified | 0 | 1253 | 1253 | 0.00% | 0.00% | | simplifycfg.NumInvokes | 948 | 1355 | 407 | 42.93% | 42.93% | | instcount.NumInsertValueInst | 4382 | 3210 | -1172 | -26.75% | 26.75% | | simplifycfg.NumSinkCommonCode | 574 | 458 | -116 | -20.21% | 20.21% | | simplifycfg.NumSinkCommonInstrs | 1154 | 921 | -233 | -20.19% | 20.19% | | instcount.NumExtractValueInst | 29017 | 26397 | -2620 | -9.03% | 9.03% | | instcombine.NumDeadInst | 166618 | 174705 | 8087 | 4.85% | 4.85% | | instcount.NumPHIInst | 51526 | 50678 | -848 | -1.65% | 1.65% | | instcount.NumLandingPadInst | 20865 | 20609 | -256 | -1.23% | 1.23% | | instcount.NumInvokeInst | 34023 | 33675 | -348 | -1.02% | 1.02% | | simplifycfg.NumSimpl | 113634 | 114708 | 1074 | 0.95% | 0.95% | | instcombine.NumSunkInst | 15030 | 14930 | -100 | -0.67% | 0.67% | | instcount.TotalBlocks | 219544 | 219024 | -520 | -0.24% | 0.24% | | instcombine.NumCombined | 644562 | 645805 | 1243 | 0.19% | 0.19% | | instcount.TotalInsts | 2139506 | 2135377 | -4129 | -0.19% | 0.19% | | instcount.NumBrInst | 156988 | 156821 | -167 | -0.11% | 0.11% | | instcount.NumCallInst | 1206144 | 1207076 | 932 | 0.08% | 0.08% | | instcount.NumResumeInst | 5193 | 5190 | -3 | -0.06% | 0.06% | | asm-printer.EmittedInsts | 948580 | 948299 | -281 | -0.03% | 0.03% | | instcount.TotalFuncs | 11509 | 11507 | -2 | -0.02% | 0.02% | | inline.NumDeleted | 97595 | 97597 | 2 | 0.00% | 0.00% | | inline.NumInlined | 210514 | 210522 | 8 | 0.00% | 0.00% | ``` So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half, and there is a very apparent decrease in instruction and basic block count. On vanilla llvm-test-suite: ``` | statistic name | baseline | proposed | Δ | % | abs(%) | |---------------------------------------------------|---------:|---------:|------:|--------:|-------:| | instcombine.NumAggregateReconstructionsSimplified | 0 | 744 | 744 | 0.00% | 0.00% | | instcount.NumInsertValueInst | 2705 | 2053 | -652 | -24.10% | 24.10% | | simplifycfg.NumInvokes | 1212 | 1424 | 212 | 17.49% | 17.49% | | instcount.NumExtractValueInst | 21681 | 20139 | -1542 | -7.11% | 7.11% | | simplifycfg.NumSinkCommonInstrs | 14575 | 14361 | -214 | -1.47% | 1.47% | | simplifycfg.NumSinkCommonCode | 6815 | 6743 | -72 | -1.06% | 1.06% | | instcount.NumLandingPadInst | 14851 | 14712 | -139 | -0.94% | 0.94% | | instcount.NumInvokeInst | 27510 | 27332 | -178 | -0.65% | 0.65% | | instcombine.NumDeadInst | 1438173 | 1443371 | 5198 | 0.36% | 0.36% | | instcount.NumResumeInst | 2880 | 2872 | -8 | -0.28% | 0.28% | | instcombine.NumSunkInst | 55187 | 55076 | -111 | -0.20% | 0.20% | | instcount.NumPHIInst | 321366 | 320916 | -450 | -0.14% | 0.14% | | instcount.TotalBlocks | 886816 | 886493 | -323 | -0.04% | 0.04% | | instcount.TotalInsts | 7663845 | 7661108 | -2737 | -0.04% | 0.04% | | simplifycfg.NumSimpl | 886791 | 887171 | 380 | 0.04% | 0.04% | | instcount.NumCallInst | 553552 | 553733 | 181 | 0.03% | 0.03% | | instcombine.NumCombined | 3200512 | 3201202 | 690 | 0.02% | 0.02% | | instcount.NumBrInst | 741794 | 741656 | -138 | -0.02% | 0.02% | | simplifycfg.NumHoistCommonInstrs | 14443 | 14445 | 2 | 0.01% | 0.01% | | asm-printer.EmittedInsts | 7978085 | 7977916 | -169 | 0.00% | 0.00% | | inline.NumDeleted | 73188 | 73189 | 1 | 0.00% | 0.00% | | inline.NumInlined | 291959 | 291968 | 9 | 0.00% | 0.00% | ``` Roughly similar effect, less instructions and blocks total. See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e. Compile-time wise, this appears to be roughly geomean-neutral: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions And this is a win size-wize in general: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text See https://bugs.llvm.org/show_bug.cgi?id=47060 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85787
1 parent 5272d29 commit ae7f088

File tree

6 files changed

+267
-54
lines changed

6 files changed

+267
-54
lines changed

clang/test/CodeGenCXX/nrvo.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,8 @@ X test2(bool B) {
8585
// %lpad: landing pad for ctor of 'y', dtor of 'y'
8686
// CHECK-EH: [[CAUGHTVAL:%.*]] = landingpad { i8*, i32 }
8787
// CHECK-EH-NEXT: cleanup
88-
// CHECK-EH-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 0
89-
// CHECK-EH-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 1
88+
// CHECK-EH-03-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 0
89+
// CHECK-EH-03-NEXT: extractvalue { i8*, i32 } [[CAUGHTVAL]], 1
9090
// CHECK-EH-NEXT: br label
9191
// -> %eh.cleanup
9292

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,8 @@ class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final
158158
Instruction *visitFenceInst(FenceInst &FI);
159159
Instruction *visitSwitchInst(SwitchInst &SI);
160160
Instruction *visitReturnInst(ReturnInst &RI);
161+
Instruction *
162+
foldAggregateConstructionIntoAggregateReuse(InsertValueInst &OrigIVI);
161163
Instruction *visitInsertValueInst(InsertValueInst &IV);
162164
Instruction *visitInsertElementInst(InsertElementInst &IE);
163165
Instruction *visitExtractElementInst(ExtractElementInst &EI);

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include "llvm/ADT/STLExtras.h"
1919
#include "llvm/ADT/SmallBitVector.h"
2020
#include "llvm/ADT/SmallVector.h"
21+
#include "llvm/ADT/Statistic.h"
2122
#include "llvm/Analysis/InstructionSimplify.h"
2223
#include "llvm/Analysis/VectorUtils.h"
2324
#include "llvm/IR/BasicBlock.h"
@@ -46,6 +47,10 @@ using namespace PatternMatch;
4647

4748
#define DEBUG_TYPE "instcombine"
4849

50+
STATISTIC(NumAggregateReconstructionsSimplified,
51+
"Number of aggregate reconstructions turned into reuse of the "
52+
"original aggregate");
53+
4954
/// Return true if the value is cheaper to scalarize than it is to leave as a
5055
/// vector operation. IsConstantExtractIndex indicates whether we are extracting
5156
/// one known element from a vector constant.
@@ -694,6 +699,243 @@ static ShuffleOps collectShuffleElements(Value *V, SmallVectorImpl<int> &Mask,
694699
return std::make_pair(V, nullptr);
695700
}
696701

702+
/// Look for chain of insertvalue's that fully define an aggregate, and trace
703+
/// back the values inserted, see if they are all were extractvalue'd from
704+
/// the same source aggregate from the exact same element indexes.
705+
/// If they were, just reuse the source aggregate.
706+
/// This potentially deals with PHI indirections.
707+
Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
708+
InsertValueInst &OrigIVI) {
709+
BasicBlock *UseBB = OrigIVI.getParent();
710+
Type *AggTy = OrigIVI.getType();
711+
unsigned NumAggElts;
712+
switch (AggTy->getTypeID()) {
713+
case Type::StructTyID:
714+
NumAggElts = AggTy->getStructNumElements();
715+
break;
716+
case Type::ArrayTyID:
717+
NumAggElts = AggTy->getArrayNumElements();
718+
break;
719+
default:
720+
llvm_unreachable("Unhandled aggregate type?");
721+
}
722+
723+
// Arbitrary aggregate size cut-off. Motivation for limit of 2 is to be able
724+
// to handle clang C++ exception struct (which is hardcoded as {i8*, i32}),
725+
// FIXME: any interesting patterns to be caught with larger limit?
726+
assert(NumAggElts > 0 && "Aggregate should have elements.");
727+
if (NumAggElts > 2)
728+
return nullptr;
729+
730+
static constexpr auto NotFound = None;
731+
static constexpr auto FoundMismatch = nullptr;
732+
733+
// Try to find a value of each element of an aggregate.
734+
// FIXME: deal with more complex, not one-dimensional, aggregate types
735+
SmallVector<Optional<Value *>, 2> AggElts(NumAggElts, NotFound);
736+
737+
// Do we know values for each element of the aggregate?
738+
auto KnowAllElts = [&AggElts]() {
739+
return all_of(AggElts,
740+
[](Optional<Value *> Elt) { return Elt != NotFound; });
741+
};
742+
743+
int Depth = 0;
744+
745+
// Arbitrary `insertvalue` visitation depth limit. Let's be okay with
746+
// every element being overwritten twice, which should never happen.
747+
static const int DepthLimit = 2 * NumAggElts;
748+
749+
// Recurse up the chain of `insertvalue` aggregate operands until either we've
750+
// reconstructed full initializer or can't visit any more `insertvalue`'s.
751+
for (InsertValueInst *CurrIVI = &OrigIVI;
752+
Depth < DepthLimit && CurrIVI && !KnowAllElts();
753+
CurrIVI = dyn_cast<InsertValueInst>(CurrIVI->getAggregateOperand()),
754+
++Depth) {
755+
Value *InsertedValue = CurrIVI->getInsertedValueOperand();
756+
ArrayRef<unsigned int> Indices = CurrIVI->getIndices();
757+
758+
// Don't bother with more than single-level aggregates.
759+
if (Indices.size() != 1)
760+
return nullptr; // FIXME: deal with more complex aggregates?
761+
762+
// Now, we may have already previously recorded the value for this element
763+
// of an aggregate. If we did, that means the CurrIVI will later be
764+
// overwritten with the already-recorded value. But if not, let's record it!
765+
Optional<Value *> &Elt = AggElts[Indices.front()];
766+
Elt = Elt.getValueOr(InsertedValue);
767+
768+
// FIXME: should we handle chain-terminating undef base operand?
769+
}
770+
771+
// Was that sufficient to deduce the full initializer for the aggregate?
772+
if (!KnowAllElts())
773+
return nullptr; // Give up then.
774+
775+
// We now want to find the source[s] of the aggregate elements we've found.
776+
// And with "source" we mean the original aggregate[s] from which
777+
// the inserted elements were extracted. This may require PHI translation.
778+
779+
enum class SourceAggregate {
780+
/// When analyzing the value that was inserted into an aggregate, we did
781+
/// not manage to find defining `extractvalue` instruction to analyze.
782+
NotFound,
783+
/// When analyzing the value that was inserted into an aggregate, we did
784+
/// manage to find defining `extractvalue` instruction[s], and everything
785+
/// matched perfectly - aggregate type, element insertion/extraction index.
786+
Found,
787+
/// When analyzing the value that was inserted into an aggregate, we did
788+
/// manage to find defining `extractvalue` instruction, but there was
789+
/// a mismatch: either the source type from which the extraction was didn't
790+
/// match the aggregate type into which the insertion was,
791+
/// or the extraction/insertion channels mismatched,
792+
/// or different elements had different source aggregates.
793+
FoundMismatch
794+
};
795+
auto Describe = [](Optional<Value *> SourceAggregate) {
796+
if (SourceAggregate == NotFound)
797+
return SourceAggregate::NotFound;
798+
if (*SourceAggregate == FoundMismatch)
799+
return SourceAggregate::FoundMismatch;
800+
return SourceAggregate::Found;
801+
};
802+
803+
// Given the value \p Elt that was being inserted into element \p EltIdx of an
804+
// aggregate AggTy, see if \p Elt was originally defined by an
805+
// appropriate extractvalue (same element index, same aggregate type).
806+
// If found, return the source aggregate from which the extraction was.
807+
// If \p PredBB is provided, does PHI translation of an \p Elt first.
808+
auto FindSourceAggregate =
809+
[&](Value *Elt, unsigned EltIdx,
810+
Optional<BasicBlock *> PredBB) -> Optional<Value *> {
811+
// For now(?), only deal with, at most, a single level of PHI indirection.
812+
if (PredBB)
813+
Elt = Elt->DoPHITranslation(UseBB, *PredBB);
814+
// FIXME: deal with multiple levels of PHI indirection?
815+
816+
// Did we find an extraction?
817+
auto *EVI = dyn_cast<ExtractValueInst>(Elt);
818+
if (!EVI)
819+
return NotFound;
820+
821+
Value *SourceAggregate = EVI->getAggregateOperand();
822+
823+
// Is the extraction from the same type into which the insertion was?
824+
if (SourceAggregate->getType() != AggTy)
825+
return FoundMismatch;
826+
// And the element index doesn't change between extraction and insertion?
827+
if (EVI->getNumIndices() != 1 || EltIdx != EVI->getIndices().front())
828+
return FoundMismatch;
829+
830+
return SourceAggregate; // SourceAggregate::Found
831+
};
832+
833+
// Given elements AggElts that were constructing an aggregate OrigIVI,
834+
// see if we can find appropriate source aggregate for each of the elements,
835+
// and see it's the same aggregate for each element. If so, return it.
836+
auto FindCommonSourceAggregate =
837+
[&](Optional<BasicBlock *> PredBB) -> Optional<Value *> {
838+
Optional<Value *> SourceAggregate;
839+
840+
for (auto I : enumerate(AggElts)) {
841+
assert(Describe(SourceAggregate) != SourceAggregate::FoundMismatch &&
842+
"We don't store nullptr in SourceAggregate!");
843+
assert((Describe(SourceAggregate) == SourceAggregate::Found) ==
844+
(I.index() != 0) &&
845+
"SourceAggregate should be valid after the the first element,");
846+
847+
// For this element, is there a plausible source aggregate?
848+
// FIXME: we could special-case undef element, IFF we know that in the
849+
// source aggregate said element isn't poison.
850+
Optional<Value *> SourceAggregateForElement =
851+
FindSourceAggregate(*I.value(), I.index(), PredBB);
852+
853+
// Okay, what have we found? Does that correlate with previous findings?
854+
855+
// Regardless of whether or not we have previously found source
856+
// aggregate for previous elements (if any), if we didn't find one for
857+
// this element, passthrough whatever we have just found.
858+
if (Describe(SourceAggregateForElement) != SourceAggregate::Found)
859+
return SourceAggregateForElement;
860+
861+
// Okay, we have found source aggregate for this element.
862+
// Let's see what we already know from previous elements, if any.
863+
switch (Describe(SourceAggregate)) {
864+
case SourceAggregate::NotFound:
865+
// This is apparently the first element that we have examined.
866+
SourceAggregate = SourceAggregateForElement; // Record the aggregate!
867+
continue; // Great, now look at next element.
868+
case SourceAggregate::Found:
869+
// We have previously already successfully examined other elements.
870+
// Is this the same source aggregate we've found for other elements?
871+
if (*SourceAggregateForElement != *SourceAggregate)
872+
return FoundMismatch;
873+
continue; // Still the same aggregate, look at next element.
874+
case SourceAggregate::FoundMismatch:
875+
llvm_unreachable("Can't happen. We would have early-exited then.");
876+
};
877+
}
878+
879+
assert(Describe(SourceAggregate) == SourceAggregate::Found &&
880+
"Must be a valid Value");
881+
return *SourceAggregate;
882+
};
883+
884+
Optional<Value *> SourceAggregate;
885+
886+
// Can we find the source aggregate without looking at predecessors?
887+
SourceAggregate = FindCommonSourceAggregate(/*PredBB=*/None);
888+
if (Describe(SourceAggregate) != SourceAggregate::NotFound) {
889+
if (Describe(SourceAggregate) == SourceAggregate::FoundMismatch)
890+
return nullptr; // Conflicting source aggregates!
891+
++NumAggregateReconstructionsSimplified;
892+
return replaceInstUsesWith(OrigIVI, *SourceAggregate);
893+
}
894+
895+
// If we didn't manage to find source aggregate without looking at
896+
// predecessors, and there are no predecessors to look at, then we're done.
897+
if (pred_empty(UseBB))
898+
return nullptr;
899+
900+
// Okay, apparently we need to look at predecessors.
901+
902+
// Arbitrary predecessor count limit.
903+
static const int PredCountLimit = 64;
904+
// Don't bother if there are too many predecessors.
905+
if (UseBB->hasNPredecessorsOrMore(PredCountLimit + 1))
906+
return nullptr;
907+
908+
// For each predecessor, what is the source aggregate,
909+
// from which all the elements were originally extracted from?
910+
// Note that we want for the map to have stable iteration order!
911+
SmallMapVector<BasicBlock *, Value *, 4> SourceAggregates;
912+
for (BasicBlock *Pred : predecessors(UseBB)) {
913+
std::pair<decltype(SourceAggregates)::iterator, bool> IV =
914+
SourceAggregates.insert({Pred, nullptr});
915+
// Did we already evaluate this predecessor?
916+
if (!IV.second)
917+
continue;
918+
919+
// Let's hope that when coming from predecessor Pred, all elements of the
920+
// aggregate produced by OrigIVI must have been originally extracted from
921+
// the same aggregate. Is that so? Can we find said original aggregate?
922+
SourceAggregate = FindCommonSourceAggregate(Pred);
923+
if (Describe(SourceAggregate) != SourceAggregate::Found)
924+
return nullptr; // Give up.
925+
IV.first->second = *SourceAggregate;
926+
}
927+
928+
// All good! Now we just need to thread the source aggregates here.
929+
auto *PHI = PHINode::Create(AggTy, SourceAggregates.size(),
930+
OrigIVI.getName() + ".merged");
931+
for (const std::pair<BasicBlock *, Value *> &SourceAggregate :
932+
SourceAggregates)
933+
PHI->addIncoming(SourceAggregate.second, SourceAggregate.first);
934+
935+
++NumAggregateReconstructionsSimplified;
936+
return PHI;
937+
};
938+
697939
/// Try to find redundant insertvalue instructions, like the following ones:
698940
/// %0 = insertvalue { i8, i32 } undef, i8 %x, 0
699941
/// %1 = insertvalue { i8, i32 } %0, i8 %y, 0
@@ -726,6 +968,10 @@ Instruction *InstCombinerImpl::visitInsertValueInst(InsertValueInst &I) {
726968

727969
if (IsRedundant)
728970
return replaceInstUsesWith(I, I.getOperand(0));
971+
972+
if (Instruction *NewI = foldAggregateConstructionIntoAggregateReuse(I))
973+
return NewI;
974+
729975
return nullptr;
730976
}
731977

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3647,10 +3647,14 @@ bool InstCombinerImpl::run() {
36473647
BasicBlock *InstParent = I->getParent();
36483648
BasicBlock::iterator InsertPos = I->getIterator();
36493649

3650-
// If we replace a PHI with something that isn't a PHI, fix up the
3651-
// insertion point.
3652-
if (!isa<PHINode>(Result) && isa<PHINode>(InsertPos))
3653-
InsertPos = InstParent->getFirstInsertionPt();
3650+
// Are we replace a PHI with something that isn't a PHI, or vice versa?
3651+
if (isa<PHINode>(Result) != isa<PHINode>(I)) {
3652+
// We need to fix up the insertion point.
3653+
if (isa<PHINode>(I)) // PHI -> Non-PHI
3654+
InsertPos = InstParent->getFirstInsertionPt();
3655+
else // Non-PHI -> PHI
3656+
InsertPos = InstParent->getFirstNonPHI()->getIterator();
3657+
}
36543658

36553659
InstParent->getInstList().insert(InsertPos, Result);
36563660

0 commit comments

Comments
 (0)