-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[IR] Use block numbers in PredIteratorCache #101885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Avoid using DenseMap of BasicBlocks when a vector is sufficient. This only has a few users, so gains are low, but so is the effort. There's no longer a separate map which only holds sizes. All users that used size() also use get() afterwards, so it's always beneficial to compute both.
@llvm/pr-subscribers-llvm-ir Author: Alexis Engelke (aengelke) ChangesAvoid using a DenseMap of BasicBlocks when a vector is sufficient. This only has a few users, so gains are low, but so is the effort. There's no longer a separate map which only holds sizes. All users that used size() also use get() afterwards, so it's always beneficial to compute both. c-t-t -0.08% in stage2-O3 Full diff: https://github.com/llvm/llvm-project/pull/101885.diff 1 Files Affected:
diff --git a/llvm/include/llvm/IR/PredIteratorCache.h b/llvm/include/llvm/IR/PredIteratorCache.h
index fc8cf20e9f759..c9603c870820b 100644
--- a/llvm/include/llvm/IR/PredIteratorCache.h
+++ b/llvm/include/llvm/IR/PredIteratorCache.h
@@ -25,52 +25,42 @@ namespace llvm {
/// predecessor iterator queries. This is useful for code that repeatedly
/// wants the predecessor list for the same blocks.
class PredIteratorCache {
- /// BlockToPredsMap - Pointer to null-terminated list.
- mutable DenseMap<BasicBlock *, BasicBlock **> BlockToPredsMap;
- mutable DenseMap<BasicBlock *, unsigned> BlockToPredCountMap;
+ /// Storage, indexed by block number.
+ SmallVector<ArrayRef<BasicBlock *>> Storage;
+ /// Block number epoch to guard against renumberings.
+ unsigned BlockNumberEpoch;
/// Memory - This is the space that holds cached preds.
BumpPtrAllocator Memory;
-private:
- /// GetPreds - Get a cached list for the null-terminated predecessor list of
- /// the specified block. This can be used in a loop like this:
- /// for (BasicBlock **PI = PredCache->GetPreds(BB); *PI; ++PI)
- /// use(*PI);
- /// instead of:
- /// for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI)
- BasicBlock **GetPreds(BasicBlock *BB) {
- BasicBlock **&Entry = BlockToPredsMap[BB];
- if (Entry)
- return Entry;
-
- SmallVector<BasicBlock *, 32> PredCache(predecessors(BB));
- PredCache.push_back(nullptr); // null terminator.
+public:
+ size_t size(BasicBlock *BB) { return get(BB).size(); }
+ ArrayRef<BasicBlock *> get(BasicBlock *BB) {
+#ifndef NDEBUG
+ // In debug builds, verify that no renumbering has occured.
+ if (Storage.empty())
+ BlockNumberEpoch = BB->getParent()->getBlockNumberEpoch();
+ else
+ assert(BlockNumberEpoch == BB->getParent()->getBlockNumberEpoch() &&
+ "Blocks renumbered during lifetime of PredIteratorCache");
+#endif
- BlockToPredCountMap[BB] = PredCache.size() - 1;
+ if (LLVM_LIKELY(BB->getNumber() < Storage.size()))
+ if (auto Res = Storage[BB->getNumber()]; Res.data())
+ return Res;
- Entry = Memory.Allocate<BasicBlock *>(PredCache.size());
- std::copy(PredCache.begin(), PredCache.end(), Entry);
- return Entry;
- }
+ if (BB->getNumber() >= Storage.size())
+ Storage.resize(BB->getParent()->getMaxBlockNumber());
- unsigned GetNumPreds(BasicBlock *BB) const {
- auto Result = BlockToPredCountMap.find(BB);
- if (Result != BlockToPredCountMap.end())
- return Result->second;
- return BlockToPredCountMap[BB] = pred_size(BB);
- }
-
-public:
- size_t size(BasicBlock *BB) const { return GetNumPreds(BB); }
- ArrayRef<BasicBlock *> get(BasicBlock *BB) {
- return ArrayRef(GetPreds(BB), GetNumPreds(BB));
+ SmallVector<BasicBlock *, 32> PredCache(predecessors(BB));
+ BasicBlock **Data = Memory.Allocate<BasicBlock *>(PredCache.size());
+ std::copy(PredCache.begin(), PredCache.end(), Data);
+ return Storage[BB->getNumber()] = ArrayRef(Data, PredCache.size());
}
/// clear - Remove all information.
void clear() {
- BlockToPredsMap.clear();
- BlockToPredCountMap.clear();
+ Storage.clear();
Memory.Reset();
}
};
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone skeptical about this one. It means that PredIteratorCache should not be used in cases where only a small fraction of blocks will be queried.
The change to store ArrayRef instead of having two separate maps looks good to me though.
This boils down to the question: when is a vector faster than a DenseMap? This depends on:
I made a small benchmark (dmvsvec.txt --
For PredIteratorCache, there are likely multiple probes, so the vector is preferable in even more cases. (E.g., I=5, and 5 probes instead of one => N~100). I have no statistics about the PIC users, but I think that they probe >5% of the basic blocks of a function is a somewhat reasonable assumption? |
Not really. For example LCSSA formation uses PredIterCache for exit blocks, so it probably usually only has one entry when forming LCSSA for a single loop? It could be pushed to a higher level and be used for e.g. full recursive LCSSA formation, but that's how it is right now. The use in scalar promotion is for all loop blocks using a pointer, which may also only be tiny fraction of all blocks. The data does suggest that it does work out fine on average, but I'm concerned about degenerate cases. |
(Meh, didn't intend to push my experiment to this branch... just wanted to get performance results (perf/...) for a two-level scheme with a BitVector to reduce memset overhead.) Edit: removed commit, memset is cheaper than BitVector lookup (c-t-t). |
032ce73
to
ad71aea
Compare
Use a single map storing ArrayRef instead of two separate maps for the base pointer and size. This saves one map lookup per cache accesses. This is an alternative to llvm#101885.
Use a single map storing ArrayRef instead of two separate maps for the base pointer and size. This saves one map lookup per cache accesses. This is an alternative to #101885 with a similar compile-time improvement.
Avoid using a DenseMap of BasicBlocks when a vector is sufficient. This only has a few users, so gains are low, but so is the effort.
There's no longer a separate map which only holds sizes. All users that used size() also use get() afterwards, so it's always beneficial to compute both.
c-t-t -0.08% in stage2-O3