[IR] Use block numbers in PredIteratorCache #101885

aengelke · 2024-08-04T12:31:46Z

Avoid using a DenseMap of BasicBlocks when a vector is sufficient. This only has a few users, so gains are low, but so is the effort.

There's no longer a separate map which only holds sizes. All users that used size() also use get() afterwards, so it's always beneficial to compute both.

c-t-t -0.08% in stage2-O3

Avoid using DenseMap of BasicBlocks when a vector is sufficient. This only has a few users, so gains are low, but so is the effort. There's no longer a separate map which only holds sizes. All users that used size() also use get() afterwards, so it's always beneficial to compute both.

llvmbot · 2024-08-04T12:32:20Z

@llvm/pr-subscribers-llvm-ir

Author: Alexis Engelke (aengelke)

Changes

Avoid using a DenseMap of BasicBlocks when a vector is sufficient. This only has a few users, so gains are low, but so is the effort.

There's no longer a separate map which only holds sizes. All users that used size() also use get() afterwards, so it's always beneficial to compute both.

c-t-t -0.08% in stage2-O3

Full diff: https://github.com/llvm/llvm-project/pull/101885.diff

1 Files Affected:

(modified) llvm/include/llvm/IR/PredIteratorCache.h (+25-35)

diff --git a/llvm/include/llvm/IR/PredIteratorCache.h b/llvm/include/llvm/IR/PredIteratorCache.h
index fc8cf20e9f759..c9603c870820b 100644
--- a/llvm/include/llvm/IR/PredIteratorCache.h
+++ b/llvm/include/llvm/IR/PredIteratorCache.h
@@ -25,52 +25,42 @@ namespace llvm {
 /// predecessor iterator queries.  This is useful for code that repeatedly
 /// wants the predecessor list for the same blocks.
 class PredIteratorCache {
-  /// BlockToPredsMap - Pointer to null-terminated list.
-  mutable DenseMap<BasicBlock *, BasicBlock **> BlockToPredsMap;
-  mutable DenseMap<BasicBlock *, unsigned> BlockToPredCountMap;
+  /// Storage, indexed by block number.
+  SmallVector<ArrayRef<BasicBlock *>> Storage;
+  /// Block number epoch to guard against renumberings.
+  unsigned BlockNumberEpoch;
 
   /// Memory - This is the space that holds cached preds.
   BumpPtrAllocator Memory;
 
-private:
-  /// GetPreds - Get a cached list for the null-terminated predecessor list of
-  /// the specified block.  This can be used in a loop like this:
-  ///   for (BasicBlock **PI = PredCache->GetPreds(BB); *PI; ++PI)
-  ///      use(*PI);
-  /// instead of:
-  /// for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI)
-  BasicBlock **GetPreds(BasicBlock *BB) {
-    BasicBlock **&Entry = BlockToPredsMap[BB];
-    if (Entry)
-      return Entry;
-
-    SmallVector<BasicBlock *, 32> PredCache(predecessors(BB));
-    PredCache.push_back(nullptr); // null terminator.
+public:
+  size_t size(BasicBlock *BB) { return get(BB).size(); }
+  ArrayRef<BasicBlock *> get(BasicBlock *BB) {
+#ifndef NDEBUG
+    // In debug builds, verify that no renumbering has occured.
+    if (Storage.empty())
+      BlockNumberEpoch = BB->getParent()->getBlockNumberEpoch();
+    else
+      assert(BlockNumberEpoch == BB->getParent()->getBlockNumberEpoch() &&
+             "Blocks renumbered during lifetime of PredIteratorCache");
+#endif
 
-    BlockToPredCountMap[BB] = PredCache.size() - 1;
+    if (LLVM_LIKELY(BB->getNumber() < Storage.size()))
+      if (auto Res = Storage[BB->getNumber()]; Res.data())
+        return Res;
 
-    Entry = Memory.Allocate<BasicBlock *>(PredCache.size());
-    std::copy(PredCache.begin(), PredCache.end(), Entry);
-    return Entry;
-  }
+    if (BB->getNumber() >= Storage.size())
+      Storage.resize(BB->getParent()->getMaxBlockNumber());
 
-  unsigned GetNumPreds(BasicBlock *BB) const {
-    auto Result = BlockToPredCountMap.find(BB);
-    if (Result != BlockToPredCountMap.end())
-      return Result->second;
-    return BlockToPredCountMap[BB] = pred_size(BB);
-  }
-
-public:
-  size_t size(BasicBlock *BB) const { return GetNumPreds(BB); }
-  ArrayRef<BasicBlock *> get(BasicBlock *BB) {
-    return ArrayRef(GetPreds(BB), GetNumPreds(BB));
+    SmallVector<BasicBlock *, 32> PredCache(predecessors(BB));
+    BasicBlock **Data = Memory.Allocate<BasicBlock *>(PredCache.size());
+    std::copy(PredCache.begin(), PredCache.end(), Data);
+    return Storage[BB->getNumber()] = ArrayRef(Data, PredCache.size());
   }
 
   /// clear - Remove all information.
   void clear() {
-    BlockToPredsMap.clear();
-    BlockToPredCountMap.clear();
+    Storage.clear();
     Memory.Reset();
   }
 };

nikic

Someone skeptical about this one. It means that PredIteratorCache should not be used in cases where only a small fraction of blocks will be queried.

The change to store ArrayRef instead of having two separate maps looks good to me though.

aengelke · 2024-08-04T14:42:17Z

where only a small fraction of blocks will be queried.

This boils down to the question: when is a vector faster than a DenseMap? This depends on:

number of inserted elements (I)
total element space (vector only) (N)
element size (16 bytes in this case)

I made a small benchmark (dmvsvec.txt -- clang++ -O3 -DNDEBUG -fno-exceptions -fno-rtti) that inserts I elements (with random number in 0..N-1) into a map/vector (vector preallocated to size N) and then probes all I elements. Not a real scientific benchmark, single machine only, and only to get an intuition where the boundaries are. Key results:

For I=1, the vector is better than the map for N<70 (2%, so the map is better if less than 2% of the elements are inserted+probed)
For I=5, the break even is at N~80 (6%)
For I=10, the break even is at N~110 (9%)
For I=20, the break even is at N~160 (12%)
For I=40, the break even is at N~270 (15%)
For I=150, the break even is at N~2000 (8%) -- for many insertions, when the DenseMap has to grow larger element sizes, the vector becomes more beneficial again.

For PredIteratorCache, there are likely multiple probes, so the vector is preferable in even more cases. (E.g., I=5, and 5 probes instead of one => N~100).

I have no statistics about the PIC users, but I think that they probe >5% of the basic blocks of a function is a somewhat reasonable assumption?

nikic · 2024-08-04T14:55:14Z

I have no statistics about the PIC users, but I think that they probe >5% of the basic blocks of a function is a somewhat reasonable assumption?

Not really. For example LCSSA formation uses PredIterCache for exit blocks, so it probably usually only has one entry when forming LCSSA for a single loop? It could be pushed to a higher level and be used for e.g. full recursive LCSSA formation, but that's how it is right now. The use in scalar promotion is for all loop blocks using a pointer, which may also only be tiny fraction of all blocks.

The data does suggest that it does work out fine on average, but I'm concerned about degenerate cases.

aengelke · 2024-08-04T15:58:39Z

(Meh, didn't intend to push my experiment to this branch... just wanted to get performance results (perf/...) for a two-level scheme with a BitVector to reduce memset overhead.)

Edit: removed commit, memset is cheaper than BitVector lookup (c-t-t).

Use a single map storing ArrayRef instead of two separate maps for the base pointer and size. This saves one map lookup per cache accesses. This is an alternative to llvm#101885.

Use a single map storing ArrayRef instead of two separate maps for the base pointer and size. This saves one map lookup per cache accesses. This is an alternative to #101885 with a similar compile-time improvement.

aengelke requested a review from nikic August 4, 2024 12:31

llvmbot added the llvm:ir label Aug 4, 2024

nikic reviewed Aug 4, 2024

View reviewed changes

aengelke force-pushed the predcachebbnums branch from 032ce73 to ad71aea Compare August 4, 2024 16:33

nikic mentioned this pull request Aug 5, 2024

[IR] Use single map in PredIteratorCache (NFC) #101950

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[IR] Use block numbers in PredIteratorCache #101885

[IR] Use block numbers in PredIteratorCache #101885

Uh oh!

aengelke commented Aug 4, 2024

Uh oh!

llvmbot commented Aug 4, 2024

Uh oh!

nikic left a comment

Uh oh!

aengelke commented Aug 4, 2024

Uh oh!

nikic commented Aug 4, 2024

Uh oh!

aengelke commented Aug 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

[IR] Use block numbers in PredIteratorCache #101885

Are you sure you want to change the base?

[IR] Use block numbers in PredIteratorCache #101885

Uh oh!

Conversation

aengelke commented Aug 4, 2024

Uh oh!

llvmbot commented Aug 4, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

aengelke commented Aug 4, 2024

Uh oh!

nikic commented Aug 4, 2024

Uh oh!

aengelke commented Aug 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aengelke commented Aug 4, 2024 •

edited

Loading