[MLIR][Mem2Reg] Improve performance by avoiding recomputations #91444

Dinistro · 2024-05-08T08:12:45Z

This commit ensures that Mem2Reg reuses the DominanceInfo as well as block index maps to avoid expensive recomputations. Due to the recent migration to OpBuilder, the promotion of a slot does no longer replace blocks. Having stable blocks makes the DominanceInfo preservable and additionally allows to cache block index maps between different promotions.

Performance measurements on very large functions show an up to 4x speedup by these changes.

This commit ensures that Mem2Reg reuses the `DominanceInfo` as well as block index maps to avoid expensive recomputations. Due to the recent migration to `OpBuilder`, the promotion of a slot does no longer replace blocks. Having stable blocks makes the `DominanceInfo` preservable and additionally allows to cache block index maps between different promotions. Performance measurements on very large functions show an up to 4x speedup by these changes.

llvmbot · 2024-05-08T08:13:14Z

@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Christian Ulmann (Dinistro)

Changes

This commit ensures that Mem2Reg reuses the DominanceInfo as well as block index maps to avoid expensive recomputations. Due to the recent migration to OpBuilder, the promotion of a slot does no longer replace blocks. Having stable blocks makes the DominanceInfo preservable and additionally allows to cache block index maps between different promotions.

Performance measurements on very large functions show an up to 4x speedup by these changes.

Full diff: https://github.com/llvm/llvm-project/pull/91444.diff

2 Files Affected:

(modified) mlir/include/mlir/Transforms/Mem2Reg.h (+1)
(modified) mlir/lib/Transforms/Mem2Reg.cpp (+45-19)

diff --git a/mlir/include/mlir/Transforms/Mem2Reg.h b/mlir/include/mlir/Transforms/Mem2Reg.h
index b4f939d654142..fee7fb312750c 100644
--- a/mlir/include/mlir/Transforms/Mem2Reg.h
+++ b/mlir/include/mlir/Transforms/Mem2Reg.h
@@ -28,6 +28,7 @@ struct Mem2RegStatistics {
 LogicalResult
 tryToPromoteMemorySlots(ArrayRef<PromotableAllocationOpInterface> allocators,
                         OpBuilder &builder, const DataLayout &dataLayout,
+                        DominanceInfo &dominance,
                         Mem2RegStatistics statistics = {});
 
 } // namespace mlir
diff --git a/mlir/lib/Transforms/Mem2Reg.cpp b/mlir/lib/Transforms/Mem2Reg.cpp
index 1d7ba4ca4f83e..74e6697e87734 100644
--- a/mlir/lib/Transforms/Mem2Reg.cpp
+++ b/mlir/lib/Transforms/Mem2Reg.cpp
@@ -18,7 +18,6 @@
 #include "mlir/Transforms/Passes.h"
 #include "mlir/Transforms/RegionUtils.h"
 #include "llvm/ADT/STLExtras.h"
-#include "llvm/Support/Casting.h"
 #include "llvm/Support/GenericIteratedDominanceFrontier.h"
 
 namespace mlir {
@@ -158,6 +157,8 @@ class MemorySlotPromotionAnalyzer {
   const DataLayout &dataLayout;
 };
 
+using BlockIndexCache = DenseMap<Region *, DenseMap<Block *, size_t>>;
+
 /// The MemorySlotPromoter handles the state of promoting a memory slot. It
 /// wraps a slot and its associated allocator. This will perform the mutation of
 /// IR.
@@ -166,7 +167,8 @@ class MemorySlotPromoter {
   MemorySlotPromoter(MemorySlot slot, PromotableAllocationOpInterface allocator,
                      OpBuilder &builder, DominanceInfo &dominance,
                      const DataLayout &dataLayout, MemorySlotPromotionInfo info,
-                     const Mem2RegStatistics &statistics);
+                     const Mem2RegStatistics &statistics,
+                     BlockIndexCache &blockIndexCache);
 
   /// Actually promotes the slot by mutating IR. Promoting a slot DOES
   /// invalidate the MemorySlotPromotionInfo of other slots. Preparation of
@@ -207,6 +209,9 @@ class MemorySlotPromoter {
   const DataLayout &dataLayout;
   MemorySlotPromotionInfo info;
   const Mem2RegStatistics &statistics;
+
+  /// Shared cache of block indices of specific regions.
+  BlockIndexCache &blockIndexCache;
 };
 
 } // namespace
@@ -214,9 +219,11 @@ class MemorySlotPromoter {
 MemorySlotPromoter::MemorySlotPromoter(
     MemorySlot slot, PromotableAllocationOpInterface allocator,
     OpBuilder &builder, DominanceInfo &dominance, const DataLayout &dataLayout,
-    MemorySlotPromotionInfo info, const Mem2RegStatistics &statistics)
+    MemorySlotPromotionInfo info, const Mem2RegStatistics &statistics,
+    BlockIndexCache &blockIndexCache)
     : slot(slot), allocator(allocator), builder(builder), dominance(dominance),
-      dataLayout(dataLayout), info(std::move(info)), statistics(statistics) {
+      dataLayout(dataLayout), info(std::move(info)), statistics(statistics),
+      blockIndexCache(blockIndexCache) {
 #ifndef NDEBUG
   auto isResultOrNewBlockArgument = [&]() {
     if (BlockArgument arg = dyn_cast<BlockArgument>(slot.ptr))
@@ -500,15 +507,29 @@ void MemorySlotPromoter::computeReachingDefInRegion(Region *region,
   }
 }
 
+/// Gets or creates a block index mapping for `region`.
+static const DenseMap<Block *, size_t> &
+getOrCreateBlockIndices(BlockIndexCache &blockIndexCache, Region *region) {
+  auto [it, created] = blockIndexCache.try_emplace(region);
+  if (!created)
+    return it->second;
+
+  DenseMap<Block *, size_t> &blockIndices = it->second;
+  SetVector<Block *> topologicalOrder = getTopologicallySortedBlocks(*region);
+  for (auto [index, block] : llvm::enumerate(topologicalOrder))
+    blockIndices[block] = index;
+  return blockIndices;
+}
+
 /// Sorts `ops` according to dominance. Relies on the topological order of basic
-/// blocks to get a deterministic ordering.
-static void dominanceSort(SmallVector<Operation *> &ops, Region &region) {
+/// blocks to get a deterministic ordering. Uses `blockIndexCache` to avoid the
+/// potentially expensive recomputation of a block index map.
+static void dominanceSort(SmallVector<Operation *> &ops, Region &region,
+                          BlockIndexCache &blockIndexCache) {
   // Produce a topological block order and construct a map to lookup the indices
   // of blocks.
-  DenseMap<Block *, size_t> topoBlockIndices;
-  SetVector<Block *> topologicalOrder = getTopologicallySortedBlocks(region);
-  for (auto [index, block] : llvm::enumerate(topologicalOrder))
-    topoBlockIndices[block] = index;
+  const DenseMap<Block *, size_t> &topoBlockIndices =
+      getOrCreateBlockIndices(blockIndexCache, &region);
 
   // Combining the topological order of the basic blocks together with block
   // internal operation order guarantees a deterministic, dominance respecting
@@ -527,7 +548,8 @@ void MemorySlotPromoter::removeBlockingUses() {
       llvm::make_first_range(info.userToBlockingUses));
 
   // Sort according to dominance.
-  dominanceSort(usersToRemoveUses, *slot.ptr.getParentBlock()->getParent());
+  dominanceSort(usersToRemoveUses, *slot.ptr.getParentBlock()->getParent(),
+                blockIndexCache);
 
   llvm::SmallVector<Operation *> toErase;
   // List of all replaced values in the slot.
@@ -605,20 +627,24 @@ void MemorySlotPromoter::promoteSlot() {
 
 LogicalResult mlir::tryToPromoteMemorySlots(
     ArrayRef<PromotableAllocationOpInterface> allocators, OpBuilder &builder,
-    const DataLayout &dataLayout, Mem2RegStatistics statistics) {
+    const DataLayout &dataLayout, DominanceInfo &dominance,
+    Mem2RegStatistics statistics) {
   bool promotedAny = false;
 
+  // Cache for block index maps. This is required to avoid expensive
+  // recomputations.
+  BlockIndexCache blockIndexCache;
+
   for (PromotableAllocationOpInterface allocator : allocators) {
     for (MemorySlot slot : allocator.getPromotableSlots()) {
       if (slot.ptr.use_empty())
         continue;
 
-      DominanceInfo dominance;
       MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
       std::optional<MemorySlotPromotionInfo> info = analyzer.computeInfo();
       if (info) {
         MemorySlotPromoter(slot, allocator, builder, dominance, dataLayout,
-                           std::move(*info), statistics)
+                           std::move(*info), statistics, blockIndexCache)
             .promoteSlot();
         promotedAny = true;
       }
@@ -640,6 +666,10 @@ struct Mem2Reg : impl::Mem2RegBase<Mem2Reg> {
 
     bool changed = false;
 
+    auto &dataLayoutAnalysis = getAnalysis<DataLayoutAnalysis>();
+    const DataLayout &dataLayout = dataLayoutAnalysis.getAtOrAbove(scopeOp);
+    auto &dominance = getAnalysis<DominanceInfo>();
+
     for (Region &region : scopeOp->getRegions()) {
       if (region.getBlocks().empty())
         continue;
@@ -655,16 +685,12 @@ struct Mem2Reg : impl::Mem2RegBase<Mem2Reg> {
           allocators.emplace_back(allocator);
         });
 
-        auto &dataLayoutAnalysis = getAnalysis<DataLayoutAnalysis>();
-        const DataLayout &dataLayout = dataLayoutAnalysis.getAtOrAbove(scopeOp);
-
         // Attempt promoting until no promotion succeeds.
         if (failed(tryToPromoteMemorySlots(allocators, builder, dataLayout,
-                                           statistics)))
+                                           dominance, statistics)))
           break;
 
         changed = true;
-        getAnalysisManager().invalidate({});
       }
     }
     if (!changed)

Dinistro · 2024-05-08T08:15:32Z

mlir/lib/Transforms/Mem2Reg.cpp

+  // Cache for block index maps. This is required to avoid expensive
+  // recomputations.
+  BlockIndexCache blockIndexCache;
+
  for (PromotableAllocationOpInterface allocator : allocators) {


On a slightly unrelated note: Would it make sense to move in the repeated promotion try loop to this place? I guess that keeping some of the caches around for longer would be beneficial. Additionally, we could avoid re-walking the entire region to get all the allocators again.

This sounds like a reasonable follow up step to me. The advantage is that the block indices could be cached longer and the possible disadvantage is that tryToPromoteMemorySlots always promotes all allocas of the region (which today we do anyways)? Are there other trade-offs?

It would only attempt promotion on the allocas sent in originally. Just retrying on the set of sent in ones - minus the promoted ones, seems to be the most efficient and expected behavior.

gysit

LGTM modulo nits!

mlir/lib/Transforms/Mem2Reg.cpp

gysit · 2024-05-08T08:28:35Z

mlir/lib/Transforms/Mem2Reg.cpp

+  // Cache for block index maps. This is required to avoid expensive
+  // recomputations.
+  BlockIndexCache blockIndexCache;
+
  for (PromotableAllocationOpInterface allocator : allocators) {


This sounds like a reasonable follow up step to me. The advantage is that the block indices could be cached longer and the possible disadvantage is that tryToPromoteMemorySlots always promotes all allocas of the region (which today we do anyways)? Are there other trade-offs?

mlir/lib/Transforms/Mem2Reg.cpp

Dinistro requested a review from gysit May 8, 2024 08:12

Dinistro requested a review from Moxinilian as a code owner May 8, 2024 08:12

llvmbot added mlir:core MLIR Core Infrastructure mlir labels May 8, 2024

Dinistro commented May 8, 2024

View reviewed changes

gysit approved these changes May 8, 2024

View reviewed changes

address review comments

c35e1f2

Moxinilian approved these changes May 8, 2024

View reviewed changes

Dinistro merged commit c6efcc9 into main May 8, 2024

Dinistro deleted the users/dinistro/change-memory-slot-interface-to-builders branch May 8, 2024 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR][Mem2Reg] Improve performance by avoiding recomputations #91444

[MLIR][Mem2Reg] Improve performance by avoiding recomputations #91444

Uh oh!

Dinistro commented May 8, 2024

Uh oh!

llvmbot commented May 8, 2024 •

edited

Loading

Uh oh!

Dinistro May 8, 2024

Uh oh!

gysit May 8, 2024

Uh oh!

Dinistro May 8, 2024

Uh oh!

gysit left a comment

Uh oh!

Uh oh!

gysit May 8, 2024

Uh oh!

Uh oh!

Uh oh!

[MLIR][Mem2Reg] Improve performance by avoiding recomputations #91444

[MLIR][Mem2Reg] Improve performance by avoiding recomputations #91444

Uh oh!

Conversation

Dinistro commented May 8, 2024

Uh oh!

llvmbot commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dinistro May 8, 2024

Choose a reason for hiding this comment

Uh oh!

gysit May 8, 2024

Choose a reason for hiding this comment

Uh oh!

Dinistro May 8, 2024

Choose a reason for hiding this comment

Uh oh!

gysit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gysit May 8, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented May 8, 2024 •

edited

Loading