[scudo] Add partial chunk heuristic to retrieval algorithm. #105009

JoshuaMBa · 2024-08-20T15:24:26Z

Previously the secondary cache retrieval algorithm would not allow retrievals of memory chunks where the number of unused bytes would be greater than than MaxUnreleasedCachePages * PageSize bytes. This meant that even if a memory chunk satisfied the requirements of the optimal fit algorithm, it may not be returned. This remains true if memory tagging is enabled. However, if memory tagging is disabled, a new heuristic has been put in place. Specifically, If a memory chunk is a non-optimal fit, the cache retrieval algorithm will attempt to release the excess memory to force a cache hit while keeping RSS down.

In the event that a memory chunk is a non-optimal fit, the retrieval algorithm will release excess memory as long as the amount of memory to be released is less than or equal to 4 Pages. If the amount of memory to be released exceeds 4 Pages, the retrieval algorithm will not consider that cached memory chunk valid for retrieval.

This change also addresses an alignment issue in a test case submitted in #104807.

llvmbot · 2024-08-20T15:25:05Z

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Joshua Baehring (JoshuaMBa)

Changes

Previously the secondary cache retrieval algorithm would not allow retrievals of memory chunks where the number of unused bytes would be greater than than MaxUnusedCachePages * PageSize bytes. This meant that even if a memory chunk satisfied the requirements of the optimal fit algorithm, it may not be returned. This remains true if memory tagging is enabled. However, if memory tagging is disabled, a new heuristic has been put in place. Specifically, If a memory chunk is a non-optimal fit, the cache retrieval algorithm will attempt to release the excess memory to force a cache hit while keeping RSS down.

In the event that a memory chunk is a non-optimal fit, the retrieval algorithm will release excess memory as long as the amount of memory to be released is less than or equal to 4 Pages. If the amount of memory to be released exceeds 4 Pages, the retrieval algorithm will not consider that cached memory chunk valid for retrieval.

Full diff: https://github.com/llvm/llvm-project/pull/105009.diff

2 Files Affected:

(modified) compiler-rt/lib/scudo/standalone/secondary.h (+95-30)
(modified) compiler-rt/lib/scudo/standalone/tests/secondary_test.cpp (+34-4)

diff --git a/compiler-rt/lib/scudo/standalone/secondary.h b/compiler-rt/lib/scudo/standalone/secondary.h
index 27f8697db7838f..2adea34c13b426 100644
--- a/compiler-rt/lib/scudo/standalone/secondary.h
+++ b/compiler-rt/lib/scudo/standalone/secondary.h
@@ -72,6 +72,14 @@ namespace {
 struct CachedBlock {
   static constexpr u16 CacheIndexMax = UINT16_MAX;
   static constexpr u16 InvalidEntry = CacheIndexMax;
+  //   * MaxCachePagesToRelease default is currently 4
+  //        - We arrived at this value after noticing that mapping
+  //        in larger memory regions performs better than releasing
+  //        memory and forcing a cache hit. According to the data,
+  //        it suggests that beyond 16KB, the release execution time is
+  //        longer than the map execution time. In this way, the default
+  //        is dependent on the platform.
+  static constexpr uptr MaxCachePagesToRelease = 4U;
 
   uptr CommitBase = 0;
   uptr CommitSize = 0;
@@ -90,8 +98,9 @@ struct CachedBlock {
 template <typename Config> class MapAllocatorNoCache {
 public:
   void init(UNUSED s32 ReleaseToOsInterval) {}
-  CachedBlock retrieve(UNUSED uptr Size, UNUSED uptr Alignment,
-                       UNUSED uptr HeadersSize, UNUSED uptr &EntryHeaderPos) {
+  CachedBlock retrieve(UNUSED uptr MaxAllowedFragmentedBytes, UNUSED uptr Size,
+                       UNUSED uptr Alignment, UNUSED uptr HeadersSize,
+                       UNUSED uptr &EntryHeaderPos) {
     return {};
   }
   void store(UNUSED Options Options, UNUSED uptr CommitBase,
@@ -334,13 +343,14 @@ class MapAllocatorCache {
     }
   }
 
-  CachedBlock retrieve(uptr Size, uptr Alignment, uptr HeadersSize,
-                       uptr &EntryHeaderPos) EXCLUDES(Mutex) {
+  CachedBlock retrieve(uptr MaxAllowedFragmentedBytes, uptr Size,
+                       uptr Alignment, uptr HeadersSize, uptr &EntryHeaderPos)
+      EXCLUDES(Mutex) {
     const uptr PageSize = getPageSizeCached();
     // 10% of the requested size proved to be the optimal choice for
     // retrieving cached blocks after testing several options.
     constexpr u32 FragmentedBytesDivisor = 10;
-    bool Found = false;
+    bool FoundOptimalFit = false;
     CachedBlock Entry;
     EntryHeaderPos = 0;
     {
@@ -348,47 +358,90 @@ class MapAllocatorCache {
       CallsToRetrieve++;
       if (EntriesCount == 0)
         return {};
-      u32 OptimalFitIndex = 0;
+      u16 RetrievedIndex = CachedBlock::InvalidEntry;
       uptr MinDiff = UINTPTR_MAX;
-      for (u32 I = LRUHead; I != CachedBlock::InvalidEntry;
+
+      //  Since allocation sizes don't always match cached memory chunk sizes
+      //  we allow some memory to be unused (called fragmented bytes). The
+      //  amount of unused bytes is exactly EntryHeaderPos - CommitBase.
+      //
+      //        CommitBase                CommitBase + CommitSize
+      //          V                              V
+      //      +---+------------+-----------------+---+
+      //      |   |            |                 |   |
+      //      +---+------------+-----------------+---+
+      //      ^                ^                     ^
+      //    Guard         EntryHeaderPos          Guard-page-end
+      //    page-begin
+      //
+      //  [EntryHeaderPos, CommitBase + CommitSize) contains the user data as
+      //  well as the header metadata. If EntryHeaderPos - CommitBase exceeds
+      //  MaxAllowedFragmentedBytes, the cached memory chunk is not considered
+      //  valid for retrieval.
+      for (u16 I = LRUHead; I != CachedBlock::InvalidEntry;
            I = Entries[I].Next) {
         const uptr CommitBase = Entries[I].CommitBase;
         const uptr CommitSize = Entries[I].CommitSize;
         const uptr AllocPos =
             roundDown(CommitBase + CommitSize - Size, Alignment);
         const uptr HeaderPos = AllocPos - HeadersSize;
-        if (HeaderPos > CommitBase + CommitSize)
+        if (HeaderPos > CommitBase + CommitSize || HeaderPos < CommitBase)
           continue;
-        if (HeaderPos < CommitBase ||
-            AllocPos > CommitBase + PageSize * MaxUnusedCachePages) {
+
+        const uptr Diff = roundDown(HeaderPos, PageSize) - CommitBase;
+
+        if (Diff > MaxAllowedFragmentedBytes || Diff >= MinDiff)
           continue;
-        }
-        Found = true;
-        const uptr Diff = HeaderPos - CommitBase;
-        // immediately use a cached block if it's size is close enough to the
-        // requested size.
-        const uptr MaxAllowedFragmentedBytes =
+
+        MinDiff = Diff;
+        RetrievedIndex = I;
+        EntryHeaderPos = HeaderPos;
+
+        const uptr OptimalFitThesholdBytes =
             (CommitBase + CommitSize - HeaderPos) / FragmentedBytesDivisor;
-        if (Diff <= MaxAllowedFragmentedBytes) {
-          OptimalFitIndex = I;
-          EntryHeaderPos = HeaderPos;
+        if (Diff <= OptimalFitThesholdBytes) {
+          FoundOptimalFit = true;
           break;
         }
-        // keep track of the smallest cached block
-        // that is greater than (AllocSize + HeaderSize)
-        if (Diff > MinDiff)
-          continue;
-        OptimalFitIndex = I;
-        MinDiff = Diff;
-        EntryHeaderPos = HeaderPos;
       }
-      if (Found) {
-        Entry = Entries[OptimalFitIndex];
-        remove(OptimalFitIndex);
+      if (RetrievedIndex != CachedBlock::InvalidEntry) {
+        Entry = Entries[RetrievedIndex];
+        remove(RetrievedIndex);
         SuccessfulRetrieves++;
       }
     }
 
+    //  The difference between the retrieved memory chunk and the request
+    //  size is at most MaxAllowedFragmentedBytes
+    //
+    //  /     MaxAllowedFragmentedBytes      \
+    // +--------------------------+-----------+
+    // |                          |           |
+    // +--------------------------+-----------+
+    //  \ Bytes to be released   /      ^
+    //                                  |
+    //                         (may or may not be commited)
+    //
+    //   The maximum number of bytes released to the OS is capped by
+    //   MaxCachePagesToRelease
+    //
+    //   TODO : Consider making MaxCachePagesToRelease configurable since
+    //   the release to OS API can vary across systems.
+    if (!FoundOptimalFit && Entry.Time != 0) {
+      const uptr FragmentedBytes =
+          roundDown(EntryHeaderPos, PageSize) - Entry.CommitBase;
+      const uptr MaxUnusedCacheBytes = MaxUnusedCachePages * PageSize;
+      if (FragmentedBytes > MaxUnusedCacheBytes) {
+        const uptr MaxCacheBytesToRelease =
+            CachedBlock::MaxCachePagesToRelease * PageSize;
+        uptr BytesToRelease =
+            roundUp(Min<uptr>(MaxCacheBytesToRelease,
+                              FragmentedBytes - MaxUnusedCacheBytes),
+                    PageSize);
+        Entry.MemMap.releaseAndZeroPagesToOS(Entry.CommitBase, BytesToRelease);
+      }
+    }
+
     return Entry;
   }
 
@@ -659,8 +712,20 @@ MapAllocator<Config>::tryAllocateFromCache(const Options &Options, uptr Size,
                                            FillContentsMode FillContents) {
   CachedBlock Entry;
   uptr EntryHeaderPos;
+  uptr MaxAllowedFragmentedBytes;
+  const uptr PageSize = getPageSizeCached();
+
+  if (LIKELY(!useMemoryTagging<Config>(Options))) {
+    const uptr MaxUnusedCacheBytes = MaxUnusedCachePages * PageSize;
+    const uptr MaxCacheBytesToRelease =
+        CachedBlock::MaxCachePagesToRelease * PageSize;
+    MaxAllowedFragmentedBytes = MaxUnusedCacheBytes + MaxCacheBytesToRelease;
+  } else {
+    MaxAllowedFragmentedBytes = MaxUnusedCachePages * PageSize;
+  }
 
-  Entry = Cache.retrieve(Size, Alignment, getHeadersSize(), EntryHeaderPos);
+  Entry = Cache.retrieve(MaxAllowedFragmentedBytes, Size, Alignment,
+                         getHeadersSize(), EntryHeaderPos);
   if (!Entry.isValid())
     return nullptr;
 
diff --git a/compiler-rt/lib/scudo/standalone/tests/secondary_test.cpp b/compiler-rt/lib/scudo/standalone/tests/secondary_test.cpp
index e85b6abdb36d22..1f4cab26550cce 100644
--- a/compiler-rt/lib/scudo/standalone/tests/secondary_test.cpp
+++ b/compiler-rt/lib/scudo/standalone/tests/secondary_test.cpp
@@ -281,8 +281,8 @@ struct MapAllocatorCacheTest : public Test {
   std::unique_ptr<CacheT> Cache = std::make_unique<CacheT>();
 
   const scudo::uptr PageSize = scudo::getPageSizeCached();
-  // The current test allocation size is set to the minimum size
-  // needed for the scudo allocator to fall back to the secondary allocator
+  // The current test allocation size is set to the maximum
+  // cache entry size
   static constexpr scudo::uptr TestAllocSize =
       CacheConfig::getDefaultMaxEntrySize();
 
@@ -327,7 +327,8 @@ TEST_F(MapAllocatorCacheTest, CacheOrder) {
   for (scudo::uptr I = CacheConfig::getEntriesArraySize(); I > 0; I--) {
     scudo::uptr EntryHeaderPos;
     scudo::CachedBlock Entry =
-        Cache->retrieve(TestAllocSize, PageSize, 0, EntryHeaderPos);
+        Cache->retrieve(scudo::MaxUnusedCachePages * PageSize, TestAllocSize,
+                        PageSize, 0, EntryHeaderPos);
     EXPECT_EQ(Entry.MemMap.getBase(), MemMaps[I - 1].getBase());
   }
 
@@ -336,6 +337,34 @@ TEST_F(MapAllocatorCacheTest, CacheOrder) {
     MemMap.unmap();
 }
 
+TEST_F(MapAllocatorCacheTest, PartialChunkHeuristicRetrievalTest) {
+  const scudo::uptr MaxUnusedCacheBytes = PageSize;
+  const scudo::uptr MaxCacheBytesToRelease =
+      scudo::CachedBlock::MaxCachePagesToRelease * PageSize;
+  const scudo::uptr FragmentedBytes =
+      MaxUnusedCacheBytes + MaxCacheBytesToRelease;
+
+  scudo::uptr EntryHeaderPos;
+  scudo::CachedBlock Entry;
+  scudo::MemMapT MemMap = allocate(PageSize + FragmentedBytes);
+  Cache->store(Options, MemMap.getBase(), MemMap.getCapacity(),
+               MemMap.getBase(), MemMap);
+
+  // FragmentedBytes > MaxAllowedFragmentedBytes so PageSize
+  // cannot be retrieved from the cache
+  Entry = Cache->retrieve(/*MaxAllowedFragmentedBytes=*/0, PageSize, PageSize,
+                          0, EntryHeaderPos);
+  EXPECT_FALSE(Entry.isValid());
+
+  // FragmentedBytes == MaxAllowedFragmentedBytes so PageSize
+  // can be retrieved from the cache
+  Entry =
+      Cache->retrieve(FragmentedBytes, PageSize, PageSize, 0, EntryHeaderPos);
+  EXPECT_TRUE(Entry.isValid());
+
+  MemMap.unmap();
+}
+
 TEST_F(MapAllocatorCacheTest, MemoryLeakTest) {
   std::vector<scudo::MemMapT> MemMaps;
   // Fill the cache above MaxEntriesCount to force an eviction
@@ -351,7 +380,8 @@ TEST_F(MapAllocatorCacheTest, MemoryLeakTest) {
   for (scudo::uptr I = CacheConfig::getDefaultMaxEntriesCount(); I > 0; I--) {
     scudo::uptr EntryHeaderPos;
     RetrievedEntries.push_back(
-        Cache->retrieve(TestAllocSize, PageSize, 0, EntryHeaderPos));
+        Cache->retrieve(scudo::MaxUnusedCachePages * PageSize, TestAllocSize,
+                        PageSize, 0, EntryHeaderPos));
     EXPECT_EQ(MemMaps[I].getBase(), RetrievedEntries.back().MemMap.getBase());
   }

compiler-rt/lib/scudo/standalone/secondary.h

cferris1000

One small nit.

compiler-rt/lib/scudo/standalone/secondary.h

cferris1000

LGTM.

ChiaHungDuan · 2024-08-20T23:06:27Z

One minor comment, let's change the MaxUnusedCachePages to its new name in the description

compiler-rt/lib/scudo/standalone/secondary.h

Previously the secondary cache retrieval algorithm would not allow retrievals of memory chunks where the number of unused bytes would be greater than than `MaxUnusedCachePages * PageSize` bytes. This meant that even if a memory chunk satisfied the requirements of the optimal fit algorithm, it may not be returned. This remains true if memory tagging is enabled. However, if memory tagging is disabled, a new heuristic has been put in place. Specifically, If a memory chunk is a non-optimal fit, the cache retrieval algorithm will attempt to release the excess memory to force a cache hit while keeping RSS down. In the event that a memory chunk is a non-optimal fit, the retrieval algorithm will release excess memory as long as the amount of memory to be released is less than or equal to 4 Pages. If the amount of memory to be released exceeds 4 Pages, the retrieval algorithm will not consider that cached memory chunk valid for retrieval.

ChiaHungDuan · 2024-08-27T21:10:14Z

@JoshuaMBa, I think we are ready to change the MaxReleasedCachePages back to 4, would you like to prepare the pull request?

JoshuaMBa · 2024-08-28T23:06:42Z

@JoshuaMBa, I think we are ready to change the MaxReleasedCachePages back to 4, would you like to prepare the pull request?

Sure, I have opened a pull request for this at #106466.

`MaxReleasedCachePages` has been set to 4. Initially, in #105009 , we set `MaxReleasedCachePages` to 0 so that the partial chunk heuristic could be introduced incrementally as we observed its impact on retrieval order and more generally, performance. Co-authored-by: Joshua Baehring <[email protected]>

llvmbot added compiler-rt compiler-rt:scudo Scudo Hardened Allocator compiler-rt:sanitizer labels Aug 20, 2024

ChiaHungDuan requested review from ChiaHungDuan and cferris1000 August 20, 2024 17:43

ChiaHungDuan reviewed Aug 20, 2024

View reviewed changes

compiler-rt/lib/scudo/standalone/secondary.h Outdated Show resolved Hide resolved

compiler-rt/lib/scudo/standalone/secondary.h Outdated Show resolved Hide resolved

compiler-rt/lib/scudo/standalone/secondary.h Outdated Show resolved Hide resolved

JoshuaMBa force-pushed the partial_chunk_heuristic branch from 665d3a0 to 9ed7b15 Compare August 20, 2024 20:29

cferris1000 reviewed Aug 20, 2024

View reviewed changes

compiler-rt/lib/scudo/standalone/secondary.h Show resolved Hide resolved

JoshuaMBa force-pushed the partial_chunk_heuristic branch from 9ed7b15 to e15b639 Compare August 20, 2024 21:59

cferris1000 approved these changes Aug 20, 2024

View reviewed changes

ChiaHungDuan approved these changes Aug 20, 2024

View reviewed changes

JoshuaMBa force-pushed the partial_chunk_heuristic branch from e15b639 to c3a2be9 Compare August 22, 2024 02:44

JoshuaMBa commented Aug 22, 2024

View reviewed changes

compiler-rt/lib/scudo/standalone/secondary.h Outdated Show resolved Hide resolved

JoshuaMBa force-pushed the partial_chunk_heuristic branch from 3abd248 to 043de74 Compare August 23, 2024 01:37

JoshuaMBa commented Aug 23, 2024

View reviewed changes

compiler-rt/lib/scudo/standalone/secondary.h Show resolved Hide resolved

cferris1000 reviewed Aug 23, 2024

View reviewed changes

compiler-rt/lib/scudo/standalone/secondary.h Outdated Show resolved Hide resolved

JoshuaMBa force-pushed the partial_chunk_heuristic branch from 043de74 to 7604ca6 Compare August 23, 2024 17:44

JoshuaMBa force-pushed the partial_chunk_heuristic branch from 7604ca6 to 07dd35d Compare August 23, 2024 17:56

ChiaHungDuan merged commit 643bf6c into llvm:main Aug 26, 2024
7 checks passed

JoshuaMBa deleted the partial_chunk_heuristic branch August 26, 2024 18:17

JoshuaMBa mentioned this pull request Aug 28, 2024

[scudo] Update secondary cache released pages bound. #106466

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[scudo] Add partial chunk heuristic to retrieval algorithm. #105009

[scudo] Add partial chunk heuristic to retrieval algorithm. #105009

Uh oh!

JoshuaMBa commented Aug 20, 2024 •

edited

Loading

Uh oh!

llvmbot commented Aug 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cferris1000 left a comment

Uh oh!

Uh oh!

cferris1000 left a comment

Uh oh!

ChiaHungDuan commented Aug 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChiaHungDuan commented Aug 27, 2024

Uh oh!

JoshuaMBa commented Aug 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

[scudo] Add partial chunk heuristic to retrieval algorithm. #105009

[scudo] Add partial chunk heuristic to retrieval algorithm. #105009

Uh oh!

Conversation

JoshuaMBa commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cferris1000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cferris1000 left a comment

Choose a reason for hiding this comment

Uh oh!

ChiaHungDuan commented Aug 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChiaHungDuan commented Aug 27, 2024

Uh oh!

JoshuaMBa commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

JoshuaMBa commented Aug 20, 2024 •

edited

Loading

JoshuaMBa commented Aug 28, 2024 •

edited

Loading