[lld-macho] Reduce memory usage of printing thunks in map file #122785

alx32 · 2025-01-13T20:32:37Z

This commit improves the memory efficiency of the lld-macho linker by optimizing how thunks are printed in the map file. Previously, merging vectors of input sections required creating a temporary vector, which increased memory usage and in some cases caused the linker to run out of memory as reported in comments on #120496. The new approach interleaves the printing of two arrays of ConcatInputSection in sorted order without allocating additional memory for a merged array.

llvmbot · 2025-01-13T20:34:02Z

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: None (alx32)

Changes

This commit improves the memory efficiency of the lld-macho linker by optimizing how thunks are printed in the map file. Previously, merging vectors of input sections required creating a temporary vector, which increased memory usage and in some cases caused the linker to run out of memory. The new approach interleaves the printing of two arrays of ConcatInputSection in sorted order without allocating additional memory for a merged array.

Full diff: https://github.com/llvm/llvm-project/pull/122785.diff

2 Files Affected:

(modified) lld/MachO/ConcatOutputSection.h (+1-1)
(modified) lld/MachO/MapFile.cpp (+33-22)

diff --git a/lld/MachO/ConcatOutputSection.h b/lld/MachO/ConcatOutputSection.h
index 8131c48d31113e..c2449e82b73ad1 100644
--- a/lld/MachO/ConcatOutputSection.h
+++ b/lld/MachO/ConcatOutputSection.h
@@ -72,7 +72,7 @@ class TextOutputSection : public ConcatOutputSection {
   void finalizeContents() override {}
   void finalize() override;
   bool needsThunks() const;
-  ArrayRef<ConcatInputSection *> getThunks() const { return thunks; }
+  const std::vector<ConcatInputSection *> &getThunks() const { return thunks; }
   void writeTo(uint8_t *buf) const override;
 
   static bool classof(const OutputSection *sec) {
diff --git a/lld/MachO/MapFile.cpp b/lld/MachO/MapFile.cpp
index 12417df8cecb8c..2aa91af9cc7112 100644
--- a/lld/MachO/MapFile.cpp
+++ b/lld/MachO/MapFile.cpp
@@ -161,20 +161,6 @@ static uint64_t getSymSizeForMap(Defined *sym) {
   return sym->size;
 }
 
-// Merges two vectors of input sections in order of their outSecOff values.
-// This approach creates a new (temporary) vector which is not ideal but the
-// ideal approach leads to a lot of code duplication.
-static std::vector<ConcatInputSection *>
-mergeOrderedInputs(ArrayRef<ConcatInputSection *> inputs1,
-                   ArrayRef<ConcatInputSection *> inputs2) {
-  std::vector<ConcatInputSection *> vec(inputs1.size() + inputs2.size());
-  std::merge(inputs1.begin(), inputs1.end(), inputs2.begin(), inputs2.end(),
-             vec.begin(), [](ConcatInputSection *a, ConcatInputSection *b) {
-               return a->outSecOff < b->outSecOff;
-             });
-  return vec;
-}
-
 void macho::writeMapFile() {
   if (config->mapFile.empty())
     return;
@@ -217,15 +203,42 @@ void macho::writeMapFile() {
                    seg->name.str().c_str(), osec->name.str().c_str());
     }
 
-  // Shared function to print an array of symbols.
-  auto printIsecArrSyms = [&](const std::vector<ConcatInputSection *> &arr) {
-    for (const ConcatInputSection *isec : arr) {
+  // Shared function to print one or two arrays of ConcatInputSection in
+  // ascending outSecOff order. The second array is optional; if provided, we
+  // interleave the printing in sorted order without allocating a merged temp
+  // array.
+  auto printIsecArrSyms = [&](const std::vector<ConcatInputSection *> &arr1,
+                              const std::vector<ConcatInputSection *> *arr2 =
+                                  nullptr) {
+    // Helper lambda that prints all symbols from one ConcatInputSection.
+    auto printOne = [&](const ConcatInputSection *isec) {
       for (Defined *sym : isec->symbols) {
-        if (!(isPrivateLabel(sym->getName()) && getSymSizeForMap(sym) == 0))
+        if (!(isPrivateLabel(sym->getName()) && getSymSizeForMap(sym) == 0)) {
           os << format("0x%08llX\t0x%08llX\t[%3u] %s\n", sym->getVA(),
                        getSymSizeForMap(sym),
-                       readerToFileOrdinal[sym->getFile()],
+                       readerToFileOrdinal.lookup(sym->getFile()),
                        sym->getName().str().data());
+        }
+      }
+    };
+
+    // If there is only one array, print all symbols from it and return.
+    // This simplifies the logic for the merge case below.
+    if (!arr2) {
+      for (const ConcatInputSection *isec : arr1)
+        printOne(isec);
+      return;
+    }
+
+    size_t i = 0, j = 0;
+    size_t size1 = arr1.size();
+    size_t size2 = arr2->size();
+    while (i < size1 || j < size2) {
+      if (i < size1 &&
+          (j >= size2 || arr1[i]->outSecOff <= (*arr2)[j]->outSecOff)) {
+        printOne(arr1[i++]);
+      } else if (j < size2) {
+        printOne((*arr2)[j++]);
       }
     }
   };
@@ -235,9 +248,7 @@ void macho::writeMapFile() {
   for (const OutputSegment *seg : outputSegments) {
     for (const OutputSection *osec : seg->getSections()) {
       if (auto *textOsec = dyn_cast<TextOutputSection>(osec)) {
-        auto inputsAndThunks =
-            mergeOrderedInputs(textOsec->inputs, textOsec->getThunks());
-        printIsecArrSyms(inputsAndThunks);
+        printIsecArrSyms(textOsec->inputs, &textOsec->getThunks());
       } else if (auto *concatOsec = dyn_cast<ConcatOutputSection>(osec)) {
         printIsecArrSyms(concatOsec->inputs);
       } else if (osec == in.cStringSection || osec == in.objcMethnameSection) {

tstellar · 2025-01-13T21:34:05Z

I don't see any changes in memory usage with this PR. Were you able to see a difference when running: /usr/bin/time -v ./lld-build/bin/llvm-lit -v lld/test/MachO/arm64-thunks.s 2>&1| grep -e "Testing Time" -e "Maximum resident set size"

alx32 · 2025-01-13T21:39:39Z

I ran those commands from your previous comment and didn't see memory changes and also got results very different than your previous results - so assumed that it's platform / Linux distro specific and that this feature doesn't work for my setup.

This does eliminate the extra temporary array so there is less memory being used for sure.
But perhaps the actual extra memory is coming from a different place. I'll look into more ...

tstellar · 2025-01-13T21:46:27Z

These are the CMake arguments I used to build LLVM if it helps: cmake -G Ninja -B lld-build -S llvm/ -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;AArch64" -DLLVM_ENABLE_PROJECTS=lld

alx32 · 2025-01-14T00:26:48Z

Latest commit should address the other (main) reason for high mem usage - the __cstring section being composed of lots of `\0' characters - which lead to each char being interpreted as its own string.

lld/MachO/MapFile.cpp

DataCorrupted · 2025-01-14T00:42:12Z

lld/MachO/MapFile.cpp

+    size_t size2 = arr2->size();
+    while (i < size1 || j < size2) {
+      if (i < size1 &&
+          (j >= size2 || arr1[i]->outSecOff <= (*arr2)[j]->outSecOff)) {


I agree with the general logic, but (*arr2)[j] is not so please to look at, why not ArrayRef? IIRC you can have ArrayRef pointing to nullptrs and it would be prettier.

Nice, changing it to ArrayRef simplified the code / logic a lot.

tstellar · 2025-01-14T01:26:47Z

@alx32 Thank you! This is a massive improvement compared even to before your original change:

/usr/bin/time -v ./lld-build/bin/llvm-lit -v lld/test/MachO/arm64-thunks.s 2>&1| grep -e "Testing Time" -e "Maximum resident set size"

Before 162814a:

Testing Time: 4.56s
        Maximum resident set size (kbytes): 1666472

After 162814a:

Testing Time: 17.61s
        Maximum resident set size (kbytes): 2984488

This PR:

Testing Time: 1.92s
        Maximum resident set size (kbytes): 617964`

lld/MachO/MapFile.cpp

llvm-ci · 2025-01-16T07:08:29Z

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building lld at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/11172

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentDelayedCrashWithBreakpointSignal.py (617 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentDelayedCrashWithBreakpointWatchpoint.py (618 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyBreakpoints.py (619 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalBreak.py (620 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentNWatchNBreak.py (621 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalDelayBreak.py (622 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyCrash.py (623 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManySignals.py (624 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyWatchpoints.py (625 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatch.py (626 of 2066)
FAIL: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py (627 of 2066)
******************** TEST 'lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/thread/concurrent_events -p TestConcurrentSignalWatchBreak.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision 95d21f6015241f1fbf36e495f101080bdcee8cd4)
  clang revision 95d21f6015241f1fbf36e495f101080bdcee8cd4
  llvm revision 95d21f6015241f1fbf36e495f101080bdcee8cd4
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

Watchpoint 1 hit:
old value: 0
new value: 1

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak)
======================================================================
FAIL: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak)
   Test a signal/watchpoint/breakpoint in multiple threads.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 148, in wrapper
    return func(*args, **kwargs)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py", line 15, in test
    self.do_thread_actions(
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/concurrent_base.py", line 333, in do_thread_actions
    self.assertEqual(
AssertionError: 1 != 2 : Expected 1 stops due to signal delivery, but got 2
Config=aarch64-/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 0.663s

alx32 requested review from tstellar, ellishg, DataCorrupted and kyulee-com January 13, 2025 20:33

alx32 marked this pull request as ready for review January 13, 2025 20:33

llvmbot added lld lld:MachO labels Jan 13, 2025

alx32 mentioned this pull request Jan 13, 2025

[lld-macho] Include branch extension thunks in linker map #120496

Merged

DataCorrupted reviewed Jan 14, 2025

View reviewed changes

alx32 and others added 3 commits January 13, 2025 16:47

[lld-macho] Reduce memory usage of prining thunks in map file

c6f0916

Fix high memory usage due to strings

b24f4fa

Address feedback Nr.1

8f4b2cd

alx32 force-pushed the 14_map_thunks_mem branch from adb4e0d to 8f4b2cd Compare January 14, 2025 01:04

ellishg reviewed Jan 14, 2025

View reviewed changes

lld/MachO/MapFile.cpp Outdated Show resolved Hide resolved

Address Feedback Nr.2

9ffe3e2

kyulee-com approved these changes Jan 16, 2025

View reviewed changes

alx32 merged commit 95d21f6 into llvm:main Jan 16, 2025
8 checks passed

nikic mentioned this pull request Jan 27, 2025

Snapshot for 20250111, v20.0.0, 9248428 (big-merge) fedora-llvm-team/llvm-snapshots#1005

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lld-macho] Reduce memory usage of printing thunks in map file #122785

[lld-macho] Reduce memory usage of printing thunks in map file #122785

Uh oh!

alx32 commented Jan 13, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 13, 2025 •

edited

Loading

Uh oh!

tstellar commented Jan 13, 2025 •

edited

Loading

Uh oh!

alx32 commented Jan 13, 2025 •

edited

Loading

Uh oh!

tstellar commented Jan 13, 2025

Uh oh!

alx32 commented Jan 14, 2025

Uh oh!

Uh oh!

Uh oh!

DataCorrupted Jan 14, 2025

Uh oh!

alx32 Jan 14, 2025

Uh oh!

tstellar commented Jan 14, 2025

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jan 16, 2025

Uh oh!

Uh oh!

[lld-macho] Reduce memory usage of printing thunks in map file #122785

[lld-macho] Reduce memory usage of printing thunks in map file #122785

Uh oh!

Conversation

alx32 commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tstellar commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alx32 commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tstellar commented Jan 13, 2025

Uh oh!

alx32 commented Jan 14, 2025

Uh oh!

Uh oh!

Uh oh!

DataCorrupted Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

alx32 Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

tstellar commented Jan 14, 2025

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jan 16, 2025

Uh oh!

Uh oh!

alx32 commented Jan 13, 2025 •

edited

Loading

llvmbot commented Jan 13, 2025 •

edited

Loading

tstellar commented Jan 13, 2025 •

edited

Loading

alx32 commented Jan 13, 2025 •

edited

Loading