Skip to content

[lld-macho] Reduce memory usage of printing thunks in map file #122785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 16, 2025

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented Jan 13, 2025

This commit improves the memory efficiency of the lld-macho linker by optimizing how thunks are printed in the map file. Previously, merging vectors of input sections required creating a temporary vector, which increased memory usage and in some cases caused the linker to run out of memory as reported in comments on #120496. The new approach interleaves the printing of two arrays of ConcatInputSection in sorted order without allocating additional memory for a merged array.

@alx32 alx32 marked this pull request as ready for review January 13, 2025 20:33
@llvmbot
Copy link
Member

llvmbot commented Jan 13, 2025

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: None (alx32)

Changes

This commit improves the memory efficiency of the lld-macho linker by optimizing how thunks are printed in the map file. Previously, merging vectors of input sections required creating a temporary vector, which increased memory usage and in some cases caused the linker to run out of memory. The new approach interleaves the printing of two arrays of ConcatInputSection in sorted order without allocating additional memory for a merged array.


Full diff: https://github.com/llvm/llvm-project/pull/122785.diff

2 Files Affected:

  • (modified) lld/MachO/ConcatOutputSection.h (+1-1)
  • (modified) lld/MachO/MapFile.cpp (+33-22)
diff --git a/lld/MachO/ConcatOutputSection.h b/lld/MachO/ConcatOutputSection.h
index 8131c48d31113e..c2449e82b73ad1 100644
--- a/lld/MachO/ConcatOutputSection.h
+++ b/lld/MachO/ConcatOutputSection.h
@@ -72,7 +72,7 @@ class TextOutputSection : public ConcatOutputSection {
   void finalizeContents() override {}
   void finalize() override;
   bool needsThunks() const;
-  ArrayRef<ConcatInputSection *> getThunks() const { return thunks; }
+  const std::vector<ConcatInputSection *> &getThunks() const { return thunks; }
   void writeTo(uint8_t *buf) const override;
 
   static bool classof(const OutputSection *sec) {
diff --git a/lld/MachO/MapFile.cpp b/lld/MachO/MapFile.cpp
index 12417df8cecb8c..2aa91af9cc7112 100644
--- a/lld/MachO/MapFile.cpp
+++ b/lld/MachO/MapFile.cpp
@@ -161,20 +161,6 @@ static uint64_t getSymSizeForMap(Defined *sym) {
   return sym->size;
 }
 
-// Merges two vectors of input sections in order of their outSecOff values.
-// This approach creates a new (temporary) vector which is not ideal but the
-// ideal approach leads to a lot of code duplication.
-static std::vector<ConcatInputSection *>
-mergeOrderedInputs(ArrayRef<ConcatInputSection *> inputs1,
-                   ArrayRef<ConcatInputSection *> inputs2) {
-  std::vector<ConcatInputSection *> vec(inputs1.size() + inputs2.size());
-  std::merge(inputs1.begin(), inputs1.end(), inputs2.begin(), inputs2.end(),
-             vec.begin(), [](ConcatInputSection *a, ConcatInputSection *b) {
-               return a->outSecOff < b->outSecOff;
-             });
-  return vec;
-}
-
 void macho::writeMapFile() {
   if (config->mapFile.empty())
     return;
@@ -217,15 +203,42 @@ void macho::writeMapFile() {
                    seg->name.str().c_str(), osec->name.str().c_str());
     }
 
-  // Shared function to print an array of symbols.
-  auto printIsecArrSyms = [&](const std::vector<ConcatInputSection *> &arr) {
-    for (const ConcatInputSection *isec : arr) {
+  // Shared function to print one or two arrays of ConcatInputSection in
+  // ascending outSecOff order. The second array is optional; if provided, we
+  // interleave the printing in sorted order without allocating a merged temp
+  // array.
+  auto printIsecArrSyms = [&](const std::vector<ConcatInputSection *> &arr1,
+                              const std::vector<ConcatInputSection *> *arr2 =
+                                  nullptr) {
+    // Helper lambda that prints all symbols from one ConcatInputSection.
+    auto printOne = [&](const ConcatInputSection *isec) {
       for (Defined *sym : isec->symbols) {
-        if (!(isPrivateLabel(sym->getName()) && getSymSizeForMap(sym) == 0))
+        if (!(isPrivateLabel(sym->getName()) && getSymSizeForMap(sym) == 0)) {
           os << format("0x%08llX\t0x%08llX\t[%3u] %s\n", sym->getVA(),
                        getSymSizeForMap(sym),
-                       readerToFileOrdinal[sym->getFile()],
+                       readerToFileOrdinal.lookup(sym->getFile()),
                        sym->getName().str().data());
+        }
+      }
+    };
+
+    // If there is only one array, print all symbols from it and return.
+    // This simplifies the logic for the merge case below.
+    if (!arr2) {
+      for (const ConcatInputSection *isec : arr1)
+        printOne(isec);
+      return;
+    }
+
+    size_t i = 0, j = 0;
+    size_t size1 = arr1.size();
+    size_t size2 = arr2->size();
+    while (i < size1 || j < size2) {
+      if (i < size1 &&
+          (j >= size2 || arr1[i]->outSecOff <= (*arr2)[j]->outSecOff)) {
+        printOne(arr1[i++]);
+      } else if (j < size2) {
+        printOne((*arr2)[j++]);
       }
     }
   };
@@ -235,9 +248,7 @@ void macho::writeMapFile() {
   for (const OutputSegment *seg : outputSegments) {
     for (const OutputSection *osec : seg->getSections()) {
       if (auto *textOsec = dyn_cast<TextOutputSection>(osec)) {
-        auto inputsAndThunks =
-            mergeOrderedInputs(textOsec->inputs, textOsec->getThunks());
-        printIsecArrSyms(inputsAndThunks);
+        printIsecArrSyms(textOsec->inputs, &textOsec->getThunks());
       } else if (auto *concatOsec = dyn_cast<ConcatOutputSection>(osec)) {
         printIsecArrSyms(concatOsec->inputs);
       } else if (osec == in.cStringSection || osec == in.objcMethnameSection) {

@tstellar
Copy link
Collaborator

tstellar commented Jan 13, 2025

I don't see any changes in memory usage with this PR. Were you able to see a difference when running: /usr/bin/time -v ./lld-build/bin/llvm-lit -v lld/test/MachO/arm64-thunks.s 2>&1| grep -e "Testing Time" -e "Maximum resident set size"

@alx32
Copy link
Contributor Author

alx32 commented Jan 13, 2025

I ran those commands from your previous comment and didn't see memory changes and also got results very different than your previous results - so assumed that it's platform / Linux distro specific and that this feature doesn't work for my setup.

This does eliminate the extra temporary array so there is less memory being used for sure.
But perhaps the actual extra memory is coming from a different place. I'll look into more ...

@tstellar
Copy link
Collaborator

These are the CMake arguments I used to build LLVM if it helps: cmake -G Ninja -B lld-build -S llvm/ -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;AArch64" -DLLVM_ENABLE_PROJECTS=lld

@alx32
Copy link
Contributor Author

alx32 commented Jan 14, 2025

Latest commit should address the other (main) reason for high mem usage - the __cstring section being composed of lots of `\0' characters - which lead to each char being interpreted as its own string.

size_t size2 = arr2->size();
while (i < size1 || j < size2) {
if (i < size1 &&
(j >= size2 || arr1[i]->outSecOff <= (*arr2)[j]->outSecOff)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the general logic, but (*arr2)[j] is not so please to look at, why not ArrayRef? IIRC you can have ArrayRef pointing to nullptrs and it would be prettier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, changing it to ArrayRef simplified the code / logic a lot.

@alx32 alx32 force-pushed the 14_map_thunks_mem branch from adb4e0d to 8f4b2cd Compare January 14, 2025 01:04
@tstellar
Copy link
Collaborator

@alx32 Thank you! This is a massive improvement compared even to before your original change:

/usr/bin/time -v ./lld-build/bin/llvm-lit -v lld/test/MachO/arm64-thunks.s 2>&1| grep -e "Testing Time" -e "Maximum resident set size"

Before 162814a:

Testing Time: 4.56s
        Maximum resident set size (kbytes): 1666472

After 162814a:

Testing Time: 17.61s
        Maximum resident set size (kbytes): 2984488

This PR:

Testing Time: 1.92s
        Maximum resident set size (kbytes): 617964`

@alx32 alx32 merged commit 95d21f6 into llvm:main Jan 16, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 16, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building lld at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/11172

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentDelayedCrashWithBreakpointSignal.py (617 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentDelayedCrashWithBreakpointWatchpoint.py (618 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyBreakpoints.py (619 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalBreak.py (620 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentNWatchNBreak.py (621 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalDelayBreak.py (622 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyCrash.py (623 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManySignals.py (624 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyWatchpoints.py (625 of 2066)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatch.py (626 of 2066)
FAIL: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py (627 of 2066)
******************** TEST 'lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/thread/concurrent_events -p TestConcurrentSignalWatchBreak.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision 95d21f6015241f1fbf36e495f101080bdcee8cd4)
  clang revision 95d21f6015241f1fbf36e495f101080bdcee8cd4
  llvm revision 95d21f6015241f1fbf36e495f101080bdcee8cd4
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

Watchpoint 1 hit:
old value: 0
new value: 1

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak)
======================================================================
FAIL: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak)
   Test a signal/watchpoint/breakpoint in multiple threads.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 148, in wrapper
    return func(*args, **kwargs)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py", line 15, in test
    self.do_thread_actions(
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/concurrent_base.py", line 333, in do_thread_actions
    self.assertEqual(
AssertionError: 1 != 2 : Expected 1 stops due to signal delivery, but got 2
Config=aarch64-/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 0.663s


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants