[lld-macho] Use parallel algorithms in favor of `ThreadPool` #99471

BertalanD · 2024-07-18T11:25:37Z

In https://reviews.llvm.org/D115416, it was decided that an explicit thread pool should be used instead of the simpler fork-join model of the parallelFor* family of functions. Since then, more parallelism has been added to LLD, but these changes always used the latter strategy, similarly to other ports of LLD.

This meant that we ended up spawning twice the requested amount of threads; one set for the llvm/Support/Parallel.h executor, and one for the thread pool.

Since that decision, 3b4d800 has landed, which allows us to explicitly enqueue jobs on the executor pool of the parallel algorithms, which should be enough to achieve sharded output writing and parallelized input file parsing. Now only the construction of the map file is left that should be done concurrently with different linking steps, this commit proposes explicitly spawning a dedicated worker thread for it.

In https://reviews.llvm.org/D115416, it was decided that an explicit thread pool should be used instead of the simpler fork-join model of the `parallelFor*` family of functions. Since then, more parallelism has been added to LLD, but these changes always used the latter strategy, similarly to other ports of LLD. This meant that we ended up spawning twice the requested amount of threads; one set for the `llvm/Support/Parallel.h` executor, and one for the thread pool. Since that decision, 3b4d800 has landed, which allows us to explicitly enqueue jobs on the executor pool of the parallel algorithms, which should be enough to achieve sharded output writing and parallelized input file parsing. Now only the construction of the map file is left that should be done *concurrently* with different linking steps, this commit proposes explicitly spawning a dedicated worker thread for it.

llvmbot · 2024-07-18T11:26:09Z

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-macho

Author: Daniel Bertalan (BertalanD)

Changes

In https://reviews.llvm.org/D115416, it was decided that an explicit thread pool should be used instead of the simpler fork-join model of the parallelFor* family of functions. Since then, more parallelism has been added to LLD, but these changes always used the latter strategy, similarly to other ports of LLD.

This meant that we ended up spawning twice the requested amount of threads; one set for the llvm/Support/Parallel.h executor, and one for the thread pool.

Since that decision, 3b4d800 has landed, which allows us to explicitly enqueue jobs on the executor pool of the parallel algorithms, which should be enough to achieve sharded output writing and parallelized input file parsing. Now only the construction of the map file is left that should be done concurrently with different linking steps, this commit proposes explicitly spawning a dedicated worker thread for it.

Full diff: https://github.com/llvm/llvm-project/pull/99471.diff

1 Files Affected:

(modified) lld/MachO/Writer.cpp (+17-18)

diff --git a/lld/MachO/Writer.cpp b/lld/MachO/Writer.cpp
index e6b80c1d42d9e..0eb809282af28 100644
--- a/lld/MachO/Writer.cpp
+++ b/lld/MachO/Writer.cpp
@@ -28,8 +28,8 @@
 #include "llvm/Support/LEB128.h"
 #include "llvm/Support/Parallel.h"
 #include "llvm/Support/Path.h"
-#include "llvm/Support/ThreadPool.h"
 #include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/thread.h"
 #include "llvm/Support/xxhash.h"
 
 #include <algorithm>
@@ -66,7 +66,6 @@ class Writer {
 
   template <class LP> void run();
 
-  DefaultThreadPool threadPool;
   std::unique_ptr<FileOutputBuffer> &buffer;
   uint64_t addr = 0;
   uint64_t fileOff = 0;
@@ -1121,14 +1120,12 @@ void Writer::finalizeLinkEditSegment() {
       symtabSection,     indirectSymtabSection,
       dataInCodeSection, functionStartsSection,
   };
-  SmallVector<std::shared_future<void>> threadFutures;
-  threadFutures.reserve(linkEditSections.size());
-  for (LinkEditSection *osec : linkEditSections)
-    if (osec)
-      threadFutures.emplace_back(threadPool.async(
-          [](LinkEditSection *osec) { osec->finalizeContents(); }, osec));
-  for (std::shared_future<void> &future : threadFutures)
-    future.wait();
+
+  parallelForEach(linkEditSections.begin(), linkEditSections.end(),
+                  [](LinkEditSection *osec) {
+                    if (osec)
+                      osec->finalizeContents();
+                  });
 
   // Now that __LINKEDIT is filled out, do a proper calculation of its
   // addresses and offsets.
@@ -1170,6 +1167,8 @@ void Writer::openFile() {
 }
 
 void Writer::writeSections() {
+  TimeTraceScope timeScope("Write output sections");
+
   uint8_t *buf = buffer->getBufferStart();
   std::vector<const OutputSection *> osecs;
   for (const OutputSegment *seg : outputSegments)
@@ -1200,18 +1199,15 @@ void Writer::writeUuid() {
 
   ArrayRef<uint8_t> data{buffer->getBufferStart(), buffer->getBufferEnd()};
   std::vector<ArrayRef<uint8_t>> chunks = split(data, 1024 * 1024);
+
   // Leave one slot for filename
   std::vector<uint64_t> hashes(chunks.size() + 1);
-  SmallVector<std::shared_future<void>> threadFutures;
-  threadFutures.reserve(chunks.size());
-  for (size_t i = 0; i < chunks.size(); ++i)
-    threadFutures.emplace_back(threadPool.async(
-        [&](size_t j) { hashes[j] = xxh3_64bits(chunks[j]); }, i));
-  for (std::shared_future<void> &future : threadFutures)
-    future.wait();
+  parallelFor(0, chunks.size(),
+              [&](size_t i) { hashes[i] = xxh3_64bits(chunks[i]); });
   // Append the output filename so that identical binaries with different names
   // don't get the same UUID.
   hashes[chunks.size()] = xxh3_64bits(sys::path::filename(config->finalOutput));
+
   uint64_t digest = xxh3_64bits({reinterpret_cast<uint8_t *>(hashes.data()),
                                  hashes.size() * sizeof(uint64_t)});
   uuidCommand->writeUuid(digest);
@@ -1330,15 +1326,18 @@ template <class LP> void Writer::run() {
   sortSegmentsAndSections();
   createLoadCommands<LP>();
   finalizeAddresses();
-  threadPool.async([&] {
+
+  llvm::thread mapFileWriter([&] {
     if (LLVM_ENABLE_THREADS && config->timeTraceEnabled)
       timeTraceProfilerInitialize(config->timeTraceGranularity, "writeMapFile");
     writeMapFile();
     if (LLVM_ENABLE_THREADS && config->timeTraceEnabled)
       timeTraceProfilerFinishThread();
   });
+
   finalizeLinkEditSegment();
   writeOutputFile();
+  mapFileWriter.join();
 }
 
 template <class LP> void macho::writeResult() { Writer().run<LP>(); }

nico

This resolves https://reviews.llvm.org/D137368#3987994, which is great.

I'm a big fan of this patch, thanks!

Summary: In https://reviews.llvm.org/D115416, it was decided that an explicit thread pool should be used instead of the simpler fork-join model of the `parallelFor*` family of functions. Since then, more parallelism has been added to LLD, but these changes always used the latter strategy, similarly to other ports of LLD. This meant that we ended up spawning twice the requested amount of threads; one set for the `llvm/Support/Parallel.h` executor, and one for the thread pool. Since that decision, 3b4d800 has landed, which allows us to explicitly enqueue jobs on the executor pool of the parallel algorithms, which should be enough to achieve sharded output writing and parallelized input file parsing. Now only the construction of the map file is left that should be done *concurrently* with different linking steps, this commit proposes explicitly spawning a dedicated worker thread for it. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251328

BertalanD requested review from thevinster and MaskRay July 18, 2024 11:25

llvmbot added lld lld:MachO labels Jul 18, 2024

BertalanD requested a review from int3 July 18, 2024 11:25

nico approved these changes Jul 18, 2024

View reviewed changes

MaskRay approved these changes Jul 18, 2024

View reviewed changes

BertalanD merged commit f18fd6e into llvm:main Jul 22, 2024
10 checks passed

BertalanD deleted the thread-pool branch July 22, 2024 06:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lld-macho] Use parallel algorithms in favor of `ThreadPool` #99471

[lld-macho] Use parallel algorithms in favor of `ThreadPool` #99471

Uh oh!

BertalanD commented Jul 18, 2024

Uh oh!

llvmbot commented Jul 18, 2024 •

edited

Loading

Uh oh!

nico left a comment

Uh oh!

Uh oh!

Uh oh!

[lld-macho] Use parallel algorithms in favor of ThreadPool #99471

[lld-macho] Use parallel algorithms in favor of ThreadPool #99471

Uh oh!

Conversation

BertalanD commented Jul 18, 2024

Uh oh!

llvmbot commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nico left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[lld-macho] Use parallel algorithms in favor of `ThreadPool` #99471

[lld-macho] Use parallel algorithms in favor of `ThreadPool` #99471

llvmbot commented Jul 18, 2024 •

edited

Loading