Make WriteIndexesThinBackend multi threaded #109847

NuriAmari · 2024-09-24T19:02:55Z

We've noticed that for large builds executing thin-link can take on the order of 10s of minutes. We are only using a single thread to write the sharded indices and import files for each input bitcode file. While we need to ensure the index file produced lists modules in a deterministic order, that doesn't prevent us from executing the rest of the work in parallel.

In this change we use a thread pool to execute as much of the backend's work as possible in parallel. In local testing on a machine with 80 cores, this change makes a thin-link for ~100,000 input files run in ~2 minutes. Without this change it takes upwards of 10 minutes.

teresajohnson · 2024-09-24T19:47:31Z

Cool! I'm interested in reviewing this change when it is ready.

NuriAmari · 2024-09-24T21:34:40Z

Some LLDB test timed out, but I'm relatively certain I haven't caused that

llvmbot · 2024-09-24T21:35:19Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-lld-macho
@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lto

Author: Nuri Amari (NuriAmari)

Changes

We've noticed that for large builds executing thin-link can take on the order of 10s of minutes. We are only using a single thread to write the sharded indices and import files for each input bitcode file. While we need to ensure the index file produced lists modules in a deterministic order, that doesn't prevent us from executing the rest of the work in parallel.

In this change we use a thread pool to execute as much of the backend's work as possible in parallel. In local testing on a machine with 80 cores, this change makes a thin-link for ~100,000 input files run in ~2 minutes. Without this change it takes upwards of 10 minutes.

Full diff: https://github.com/llvm/llvm-project/pull/109847.diff

1 Files Affected:

(modified) llvm/lib/LTO/LTO.cpp (+45-12)

diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index a88124dacfaefd..78084c7aedcd91 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -1395,11 +1395,12 @@ class lto::ThinBackendProc {
       MapVector<StringRef, BitcodeModule> &ModuleMap) = 0;
   virtual Error wait() = 0;
   virtual unsigned getThreadCount() = 0;
+  virtual bool isSensitiveToInputOrder() { return false; }
 
   // Write sharded indices and (optionally) imports to disk
   Error emitFiles(const FunctionImporter::ImportMapTy &ImportList,
                   llvm::StringRef ModulePath,
-                  const std::string &NewModulePath) {
+                  const std::string &NewModulePath) const {
     ModuleToSummariesForIndexTy ModuleToSummariesForIndex;
     GVSummaryPtrSet DeclarationSummaries;
 
@@ -1613,6 +1614,10 @@ namespace {
 class WriteIndexesThinBackend : public ThinBackendProc {
   std::string OldPrefix, NewPrefix, NativeObjectPrefix;
   raw_fd_ostream *LinkedObjectsFile;
+  DefaultThreadPool BackendThreadPool;
+  std::optional<Error> Err;
+  std::mutex ErrMu;
+  std::mutex OnWriteMu;
 
 public:
   WriteIndexesThinBackend(
@@ -1634,8 +1639,6 @@ class WriteIndexesThinBackend : public ThinBackendProc {
       const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
       MapVector<StringRef, BitcodeModule> &ModuleMap) override {
     StringRef ModulePath = BM.getModuleIdentifier();
-    std::string NewModulePath =
-        getThinLTOOutputFile(ModulePath, OldPrefix, NewPrefix);
 
     if (LinkedObjectsFile) {
       std::string ObjectPrefix =
@@ -1645,19 +1648,48 @@ class WriteIndexesThinBackend : public ThinBackendProc {
       *LinkedObjectsFile << LinkedObjectsFilePath << '\n';
     }
 
-    if (auto E = emitFiles(ImportList, ModulePath, NewModulePath))
-      return E;
+    BackendThreadPool.async(
+        [this](const StringRef ModulePath,
+               const FunctionImporter::ImportMapTy &ImportList,
+               const std::string &OldPrefix, const std::string &NewPrefix) {
+          std::string NewModulePath =
+              getThinLTOOutputFile(ModulePath, OldPrefix, NewPrefix);
+          auto E = emitFiles(ImportList, ModulePath, NewModulePath);
+          if (E) {
+            std::unique_lock<std::mutex> L(ErrMu);
+            if (Err)
+              Err = joinErrors(std::move(*Err), std::move(E));
+            else
+              Err = std::move(E);
+            return;
+          }
+          if (OnWrite) {
+            // Serialize calls to the on write callback in case it is not thread
+            // safe
+            std::unique_lock<std::mutex> L(OnWriteMu);
+            OnWrite(std::string(ModulePath));
+          }
+        },
+        ModulePath, ImportList, OldPrefix, NewPrefix);
+    return Error::success();
+  }
 
-    if (OnWrite)
-      OnWrite(std::string(ModulePath));
+  Error wait() override {
+    BackendThreadPool.wait();
+    if (Err)
+      return std::move(*Err);
     return Error::success();
   }
 
-  Error wait() override { return Error::success(); }
+  unsigned getThreadCount() override {
+    return BackendThreadPool.getMaxConcurrency();
+  }
 
-  // WriteIndexesThinBackend should always return 1 to prevent module
-  // re-ordering and avoid non-determinism in the final link.
-  unsigned getThreadCount() override { return 1; }
+  bool isSensitiveToInputOrder() override {
+    // The order which modules are written to LinkedObjectsFile should be
+    // deterministic and match the order they are passed on the command line.
+    return true;
+  }
 };
 } // end anonymous namespace
 
@@ -1856,7 +1888,8 @@ Error LTO::runThinLTO(AddStreamFn AddStream, FileCache Cache,
                               ThinLTO.ModuleMap);
   };
 
-  if (BackendProc->getThreadCount() == 1) {
+  if (BackendProc->getThreadCount() == 1 ||
+      BackendProc->isSensitiveToInputOrder()) {
     // Process the modules in the order they were provided on the command-line.
     // It is important for this codepath to be used for WriteIndexesThinBackend,
     // to ensure the emitted LinkedObjectsFile lists ThinLTO objects in the same

teresajohnson

Thanks for this patch. I have a few suggestions below. Have you measured the compile time if the max concurrency is 1 or a small value - wondering if there is overhead?

llvm/lib/LTO/LTO.cpp

NuriAmari · 2024-09-25T20:45:17Z

Have you measured the compile time if the max concurrency is 1 or a small value - wondering if there is overhead?

I haven't, but I'll give it a try.

NuriAmari · 2024-09-26T22:26:52Z

These measurements were taken with a RelWithDebInfo build of the toolchain, but I think it should serve to illustrate we shouldn't be too worried about overhead.

Without this patch all together I measured:

real    10m38.836s
user    9m57.354s
sys     0m47.409s

With this patch but with the thread pool hard coded to use max concurrency = 1:

real    11m11.859s
user    10m29.944s
sys     0m49.081s

With this patch but with the thread pool hard coded to use max concurrency = 2:

real    6m24.810s
user    10m1.542s
sys     0m48.052s

I imagine the overhead would be reduced with a release build, and we still see benefit from 2 threads onwards.

teresajohnson

I've been meaning to test this in our build system, will try to get to that tomorrow. That's a noticeable enough overhead that I want to be sure some parallelism will kick in and behave properly.

llvm/lib/LTO/LTO.cpp

teresajohnson

Doing some testing with a large thin link and the results look really good! Couple comments below, mostly need to make sure the parallelism is adjustable as it is for in process thinlto.

llvm/lib/LTO/LTO.cpp

We've noticed that for large builds executing thin-link can take on the order of 10s of minutes. We are only using a single thread to write the sharded indices and import files for each input bitcode file. While we need to ensure the index files produced lists modules in a deterministic order, that doesn't prevent us from executing the rest of the work in parallel. In this change we use a thread pool to execute as much of the backend's work as possible in parallel. In local testing on a machine with 80 cores, this change makes a thin-link for ~100,000 input files run in ~2 minutes. Without this change is takes upwards of 10 minutes.

github-actions · 2024-10-07T04:30:38Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lld/MachO/LTO.cpp

teresajohnson

lgtm

NuriAmari · 2024-10-07T15:16:15Z

Thanks for the review @teresajohnson !

llvm#109847 inadvertently introduced compile errors to the gold plugin. This PR fixes the issue.

This seems to have been overlooked in #109847, probably because most bots don't build w/ gold enabled.

NuriAmari requested review from teresajohnson, MaskRay, apolloww and kyulee-com September 24, 2024 20:55

NuriAmari marked this pull request as ready for review September 24, 2024 21:34

llvmbot added the LTO Link time optimization (regular/full LTO or ThinLTO) label Sep 24, 2024

teresajohnson reviewed Sep 25, 2024

View reviewed changes

llvm/lib/LTO/LTO.cpp Show resolved Hide resolved

llvm/lib/LTO/LTO.cpp Outdated Show resolved Hide resolved

llvm/lib/LTO/LTO.cpp Outdated Show resolved Hide resolved

llvm/lib/LTO/LTO.cpp Outdated Show resolved Hide resolved

teresajohnson reviewed Oct 3, 2024

View reviewed changes

llvm/lib/LTO/LTO.cpp Outdated Show resolved Hide resolved

teresajohnson reviewed Oct 4, 2024

View reviewed changes

llvm/lib/LTO/LTO.cpp Outdated Show resolved Hide resolved

llvm/lib/LTO/LTO.cpp Outdated Show resolved Hide resolved

Nuri Amari added 4 commits October 6, 2024 16:32

Address PR Feedback #1

00ad4a0

Address PR Feedback llvm#2

82a849f

Address PR Comments llvm#3

33fb21b

NuriAmari force-pushed the users/NuriAmari/multithread-thin-link branch from 846ac06 to 33fb21b Compare October 7, 2024 04:27

llvmbot added lld lld:MachO lld:ELF lld:COFF platform:windows llvm:support llvm:transforms labels Oct 7, 2024

Fix formatting

fa01b00

NuriAmari commented Oct 7, 2024

View reviewed changes

lld/MachO/LTO.cpp Show resolved Hide resolved

teresajohnson approved these changes Oct 7, 2024

View reviewed changes

NuriAmari merged commit 2edd897 into llvm:main Oct 7, 2024
9 checks passed

NuriAmari pushed a commit to NuriAmari/llvm-project that referenced this pull request Oct 7, 2024

Fix compile errors in gold plugin

cce01af

llvm#109847 inadvertently introduced compile errors to the gold plugin. This PR fixes the issue.

NuriAmari mentioned this pull request Oct 7, 2024

Fix compile errors in gold plugin #111402

Closed

NuriAmari pushed a commit to NuriAmari/llvm-project that referenced this pull request Oct 7, 2024

Fix compile errors in gold plugin

ddddecf

llvm#109847 inadvertently introduced compile errors to the gold plugin. This PR fixes the issue.

NuriAmari pushed a commit to NuriAmari/llvm-project that referenced this pull request Oct 7, 2024

Fix compile errors in gold plugin

71cfa84

llvm#109847 inadvertently introduced compile errors to the gold plugin. This PR fixes the issue.

ilovepi mentioned this pull request Oct 7, 2024

[llvm][gold] Fix syntax error #111412

Merged

ilovepi added a commit that referenced this pull request Oct 7, 2024

[llvm][gold] Fix syntax error (#111412)

fabe7e3

This seems to have been overlooked in #109847, probably because most bots don't build w/ gold enabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make WriteIndexesThinBackend multi threaded #109847

Make WriteIndexesThinBackend multi threaded #109847

NuriAmari commented Sep 24, 2024 •

edited

Loading

Uh oh!

teresajohnson commented Sep 24, 2024

Uh oh!

NuriAmari commented Sep 24, 2024

Uh oh!

llvmbot commented Sep 24, 2024 •

edited

Loading

Uh oh!

teresajohnson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NuriAmari commented Sep 25, 2024

Uh oh!

NuriAmari commented Sep 26, 2024

Uh oh!

teresajohnson left a comment

Uh oh!

Uh oh!

teresajohnson left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 7, 2024 •

edited

Loading

Uh oh!

Uh oh!

teresajohnson left a comment

Uh oh!

NuriAmari commented Oct 7, 2024

Uh oh!

Uh oh!

Uh oh!

Make WriteIndexesThinBackend multi threaded #109847

Make WriteIndexesThinBackend multi threaded #109847

Conversation

NuriAmari commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teresajohnson commented Sep 24, 2024

Uh oh!

NuriAmari commented Sep 24, 2024

Uh oh!

llvmbot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NuriAmari commented Sep 25, 2024

Uh oh!

NuriAmari commented Sep 26, 2024

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

NuriAmari commented Oct 7, 2024

Uh oh!

Uh oh!

Uh oh!

NuriAmari commented Sep 24, 2024 •

edited

Loading

llvmbot commented Sep 24, 2024 •

edited

Loading

github-actions bot commented Oct 7, 2024 •

edited

Loading