[ThinLTO] Asynchronous caching #8236

jansvoboda11 · 2024-02-20T01:43:20Z

This patch makes ThinLTO caching asynchronous, solving the problem of IO-bound cache operations blocking the thread pool for CPU-bound codegen.

We spawn asynchronous cache lookup queries for all modules at the start and then start proactively performing the codegen locally. Each module has both of these tasks racing. In the normal mode, we pick the winner to be written as the final object file. With deteminism checks on, we wait for both of these tasks to finish to compare their results.

Our current implementation of the remote caching service doesn't like concurrent read and write requests, so we wait with cache writes until all reads have finished.

We have a cancellation mechanism in place so that even with particularly slow remote cache, we're not much slower than with no cache at all.

Below are my wall-time measurements of the parallel part of ThinLTOCodeGenerator::run(), optimizing 3019 modules using our remote caching service:

Cache	Miss/Hit	Before	After
File	Miss	74.3s	75.4s
Remote	Miss	83.3s	78.7s
Remote	Hit	31.1s	22.4s

Before this patch, using a particularly slow cache would result in linear unbounded growth of the wall-time, while the worst case after this patch is essentially an empty local cache + some timeout constant (currently hard-coded to 5s).

jansvoboda11 · 2024-02-20T01:46:16Z

Just noting that there are still some parts that needs more polish. I'm looking for high-level comments and discussion on whether this level of additional complexity makes sense.

Prior to this patch, the implementation of `RemoteModuleCacheEntry` assumed `tryLoadingBuffer()` always finishes before `write()` gets called. This assumption was leveraged to perform determinism checks in the `write()` function. In the future, these two functions will run concurrently, and the assumption will no longer hold. This patch extracts the determinism check into new function `areLoadedAndWrittenResultsIdentical()` that will be guaranteed to be always called from context where both of those functions have already returned. This patch also takes care to avoid concurrent writes into single file from those two functions. This is done by manipulating these three files: * The ".downloaded.tmp" file, where we're downloading the cached buffer. * The ".computed.tmp" file, where we're streaming out the computed buffer. * The file at `OutputPath`, where we atomically move either of the above. This file also acts as the backing buffer for the "UseBufferAPI" mode. Note that there's no synchronization between the atomic moves, so it's possible for `tryLoadingBuffer()` and `getMappedBuffer()` to return the buffer written out by `write()`. In practise, we expect these buffers to be identical (and we verify that with determinism checks) so this should be fine.

This is supposed to be an NFC refactoring that makes the following commit more obvious.

This patch turns the ThinLTO steps asynchronous, adds async APIs to `ModuleCacheEntry` and actually implements `tryLoadingBuffer()` to be asynchronous. For now, `write()` remains synchronous.

It assumed cache is always written before the object file.

jansvoboda11 · 2024-04-01T23:22:36Z

Both @cachemeifyoucan and @akyrtzi wanted to avoid diverging even more than we currently do from the upstream ThinLTO implementation. The newest commit makes the changes to ThinLTOCodeGenerator.cpp less spread out. This should also make it easier to remove the existing diff from upstream for the file-based caching and leave us mostly just additive changes to support the async use-case.

cachemeifyoucan

Some small nit picks. Diff is still quite large but seems more manageable.

cachemeifyoucan · 2024-04-18T23:19:34Z

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

+      });
+    }
+
+    // Parallel optimizer + codegen


Is this block same as blow?

No, this block is pretty different from the existing logic. Only the chunk of code between line 1960 and 1986 is identical, and I think it's still better to duplicate that in order to make the downstream diff more self-contained.

cachemeifyoucan · 2024-04-18T23:20:32Z

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

@@ -1364,6 +1526,48 @@ ThinLTOCodeGenerator::writeGeneratedObject(StringRef OutputPath,
  return std::string(OutputPath);
 }

+template <class T>


Nit: Move to a local header in lib/LTO

It's only used in this .cpp file, so I'd prefer to keep it here to reduce the spread of our downstream diff.

[ThinLTO] Asynchronous caching (cherry picked from commit 03af59c)

jansvoboda11 requested review from akyrtzi and cachemeifyoucan February 20, 2024 01:43

akyrtzi requested a review from benlangmuir February 20, 2024 22:04

jansvoboda11 added 11 commits April 1, 2024 14:26

[ThinLTO] Split main loop into steps

46f6a8e

This is supposed to be an NFC refactoring that makes the following commit more obvious.

[ThinLTO] Async steps & cache lookup

554ebcc

This patch turns the ThinLTO steps asynchronous, adds async APIs to `ModuleCacheEntry` and actually implements `tryLoadingBuffer()` to be asynchronous. For now, `write()` remains synchronous.

[ThinLTO] Async CAS cache write

3470f2c

[ThinLTO] Async gRPC cache write

303c173

[ThinLTO] Don't mix cache reads with writes, implement cancellation

5893821

[ThinLTO] Fix FileModuleCacheEntry

92096f4

It assumed cache is always written before the object file.

Revert most of "[ThinLTO] Async CAS cache write"

7cf34dd

Undo whitespace change

c158665

Fix build after rebase

244bc70

Make the async code more self-contained

8fc1687

jansvoboda11 force-pushed the jan_svoboda/async-thinlto branch from e68c156 to 8fc1687 Compare April 1, 2024 23:17

akyrtzi approved these changes Apr 2, 2024

View reviewed changes

Propagate error out of writeObject()

85ed390

cachemeifyoucan reviewed Apr 18, 2024

View reviewed changes

jansvoboda11 merged commit 03af59c into next Apr 19, 2024

jansvoboda11 deleted the jan_svoboda/async-thinlto branch April 19, 2024 22:09

jansvoboda11 added a commit that referenced this pull request Apr 19, 2024

Merge pull request #8236 from apple/jan_svoboda/async-thinlto

c840bc2

[ThinLTO] Asynchronous caching (cherry picked from commit 03af59c)

jansvoboda11 mentioned this pull request Apr 19, 2024

[ThinLTO] Asynchronous caching #8614

Merged

This was referenced Jun 4, 2024

[ThinLTO] Re-commit termination handling #8839

Merged

[ThinLTO] Re-commit termination handling #8840

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ThinLTO] Asynchronous caching #8236

[ThinLTO] Asynchronous caching #8236

Uh oh!

jansvoboda11 commented Feb 20, 2024 •

edited

Loading

Uh oh!

jansvoboda11 commented Feb 20, 2024

Uh oh!

jansvoboda11 commented Apr 1, 2024

Uh oh!

cachemeifyoucan left a comment

Uh oh!

cachemeifyoucan Apr 18, 2024

Uh oh!

jansvoboda11 Apr 19, 2024

Uh oh!

cachemeifyoucan Apr 18, 2024

Uh oh!

jansvoboda11 Apr 19, 2024

Uh oh!

Uh oh!

[ThinLTO] Asynchronous caching #8236

[ThinLTO] Asynchronous caching #8236

Uh oh!

Conversation

jansvoboda11 commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jansvoboda11 commented Feb 20, 2024

Uh oh!

jansvoboda11 commented Apr 1, 2024

Uh oh!

cachemeifyoucan left a comment

Choose a reason for hiding this comment

Uh oh!

cachemeifyoucan Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

jansvoboda11 Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

cachemeifyoucan Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

jansvoboda11 Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jansvoboda11 commented Feb 20, 2024 •

edited

Loading