Skip to content

[ThinLTO] Asynchronous caching #8236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 19, 2024
Merged

[ThinLTO] Asynchronous caching #8236

merged 12 commits into from
Apr 19, 2024

Conversation

jansvoboda11
Copy link

@jansvoboda11 jansvoboda11 commented Feb 20, 2024

This patch makes ThinLTO caching asynchronous, solving the problem of IO-bound cache operations blocking the thread pool for CPU-bound codegen.

We spawn asynchronous cache lookup queries for all modules at the start and then start proactively performing the codegen locally. Each module has both of these tasks racing. In the normal mode, we pick the winner to be written as the final object file. With deteminism checks on, we wait for both of these tasks to finish to compare their results.

Our current implementation of the remote caching service doesn't like concurrent read and write requests, so we wait with cache writes until all reads have finished.

We have a cancellation mechanism in place so that even with particularly slow remote cache, we're not much slower than with no cache at all.

Below are my wall-time measurements of the parallel part of ThinLTOCodeGenerator::run(), optimizing 3019 modules using our remote caching service:

Cache Miss/Hit Before After
File Miss 74.3s 75.4s
Remote Miss 83.3s 78.7s
Remote Hit 31.1s 22.4s

Before this patch, using a particularly slow cache would result in linear unbounded growth of the wall-time, while the worst case after this patch is essentially an empty local cache + some timeout constant (currently hard-coded to 5s).

@jansvoboda11
Copy link
Author

Just noting that there are still some parts that needs more polish. I'm looking for high-level comments and discussion on whether this level of additional complexity makes sense.

@akyrtzi akyrtzi requested a review from benlangmuir February 20, 2024 22:04
Prior to this patch, the implementation of `RemoteModuleCacheEntry` assumed `tryLoadingBuffer()` always finishes before `write()` gets called. This assumption was leveraged to perform determinism checks in the `write()` function. In the future, these two functions will run concurrently, and the assumption will no longer hold. This patch extracts the determinism check into new function `areLoadedAndWrittenResultsIdentical()` that will be guaranteed to be always called from context where both of those functions have already returned.

This patch also takes care to avoid concurrent writes into single file from those two functions. This is done by manipulating these three files:
* The ".downloaded.tmp" file, where we're downloading the cached buffer.
* The ".computed.tmp" file, where we're streaming out the computed buffer.
* The file at `OutputPath`, where we atomically move either of the above. This file also acts as the backing buffer for the "UseBufferAPI" mode.

Note that there's no synchronization between the atomic moves, so it's possible for `tryLoadingBuffer()` and `getMappedBuffer()` to return the buffer written out by `write()`. In practise, we expect these buffers to be identical (and we verify that with determinism checks) so this should be fine.
This is supposed to be an NFC refactoring that makes the following commit more obvious.
This patch turns the ThinLTO steps asynchronous, adds async APIs to `ModuleCacheEntry` and actually implements `tryLoadingBuffer()` to be asynchronous. For now, `write()` remains synchronous.
It assumed cache is always written before the object file.
@jansvoboda11 jansvoboda11 force-pushed the jan_svoboda/async-thinlto branch from e68c156 to 8fc1687 Compare April 1, 2024 23:17
@jansvoboda11
Copy link
Author

Both @cachemeifyoucan and @akyrtzi wanted to avoid diverging even more than we currently do from the upstream ThinLTO implementation. The newest commit makes the changes to ThinLTOCodeGenerator.cpp less spread out. This should also make it easier to remove the existing diff from upstream for the file-based caching and leave us mostly just additive changes to support the async use-case.

Copy link

@cachemeifyoucan cachemeifyoucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nit picks. Diff is still quite large but seems more manageable.

});
}

// Parallel optimizer + codegen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this block same as blow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this block is pretty different from the existing logic. Only the chunk of code between line 1960 and 1986 is identical, and I think it's still better to duplicate that in order to make the downstream diff more self-contained.

@@ -1364,6 +1526,48 @@ ThinLTOCodeGenerator::writeGeneratedObject(StringRef OutputPath,
return std::string(OutputPath);
}

template <class T>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Move to a local header in lib/LTO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used in this .cpp file, so I'd prefer to keep it here to reduce the spread of our downstream diff.

@jansvoboda11 jansvoboda11 merged commit 03af59c into next Apr 19, 2024
@jansvoboda11 jansvoboda11 deleted the jan_svoboda/async-thinlto branch April 19, 2024 22:09
jansvoboda11 added a commit that referenced this pull request Apr 19, 2024
[ThinLTO] Asynchronous caching

(cherry picked from commit 03af59c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants