Skip to content

[SwiftScan] Add SwiftScan APIs to replay the cache output #68660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cachemeifyoucan
Copy link
Contributor

Add new C APIs to libSwiftScan for cache querying and cache replays.

@cachemeifyoucan
Copy link
Contributor Author

cachemeifyoucan commented Sep 21, 2023

I was trying to abstract out the compilation to avoid re-parsing command line, compute input/output and create instance in replay logics, but it turns out a lots of information needs to be passed, including:

  • the diagnostics needed
  • how to print diagnostics (color, format, etc.)
  • (added) prefix map for cache replay
  • inputs (to compute how to construct file specific diagnostic consumer)
  • outputs (to replay)

So stick with replaying from command-line for now.

@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@akyrtzi
Copy link
Contributor

akyrtzi commented Sep 21, 2023

The libclang API has this kind of interaction with the build system:

  1. key query API call: if the key exists it returns an abstract object that represents the outputs (the outputs are not downloaded from this call)
  2. outputs materialize API call: downloads outputs locally
  3. replay API call: receives arguments and the abstract object for outputs

Separating (1) from (2) provides flexibility on the client side for treating them as separate actions that can be recorded in the build log with their own timing info and executed potentially with different priority/scheduling. I would prefer to preserve that for swift as well instead of swiftscan_cache_query bundling both actions into the same API call.

And once an opaque object for outputs is introduced you can pass it to swiftscan_cache_replay_compilation to formalize that the outputs must already be found and potentially simplify the work that the replay function needs to do.

@cachemeifyoucan
Copy link
Contributor Author

Separating (1) from (2) provides flexibility on the client side for treating them as separate actions that can be recorded in the build log with their own timing info and executed potentially with different priority/scheduling. I would prefer to preserve that for swift as well instead of swiftscan_cache_query bundling both actions into the same API call.

Maybe. But isn't that more round trip to remote that just adds latency? Maybe query can return faster without download, but what do you do if query succeed but materialize failed? Isn't that the same as a miss?

@akyrtzi
Copy link
Contributor

akyrtzi commented Sep 22, 2023

But isn't that more round trip to remote that just adds latency?

Key/value queries are separated from CAS actions, both in the LLVMCAS APIs and the network protocols, there's no saving of latency in putting them behind one C API function instead of two.

Maybe query can return faster without download, but what do you do if query succeed but materialize failed? Isn't that the same as a miss?

Yes, the client should treat it as a miss (or not, it is up to the client, maybe in a more strict mode it prefers to fail if the CAS download was not found, that's the benefit of the additional flexibility and control) and it will be more transparent to the client what exact action failed.

@cachemeifyoucan
Copy link
Contributor Author

Split queryCacheKey and loadObject into different API. Add a reserved field in replay function in case we want to add in the cache keys in the future.

@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please test

@benlangmuir
Copy link
Contributor

I'm a bit unclear on the intended design for the replay function. Currently it works by triggering a performFrontend that internally does the caching. IIUC that will replay all the cached compilations associated with that frontend job in the case of batch mode. But if we have multiple keys for that and you have added one key parameter to the function for future use, won't you then need to make multiple calls to the same function?

I wonder if we should have a list of keys instead? I assume that even after you stop using performFrontend, you need to setup some state from the compiler arguments to do the replay. That seems like something we want to avoid repeating multiple times for the same invocation if possible.

@cachemeifyoucan
Copy link
Contributor Author

I'm a bit unclear on the intended design for the replay function. Currently it works by triggering a performFrontend that internally does the caching. IIUC that will replay all the cached compilations associated with that frontend job in the case of batch mode. But if we have multiple keys for that and you have added one key parameter to the function for future use, won't you then need to make multiple calls to the same function?

I wonder if we should have a list of keys instead? I assume that even after you stop using performFrontend, you need to setup some state from the compiler arguments to do the replay. That seems like something we want to avoid repeating multiple times for the same invocation if possible.

See my comment in the beginning. #68660 (comment)

The ideal situation is to replay from each key but the number of API required will grow significantly and might not be very stable. I was thinking if there are better way to communicate (like setup replay state from a JSON or a CAS object).

@cachemeifyoucan
Copy link
Contributor Author

After offline discussion, we are going to revise the cache key structure and this patch will be updated with the new design in mind later.

@cachemeifyoucan cachemeifyoucan changed the title [SwiftScan] Add SwiftScan APIs to replay the cache output [Do Not Merge/Review][SwiftScan] Add SwiftScan APIs to replay the cache output Sep 27, 2023
@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from 0d56a00 to 9177399 Compare October 25, 2023 18:50
@cachemeifyoucan cachemeifyoucan changed the title [Do Not Merge/Review][SwiftScan] Add SwiftScan APIs to replay the cache output [SwiftScan] Add SwiftScan APIs to replay the cache output Oct 25, 2023
@cachemeifyoucan
Copy link
Contributor Author

Update the patch to using the new cache key schema and new C APIs.

Add a tool swift-scan-test that can be used to test libswiftscan C APIs.

@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from 9177399 to 9565794 Compare October 26, 2023 17:51
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from 9565794 to 0d1c792 Compare October 26, 2023 18:33
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from eb0eeb8 to e640e14 Compare October 30, 2023 22:24
@cachemeifyoucan
Copy link
Contributor Author

Also rewrite the patch to split Caching related code from DependencyScanning related code in libSwiftScan implementation so the file gets smaller and easier to read.

@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

Change how cached diagnostics are stored inside the CAS. It used to be
stored as a standalone entry for a frontend invocation in the cache and
now it is switched to be associated with input files, stored together
with other outputs like object files, etc.

This enables cleaner Cache Replay APIs and future cached diagnostics
that can be splitted up by file contribution.
@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from e640e14 to ced0922 Compare October 30, 2023 23:03
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from ced0922 to b8f5226 Compare November 1, 2023 23:20
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from b8f5226 to ef24432 Compare November 2, 2023 16:15
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

Copy link
Contributor

@akyrtzi akyrtzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a swiftscan_cached_output_get_casidstring function, it will be useful for the client for logging purposes.

I also foresee a need for something like swiftscan_cache_compilation_is_uncacheable, for the case that the compiler decided that the compilation was uncacheable, so the client can distinguish this case. I'd suggest to add the function as a stub (always returning false) so clients can call it, and implement it properly in a future PR once we find a case for marking a key as 'uncacheable'.

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from ef24432 to ad78812 Compare November 3, 2023 17:06
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from ad78812 to e5076cb Compare November 3, 2023 18:17
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from e5076cb to 79118aa Compare November 3, 2023 22:13
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

Add new APIs libSwiftScan that can be used for cache query and cache
replay. This enables swift-driver or build system to query the cache and
replay the compilation results without invocation swift-frontend for
better scheduling.
@cachemeifyoucan cachemeifyoucan force-pushed the eng/PR-cache-replay-api branch from 79118aa to 034c15c Compare November 3, 2023 23:00
@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test

Copy link
Contributor

@akyrtzi akyrtzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing all the feedback!

@cachemeifyoucan
Copy link
Contributor Author

@swift-ci please smoke test windows platform

@cachemeifyoucan cachemeifyoucan merged commit 9291c25 into swiftlang:main Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants