rdar://81632946 [libclang][deps] Accept only driver invocations, don't modify them #3168

jansvoboda11 · 2021-08-13T11:41:36Z

The way libclang dependency scanner handles command-lines differs from what clang-scan-deps does.

The clang-scan-deps command-line tool only accepts driver arguments (from a compilation database), while libclang supports both driver and cc1 command-lines. This PR removes the support for cc1 command-lines from libclang (the getFileDependencies function in CDependencies.cpp), unifying the API. There are no clients that rely on cc1 support. (Relying on cc1 invocations would be pretty dangerous in general, since it has breaking changes between releases.)

This change to getFileDependencies means DependencyScanningWorker::computeDependenciesForClangInvocation now receives driver command-line, making it possible to just forward to DependencyScanningWorker::computeDependencies.

There's a subtle aspect of this patch that solves rdar://81632946. The code that was used to convert the driver invocation to cc1 invocation in getFileDependencies injected the -fsyntax-only flag, which can override flags like -emit-pch. This means the information about PCH compilations didn't reach the dependency scanner library, breaking PCH support that was tested to work with clang-scan-deps.

The upstream cherry-pick will be removed once the commit makes it through the automerger.

…t modify them

dexonsmith

This change looks correct (and overall an improvement), so I'm approving it, but:

I'm not seeing the logic related to -fsyntax-only. Can you point it out to me? And should that be fixed independently?
I have high-level concerns about the direction. As I said, I'm fine with this patch, but see below.

I have two high-level concerns.

Firstly, it seems valuable for clients to be able to separate the driver-to-cc1 computation from the cc1-to-dependencies computation. I think that's a valid thing to do, and in the future if there's a specific use-case for it, the library should evolve to support that somehow. E.g., a client could decompose a driver command, then ask questions about the -cc1 it got back, such as "what are the outputs?", "what are the explicit inputs?", "what are all the inputs?", "what's the build graph for an explicit build?", and so forth. The current APIs are focused on the last couple of questions, but the other questions are valid and useful as well. I agree with your point that clients have no business crafting -cc1 command-lines (or reusing them across compilers), but that's not the only use case for accepting a -cc1 command-line. The -cc1 command-line is the serialization format for a CompilerInvocation, and clients should be able to receive them / cache them / send them back in (as long as the clients treat them opaquely).

That said, the entry points for -cc1 (opaque, serialized compiler-invocation) and driver commands (user commands) should be distinct. I'm totally fine stripping down the current entry point to be driver-only. We can add a separate entry point for taking a -cc1 command-line in the future when it's needed. (It'd be good to note somewhere (maybe in this review) how to do it correctly, without the -fsyntax-only problem.)

Secondly, this commit feels a bit like it reinforces the limiting assumption that a clang driver command corresponds to a single compilation. That's usually true for clang invocations for the compile steps that come from Xcode's build system (one exception is if users add -save-temps), but not true for clang driver commands in general: they may correspond to multiple compilations (e.g., multiple arches or multiple source files), compilation(s) plus other actions, or just other actions (link job).

To be useful for clients in that world, I see two possible directions for this API:

Learn to compute dependencies and/or build graphs for general driver commands: passing the full build graph / deps back to the client (instead of returning a build graph that only handles the first compilation (and ignores the rest)).
Expect clients to decompose the driver command ahead of time (as suggested earlier), and take a single, serialized compilation here.

This patch is not going in either of those two directions. However, in practice we don't have clients like that right now and we can address this later.

jansvoboda11 · 2021-08-13T20:07:13Z

I'm not seeing the logic related to -fsyntax-only. Can you point it out to me? And should that be fixed independently?

It's here. This function basically duplicates the function I've been working on in this patch. Honestly, I don't know why we have two copies of the same function, it would be nice to unify them.

In general, I think things would be simpler if the tooling libraries stopped doing the ad-hoc parsing and tweaking of the raw command-line arguments and instead worked with CompilerInvocation directly if necessary. I'm not sure how much work that would be, but it's on my back burner. My concern is that removing the implicit -fsyntax-only from this function could cause some tools to start outputting object files for example, so this would need to be done carefully.

Firstly, it seems valuable for clients to be able to separate the driver-to-cc1 computation from the cc1-to-dependencies computation. I think that's a valid thing to do, and in the future if there's a specific use-case for it, the library should evolve to support that somehow. E.g., a client could decompose a driver command, then ask questions about the -cc1 it got back, such as "what are the outputs?", "what are the explicit inputs?", "what are all the inputs?", "what's the build graph for an explicit build?", and so forth. The current APIs are focused on the last couple of questions, but the other questions are valid and useful as well. I agree with your point that clients have no business crafting -cc1 command-lines (or reusing them across compilers), but that's not the only use case for accepting a -cc1 command-line. The -cc1 command-line is the serialization format for a CompilerInvocation, and clients should be able to receive them / cache them / send them back in (as long as the clients treat them opaquely).

Yeah, we already have a libclang API that can expand the driver command-line into sub-commands. I can imagine use cases where it would be useful for the dependency scanner to handle cc1 command-lines as well, but I also see value in having the API as minimal as possible for now and carefully expanding only when the need arises.

That said, the entry points for -cc1 (opaque, serialized compiler-invocation) and driver commands (user commands) should be distinct. I'm totally fine stripping down the current entry point to be driver-only. We can add a separate entry point for taking a -cc1 command-line in the future when it's needed. (It'd be good to note somewhere (maybe in this review) how to do it correctly, without the -fsyntax-only problem.)

The correct way would be to use the function I worked on in the patch mentioned above. It expands the driver invocation without appending -fsyntax-only and filters out the jobs that don't read an actual source file.

Secondly, this commit feels a bit like it reinforces the limiting assumption that a clang driver command corresponds to a single compilation. That's usually true for clang invocations for the compile steps that come from Xcode's build system (one exception is if users add -save-temps), but not true for clang driver commands in general: they may correspond to multiple compilations (e.g., multiple arches or multiple source files), compilation(s) plus other actions, or just other actions (link job).

To be useful for clients in that world, I see two possible directions for this API:

Learn to compute dependencies and/or build graphs for general driver commands: passing the full build graph / deps back to the client (instead of returning a build graph that only handles the first compilation (and ignores the rest)).

Expect clients to decompose the driver command ahead of time (as suggested earlier), and take a single, serialized compilation here.

Yeah, I think that it would be nice to eventually have an API that accepts a driver invocation and returns a set of CXFileDependencies - one for each expanded source-reading cc1 invocation. For example, this would make it very easy to take a compilation database with driver invocations (potentially with multiple architectures, multiple source files), send it to the API and get the correct results out without expanding each Clang invocation manually. Basically your 1st direction.

This patch is not going in either of those two directions. However, in practice we don't have clients like that right now and we can address this later.

Agreed. Thanks for the review!

dexonsmith · 2021-08-13T21:39:07Z

I'm not seeing the logic related to -fsyntax-only. Can you point it out to me? And should that be fixed independently?

It's here. This function basically duplicates the function I've been working on in this patch. Honestly, I don't know why we have two copies of the same function, it would be nice to unify them.

Thanks, I had only been looking through scan-deps-related code!

Might be nice to expand the header doc to explain this doesn't return the same -cc1 the driver would give. (I agree the two copies should be unified somehow to share the common bits.)

The correct way would be to use the function I worked on in the patch mentioned above. It expands the driver invocation without appending -fsyntax-only and filters out the jobs that don't read an actual source file.

Right, this makes sense. Seems like I was a bit confused about the problem (I thought the -cc1 version was broken, but of course it had no direct clients; it was the driver-badly-converted-to-cc1 that was broken).

jansvoboda11 added 2 commits August 13, 2021 13:23

Upstream cherry-pick (10c8f78)

3e27b4e

rdar://81632946 [libclang][deps] Accept only driver invocations, don'…

7c44674

…t modify them

jansvoboda11 requested review from Bigcheese, hyp and dexonsmith August 13, 2021 11:41

dexonsmith approved these changes Aug 13, 2021

View reviewed changes

jansvoboda11 merged commit bc7e7c7 into next Aug 13, 2021

jansvoboda11 deleted the jan_svoboda/libclang-deps-command-line branch August 13, 2021 20:50

jansvoboda11 mentioned this pull request Aug 16, 2021

Add an API for clang dependency scanner to perform module lookup by name alone #3127

Merged

edymtt mentioned this pull request Aug 19, 2021

[20210726] rdar://81632946 [libclang][deps] Accept only driver invocations, don't modify them #3183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rdar://81632946 [libclang][deps] Accept only driver invocations, don't modify them #3168

rdar://81632946 [libclang][deps] Accept only driver invocations, don't modify them #3168

Uh oh!

jansvoboda11 commented Aug 13, 2021

Uh oh!

dexonsmith left a comment

Uh oh!

jansvoboda11 commented Aug 13, 2021

Uh oh!

dexonsmith commented Aug 13, 2021

Uh oh!

Uh oh!

rdar://81632946 [libclang][deps] Accept only driver invocations, don't modify them #3168

rdar://81632946 [libclang][deps] Accept only driver invocations, don't modify them #3168

Uh oh!

Conversation

jansvoboda11 commented Aug 13, 2021

Uh oh!

dexonsmith left a comment

Choose a reason for hiding this comment

Uh oh!

jansvoboda11 commented Aug 13, 2021

Uh oh!

dexonsmith commented Aug 13, 2021

Uh oh!

Uh oh!