Skip to content

rdar://81632946 [libclang][deps] Accept only driver invocations, don't modify them #3168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 13, 2021

Conversation

jansvoboda11
Copy link

The way libclang dependency scanner handles command-lines differs from what clang-scan-deps does.

The clang-scan-deps command-line tool only accepts driver arguments (from a compilation database), while libclang supports both driver and cc1 command-lines. This PR removes the support for cc1 command-lines from libclang (the getFileDependencies function in CDependencies.cpp), unifying the API. There are no clients that rely on cc1 support. (Relying on cc1 invocations would be pretty dangerous in general, since it has breaking changes between releases.)

This change to getFileDependencies means DependencyScanningWorker::computeDependenciesForClangInvocation now receives driver command-line, making it possible to just forward to DependencyScanningWorker::computeDependencies.

There's a subtle aspect of this patch that solves rdar://81632946. The code that was used to convert the driver invocation to cc1 invocation in getFileDependencies injected the -fsyntax-only flag, which can override flags like -emit-pch. This means the information about PCH compilations didn't reach the dependency scanner library, breaking PCH support that was tested to work with clang-scan-deps.

The upstream cherry-pick will be removed once the commit makes it through the automerger.

Copy link

@dexonsmith dexonsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks correct (and overall an improvement), so I'm approving it, but:

  • I'm not seeing the logic related to -fsyntax-only. Can you point it out to me? And should that be fixed independently?
  • I have high-level concerns about the direction. As I said, I'm fine with this patch, but see below.

I have two high-level concerns.

Firstly, it seems valuable for clients to be able to separate the driver-to-cc1 computation from the cc1-to-dependencies computation. I think that's a valid thing to do, and in the future if there's a specific use-case for it, the library should evolve to support that somehow. E.g., a client could decompose a driver command, then ask questions about the -cc1 it got back, such as "what are the outputs?", "what are the explicit inputs?", "what are all the inputs?", "what's the build graph for an explicit build?", and so forth. The current APIs are focused on the last couple of questions, but the other questions are valid and useful as well. I agree with your point that clients have no business crafting -cc1 command-lines (or reusing them across compilers), but that's not the only use case for accepting a -cc1 command-line. The -cc1 command-line is the serialization format for a CompilerInvocation, and clients should be able to receive them / cache them / send them back in (as long as the clients treat them opaquely).

That said, the entry points for -cc1 (opaque, serialized compiler-invocation) and driver commands (user commands) should be distinct. I'm totally fine stripping down the current entry point to be driver-only. We can add a separate entry point for taking a -cc1 command-line in the future when it's needed. (It'd be good to note somewhere (maybe in this review) how to do it correctly, without the -fsyntax-only problem.)

Secondly, this commit feels a bit like it reinforces the limiting assumption that a clang driver command corresponds to a single compilation. That's usually true for clang invocations for the compile steps that come from Xcode's build system (one exception is if users add -save-temps), but not true for clang driver commands in general: they may correspond to multiple compilations (e.g., multiple arches or multiple source files), compilation(s) plus other actions, or just other actions (link job).

To be useful for clients in that world, I see two possible directions for this API:

  1. Learn to compute dependencies and/or build graphs for general driver commands: passing the full build graph / deps back to the client (instead of returning a build graph that only handles the first compilation (and ignores the rest)).
  2. Expect clients to decompose the driver command ahead of time (as suggested earlier), and take a single, serialized compilation here.

This patch is not going in either of those two directions. However, in practice we don't have clients like that right now and we can address this later.

@jansvoboda11
Copy link
Author

  • I'm not seeing the logic related to -fsyntax-only. Can you point it out to me? And should that be fixed independently?

It's here. This function basically duplicates the function I've been working on in this patch. Honestly, I don't know why we have two copies of the same function, it would be nice to unify them.

In general, I think things would be simpler if the tooling libraries stopped doing the ad-hoc parsing and tweaking of the raw command-line arguments and instead worked with CompilerInvocation directly if necessary. I'm not sure how much work that would be, but it's on my back burner. My concern is that removing the implicit -fsyntax-only from this function could cause some tools to start outputting object files for example, so this would need to be done carefully.

Firstly, it seems valuable for clients to be able to separate the driver-to-cc1 computation from the cc1-to-dependencies computation. I think that's a valid thing to do, and in the future if there's a specific use-case for it, the library should evolve to support that somehow. E.g., a client could decompose a driver command, then ask questions about the -cc1 it got back, such as "what are the outputs?", "what are the explicit inputs?", "what are all the inputs?", "what's the build graph for an explicit build?", and so forth. The current APIs are focused on the last couple of questions, but the other questions are valid and useful as well. I agree with your point that clients have no business crafting -cc1 command-lines (or reusing them across compilers), but that's not the only use case for accepting a -cc1 command-line. The -cc1 command-line is the serialization format for a CompilerInvocation, and clients should be able to receive them / cache them / send them back in (as long as the clients treat them opaquely).

Yeah, we already have a libclang API that can expand the driver command-line into sub-commands. I can imagine use cases where it would be useful for the dependency scanner to handle cc1 command-lines as well, but I also see value in having the API as minimal as possible for now and carefully expanding only when the need arises.

That said, the entry points for -cc1 (opaque, serialized compiler-invocation) and driver commands (user commands) should be distinct. I'm totally fine stripping down the current entry point to be driver-only. We can add a separate entry point for taking a -cc1 command-line in the future when it's needed. (It'd be good to note somewhere (maybe in this review) how to do it correctly, without the -fsyntax-only problem.)

The correct way would be to use the function I worked on in the patch mentioned above. It expands the driver invocation without appending -fsyntax-only and filters out the jobs that don't read an actual source file.

Secondly, this commit feels a bit like it reinforces the limiting assumption that a clang driver command corresponds to a single compilation. That's usually true for clang invocations for the compile steps that come from Xcode's build system (one exception is if users add -save-temps), but not true for clang driver commands in general: they may correspond to multiple compilations (e.g., multiple arches or multiple source files), compilation(s) plus other actions, or just other actions (link job).

To be useful for clients in that world, I see two possible directions for this API:

  1. Learn to compute dependencies and/or build graphs for general driver commands: passing the full build graph / deps back to the client (instead of returning a build graph that only handles the first compilation (and ignores the rest)).
  2. Expect clients to decompose the driver command ahead of time (as suggested earlier), and take a single, serialized compilation here.

Yeah, I think that it would be nice to eventually have an API that accepts a driver invocation and returns a set of CXFileDependencies - one for each expanded source-reading cc1 invocation. For example, this would make it very easy to take a compilation database with driver invocations (potentially with multiple architectures, multiple source files), send it to the API and get the correct results out without expanding each Clang invocation manually. Basically your 1st direction.

This patch is not going in either of those two directions. However, in practice we don't have clients like that right now and we can address this later.

Agreed. Thanks for the review!

@jansvoboda11 jansvoboda11 merged commit bc7e7c7 into next Aug 13, 2021
@jansvoboda11 jansvoboda11 deleted the jan_svoboda/libclang-deps-command-line branch August 13, 2021 20:50
@dexonsmith
Copy link

  • I'm not seeing the logic related to -fsyntax-only. Can you point it out to me? And should that be fixed independently?

It's here. This function basically duplicates the function I've been working on in this patch. Honestly, I don't know why we have two copies of the same function, it would be nice to unify them.

Thanks, I had only been looking through scan-deps-related code!

Might be nice to expand the header doc to explain this doesn't return the same -cc1 the driver would give. (I agree the two copies should be unified somehow to share the common bits.)

The correct way would be to use the function I worked on in the patch mentioned above. It expands the driver invocation without appending -fsyntax-only and filters out the jobs that don't read an actual source file.

Right, this makes sense. Seems like I was a bit confused about the problem (I thought the -cc1 version was broken, but of course it had no direct clients; it was the driver-badly-converted-to-cc1 that was broken).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants