-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Pass LangOpts from CompilerInstance to DependencyScanningWorker #93753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang Author: Nishith Kumar M Shah (nishithshah2211) ChangesThis commit fixes #88896 by passing LangOpts from the CompilerInstance to Full diff: https://github.com/llvm/llvm-project/pull/93753.diff 6 Files Affected:
diff --git a/clang/include/clang/Lex/DependencyDirectivesScanner.h b/clang/include/clang/Lex/DependencyDirectivesScanner.h
index 0e115906fbfe5..2f8354dec939f 100644
--- a/clang/include/clang/Lex/DependencyDirectivesScanner.h
+++ b/clang/include/clang/Lex/DependencyDirectivesScanner.h
@@ -17,6 +17,7 @@
#ifndef LLVM_CLANG_LEX_DEPENDENCYDIRECTIVESSCANNER_H
#define LLVM_CLANG_LEX_DEPENDENCYDIRECTIVESSCANNER_H
+#include "clang/Basic/LangOptions.h"
#include "clang/Basic/SourceLocation.h"
#include "llvm/ADT/ArrayRef.h"
@@ -117,7 +118,7 @@ struct Directive {
bool scanSourceForDependencyDirectives(
StringRef Input, SmallVectorImpl<dependency_directives_scan::Token> &Tokens,
SmallVectorImpl<dependency_directives_scan::Directive> &Directives,
- DiagnosticsEngine *Diags = nullptr,
+ const LangOptions &LangOpts, DiagnosticsEngine *Diags = nullptr,
SourceLocation InputSourceLoc = SourceLocation());
/// Print the previously scanned dependency directives as minimized source text.
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index f7b4510d7f7be..9dc20065a09a3 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -363,7 +363,8 @@ class DependencyScanningWorkerFilesystem
///
/// Returns true if the directive tokens are populated for this file entry,
/// false if not (i.e. this entry is not a file or its scan fails).
- bool ensureDirectiveTokensArePopulated(EntryRef Entry);
+ bool ensureDirectiveTokensArePopulated(EntryRef Entry,
+ const LangOptions &LangOpts);
/// Check whether \p Path exists. By default checks cached result of \c
/// status(), and falls back on FS if unable to do so.
diff --git a/clang/lib/Frontend/FrontendActions.cpp b/clang/lib/Frontend/FrontendActions.cpp
index 454653a31534c..eddb2ac0c0834 100644
--- a/clang/lib/Frontend/FrontendActions.cpp
+++ b/clang/lib/Frontend/FrontendActions.cpp
@@ -1168,8 +1168,8 @@ void PrintDependencyDirectivesSourceMinimizerAction::ExecuteAction() {
llvm::SmallVector<dependency_directives_scan::Token, 16> Tokens;
llvm::SmallVector<dependency_directives_scan::Directive, 32> Directives;
if (scanSourceForDependencyDirectives(
- FromFile.getBuffer(), Tokens, Directives, &CI.getDiagnostics(),
- SM.getLocForStartOfFile(SM.getMainFileID()))) {
+ FromFile.getBuffer(), Tokens, Directives, CI.getLangOpts(),
+ &CI.getDiagnostics(), SM.getLocForStartOfFile(SM.getMainFileID()))) {
assert(CI.getDiagnostics().hasErrorOccurred() &&
"no errors reported for failure");
diff --git a/clang/lib/Lex/DependencyDirectivesScanner.cpp b/clang/lib/Lex/DependencyDirectivesScanner.cpp
index 0971daa1f3666..f7462aefa4f87 100644
--- a/clang/lib/Lex/DependencyDirectivesScanner.cpp
+++ b/clang/lib/Lex/DependencyDirectivesScanner.cpp
@@ -62,14 +62,17 @@ struct DirectiveWithTokens {
struct Scanner {
Scanner(StringRef Input,
SmallVectorImpl<dependency_directives_scan::Token> &Tokens,
- DiagnosticsEngine *Diags, SourceLocation InputSourceLoc)
+ DiagnosticsEngine *Diags, SourceLocation InputSourceLoc,
+ const LangOptions &LangOpts)
: Input(Input), Tokens(Tokens), Diags(Diags),
- InputSourceLoc(InputSourceLoc), LangOpts(getLangOptsForDepScanning()),
+ InputSourceLoc(InputSourceLoc),
+ LangOpts(getLangOptsForDepScanning(LangOpts)),
TheLexer(InputSourceLoc, LangOpts, Input.begin(), Input.begin(),
Input.end()) {}
- static LangOptions getLangOptsForDepScanning() {
- LangOptions LangOpts;
+ static LangOptions
+ getLangOptsForDepScanning(const LangOptions &invocationLangOpts) {
+ LangOptions LangOpts(invocationLangOpts);
// Set the lexer to use 'tok::at' for '@', instead of 'tok::unknown'.
LangOpts.ObjC = true;
LangOpts.LineComment = true;
@@ -700,7 +703,7 @@ bool Scanner::lex_Pragma(const char *&First, const char *const End) {
SmallVector<dependency_directives_scan::Token> DiscardTokens;
const char *Begin = Buffer.c_str();
Scanner PragmaScanner{StringRef(Begin, Buffer.size()), DiscardTokens, Diags,
- InputSourceLoc};
+ InputSourceLoc, LangOptions()};
PragmaScanner.TheLexer.setParsingPreprocessorDirective(true);
if (PragmaScanner.lexPragma(Begin, Buffer.end()))
@@ -950,9 +953,10 @@ bool Scanner::scan(SmallVectorImpl<Directive> &Directives) {
bool clang::scanSourceForDependencyDirectives(
StringRef Input, SmallVectorImpl<dependency_directives_scan::Token> &Tokens,
- SmallVectorImpl<Directive> &Directives, DiagnosticsEngine *Diags,
- SourceLocation InputSourceLoc) {
- return Scanner(Input, Tokens, Diags, InputSourceLoc).scan(Directives);
+ SmallVectorImpl<Directive> &Directives, const LangOptions &LangOpts,
+ DiagnosticsEngine *Diags, SourceLocation InputSourceLoc) {
+ return Scanner(Input, Tokens, Diags, InputSourceLoc, LangOpts)
+ .scan(Directives);
}
void clang::printDependencyDirectivesAsSource(
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 0cab17a342440..66a2f6e0acb63 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -42,7 +42,7 @@ DependencyScanningWorkerFilesystem::readFile(StringRef Filename) {
}
bool DependencyScanningWorkerFilesystem::ensureDirectiveTokensArePopulated(
- EntryRef Ref) {
+ EntryRef Ref, const LangOptions &LangOpts) {
auto &Entry = Ref.Entry;
if (Entry.isError() || Entry.isDirectory())
@@ -66,7 +66,7 @@ bool DependencyScanningWorkerFilesystem::ensureDirectiveTokensArePopulated(
// dependencies.
if (scanSourceForDependencyDirectives(Contents->Original->getBuffer(),
Contents->DepDirectiveTokens,
- Directives)) {
+ Directives, LangOpts)) {
Contents->DepDirectiveTokens.clear();
// FIXME: Propagate the diagnostic if desired by the client.
Contents->DepDirectives.store(new std::optional<DependencyDirectivesTy>());
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
index 0c047b6c5da2f..c3d63c3f890e8 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
@@ -366,11 +366,12 @@ class DependencyScanningAction : public tooling::ToolAction {
// Use the dependency scanning optimized file system if requested to do so.
if (DepFS)
ScanInstance.getPreprocessorOpts().DependencyDirectivesForFile =
- [LocalDepFS = DepFS](FileEntryRef File)
+ [LocalDepFS = DepFS,
+ &LangOpts = ScanInstance.getLangOpts()](FileEntryRef File)
-> std::optional<ArrayRef<dependency_directives_scan::Directive>> {
if (llvm::ErrorOr<EntryRef> Entry =
LocalDepFS->getOrCreateFileSystemEntry(File.getName()))
- if (LocalDepFS->ensureDirectiveTokensArePopulated(*Entry))
+ if (LocalDepFS->ensureDirectiveTokensArePopulated(*Entry, LangOpts))
return Entry->getDirectiveTokens();
return std::nullopt;
};
|
@cor3ntin @Sirraide I put up a draft PR to get some feedback and see if the changes here make sense or not. Also tagging @pogo59 since I received good feedback/pointers here: https://discourse.llvm.org/t/looking-for-help-with-accessing-langopts-from-the-actual-compiler-invocation/79228. My main concern is mostly around introducing I'll add new tests and fix existing tests in the next commit on this PR if the source changes seem reasonable. Build will fail until then. |
Thanks for working on that. Your change is pretty much what i expected yes, it's great! |
@cor3ntin Thanks for reviewing. I updated the PR to get the tests/build passing. Please take a look when you get a chance. |
Can you add a test for #88896 ? |
This commit fixes llvm#88896 by passing LangOpts from the CompilerInstance to DependencyScanningWorker so that the original LangOpts are preserved/respected. This makes for more accurate parsing/lexing when certain language versions or features specific to versions are to be used.
Yes, sorry I missed committing that change. Let me know if there is a better way to test out the changes or add any additional changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense.
Thanks a lot for the fix
Thanks for the pointers. What would be the process to merge this in? |
@nishithshah2211 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
I don't think this is correct. If you concurrently scan the same file under two language standards with the same scanning service, it becomes non-deterministic which one gets cached in the filesystem cache. For subsequent FS queries the cache might return wrong results, ignoring language options. You need to make the language standard a part of the cache key. An alternative solution (that I prefer) is to set up the scanner in a way that always accepts |
Thanks for the comments @jansvoboda11 . I am new to all these different moving parts and want to understand better. I have a few questions.
That is true. But until your comment, I did not even know that it is possible (and supported) to be able to invoke the same scanning service on the same file under two language options (say c++14 and c++11). How would someone do that? Asking so I can test this out locally and try to come up with a better solution. (Also, why would someone do that?)
This was kind of one of my concerns that I had called out here: https://discourse.llvm.org/t/looking-for-help-with-accessing-langopts-from-the-actual-compiler-invocation/79228/3. Specifically:
Given that
Would this be considered "hacky"? Because if we go this way, the Scanner would technically be operating in a different language mode for integers, potentially overriding the language mode arg that was passed in during invocation. I am not opposed to it - just trying to understand the implications better. We do turn on specific llvm-project/clang/lib/Lex/DependencyDirectivesScanner.cpp Lines 71 to 79 in 8364659
I guess the general question is - is it acceptable to have the Scanner operating in a language standard different than the passed in language mode and different than the compiler language standard? |
You can have a project that has both C and C++ implementation files that end up including the same header files from the C standard library. One can be compiled under C11 (without separator support), the other under C++14 (with separator support).
Scanning is often the choke point in builds, so any change that slows it down needs to make up for it in correctness. Adding language options (or just the standard) to the cache key could trigger multiple scans for a single file, which would change the performance quite a bit. Since we have an alternative solution that's still correct, I'd prefer that.
I think that is acceptable. It is kinda hacky, but the lexer and preprocessor are largely independent of the language and the standard. When they do depend on those settings, taking the union of the features and letting the compiler trim it down is still a perfectly sound thing to do. |
Thanks. I had considered this very lightly, and in my mind, I thought that this would result in two separate passes of scanning - and so, two separate scanning services. Maybe I am wrong? Thank you for the additional context and bits of knowledge, super helpful. I'll put up a separate PR that does the following:
Let me know if it is preferable for history etc. to have two separate PRs - one for the revert and one for the lexer to relax the CPP14/C23 constraint during the preprocessing phase. |
Single scanning service will be used across the entire build, across build targets, regardless of language and the standard of the compilation.
Yes, splitting this into two PRs would be great. |
…er (llvm#93753)" This reverts commit 9862080.
Here is the revert PR: #94488 |
You can certainly construct cases where the different lexing rules in different language modes allow you to detect which language you're in from within the preprocessor (1 2 3) or where enabling more language mode flags may reject valid code. It may be good enough for what the scanner is trying to do, but I think it's a stretch to say that it's sound. |
…cessing When trying to lex numeric constants that use single quotes as separators, if the lexer is parsing a preprocessor directive, we can relax the language options check. These checks will be enforced in a later phase after preprocessing/scanning. This commit is a second attempt to fix llvm#88896. The first attempt to fix this is here: llvm#93753, which had to be reverted because of the discussions in that same PR.
+1; it's definitely not sound to go this approach, though it may work well enough in practice to be worth the tradeoff. That said, I think the behavior of the dependency scanner ignoring the language options used is actually kind of problematic, unless I'm misunderstanding something. This means command line options that impact dependency scanning may fail (for example, the user might enable SYCL or OpenMP on the command line and that may have different dependencies, trigraph support may be important for things like |
Ping @jansvoboda11 |
I'll defer this to @Bigcheese, I don't think we've hit any of these yet.
This layer of the dependency scanner only lexes the source into a list of tokens. Preprocessing and dependency discovery happen later and do respect the command-line options. Good point on trigraphs. I think that's similar to the digit separator issue: we can unconditionally enable this feature on this layer of the dependency scanner and let the build fail later during compilation itself. |
But the list of tokens depends on things like what features are enabled, right? e.g.,
Seems reasonable to me. |
I touched on this in #95798 (comment). We could represent these cases in the token list such that when the scanner feeds it to the |
This commit fixes #88896 by passing LangOpts from the CompilerInstance to
DependencyScanningWorker so that the original LangOpts are preserved/respected.
This makes for more accurate parsing/lexing when certain language versions or features specific to versions are to be used.