-
Notifications
You must be signed in to change notification settings - Fork 14.3k
clang: Relax LangOpts checks when lexing quoted numbers during preprocessing #95798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nishithshah2211
wants to merge
1
commit into
llvm:main
Choose a base branch
from
nishithshah2211:better-fix-gh-88896
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is
ParsingPreprocessorDirective
used here?I think I would prefer introducing new
LangOptions::AllowLiteralDigitSeparator
member that would be set to true wheneverCPlusPlus14 || C23
evaluates totrue
and that could be set totrue
explicitly from the scanner. WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain that's a pattern we should get into -- we don't want a
LangOption
per language feature shared between languages unless it's something the user can enable or disable themselves via the command line.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that
LangOptions
are options internal to the compiler and not necessarily exposed via-cc1
(let alone the driver). The tooling use case is relevant too, IMO.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not a one-to-one mapping of
LangOptions
to frontend of driver options, yes. But I think it's a stretch to argue that the standard mode tracked byLangOptions
is not exposed via flags.I don't think per-feature
LangOptions
is viable. There's digit separators, hex float support, binary literal support, different literal suffixes, various keywords, etc and that doesn't seem likely to scale well. Also, it makes it seem like we support a la carte language features when we don't -- we intentionally don't want to let users disable standard features (in general; exceptions exist) and so that approach gives the impression contributors need to check both language mode and feature availability which is a maintenance concern.I'm still wondering about the question I asked here: #93753 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. I'm suggesting that in the future, parts of the lexer behavior could be controlled in a way that's hidden from users. And I'm saying that we are already hiding existing
LangOptions
fields from users (e.g.LangOptions::CompilingPCH
).This doesn't need to scale. We'd only do this for options that control lexing. FWIW, code that says "literal digit separators are enabled in C++14+, C23+, or when explicitly allowed" seems more logical to me than checking
ParsingPreprocessorDirective
, which doesn't really explain why we're suddenly allowing digit separators.I think documenting the cases where we introduce this extra knob should pretty much clear up any confusion and prevent contributors cargo culting that approach to options that don't need it.
Are there any comments that weren't addressed by my reply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I still see value in adding orthogonal flags for small features. We don't have to expose them to users via command line flags, they can be purely an implementation detail for implementing the dependency scanner. My argument is still that most of the Clang codebase doesn't need to know about dependency scanning and that having one flag for the dependency scanner makes it way too easy fracture the codebase just for this one Clang-based tool.
Case in point, I recently made a series of changes to the PCM file format in order to remove some needless deserialization overhead the dependency scanner uncovered. If we had flag that signifies a "scanner mode", I would've special-cased my changes for that use case, since it'd be easier to implement and test. But by not having that flag (and by not wanting to introduce it), I was forced to figure out how to make the changes generic, and now Clang itself and all Clang-based tools benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but we don't have to expose that language extension to users, it can remain internal to the compiler.
The scanner works in two steps:
[if, ident, gte, ident, then, include, string, else, include, string, endif]
.LanguageOptions
and runs full preprocessor on those tokens. This means that the condition should evaluate correctly.Having (1) be context-free is important for performance.
On high level, the goal of #93753 is to respect
LanguageOptions
in the scanner. My argument is that we should be able to do so by representing tokens in the stream produced by (1) such that we're still able to emit context-aware,LanguageOptions
-driven diagnostics in (2).That solves the issue outlined by #93753 without impacting the performance.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that’s fine. What I meant was if we were to just use
ParsingPreprocessorDirective
, we would be exposing it to users, so that’s probably not the way to go imo.I see; in that case being more permissive in what we accept during step 1 makes perfect sense imo. I don’t think there are that many language features that affect this step either since that’s basically just preprocessor directives from what I gather, so adding language options for whatever features we need to enable for this seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still doesn't address my concerns from earlier about how it impacts maintenance of Clang itself:
How do we avoid this situation, as that's my primary concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I still favor a "DependencyScanning" option over feature specific flags, it seems much simpler to maintain and document in the long run, for exactly the same reason as @AaronBallman