-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token without side-effects #143898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: yronglin <[email protected]>
@llvm/pr-subscribers-clang Author: None (yronglin) ChangesThis PR introduce a new function It's also the 1st part of #107168 and it was used to look ahead next token then determine whether current lexing pp directive is one of pp-import or pp-module directive. At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:
Full diff: https://github.com/llvm/llvm-project/pull/143898.diff 6 Files Affected:
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index bb65ae010cffa..a595cda1eaa77 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
//===--------------------------------------------------------------------===//
// Context that changes as the file is lexed.
// NOTE: any state that mutates when in raw mode must have save/restore code
- // in Lexer::isNextPPTokenLParen.
+ // in Lexer::peekNextPPToken.
// BufferPtr - Current pointer into the buffer. This is the next character
// to be lexed.
@@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
BufferPtr = TokEnd;
}
- /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
- /// tok::l_paren token, 0 if it is something else and 2 if there are no more
- /// tokens in the buffer controlled by this lexer.
- unsigned isNextPPTokenLParen();
+ /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+ /// buffer controlled by this lexer, otherwise return the next unexpanded
+ /// token.
+ std::optional<Token> peekNextPPToken();
//===--------------------------------------------------------------------===//
// Lexer character reading interfaces.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index 78be2bd64d61c..4fe7a79393afc 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -2288,7 +2288,9 @@ class Preprocessor {
/// Determine whether the next preprocessor token to be
/// lexed is a '('. If so, consume the token and return true, if not, this
/// method should have no observable side-effect on the lexed tokens.
- bool isNextPPTokenLParen();
+ bool isNextPPTokenLParen() {
+ return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
+ }
private:
/// Identifiers used for SEH handling in Borland. These are only
@@ -2667,6 +2669,10 @@ class Preprocessor {
void removeCachedMacroExpandedTokensOfLastLexer();
+ /// Peek the next token. If so, return the token, if not, this
+ /// method should have no observable side-effect on the lexed tokens.
+ std::optional<Token> peekNextPPToken();
+
/// After reading "MACRO(", this method is invoked to read all of the formal
/// arguments specified for the macro invocation. Returns null on error.
MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 4d229ae610674..777b4e6266c71 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -139,10 +139,9 @@ class TokenLexer {
void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
bool OwnsTokens, bool IsReinject);
- /// If the next token lexed will pop this macro off the
- /// expansion stack, return 2. If the next unexpanded token is a '(', return
- /// 1, otherwise return 0.
- unsigned isNextTokenLParen() const;
+ /// If the next token lexed will pop this macro off the expansion stack,
+ /// return std::nullopt, otherwise return the next unexpanded token.
+ std::optional<Token> peekNextPPToken() const;
/// Lex and return a token from this macro stream.
bool Lex(Token &Tok);
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 93200458f04b4..8e977eac8c983 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -3200,18 +3200,19 @@ bool Lexer::LexEndOfFile(Token &Result, const char *CurPtr) {
return PP->HandleEndOfFile(Result, isPragmaLexer());
}
-/// isNextPPTokenLParen - Return 1 if the next unexpanded token lexed from
-/// the specified lexer will return a tok::l_paren token, 0 if it is something
-/// else and 2 if there are no more tokens in the buffer controlled by the
-/// lexer.
-unsigned Lexer::isNextPPTokenLParen() {
+/// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+/// buffer controlled by this lexer, otherwise return the next unexpanded
+/// token.
+std::optional<Token> Lexer::peekNextPPToken() {
assert(!LexingRawMode && "How can we expand a macro from a skipping buffer?");
if (isDependencyDirectivesLexer()) {
if (NextDepDirectiveTokenIndex == DepDirectives.front().Tokens.size())
- return 2;
- return DepDirectives.front().Tokens[NextDepDirectiveTokenIndex].is(
- tok::l_paren);
+ return std::nullopt;
+ Token Result;
+ (void)convertDependencyDirectiveToken(
+ DepDirectives.front().Tokens[NextDepDirectiveTokenIndex], Result);
+ return Result;
}
// Switch to 'skipping' mode. This will ensure that we can lex a token
@@ -3240,8 +3241,8 @@ unsigned Lexer::isNextPPTokenLParen() {
LexingRawMode = false;
if (Tok.is(tok::eof))
- return 2;
- return Tok.is(tok::l_paren);
+ return std::nullopt;
+ return Tok;
}
/// Find the end of a version control conflict marker.
diff --git a/clang/lib/Lex/PPMacroExpansion.cpp b/clang/lib/Lex/PPMacroExpansion.cpp
index 37ac1bf07e9c0..585990f60c98a 100644
--- a/clang/lib/Lex/PPMacroExpansion.cpp
+++ b/clang/lib/Lex/PPMacroExpansion.cpp
@@ -418,42 +418,40 @@ static bool isTrivialSingleTokenExpansion(const MacroInfo *MI,
return !llvm::is_contained(MI->params(), II);
}
-/// isNextPPTokenLParen - Determine whether the next preprocessor token to be
-/// lexed is a '('. If so, consume the token and return true, if not, this
-/// method should have no observable side-effect on the lexed tokens.
-bool Preprocessor::isNextPPTokenLParen() {
+/// isNextPPTokenLParen - Peek the next token. If so, return the token, if not,
+/// this method should have no observable side-effect on the lexed tokens.
+std::optional<Token> Preprocessor::peekNextPPToken() {
// Do some quick tests for rejection cases.
- unsigned Val;
+ std::optional<Token> Val;
if (CurLexer)
- Val = CurLexer->isNextPPTokenLParen();
+ Val = CurLexer->peekNextPPToken();
else
- Val = CurTokenLexer->isNextTokenLParen();
+ Val = CurTokenLexer->peekNextPPToken();
- if (Val == 2) {
+ if (!Val) {
// We have run off the end. If it's a source file we don't
// examine enclosing ones (C99 5.1.1.2p4). Otherwise walk up the
// macro stack.
if (CurPPLexer)
- return false;
+ return std::nullopt;
for (const IncludeStackInfo &Entry : llvm::reverse(IncludeMacroStack)) {
if (Entry.TheLexer)
- Val = Entry.TheLexer->isNextPPTokenLParen();
+ Val = Entry.TheLexer->peekNextPPToken();
else
- Val = Entry.TheTokenLexer->isNextTokenLParen();
+ Val = Entry.TheTokenLexer->peekNextPPToken();
- if (Val != 2)
+ if (Val)
break;
// Ran off the end of a source file?
if (Entry.ThePPLexer)
- return false;
+ return std::nullopt;
}
}
- // Okay, if we know that the token is a '(', lex it and return. Otherwise we
- // have found something that isn't a '(' or we found the end of the
- // translation unit. In either case, return false.
- return Val == 1;
+ // Okay, we found the token and return. Otherwise we found the end of the
+ // translation unit.
+ return Val;
}
/// HandleMacroExpandedIdentifier - If an identifier token is read that is to be
diff --git a/clang/lib/Lex/TokenLexer.cpp b/clang/lib/Lex/TokenLexer.cpp
index 6e93416e01c0c..fbb8c4262d6da 100644
--- a/clang/lib/Lex/TokenLexer.cpp
+++ b/clang/lib/Lex/TokenLexer.cpp
@@ -921,13 +921,13 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
}
/// isNextTokenLParen - If the next token lexed will pop this macro off the
-/// expansion stack, return 2. If the next unexpanded token is a '(', return
-/// 1, otherwise return 0.
-unsigned TokenLexer::isNextTokenLParen() const {
+/// expansion stack, return std::nullopt, otherwise return the next unexpanded
+/// token.
+std::optional<Token> TokenLexer::peekNextPPToken() const {
// Out of tokens?
if (isAtEnd())
- return 2;
- return Tokens[CurTokenIdx].is(tok::l_paren);
+ return std::nullopt;
+ return Tokens[CurTokenIdx];
}
/// isParsingPreprocessorDirective - Return true if we are in the middle of a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like @AaronBallman and @Bigcheese to look at this but it looks reasonable
@@ -2288,7 +2288,9 @@ class Preprocessor { | |||
/// Determine whether the next preprocessor token to be | |||
/// lexed is a '('. If so, consume the token and return true, if not, this | |||
/// method should have no observable side-effect on the lexed tokens. | |||
bool isNextPPTokenLParen(); | |||
bool isNextPPTokenLParen() { | |||
return peekNextPPToken().value_or(Token{}).is(tok::l_paren); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we are creating a new token everytime - I'm not sure that it matters though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lexer is almost always on the hot path but Token
s are small... I think it may be worth verifying that this has no performance difference from an approach without value_or
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use an if-stmt to avoid create a dummy Token{}. WDYT? (I'm interested in this performance overhead, give me some time to test it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking of as an alternative, but Erich's suggestion also seems like a reasonable idea worth exploring. It's a bummer to go back to foo == tok::blah || foo == tok::yada
instead of using IsOneOf()
, but if that gets too frustrating, we could add a helper function like isTokenKindOneOf()
that takes a pack of token kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional alternative, is to just have
template<TokenKind TK1, TokenKind...TKs>
isNextPPTokenOneOf();
Then just ask the question we REALLY care about.
So uses of THIS function become:
isNextPPTokenOneOf<tok::l_paren>()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! This looks good to me, let me give a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find myself wondering based on the patch: DO we actually care about the token itself, or just the kind? Could we instead have peekNextPPToken
instead be peekNextPPTokenKind
and just return the next token kind?
That way we don't need the optional, and can just do a direct comparison. We could also just return the eof
as well rather than treating that as a nullopt.
WDYT?
Thanks a lot for the review!
Can we use an if-stmt to avoid create a dummy
Yes, currently we just care about token kind. The reason for using auto NextTok = peekNextPPToken().value_or(Token{});
CurPPLexer->ParsingPreprocessorDirective = SavedParsingPreprocessorDirective;
if (Result.getIdentifierInfo()->isModulesImport() &&
NextTok.isOneOf(tok::raw_identifier, tok::less, tok::string_literal,
tok::colon)) {
Result.setKind(tok::kw_import);
ModuleImportLoc = Result.getLocation();
IsAtImport = false;
return true;
}
if (Result.getIdentifierInfo()->isModulesDeclaration() &&
NextTok.isOneOf(tok::raw_identifier, tok::colon, tok::semi)) {
Result.setKind(tok::kw_module);
ModuleDeclLoc = Result.getLocation();
return true;
}
// Ok, it's an identifier.
return false;
The original |
Signed-off-by: yronglin <[email protected]>
Do we also need to update
We depent std::nullopt to walk up in macro stack in |
friendly ping~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Many thanks for your review! |
This PR introduce a new function
peekNextPPToken
. It's an extension ofisNextPPTokenLParen
and can makes look ahead one token in preprocessor without side-effects.It's also the 1st part of #107168 and it was used to look ahead next token then determine whether current lexing pp directive is one of pp-import or pp-module directive.
At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:
Otherwise the token is treated as an identifier.