Skip to content

[Lexer] Improve lexing of BOM trivia #35917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion include/swift/Parse/Lexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,13 @@ class Lexer {
void lexOperatorIdentifier();
void lexHexNumber();
void lexNumber();
StringRef lexTrivia(bool IsForTrailingTrivia);

/// Skip over trivia and return characters that were skipped over in a \c
/// StringRef. \p AllTriviaStart determines the start of the trivia. In nearly
/// all cases, this should be \c CurPtr. If other trivia has already been
/// skipped over (like a BOM), this can be used to point to the start of the
/// BOM. The returned \c StringRef will always start at \p AllTriviaStart.
StringRef lexTrivia(bool IsForTrailingTrivia, const char *AllTriviaStart);
static unsigned lexUnicodeEscape(const char *&CurPtr, Lexer *Diags);

unsigned lexCharacter(const char *&CurPtr, char StopQuote,
Expand Down
13 changes: 5 additions & 8 deletions lib/Parse/Lexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ void Lexer::formToken(tok Kind, const char *TokStart) {
StringRef TokenText { TokStart, static_cast<size_t>(CurPtr - TokStart) };

if (TriviaRetention == TriviaRetentionMode::WithTrivia && Kind != tok::eof) {
TrailingTrivia = lexTrivia(/*IsForTrailingTrivia=*/true);
TrailingTrivia = lexTrivia(/*IsForTrailingTrivia=*/true, CurPtr);
} else {
TrailingTrivia = StringRef();
}
Expand Down Expand Up @@ -2344,11 +2344,8 @@ void Lexer::lexImpl() {
NextToken.setAtStartOfLine(false);
}

// Advance CurPtr to the end of the first trivia in the source file and form
// the leading trivia including the BOM
lexTrivia(/*IsForTrailingTrivia=*/false);
LeadingTrivia = StringRef(LeadingTriviaStart, CurPtr - LeadingTriviaStart);

LeadingTrivia = lexTrivia(/*IsForTrailingTrivia=*/false, LeadingTriviaStart);

// Remember the start of the token so we can form the text range.
const char *TokStart = CurPtr;

Expand Down Expand Up @@ -2525,8 +2522,8 @@ Token Lexer::getTokenAtLocation(const SourceManager &SM, SourceLoc Loc,
return L.peekNextToken();
}

StringRef Lexer::lexTrivia(bool IsForTrailingTrivia) {
const char *AllTriviaStart = CurPtr;
StringRef Lexer::lexTrivia(bool IsForTrailingTrivia,
const char *AllTriviaStart) {
CommentStart = nullptr;

Restart:
Expand Down