-
Notifications
You must be signed in to change notification settings - Fork 441
Lex string literal segments in the lexer and compose them in the parser #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lex string literal segments in the lexer and compose them in the parser #1250
Conversation
Instead of parsing the entire string literal as a single string literal token, directly produce the lexemes that will become tokens in the parser.
@swift-ci Please test |
case UInt8(ascii: "\\"): | ||
if self.isAtStringInterpolationAnchor(delimiterLength: delimiterLength) { | ||
// Finish the current string segment. The next time | ||
// `lexStringLiteralContents` is called, it will consume the backslash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lexInStringLiteral
? lexStringLiteralContents()
is removed in this PR.
private mutating func lexAfterBackslashOfStringInterpolation(stringLiteralKind: StringLiteralKind) -> Lexer.Result { | ||
switch self.peek() { | ||
case UInt8(ascii: "#"): | ||
self.advance(while: { $0 == Unicode.Scalar("#") }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment that correctness of the number of #
is guaranteed by isAtStringInterpolationAnchor()
in lexInStringLiteral()
.
case .afterStringLiteral(isRawString: _): | ||
result = lexAfterStringLiteral() | ||
case .afterClosingStringQuote: | ||
result = lexAfterClosingStringQuote() | ||
case .afterBackslashOfStringInterpolation(stringLiteralKind: let stringLiteralKind): | ||
result = lexAfterBackslashOfStringInterpolation(stringLiteralKind: stringLiteralKind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "after" the backslash, I would make a state inStringInterpolationStart
that lexes \
, ##
, and (
so we can remove the redundant isAtStringInterpolationAnchor()
call in lexInStringLiteral()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍 Thanks.
} else if let backslash = self.consume(if: .backslash) { | ||
let (unexpectedBeforeDelimiter, delimiter) = self.parseStringDelimiter(openDelimiter: openDelimiter) | ||
let (unexpectedBeforeLeftParen, leftParen) = self.expect(.leftParen) | ||
let expressions = RawTupleExprElementListSyntax(elements: self.parseArgumentListElements(pattern: .none), arena: self.arena) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How newlines in the string interpolation are handled? e.g.
let a = "test \(label:
foo()
This should be missing expression after label:
, missing )
, and missing "
. foo()
should be a whole new CodeBlockItem
syntax. But I think foo()
is parsed as the expression to label:
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are leaving the inStringInterpolationState
state when hitting a newline because of
I added test additional test cases for this
// This allows us to skip over extraneous identifiers etc. in an unterminated string interpolation. | ||
var unexpectedBeforeRightParen: [RawTokenSyntax] = [] | ||
var unexpectedProgress = LoopProgressCondition() | ||
while !self.at(any: [.rightParen, .stringSegment, .backslash, openQuote.tokenKind, .eof]) && unexpectedProgress.evaluate(self.currentToken) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In single line string literals, this should stop at newline IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lexer automatically produces a stringSegment
at the end of a single-line string literal, even if it is not terminated (see the new test testNewlineInInterpolationOfSingleLineString
in LexerTest.swift), so we will stop at that.
@swift-ci Please test |
Instead of parsing the entire string literal as a single string literal token, directly produce the lexemes that will become tokens in the parser.