Skip to content

Lex string literal segments in the lexer and compose them in the parser #1250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 20, 2023

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Jan 19, 2023

Instead of parsing the entire string literal as a single string literal token, directly produce the lexemes that will become tokens in the parser.

@ahoppen ahoppen requested review from rintaro and CodaFi January 19, 2023 11:55
@ahoppen
Copy link
Member Author

ahoppen commented Jan 19, 2023

@swift-ci Please test

case UInt8(ascii: "\\"):
if self.isAtStringInterpolationAnchor(delimiterLength: delimiterLength) {
// Finish the current string segment. The next time
// `lexStringLiteralContents` is called, it will consume the backslash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lexInStringLiteral? lexStringLiteralContents() is removed in this PR.

private mutating func lexAfterBackslashOfStringInterpolation(stringLiteralKind: StringLiteralKind) -> Lexer.Result {
switch self.peek() {
case UInt8(ascii: "#"):
self.advance(while: { $0 == Unicode.Scalar("#") })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment that correctness of the number of # is guaranteed by isAtStringInterpolationAnchor() in lexInStringLiteral().

case .afterStringLiteral(isRawString: _):
result = lexAfterStringLiteral()
case .afterClosingStringQuote:
result = lexAfterClosingStringQuote()
case .afterBackslashOfStringInterpolation(stringLiteralKind: let stringLiteralKind):
result = lexAfterBackslashOfStringInterpolation(stringLiteralKind: stringLiteralKind)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "after" the backslash, I would make a state inStringInterpolationStart that lexes \, ##, and ( so we can remove the redundant isAtStringInterpolationAnchor() call in lexInStringLiteral()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍 Thanks.

} else if let backslash = self.consume(if: .backslash) {
let (unexpectedBeforeDelimiter, delimiter) = self.parseStringDelimiter(openDelimiter: openDelimiter)
let (unexpectedBeforeLeftParen, leftParen) = self.expect(.leftParen)
let expressions = RawTupleExprElementListSyntax(elements: self.parseArgumentListElements(pattern: .none), arena: self.arena)
Copy link
Member

@rintaro rintaro Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How newlines in the string interpolation are handled? e.g.

let a = "test \(label:
foo()

This should be missing expression after label:, missing ), and missing ". foo() should be a whole new CodeBlockItem syntax. But I think foo() is parsed as the expression to label:, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are leaving the inStringInterpolationState state when hitting a newline because of

https://github.com/apple/swift-syntax/pull/1250/files#diff-253b01bc981faa185173c1697784a41bf1b586e91ec7b4de806dfd2646db52c3R135

I added test additional test cases for this

// This allows us to skip over extraneous identifiers etc. in an unterminated string interpolation.
var unexpectedBeforeRightParen: [RawTokenSyntax] = []
var unexpectedProgress = LoopProgressCondition()
while !self.at(any: [.rightParen, .stringSegment, .backslash, openQuote.tokenKind, .eof]) && unexpectedProgress.evaluate(self.currentToken) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In single line string literals, this should stop at newline IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lexer automatically produces a stringSegment at the end of a single-line string literal, even if it is not terminated (see the new test testNewlineInInterpolationOfSingleLineString in LexerTest.swift), so we will stop at that.

@ahoppen
Copy link
Member Author

ahoppen commented Jan 20, 2023

@swift-ci Please test

@ahoppen ahoppen merged commit 6d7d66b into swiftlang:main Jan 20, 2023
@ahoppen ahoppen deleted the ahoppen/dont-relex-string-literals branch January 20, 2023 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants