Lex string literal segments in the lexer and compose them in the parser #1250

ahoppen · 2023-01-19T11:55:38Z

Instead of parsing the entire string literal as a single string literal token, directly produce the lexemes that will become tokens in the parser.

ahoppen · 2023-01-19T11:56:40Z

@swift-ci Please test

rintaro · 2023-01-19T19:23:25Z

Sources/SwiftParser/Lexer/Cursor.swift

+      case UInt8(ascii: "\\"):
+        if self.isAtStringInterpolationAnchor(delimiterLength: delimiterLength) {
+          // Finish the current string segment. The next time
+          // `lexStringLiteralContents` is called, it will consume the backslash


lexInStringLiteral? lexStringLiteralContents() is removed in this PR.

rintaro · 2023-01-19T19:30:37Z

Sources/SwiftParser/Lexer/Cursor.swift

+  private mutating func lexAfterBackslashOfStringInterpolation(stringLiteralKind: StringLiteralKind) -> Lexer.Result {
+    switch self.peek() {
+    case UInt8(ascii: "#"):
+      self.advance(while: { $0 == Unicode.Scalar("#") })


Could you add a comment that correctness of the number of # is guaranteed by isAtStringInterpolationAnchor() in lexInStringLiteral().

rintaro · 2023-01-19T19:34:58Z

Sources/SwiftParser/Lexer/Cursor.swift

    case .afterStringLiteral(isRawString: _):
      result = lexAfterStringLiteral()
    case .afterClosingStringQuote:
      result = lexAfterClosingStringQuote()
+    case .afterBackslashOfStringInterpolation(stringLiteralKind: let stringLiteralKind):
+      result = lexAfterBackslashOfStringInterpolation(stringLiteralKind: stringLiteralKind)


Instead of "after" the backslash, I would make a state inStringInterpolationStart that lexes \, ##, and ( so we can remove the redundant isAtStringInterpolationAnchor() call in lexInStringLiteral()

Good idea 👍 Thanks.

rintaro · 2023-01-19T19:58:27Z

Sources/SwiftParser/StringLiterals.swift

+      } else if let backslash = self.consume(if: .backslash) {
+        let (unexpectedBeforeDelimiter, delimiter) = self.parseStringDelimiter(openDelimiter: openDelimiter)
+        let (unexpectedBeforeLeftParen, leftParen) = self.expect(.leftParen)
+        let expressions = RawTupleExprElementListSyntax(elements: self.parseArgumentListElements(pattern: .none), arena: self.arena)


How newlines in the string interpolation are handled? e.g.

let a = "test \(label: foo()

This should be missing expression after label:, missing ), and missing ". foo() should be a whole new CodeBlockItem syntax. But I think foo() is parsed as the expression to label:, no?

We are leaving the inStringInterpolationState state when hitting a newline because of

https://github.com/apple/swift-syntax/pull/1250/files#diff-253b01bc981faa185173c1697784a41bf1b586e91ec7b4de806dfd2646db52c3R135

I added test additional test cases for this

rintaro · 2023-01-19T19:59:31Z

Sources/SwiftParser/StringLiterals.swift

+        // This allows us to skip over extraneous identifiers etc. in an unterminated string interpolation.
+        var unexpectedBeforeRightParen: [RawTokenSyntax] = []
+        var unexpectedProgress = LoopProgressCondition()
+        while !self.at(any: [.rightParen, .stringSegment, .backslash, openQuote.tokenKind, .eof]) && unexpectedProgress.evaluate(self.currentToken) {


In single line string literals, this should stop at newline IMO.

The lexer automatically produces a stringSegment at the end of a single-line string literal, even if it is not terminated (see the new test testNewlineInInterpolationOfSingleLineString in LexerTest.swift), so we will stop at that.

ahoppen · 2023-01-20T13:33:45Z

@swift-ci Please test

ahoppen added 4 commits January 19, 2023 12:53

Model state of the lexer as a stack

3b652ad

Lex string literal segments in the lexer and compose them in the parser

82c0a48

Instead of parsing the entire string literal as a single string literal token, directly produce the lexemes that will become tokens in the parser.

Add parser knowledge to kick lexer out of string interpolation mode

e34473f

Merge lexInStringLiteral and lexStringLiteralContents

8d5f89e

ahoppen requested review from rintaro and CodaFi January 19, 2023 11:55

rintaro reviewed Jan 19, 2023

View reviewed changes

Address review comments

0e44ec7

ahoppen merged commit 6d7d66b into swiftlang:main Jan 20, 2023

ahoppen deleted the ahoppen/dont-relex-string-literals branch January 20, 2023 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lex string literal segments in the lexer and compose them in the parser #1250

Lex string literal segments in the lexer and compose them in the parser #1250

Uh oh!

ahoppen commented Jan 19, 2023

Uh oh!

ahoppen commented Jan 19, 2023

Uh oh!

rintaro Jan 19, 2023

Uh oh!

rintaro Jan 19, 2023

Uh oh!

rintaro Jan 19, 2023

Uh oh!

ahoppen Jan 20, 2023

Uh oh!

rintaro Jan 19, 2023 •

edited

Loading

Uh oh!

ahoppen Jan 20, 2023

Uh oh!

rintaro Jan 19, 2023

Uh oh!

ahoppen Jan 20, 2023

Uh oh!

ahoppen commented Jan 20, 2023

Uh oh!

Uh oh!

Lex string literal segments in the lexer and compose them in the parser #1250

Lex string literal segments in the lexer and compose them in the parser #1250

Uh oh!

Conversation

ahoppen commented Jan 19, 2023

Uh oh!

ahoppen commented Jan 19, 2023

Uh oh!

rintaro Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

rintaro Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

rintaro Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

ahoppen Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

rintaro Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahoppen Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

rintaro Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

ahoppen Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

ahoppen commented Jan 20, 2023

Uh oh!

Uh oh!

rintaro Jan 19, 2023 •

edited

Loading