Produce separate tokens for raw string delimiters and string quotes in the lexer #1192

ahoppen · 2023-01-05T16:48:02Z

The eventual goal of this change is that we no longer need to re-lex string literals from the parser to separate them into their components. Instead, the lexer should just produce the lexemes that will later be put into the syntax tree as tokens.

The downside of this is that the lexer now needs to carry state and know whether it is lexing a string literal. On the upside, the string literal parser could be significantly simplified and the diagnostics got better without any further changes.

CodaFi · 2023-01-05T17:41:48Z

Sources/SwiftParser/Lexer.swift

          continue
        } else {
          //          diagnose(TokStart, diag::lex_unterminated_string)
-          return (.unknown, [])
+          return LexerResult(.stringLiteralContents, newMode: .normal)


This change is dangerous without possibly corresponding changes to string literal contents parsing. We need to make sure that its scanning and indexing logic can tolerate an early eof and strings like "\(xyz \v"

It seems to be working… At least the existing test cases didn't catch anything. I haven't banged it harder than that though

bnbarham · 2023-01-10T00:58:19Z

Sources/SwiftParser/Lexer.swift

@@ -456,7 +493,7 @@ extension Lexer.Cursor {
      return isDelimeter
    })

-    guard clone.advance(matching: UInt8(ascii: #"""#)) != nil else {
+    guard !clone.isAtEndOfFile && clone.peek() == UInt8(ascii: #"""#) else {


Probably not for this PR, but it'd be nice if we had a safePeek or something similar. Soooo many isAtEndOfFile && peek checks

bnbarham · 2023-01-10T00:59:15Z

Sources/SwiftParser/Lexer.swift

@@ -492,6 +529,58 @@ extension Lexer.Cursor {
    return false
  }

+  mutating func lexStringQuote() -> LexerResult {
+    func newMode(currentMode: LexerCursorMode, kind: StringLiteralKind) -> LexerCursorMode {


I personally prefer nextState/state over mode, but I don't really care. state just makes it super clear that it's a state machine :P.

Good idea 👍 I renamed it to state.

bnbarham · 2023-01-10T01:10:39Z

Tests/SwiftParserTest/translated/PoundAssertTests.swift

+        DiagnosticSpec(message: "expected string literal in '#assert' directive"),
+        DiagnosticSpec(message: "unexpected code '123' in '#assert' directive"),


Two diagnostics here is a little weird. Maybe we should just mark the next unexpected after a missing token and then have the fix-it remove the unexpected? I haven't checked whether that would make other cases worse though.

Tests/SwiftParserTest/translated/RawStringErrorsTests.swift

bnbarham · 2023-01-10T01:12:55Z

Tests/SwiftParserTest/translated/RawStringErrorsTests.swift

@@ -73,16 +67,13 @@ final class RawStringErrorsTests: XCTestCase {
  func testRawStringErrors5() {
    AssertParse(
      #####"""
-      let _ = ###"invalid"###1️⃣#2️⃣#3️⃣#4️⃣
+      let _ = ###"invalid"###1️⃣###


This is a duplicate of 3 isn't it?

bnbarham · 2023-01-10T01:26:17Z

Tests/SwiftParserTest/translated/StringLiteralEofTests.swift

+        DiagnosticSpec(locationMarker: "1️⃣", message: #"expected '"' to end string literal"#),
+        DiagnosticSpec(locationMarker: "1️⃣", message: "unexpected code in string literal"),
+        DiagnosticSpec(locationMarker: "2️⃣", message: "expected ')' in string literal"),
+        DiagnosticSpec(locationMarker: "3️⃣", message: #"expected '"""' to end string literal"#),


Bit of a weird case I suppose, but it's a little strange that we don't treat this as bar being part of the string within the interpolation. Though having a string within an interpolation seems fairly unusual anyway, so... maybe doesn't actually matter.

I think ideally we'd have just two diagnostics: one for the missing ") (or two, whatever) and another for the missing """.

bnbarham · 2023-01-10T01:28:35Z

Tests/SwiftParserTest/translated/UnclosedStringInterpolationTests.swift

+        DiagnosticSpec(locationMarker: "1️⃣", message: #"unexpected code '"' in string literal"#),
+        DiagnosticSpec(locationMarker: "2️⃣", message: "expected ')' in string literal"),
+        DiagnosticSpec(locationMarker: "2️⃣", message: #"expected '"' to end string literal"#),


It'd be nice to handle this one better, ie. don't eat the " as unexpected. That makes 7 worse, but IMO this would be by far the more common case.

bnbarham · 2023-01-10T01:29:13Z

Tests/SwiftParserTest/translated/UnclosedStringInterpolationTests.swift

+        DiagnosticSpec(message: #"expected '"' to end string literal"#),
+        DiagnosticSpec(message: "expected ')' in string literal"),
+        DiagnosticSpec(message: #"expected '"' to end string literal"#),


Same as 2, though honestly I'm not sure it's worth keeping both. That's true for most of the testUncloseStringInterpolation* TBH.

ahoppen · 2023-01-11T18:52:21Z

@swift-ci Please test

ahoppen · 2023-01-12T16:18:53Z

@swift-ci Please test

ahoppen · 2023-01-13T14:32:47Z

@swift-ci Please test

ahoppen · 2023-01-14T07:36:33Z

@swift-ci Please test

ahoppen · 2023-01-14T08:30:22Z

@swift-ci Please test macOS

ahoppen · 2023-01-14T11:57:27Z

@swift-ci Please test

…n the lexer The eventual goal of this change is that we no longer need to re-lex string literals from the parser to separate them into their components. Instead, the lexer should just produce the lexemes that will later be put into the syntax tree as tokens. The downside of this is that the lexer now needs to carry state and know whether it is lexing a string literal. On the upside, the string literal parser could be significantly simplified and the diagnostics got better without any further changes.

…(' when looking for the closing ')'

ahoppen · 2023-01-16T08:44:27Z

@swift-ci Please test

ahoppen requested review from rintaro, DougGregor, bnbarham and CodaFi January 5, 2023 16:48

CodaFi reviewed Jan 5, 2023

View reviewed changes

bnbarham reviewed Jan 10, 2023

View reviewed changes

ahoppen changed the title ~~Produce separate tokens for raw string delimiters and string quotes in the lexer~~ Produce separate tokens for raw string delimiters and string quotes in the lexer 🚥 #1191 Jan 10, 2023

ahoppen force-pushed the ahoppen/lex-string-delimiteres branch 2 times, most recently from 5735bd2 to 5a6c6bd Compare January 11, 2023 18:51

ahoppen changed the title ~~Produce separate tokens for raw string delimiters and string quotes in the lexer 🚥 #1191~~ Produce separate tokens for raw string delimiters and string quotes in the lexer Jan 11, 2023

ahoppen changed the title ~~Produce separate tokens for raw string delimiters and string quotes in the lexer~~ Produce separate tokens for raw string delimiters and string quotes in the lexer 🚥 #1176 Jan 12, 2023

ahoppen force-pushed the ahoppen/lex-string-delimiteres branch from 5a6c6bd to df7fad7 Compare January 12, 2023 13:42

ahoppen changed the title ~~Produce separate tokens for raw string delimiters and string quotes in the lexer 🚥 #1176~~ Produce separate tokens for raw string delimiters and string quotes in the lexer Jan 12, 2023

ahoppen force-pushed the ahoppen/lex-string-delimiteres branch from df7fad7 to f126e6c Compare January 12, 2023 16:18

ahoppen force-pushed the ahoppen/lex-string-delimiteres branch 2 times, most recently from cdf6e52 to ecf42cf Compare January 13, 2023 14:32

ahoppen mentioned this pull request Jan 13, 2023

Refactor the lexer to make it easier to understand, maintain and more swifty #1227

Merged

ahoppen force-pushed the ahoppen/lex-string-delimiteres branch from ecf42cf to 4530a3c Compare January 13, 2023 18:01

bnbarham approved these changes Jan 13, 2023

View reviewed changes

rintaro self-assigned this Jan 13, 2023

ahoppen force-pushed the ahoppen/lex-string-delimiteres branch 2 times, most recently from b853853 to df7c412 Compare January 14, 2023 11:57

ahoppen mentioned this pull request Jan 16, 2023

[WIP] Support parsing @_package attribute #1233

Draft

Import raw_string.swift test case from the compiler test suite

9345009

ahoppen added 2 commits January 16, 2023 09:44

When eating unexpected tokens in string interpolation, match opened '…

41807be

…(' when looking for the closing ')'

ahoppen force-pushed the ahoppen/lex-string-delimiteres branch from df7c412 to 41807be Compare January 16, 2023 08:44

ahoppen merged commit b931534 into swiftlang:main Jan 16, 2023

ahoppen deleted the ahoppen/lex-string-delimiteres branch January 16, 2023 11:18

ahoppen mentioned this pull request Jan 26, 2023

Unterminated string literal causes mis-parse of subsequent nodes #778

Closed

		DiagnosticSpec(message: "expected string literal in '#assert' directive"),
		DiagnosticSpec(message: "unexpected code '123' in '#assert' directive"),

Produce separate tokens for raw string delimiters and string quotes in the lexer #1192

Produce separate tokens for raw string delimiters and string quotes in the lexer #1192

Uh oh!

Conversation

ahoppen commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnbarham Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahoppen commented Jan 11, 2023

Uh oh!

ahoppen commented Jan 12, 2023

Uh oh!

ahoppen commented Jan 13, 2023

Uh oh!

ahoppen commented Jan 14, 2023

Uh oh!

ahoppen commented Jan 14, 2023

Uh oh!

ahoppen commented Jan 14, 2023

Uh oh!

ahoppen commented Jan 16, 2023

Uh oh!

Uh oh!

ahoppen commented Jan 5, 2023 •

edited

Loading

bnbarham Jan 10, 2023 •

edited

Loading