-
Notifications
You must be signed in to change notification settings - Fork 441
Produce separate tokens for raw string delimiters and string quotes in the lexer #1192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce separate tokens for raw string delimiters and string quotes in the lexer #1192
Conversation
Sources/SwiftParser/Lexer.swift
Outdated
continue | ||
} else { | ||
// diagnose(TokStart, diag::lex_unterminated_string) | ||
return (.unknown, []) | ||
return LexerResult(.stringLiteralContents, newMode: .normal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is dangerous without possibly corresponding changes to string literal contents parsing. We need to make sure that its scanning and indexing logic can tolerate an early eof and strings like "\(xyz \v"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be working… At least the existing test cases didn't catch anything. I haven't banged it harder than that though
@@ -456,7 +493,7 @@ extension Lexer.Cursor { | |||
return isDelimeter | |||
}) | |||
|
|||
guard clone.advance(matching: UInt8(ascii: #"""#)) != nil else { | |||
guard !clone.isAtEndOfFile && clone.peek() == UInt8(ascii: #"""#) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not for this PR, but it'd be nice if we had a safePeek
or something similar. Soooo many isAtEndOfFile && peek
checks
Sources/SwiftParser/Lexer.swift
Outdated
@@ -492,6 +529,58 @@ extension Lexer.Cursor { | |||
return false | |||
} | |||
|
|||
mutating func lexStringQuote() -> LexerResult { | |||
func newMode(currentMode: LexerCursorMode, kind: StringLiteralKind) -> LexerCursorMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer nextState
/state
over mode
, but I don't really care. state
just makes it super clear that it's a state machine :P.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍 I renamed it to state
.
DiagnosticSpec(message: "expected string literal in '#assert' directive"), | ||
DiagnosticSpec(message: "unexpected code '123' in '#assert' directive"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two diagnostics here is a little weird. Maybe we should just mark the next unexpected after a missing token and then have the fix-it remove the unexpected? I haven't checked whether that would make other cases worse though.
@@ -73,16 +67,13 @@ final class RawStringErrorsTests: XCTestCase { | |||
func testRawStringErrors5() { | |||
AssertParse( | |||
#####""" | |||
let _ = ###"invalid"###1️⃣#2️⃣#3️⃣#4️⃣ | |||
let _ = ###"invalid"###1️⃣### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a duplicate of 3 isn't it?
DiagnosticSpec(locationMarker: "1️⃣", message: #"expected '"' to end string literal"#), | ||
DiagnosticSpec(locationMarker: "1️⃣", message: "unexpected code in string literal"), | ||
DiagnosticSpec(locationMarker: "2️⃣", message: "expected ')' in string literal"), | ||
DiagnosticSpec(locationMarker: "3️⃣", message: #"expected '"""' to end string literal"#), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a weird case I suppose, but it's a little strange that we don't treat this as bar
being part of the string within the interpolation. Though having a string within an interpolation seems fairly unusual anyway, so... maybe doesn't actually matter.
I think ideally we'd have just two diagnostics: one for the missing ")
(or two, whatever) and another for the missing """
.
DiagnosticSpec(locationMarker: "1️⃣", message: #"unexpected code '"' in string literal"#), | ||
DiagnosticSpec(locationMarker: "2️⃣", message: "expected ')' in string literal"), | ||
DiagnosticSpec(locationMarker: "2️⃣", message: #"expected '"' to end string literal"#), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to handle this one better, ie. don't eat the "
as unexpected. That makes 7 worse, but IMO this would be by far the more common case.
DiagnosticSpec(message: #"expected '"' to end string literal"#), | ||
DiagnosticSpec(message: "expected ')' in string literal"), | ||
DiagnosticSpec(message: #"expected '"' to end string literal"#), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as 2, though honestly I'm not sure it's worth keeping both. That's true for most of the testUncloseStringInterpolation*
TBH.
5735bd2
to
5a6c6bd
Compare
@swift-ci Please test |
5a6c6bd
to
df7fad7
Compare
df7fad7
to
f126e6c
Compare
@swift-ci Please test |
cdf6e52
to
ecf42cf
Compare
@swift-ci Please test |
ecf42cf
to
4530a3c
Compare
@swift-ci Please test |
@swift-ci Please test macOS |
b853853
to
df7c412
Compare
@swift-ci Please test |
…n the lexer The eventual goal of this change is that we no longer need to re-lex string literals from the parser to separate them into their components. Instead, the lexer should just produce the lexemes that will later be put into the syntax tree as tokens. The downside of this is that the lexer now needs to carry state and know whether it is lexing a string literal. On the upside, the string literal parser could be significantly simplified and the diagnostics got better without any further changes.
…(' when looking for the closing ')'
df7c412
to
41807be
Compare
@swift-ci Please test |
The eventual goal of this change is that we no longer need to re-lex string literals from the parser to separate them into their components. Instead, the lexer should just produce the lexemes that will later be put into the syntax tree as tokens.
The downside of this is that the lexer now needs to carry state and know whether it is lexing a string literal. On the upside, the string literal parser could be significantly simplified and the diagnostics got better without any further changes.