Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler #1191

ahoppen · 2023-01-05T13:01:40Z

Instead of marking a token as having an error and relying on re-lexing to emit a diagnostic, store the type of error and the offset at which it occurred in the token itself.

This is a more sound design IMO and will continue to work even if we introduce state into the lexer (e.g. whether we are currently in a string literal or inside a conflict marker range).

ahoppen · 2023-01-05T13:01:51Z

@swift-ci Please test

CodaFi · 2023-01-05T20:17:29Z

Sources/SwiftParser/Lexer.swift

      self.advance(while: { $0.isValidIdentifierContinuationCodePoint })
-      return (.integerLiteral, [.isErroneous])
+      return LexerResult(.integerLiteral)


This appears to accept the literal 0x

This one is still an issue too

Since this is not a regression, I just opened #1214 so we can fix it in a follow-up PR.

Sources/SwiftParser/Lexer.swift

bnbarham · 2023-01-10T00:16:54Z

Sources/SwiftSyntax/LexerError.swift

+/// If the token has a lexical error, this defines the type of the error.
+/// `lexerErrorOffset` in the token will specify at which offset the error
+/// occurred.
+public enum LexerError {


Any thoughts on something like:

public struct LexerError { public enum Kind { case ... } public let kind: Kind public let byteOffset: UInt16 // No need for alias, it's only defined in one place now public init(...) { ... } }

And then just having the one public var lexerError: LexerError field and a

public var hasLexerError: Bool { return lexerError != nil }

?

This would also simplify all the comments that mention "if there's an error, lexerErrorOffset will specify the offset".

As an aside, I'd personally be fine with UInt8 as the offset type. Do people really have more than 255 characters for a token?

That’s a much better idea. No idea why I modeled it the way I did…

As an aside, I'd personally be fine with UInt8 as the offset type. Do people really have more than 255 characters for a token?

String literals are currently single tokens and those can reasonably be bigger than 255 characters, I think.

String literals are currently single tokens and those can reasonably be bigger than 255 characters, I think.

Oh, unfortunate.

bnbarham · 2023-01-10T17:37:43Z

Sources/SwiftSyntax/SyntaxOtherNodes.swift

@@ -169,6 +169,11 @@ public struct TokenSyntax: SyntaxProtocol, SyntaxHashable {
  public func childNameForDiagnostics(_ index: SyntaxChildrenIndex) -> String? {
    return nil
  }
+
+  /// If the token has a lexial error, the type of the error.


Suggested change

/// If the token has a lexial error, the type of the error.

/// If the token has a lexical error, the type of the error.

bnbarham · 2023-01-10T17:38:18Z

Sources/SwiftSyntax/Raw/RawSyntax.swift

-      if tokenView.hasLexerError || tokenView.presence == .missing {
+      if tokenView.lexerError != nil || tokenView.presence == .missing {


IMO it wouldn't hurt to add a hasLexerError, but up to you.

I prefer to remove it.

ahoppen · 2023-01-10T19:47:58Z

@swift-ci Please test

…ing a diagnosticsHandler Instead of marking a token as having an error and relying on re-lexing to emit a diagnostic, store the type of error and the offset at which it occurred in the token itself. This is a more sound design IMO and will continue to work even if we introduce state into the lexer (e.g. whether we are currently in a string literal or inside a conflict marker range).

ahoppen · 2023-01-10T21:15:09Z

@swift-ci Please test

ahoppen requested review from rintaro, DougGregor, bnbarham and CodaFi January 5, 2023 13:01

CodaFi reviewed Jan 5, 2023

View reviewed changes

Sources/SwiftParser/Lexer.swift Outdated Show resolved Hide resolved

bnbarham reviewed Jan 10, 2023

View reviewed changes

ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from 80cd97b to f80753d Compare January 10, 2023 13:29

ahoppen mentioned this pull request Jan 10, 2023

Produce separate tokens for raw string delimiters and string quotes in the lexer #1192

Merged

bnbarham reviewed Jan 10, 2023

View reviewed changes

ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from f80753d to 58554b0 Compare January 10, 2023 17:55

ahoppen changed the title ~~Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler~~ Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1191 Jan 10, 2023

ahoppen changed the title ~~Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1191~~ Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1213 Jan 10, 2023

bnbarham approved these changes Jan 10, 2023

View reviewed changes

ahoppen changed the title ~~Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1213~~ Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler Jan 10, 2023

ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from 58554b0 to 13e60bb Compare January 10, 2023 19:47

ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from 13e60bb to b72c418 Compare January 10, 2023 21:14

ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from b72c418 to 8345eac Compare January 10, 2023 21:15

ahoppen merged commit f5f6ae2 into swiftlang:main Jan 11, 2023

ahoppen deleted the ahoppen/lexer-errors-in-token branch January 11, 2023 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler #1191

Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler #1191

Uh oh!

ahoppen commented Jan 5, 2023 •

edited

Loading

Uh oh!

ahoppen commented Jan 5, 2023

Uh oh!

CodaFi Jan 5, 2023

Uh oh!

bnbarham Jan 10, 2023

Uh oh!

ahoppen Jan 10, 2023

Uh oh!

Uh oh!

bnbarham Jan 10, 2023

Uh oh!

ahoppen Jan 10, 2023

Uh oh!

bnbarham Jan 10, 2023 •

edited

Loading

Uh oh!

bnbarham Jan 10, 2023

Uh oh!

bnbarham Jan 10, 2023

Uh oh!

ahoppen Jan 10, 2023

Uh oh!

ahoppen commented Jan 10, 2023

Uh oh!

ahoppen commented Jan 10, 2023

Uh oh!

Uh oh!

	/// If the token has a lexial error, the type of the error.
	/// If the token has a lexical error, the type of the error.

		if tokenView.hasLexerError \|\| tokenView.presence == .missing {
		if tokenView.lexerError != nil \|\| tokenView.presence == .missing {

Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler #1191

Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler #1191

Uh oh!

Conversation

ahoppen commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahoppen commented Jan 5, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnbarham Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahoppen commented Jan 10, 2023

Uh oh!

ahoppen commented Jan 10, 2023

Uh oh!

Uh oh!

ahoppen commented Jan 5, 2023 •

edited

Loading

bnbarham Jan 10, 2023 •

edited

Loading