Skip to content

Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler #1191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 11, 2023

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Jan 5, 2023

Instead of marking a token as having an error and relying on re-lexing to emit a diagnostic, store the type of error and the offset at which it occurred in the token itself.

This is a more sound design IMO and will continue to work even if we introduce state into the lexer (e.g. whether we are currently in a string literal or inside a conflict marker range).

@ahoppen
Copy link
Member Author

ahoppen commented Jan 5, 2023

@swift-ci Please test

self.advance(while: { $0.isValidIdentifierContinuationCodePoint })
return (.integerLiteral, [.isErroneous])
return LexerResult(.integerLiteral)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to accept the literal 0x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is still an issue too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not a regression, I just opened #1214 so we can fix it in a follow-up PR.

/// If the token has a lexical error, this defines the type of the error.
/// `lexerErrorOffset` in the token will specify at which offset the error
/// occurred.
public enum LexerError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on something like:

public struct LexerError {
  public enum Kind {
    case ...
  }

  public let kind: Kind
  public let byteOffset: UInt16 // No need for alias, it's only defined in one place now

  public init(...) { ... }
}

And then just having the one public var lexerError: LexerError field and a

public var hasLexerError: Bool {
  return lexerError != nil
}

?

This would also simplify all the comments that mention "if there's an error, lexerErrorOffset will specify the offset".

As an aside, I'd personally be fine with UInt8 as the offset type. Do people really have more than 255 characters for a token?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a much better idea. No idea why I modeled it the way I did…

As an aside, I'd personally be fine with UInt8 as the offset type. Do people really have more than 255 characters for a token?

String literals are currently single tokens and those can reasonably be bigger than 255 characters, I think.

Copy link
Contributor

@bnbarham bnbarham Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String literals are currently single tokens and those can reasonably be bigger than 255 characters, I think.

Oh, unfortunate.

@@ -169,6 +169,11 @@ public struct TokenSyntax: SyntaxProtocol, SyntaxHashable {
public func childNameForDiagnostics(_ index: SyntaxChildrenIndex) -> String? {
return nil
}

/// If the token has a lexial error, the type of the error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If the token has a lexial error, the type of the error.
/// If the token has a lexical error, the type of the error.

Comment on lines -163 to +164
if tokenView.hasLexerError || tokenView.presence == .missing {
if tokenView.lexerError != nil || tokenView.presence == .missing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it wouldn't hurt to add a hasLexerError, but up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to remove it.

@ahoppen ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from f80753d to 58554b0 Compare January 10, 2023 17:55
@ahoppen ahoppen changed the title Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1191 Jan 10, 2023
@ahoppen ahoppen changed the title Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1191 Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1213 Jan 10, 2023
@ahoppen ahoppen changed the title Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler 🚥 #1213 Refactor the lexer to store lexer errors in the tokens instead of having a diagnosticsHandler Jan 10, 2023
@ahoppen ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from 58554b0 to 13e60bb Compare January 10, 2023 19:47
@ahoppen
Copy link
Member Author

ahoppen commented Jan 10, 2023

@swift-ci Please test

@ahoppen ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from 13e60bb to b72c418 Compare January 10, 2023 21:14
…ing a diagnosticsHandler

Instead of marking a token as having an error and relying on re-lexing to emit a diagnostic, store the type of error and the offset at which it occurred in the token itself.

This is a more sound design IMO and will continue to work even if we introduce state into the lexer (e.g. whether we are currently in a string literal or inside a conflict marker range).
@ahoppen ahoppen force-pushed the ahoppen/lexer-errors-in-token branch from b72c418 to 8345eac Compare January 10, 2023 21:15
@ahoppen
Copy link
Member Author

ahoppen commented Jan 10, 2023

@swift-ci Please test

@ahoppen ahoppen merged commit f5f6ae2 into swiftlang:main Jan 11, 2023
@ahoppen ahoppen deleted the ahoppen/lexer-errors-in-token branch January 11, 2023 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants