Skip to content

Refactor the lexer to make it easier to understand, maintain and more swifty #1227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 17, 2023

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Jan 13, 2023

Honestly, I don’t know how best to review this. Essentially, I’ve gone through the entire lexer (without string lexing because I want to refactor that anyway) and made the following changes

  • peek is now safe by default and returns nil if we reached the end of the file. There is peek(matches:) and peek(doesntMatch:) to check if the current token is one of the expected
  • Split the lexer into multiple files
  • Re-arranges some methods to be grouped more logically
  • All the lex* methods now expect to be placed at the first character they are lexing (previously they expected the first character to already be lexed). This means we could get rid of the tokStart parameter. Also I think it’s more natural this way
  • Added a few more assertions
  • Made all variables lowercase
  • Added some documentation
  • Enabled argument labels
  • Plus probably a few miscellaneous more that I forgot

There should be no functionality change in this PR.

@ahoppen
Copy link
Member Author

ahoppen commented Jan 13, 2023

@swift-ci Please test

/// If the current character is in `matching`, advance the cursor and return `true`.
/// Otherwise, this is a no-op and returns `false`.
@_disfavoredOverload // favor the stamped out copies
mutating func advance(matching characters: CharacterByte...) -> Bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the max number of characters we pass here (and similar for the peek case)? I'd prefer to just not have this one at all, I don't think we ever want to pay the cost of the ARC traffic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s three. I just stamped out three copies.

extension Lexer.Cursor {
func peek(at offset: Int = 0) -> UInt8? {
assert(offset >= 0)
guard offset < self.input.count else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to check the performance cost of running this on every single peek, ie. do we still need an unsafe version for when we know we're not at EOF?

Copy link
Member Author

@ahoppen ahoppen Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance measurements of testSelfParsePerformance (release builds):

I think this is acceptable given the added safety this gives us.

Raw data
main

Test Case '-[SwiftParserTest.ParserTests testSelfParsePerformance]' measured [Time, seconds] average: 6.062, relative standard deviation: 1.030%, values: [6.188462, 6.127593, 6.119602, 6.032848, 6.075716, 6.015036, 5.986559, 6.063117, 6.017368, 5.995290], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100
Test Case '-[SwiftParserTest.ParserTests testSelfParsePerformance]' measured [Time, seconds] average: 6.061, relative standard deviation: 0.619%, values: [6.136058, 6.019346, 6.036355, 6.052679, 6.034555, 6.046183, 6.101317, 6.109356, 6.043739, 6.031751], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100


lex-string-delimiters

Test Case '-[SwiftParserTest.ParserTests testSelfParsePerformance]' measured [Time, seconds] average: 6.298, relative standard deviation: 0.360%, values: [6.314035, 6.311459, 6.294189, 6.284121, 6.263693, 6.275958, 6.329505, 6.316271, 6.323488, 6.268774], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100
Test Case '-[SwiftParserTest.ParserTests testSelfParsePerformance]' measured [Time, seconds] average: 6.272, relative standard deviation: 0.664%, values: [6.369508, 6.263280, 6.288378, 6.267686, 6.229722, 6.224330, 6.228591, 6.259549, 6.308582, 6.278303], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100


This PR

Test Case '-[SwiftParserTest.ParserTests testSelfParsePerformance]' measured [Time, seconds] average: 6.340, relative standard deviation: 0.672%, values: [6.456374, 6.327926, 6.361954, 6.321446, 6.342133, 6.335581, 6.328410, 6.287928, 6.318708, 6.322288], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage
Test Case '-[SwiftParserTest.ParserTests testSelfParsePerformance]' measured [Time, seconds] average: 6.319, relative standard deviation: 0.740%, values: [6.443643, 6.314616, 6.300575, 6.312995, 6.347714, 6.276697, 6.307492, 6.329889, 6.280654, 6.279295], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100

Copy link
Contributor

@bnbarham bnbarham Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat surprised by the ~4% from the delimiters split. Any idea why that's so high?

EDIT: I assume just the check on state for every token

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just measured again using Instruments (and a command line tool that contains the code from testSelfParsePerformance instead of testSelfParsePerformance itself) and couldn’t find a significant difference between any of the branches. Not sure what introduced the variation before.


/// Stamped out copy of `peek(at:matches:)` in case we are matching a single
/// character to avoid construction of an array.
func peek(at offset: Int = 0, matches character: PeekMatch) -> Bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

advance has matching which IMO reads better. You could also make doesntMatch unless instead. It sort of sounds like a condition, but not as much as if and when would for the positive case 🤷.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline to rename this to is(at:)

@ahoppen ahoppen changed the title WIP: Refactor the lexer to make it easier to understand, maintain and more swifty 🚥 #1192 Refactor the lexer to make it easier to understand, maintain and more swifty Jan 16, 2023
@ahoppen ahoppen force-pushed the ahoppen/refactor-lexer branch from 0a99f1d to ec2b1b9 Compare January 16, 2023 11:20
@ahoppen
Copy link
Member Author

ahoppen commented Jan 16, 2023

@swift-ci Please test

@ahoppen ahoppen force-pushed the ahoppen/refactor-lexer branch from ec2b1b9 to 8addb9a Compare January 16, 2023 11:48
@ahoppen
Copy link
Member Author

ahoppen commented Jan 16, 2023

@swift-ci Please test

@ahoppen ahoppen marked this pull request as ready for review January 17, 2023 15:06
@ahoppen ahoppen merged commit 759043e into swiftlang:main Jan 17, 2023
@ahoppen ahoppen deleted the ahoppen/refactor-lexer branch January 17, 2023 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants