-
Notifications
You must be signed in to change notification settings - Fork 440
Use TokenSpec
(nee RawTokenKindMatch
) in consume
functions
#1328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use TokenSpec
(nee RawTokenKindMatch
) in consume
functions
#1328
Conversation
@swift-ci Please test |
@swift-ci Please test macOS |
This allows us to simplify the consume methods a little since they no longer need to take parameters for recovery precedence, remapped `RawTokenKind` and `allowTokenAtStartOfLine`. It also removes `RawTokenKind` from the API surface of all the consumption functions, which makes it easier to make `RawTokenKind` a trivial enum again. This also renames `RawTokenKindSubset` to `TokenSpecSet` because that’s what it is now: A set of `TokenSpec`s. I measured and there is no performance regression associated with this change.
0f028a8
to
b439add
Compare
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel weird about how all this works though. In my head it seems like it would be so much simpler to just have the lexer convert all tokens to known keywords. That can be super optimized as much as we want. Then the subsets would be eg. just an optionset that could be compared first and then a spec after that (if needed)... have to think it through more.
Anyway, regardless of any of that we still need to make RawTokenKind a trivial enum again. So I'm happy with this + what I assume your next PR is going to be. If all of this turns out to be too slow we can revisit later.
My plan is that eventually all the identifier -> keyword conversions are done in the parser and the lexer doesn’t even know about keywords. But last time I checked, making that switch will not be trivial if we don’t want to start accepting input that we currently reject. |
My problem with that is that it just seems harder to optimize. Ie. harder to not end up with "if else if else if " for each at/consume/etc. Where as if there's just one function that does this in the lexer ... start with a switch on count, add in some lookup tables for cases where we can convert the keyword to eg. 8 bytes. So on and so forth. Having said all that, we could totally have that function regardless and just run it as needed in the parser (ie. if we only need an identifier, don't bother). If we really wanted we could even do it for each subset, it'd just be way more code. ... So maybe my issue is actually with the pattern matching in the subsets 😅. But this PR doesn't change that and we could still do either with taking in a TokenSpec. And I also don't think this changes the trivial RawTokenKind update either. It's really just optimizing the subset matching. |
This might be just what you are asking for |
…um-prep Preparation to switch RawTokenKind to be a trivial enum again 🚥 #1328
This allows us to simplify the consume methods a little since they no longer need to take parameters for recovery precedence, remapping
RawTokenKind
andallowTokenAtStartOfLine
.It also removes
RawTokenKind
from the API surface of all the consumption functions, which makes it easier to makeRawTokenKind
a trivial enum again.And it renames
RawTokenKindSubset
toTokenSpecSet
because that’s what it is now: A set ofTokenSpec
s.I measured and there is no performance regression associated with this change, maybe even a performance increase because of the added
@inline(__always)
but it’s too narrow to tell for certain.