-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Parser, NameLookup, ASTScope] Parser changes for lazy ASTScopes #26768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please smoke test os x platform |
@swift-ci please test source compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes break the invariant that the EndLocs of AST nodes are token-based. My recommendation is to preserve token-based EndLocs, and have AST scopes record their CharSourceRanges
and do their binary lookup using character ranges.
lib/Parse/ParseExpr.cpp
Outdated
// of the literal so that it doesn't go past where a missing close brace | ||
// for an enclosing IterableTypeDecl will be added. | ||
// (See \c parseMatchingToken.) | ||
SourceLoc EndLoc = Loc.getAdvancedLoc(Tok.getLength() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This EndLoc
that is used for the "closing brace" is character-based, it should actually be token based (it was character-based before as well, before this change, but we should correct it).
It's not clear to me if the implicit BraceStmt
that is created for the TapExpr
is supposed to be the SourceRange
of the whole string literal or just the range of the expression segments, but if it is supposed to be the SourceRange of the whole string literal then it should be matching the string literal's SourceRange, whose start and end locations are the token location of where the literal string begins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; I'll try that. As you saw, before this change it was one character past the end of the token; neither fish nor fowl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will work better to point at the last token in the AppendingExpression. Going to try that.
lib/Parse/Parser.cpp
Outdated
// middle. | ||
return PreviousTok.getKind() != tok::string_literal | ||
? PreviousLoc | ||
: PreviousLoc.getAdvancedLoc(PreviousTok.getLength() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here, this is returning a character-based location. I think this should always return PreviousLoc
, which AFAICT is the location of the last parsed token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this, and try the change suggested to ASTScopes (using character locations). And I'm happy to give it a whirl. But I'm still bothered because the notion of "last parsed token" seems ill-defined to me, as if we're building on sand.
Consider "\(foo)"
. You could say that the last parsed token is the whole interpolated string literal, except that the last parsed token is also the closing parenthesis, or maybe the foo
.
Before I added the save-and-restore line for PreviousLoc
that value did point to the foo
IIRC. Part of the confusion is in the word "last", I suppose. Is that the most-recently-parsed? (in which case the ambiguity is whether it's the most-recently-started-to-parse or the most-recently-finished-to-parse). Or is it the last token that has been parsed in the file? (I.e., the token furthest from the start of the file that has been parsed?)
If you help me understand this, I'll try to add some comments, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that complicates the interpolated string literal is that it is a "tokens within a token" situation, but a way to reason about it is that there are these set of tokens:
- the string literal token
"\(foo)"
- The interpolated tokens
'(', 'foo', ')'
Then the question is what kind of source range you want. If you want the range of the whole string literal then you should use the literal token range. If you want the range of the interpolated segment then you should use the interpolated tokens.
But you cannot have the EndLoc
of the SourceRange be the closing quote, because the closing quote is not a token, and the SourceRange you'd form with it would not be token-based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that helps. Given that, if I want to keep ASTScope token-like SourceRange based, the key would be to look inside the string literal, at the tokens of the expression within, and take the location of the last of those tokens, the ')'. Or as you say, change ASTScopes to be character-location based.
The other question is where should the EndLoc of a StructDecl (etc.) be when the close brace is missing? Given the rule that SourceLocs refer to tokens, and that sub-nodes are enclosed (perhaps improperly) by super-nodes, one could put that EndLoc either at the SourceLoc of the string literal (i.e. the open '"'), or at the SourceLoc of the ')'. Do you have a sense of which of these would be more fitting? What's tough here is that we're talking about illegal code.
In the latter case, I think I saw other things break, so I suspect the closing brace belongs with the open quote of the string literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to try using the SourceLoc of the string literal, the open quote.
Thanks a lot, @akyrtzi , for getting to this so quickly. I'll try out the directions you suggest. |
@akyrtzi I'll make the changes I think will work for the invariants, and do some testing tomorrow. Will push when it looks like it will work, and let you know so you can take a look. Thanks again! |
@akyrtzi Update: I pushed, but haven't tested or proofread yet. |
4c880d6
to
79c7cbd
Compare
79c7cbd
to
a40b694
Compare
@swift-ci please test source code compatibility |
@swift-ci please test |
Build failed |
Build failed |
@akyrtzi I think this PR is now ready for your perusal. I haven't tested it yet, nor changed ASTScope to work with it. I'm thinking that a lazy IterableTypeDecl scope can ask the lexer for the token at the EndLoc of an IterableTypeDecl. If it's an InterpolatedStringLiteral, it can lex it to find the end. And, as you suggest, the ASTScope system can use the charSourceLocations. For expressions in general, come to think of it, the same thing ought to work: lex the EndLoc of the expression, and use the charSourceLoc of the end. Will that tomorrow. |
lib/Parse/ParseExpr.cpp
Outdated
auto Body = BraceStmt::create(Context, Loc, Stmts, EndLoc, | ||
// At this point, PreviousLoc points to the last token parsed within | ||
// the body, so use that for the brace statement location. | ||
auto Body = BraceStmt::create(Context, Loc, Stmts, PreviousLoc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source range of this BraceStmt
is rather strange because, according to the comment, the start loc is the start of the literal and the end loc is at a token inside it, like this:
" blah \(some) blah \(thing) blah "
^ ^
I'd recommend that it should either be the full range of the literal ([Loc, Loc]), or the range of the statements ([Stmts.first.StartLoc, Stmts.last.EndLoc]), or this range if you can manage it:
" blah \(some) blah \(thing) blah "
^ ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the Stmts.{first,last} idea. I'll try that.
…xpressions and their subexpressions
@swift-ci please smoke test |
@swift-ci please test source compatibility |
@akyrtzi Thanks!! |
getErrorOrMissingLocForLazyASTScopes
.AppendingExpr
into account when computing the end location of aTapExpr
.InterpolatedStringLiteralExpr
back one character so it coincides with the closing quote, in order to avoid a nesting violation.PreviousTokenLoc
pointing to the start of anInterpolatedStringLiteralExpr
instead of somewhere in the middle, and uses that to locate a missing closing brace at the close quote.