Skip to content

[Parser, NameLookup, ASTScope] Parser changes for lazy ASTScopes #26768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 23, 2019

Conversation

davidungar
Copy link
Contributor

  • Factors locations for missing close braces (etc.) and error into two functions: 'getConfabulatedMatchingTokenLoc' and getErrorOrMissingLocForLazyASTScopes.
  • Also takes AppendingExpr into account when computing the end location of aTapExpr.
  • Adds a check in ASTVerifier for the end locations.
  • Moves the fictional closing brace in the synthesized expression for an InterpolatedStringLiteralExpr back one character so it coincides with the closing quote, in order to avoid a nesting violation.
  • Leaves the PreviousTokenLoc pointing to the start of an InterpolatedStringLiteralExpr instead of somewhere in the middle, and uses that to locate a missing closing brace at the close quote.

@davidungar davidungar changed the title Parser changes for lazy ASTScopes [WIP, DNM, Parser, NameLookup, ASTScope] Parser changes for lazy ASTScopes Aug 21, 2019
@davidungar davidungar changed the title [WIP, DNM, Parser, NameLookup, ASTScope] Parser changes for lazy ASTScopes [Parser, NameLookup, ASTScope] Parser changes for lazy ASTScopes Aug 21, 2019
@davidungar
Copy link
Contributor Author

@swift-ci please smoke test os x platform

@davidungar davidungar requested review from beccadax and akyrtzi August 21, 2019 20:45
@davidungar
Copy link
Contributor Author

@swift-ci please test source compatibility

Copy link
Contributor

@akyrtzi akyrtzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes break the invariant that the EndLocs of AST nodes are token-based. My recommendation is to preserve token-based EndLocs, and have AST scopes record their CharSourceRanges and do their binary lookup using character ranges.

// of the literal so that it doesn't go past where a missing close brace
// for an enclosing IterableTypeDecl will be added.
// (See \c parseMatchingToken.)
SourceLoc EndLoc = Loc.getAdvancedLoc(Tok.getLength() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This EndLoc that is used for the "closing brace" is character-based, it should actually be token based (it was character-based before as well, before this change, but we should correct it).

It's not clear to me if the implicit BraceStmt that is created for the TapExpr is supposed to be the SourceRange of the whole string literal or just the range of the expression segments, but if it is supposed to be the SourceRange of the whole string literal then it should be matching the string literal's SourceRange, whose start and end locations are the token location of where the literal string begins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; I'll try that. As you saw, before this change it was one character past the end of the token; neither fish nor fowl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will work better to point at the last token in the AppendingExpression. Going to try that.

// middle.
return PreviousTok.getKind() != tok::string_literal
? PreviousLoc
: PreviousLoc.getAdvancedLoc(PreviousTok.getLength() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here, this is returning a character-based location. I think this should always return PreviousLoc, which AFAICT is the location of the last parsed token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this, and try the change suggested to ASTScopes (using character locations). And I'm happy to give it a whirl. But I'm still bothered because the notion of "last parsed token" seems ill-defined to me, as if we're building on sand.

Consider "\(foo)". You could say that the last parsed token is the whole interpolated string literal, except that the last parsed token is also the closing parenthesis, or maybe the foo.
Before I added the save-and-restore line for PreviousLoc that value did point to the foo IIRC. Part of the confusion is in the word "last", I suppose. Is that the most-recently-parsed? (in which case the ambiguity is whether it's the most-recently-started-to-parse or the most-recently-finished-to-parse). Or is it the last token that has been parsed in the file? (I.e., the token furthest from the start of the file that has been parsed?)

If you help me understand this, I'll try to add some comments, too.

Copy link
Contributor

@akyrtzi akyrtzi Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that complicates the interpolated string literal is that it is a "tokens within a token" situation, but a way to reason about it is that there are these set of tokens:

  • the string literal token "\(foo)"
  • The interpolated tokens '(', 'foo', ')'

Then the question is what kind of source range you want. If you want the range of the whole string literal then you should use the literal token range. If you want the range of the interpolated segment then you should use the interpolated tokens.

But you cannot have the EndLoc of the SourceRange be the closing quote, because the closing quote is not a token, and the SourceRange you'd form with it would not be token-based.

Copy link
Contributor Author

@davidungar davidungar Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that helps. Given that, if I want to keep ASTScope token-like SourceRange based, the key would be to look inside the string literal, at the tokens of the expression within, and take the location of the last of those tokens, the ')'. Or as you say, change ASTScopes to be character-location based.

The other question is where should the EndLoc of a StructDecl (etc.) be when the close brace is missing? Given the rule that SourceLocs refer to tokens, and that sub-nodes are enclosed (perhaps improperly) by super-nodes, one could put that EndLoc either at the SourceLoc of the string literal (i.e. the open '"'), or at the SourceLoc of the ')'. Do you have a sense of which of these would be more fitting? What's tough here is that we're talking about illegal code.

In the latter case, I think I saw other things break, so I suspect the closing brace belongs with the open quote of the string literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to try using the SourceLoc of the string literal, the open quote.

@davidungar
Copy link
Contributor Author

Thanks a lot, @akyrtzi , for getting to this so quickly. I'll try out the directions you suggest.

@davidungar
Copy link
Contributor Author

@akyrtzi I'll make the changes I think will work for the invariants, and do some testing tomorrow. Will push when it looks like it will work, and let you know so you can take a look. Thanks again!

@davidungar
Copy link
Contributor Author

@akyrtzi Update: I pushed, but haven't tested or proofread yet.

@davidungar davidungar force-pushed the A-8-21-parserPR branch 2 times, most recently from 4c880d6 to 79c7cbd Compare August 22, 2019 05:33
@davidungar
Copy link
Contributor Author

@swift-ci please test source code compatibility

@davidungar
Copy link
Contributor Author

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - d84d541

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - d84d541

@davidungar
Copy link
Contributor Author

@akyrtzi I think this PR is now ready for your perusal. I haven't tested it yet, nor changed ASTScope to work with it. I'm thinking that a lazy IterableTypeDecl scope can ask the lexer for the token at the EndLoc of an IterableTypeDecl. If it's an InterpolatedStringLiteral, it can lex it to find the end. And, as you suggest, the ASTScope system can use the charSourceLocations. For expressions in general, come to think of it, the same thing ought to work: lex the EndLoc of the expression, and use the charSourceLoc of the end. Will that tomorrow.

auto Body = BraceStmt::create(Context, Loc, Stmts, EndLoc,
// At this point, PreviousLoc points to the last token parsed within
// the body, so use that for the brace statement location.
auto Body = BraceStmt::create(Context, Loc, Stmts, PreviousLoc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source range of this BraceStmt is rather strange because, according to the comment, the start loc is the start of the literal and the end loc is at a token inside it, like this:

" blah \(some) blah \(thing) blah "
^                          ^

I'd recommend that it should either be the full range of the literal ([Loc, Loc]), or the range of the statements ([Stmts.first.StartLoc, Stmts.last.EndLoc]), or this range if you can manage it:

" blah \(some) blah \(thing) blah "
        ^                  ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the Stmts.{first,last} idea. I'll try that.

@davidungar
Copy link
Contributor Author

@swift-ci please smoke test

@davidungar
Copy link
Contributor Author

@swift-ci please test source compatibility

@davidungar
Copy link
Contributor Author

@akyrtzi Thanks!!

@davidungar davidungar merged commit 72b1e27 into swiftlang:master Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants