-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[libSyntax] Allow adding garbage nodes in between any two children of a syntax node #60211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before proceeding this, I think we should define the semantics of "garbage" nodes for traversal. Like, should SyntaxVisitor
visit the contents of garbages, should TokenSequence
includes tokens in garbages? How about next/previousSibling
, first/lastToken
etc.?
@@ -51,4 +51,10 @@ | |||
Child('RightBrace', kind='RightBraceToken', | |||
requires_leading_newline=True), | |||
]), | |||
|
|||
Node('GarbageNodes', kind='SyntaxCollection', element='Syntax', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about introducing a new syntax kind for this. I feel like visitation (SyntaxVisitor
) should ignore "garbage" element, and walk directly into the contents of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is a bigger topic about how traversal should handle garbage and missing nodes. I believe there are two distinctly use cases for the SwiftSyntax tree
- For use cases like a formatter, you want a fully source accurate representation of the syntax tree that includes all garbage but skips over missing nodes
- If you are doing lightweight structural analysis of a Swift file, you want the parser to “fix-up” the syntax tree as much as possible using it’s heuristics, giving you a view of the tree that’s at least valid with respect to SwiftSyntax’s grammar. This is done by ignoring garbage nodes but walking missing tokens as if they were present.
I think these are both valid use case and sufficiently different that we shouldn’t make a decision which view should be the default. I implemented the two different views in swiftlang/swift-syntax#522.
Regarding the need for the GarbageNodes
type: Theoretically, we could just use SyntaxCollection
here, but IIRC the entire code generation pipeline is not properly set up to instantiate nodes of these base types. The GarbageNodes
type also comes in handy for the fixed up view because it allows us to skip over all children in GarbageSyntax
.
utils/gyb_syntax_support/Node.py
Outdated
# Add implicitly generated GarbageNodes children in between any two | ||
# defined children | ||
if kind != 'SyntaxCollection': | ||
for i in range(2 * len(children) + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern with this 2n+1 which causes ambiguity for the garbage placements in trees.
For example, when there were consecutive nodes [foo]
and [bar]
and some garbage between them, there are 3 places the garbage possibly be in.
[root]
+- [garbage root 1]
+- [foo]
| +- [garbage foo 1]
| +- [content]
| `- [garbage foo 2]
+- [garbage root 2]
+- [bar]
| +- [garbage bar 1]
| +- [content]
| `- [garbage bar 2]
`- [garbage root 3]
[garbage foo 2]
, [garbage root 2]
and [garbage bar 1]
. That might not matter in real world scenario. And being able to have "leading"/"trailing" garbage might be convenient, but at least, I feel we need some rule/guidance how to place the garbage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the trailing garbage is of very little value because most likely the current node doesn’t know until which node skip garbage tokens. I reduced the number of nodes to 2n.
Note that the placement of garbage nodes is still ambiguous because placement of nodes in [garbage root 2]
and [garbage bar 1]
results in the same syntax tree. I doubt this will be a problem in practice though. Whichever node knows that it wants to skip over tokens should place these in its garbage.
self.children = [] | ||
# Add implicitly generated GarbageNodes children in between any two | ||
# defined children | ||
if kind != 'SyntaxCollection': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about SyntaxCollection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that the elements of the collection contain the garbage. Most likely the element parsing functions will also be the ones that know how much garbage to skip.
9a8749c
to
cf615b2
Compare
1abf0ec
to
43ebf73
Compare
43ebf73
to
9c8f754
Compare
@swift-ci Please smoke test |
… a syntax node When the source code is invalid, this allows us to represent tokens that could not be used to form a valid syntax tree with more fidelity. This commit does not start using GarbageNodes yet, it just sets everything up for them.
9c8f754
to
c2695f0
Compare
@swift-ci Please smoke test |
When the source code is invalid, this allows us to represent tokens that could not be used to form a valid syntax tree with more fidelity.
This commit does not start using GarbageNodes yet, it just sets everything up for them.