Skip to content

[libSyntax] Allow adding garbage nodes in between any two children of a syntax node #60211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Jul 24, 2022

When the source code is invalid, this allows us to represent tokens that could not be used to form a valid syntax tree with more fidelity.

This commit does not start using GarbageNodes yet, it just sets everything up for them.

@ahoppen
Copy link
Member Author

ahoppen commented Jul 25, 2022

swiftlang/swift-syntax#514

@swift-ci Please smoke test

Copy link
Member

@rintaro rintaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before proceeding this, I think we should define the semantics of "garbage" nodes for traversal. Like, should SyntaxVisitor visit the contents of garbages, should TokenSequence includes tokens in garbages? How about next/previousSibling, first/lastToken etc.?

@@ -51,4 +51,10 @@
Child('RightBrace', kind='RightBraceToken',
requires_leading_newline=True),
]),

Node('GarbageNodes', kind='SyntaxCollection', element='Syntax',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about introducing a new syntax kind for this. I feel like visitation (SyntaxVisitor) should ignore "garbage" element, and walk directly into the contents of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is a bigger topic about how traversal should handle garbage and missing nodes. I believe there are two distinctly use cases for the SwiftSyntax tree

  1. For use cases like a formatter, you want a fully source accurate representation of the syntax tree that includes all garbage but skips over missing nodes
  2. If you are doing lightweight structural analysis of a Swift file, you want the parser to “fix-up” the syntax tree as much as possible using it’s heuristics, giving you a view of the tree that’s at least valid with respect to SwiftSyntax’s grammar. This is done by ignoring garbage nodes but walking missing tokens as if they were present.

I think these are both valid use case and sufficiently different that we shouldn’t make a decision which view should be the default. I implemented the two different views in swiftlang/swift-syntax#522.

Regarding the need for the GarbageNodes type: Theoretically, we could just use SyntaxCollection here, but IIRC the entire code generation pipeline is not properly set up to instantiate nodes of these base types. The GarbageNodes type also comes in handy for the fixed up view because it allows us to skip over all children in GarbageSyntax.

# Add implicitly generated GarbageNodes children in between any two
# defined children
if kind != 'SyntaxCollection':
for i in range(2 * len(children) + 1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern with this 2n+1 which causes ambiguity for the garbage placements in trees.
For example, when there were consecutive nodes [foo] and [bar] and some garbage between them, there are 3 places the garbage possibly be in.

[root]
  +- [garbage root 1]
  +- [foo]
  |   +- [garbage foo 1]
  |   +- [content]
  |   `- [garbage foo 2]
  +- [garbage root 2]
  +- [bar]
  |   +- [garbage bar 1]
  |   +- [content]
  |   `- [garbage bar 2]
  `- [garbage root 3]

[garbage foo 2], [garbage root 2] and [garbage bar 1]. That might not matter in real world scenario. And being able to have "leading"/"trailing" garbage might be convenient, but at least, I feel we need some rule/guidance how to place the garbage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the trailing garbage is of very little value because most likely the current node doesn’t know until which node skip garbage tokens. I reduced the number of nodes to 2n.

Note that the placement of garbage nodes is still ambiguous because placement of nodes in [garbage root 2] and [garbage bar 1] results in the same syntax tree. I doubt this will be a problem in practice though. Whichever node knows that it wants to skip over tokens should place these in its garbage.

self.children = []
# Add implicitly generated GarbageNodes children in between any two
# defined children
if kind != 'SyntaxCollection':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about SyntaxCollection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that the elements of the collection contain the garbage. Most likely the element parsing functions will also be the ones that know how much garbage to skip.

@ahoppen ahoppen force-pushed the pr/garbage-syntax branch 3 times, most recently from 9a8749c to cf615b2 Compare July 29, 2022 18:28
@ahoppen ahoppen requested review from CodaFi and bnbarham August 1, 2022 12:51
@ahoppen ahoppen force-pushed the pr/garbage-syntax branch 2 times, most recently from 1abf0ec to 43ebf73 Compare August 2, 2022 18:40
@ahoppen ahoppen force-pushed the pr/garbage-syntax branch from 43ebf73 to 9c8f754 Compare August 3, 2022 06:02
@ahoppen
Copy link
Member Author

ahoppen commented Aug 3, 2022

swiftlang/swift-syntax#514

@swift-ci Please smoke test

… a syntax node

When the source code is invalid, this allows us to represent tokens that could not be used to form a valid syntax tree with more fidelity.

This commit does not start using GarbageNodes yet, it just sets everything up for them.
@ahoppen ahoppen force-pushed the pr/garbage-syntax branch from 9c8f754 to c2695f0 Compare August 4, 2022 07:21
@ahoppen
Copy link
Member Author

ahoppen commented Aug 4, 2022

swiftlang/swift-syntax#514

@swift-ci Please smoke test

@ahoppen ahoppen merged commit df2d938 into swiftlang:main Aug 4, 2022
@ahoppen ahoppen deleted the pr/garbage-syntax branch August 4, 2022 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants