Skip to content

Implement semantic highlighting for Swift #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 28, 2021

Conversation

fwcd
Copy link
Member

@fwcd fwcd commented Jul 3, 2021

This is a reboot of @prostakm's implementation of semantic highlighting in #279, using the LSP structures from #388 and the suggestions from #279 (comment).

Both lexical and semantic tokens are now stored in Documents and updated whenever a document is updated or changed. Lexical tokens are updated incrementally in openDocument and changeDocument and semantic tokens are updated all at once in handleDocumentUpdate.

image

Implementation Progress

  • Parse syntax map and annotations from SourceKit
  • Update lexical tokens on document open and change events from SourceKit
    • Remove and shift existing lexical tokens correctly on edit events in DocumentManager
    • Handle edge cases where lexical tokens aren't removed correctly, e.g. when typing let document = ... the do stays highlighted as a keyword and isn't removed, since the edit range never overlaps with it (this is solved by checking whether an edit that is directly adjacent to a token uses a whitespace character at the adjacent position)
  • Update semantic tokens on document update events from SourceKit
  • Implement (full) semantic token requests
  • Implement ranged semantic token requests
  • Implement semantic token modifiers
  • Provide semantic tokens for declarations
  • Add test case for lexical tokens
  • Add test case for semantic tokens

Future Work

  • Add LSP config options so users can selectively enable/disable the different kinds of tokens (semantic, lexical)
    • This could also be useful for testing
  • Implement semantic token delta responses
  • Parse syntactic tokens from the edit response's keys.substructure (i.e. revert 1e65730) and address the performance issues in big files (see this thread)
    • A possible approach would be to parse the substructure asynchronously into syntactic tokens, this should be possible since they aren't updated incrementally like the lexical tokens

@ahoppen ahoppen self-assigned this Jul 6, 2021
@fwcd fwcd force-pushed the semantic-tokens branch 2 times, most recently from 3740394 to 4bddd07 Compare July 7, 2021 00:11
Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I’ve left a few™ comments inline. Don’t be afraid by the amount. Overall, I think the PR is really good and I don’t think it’s anything too major.

@fwcd fwcd force-pushed the semantic-tokens branch from 5893807 to ead2e73 Compare July 10, 2021 16:36
@fwcd fwcd marked this pull request as ready for review July 10, 2021 21:36
@fwcd fwcd requested a review from benlangmuir as a code owner July 10, 2021 21:36
Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added some more comments inline.

@fwcd fwcd force-pushed the semantic-tokens branch from b5dfffe to 46da752 Compare July 18, 2021 11:43
@fwcd fwcd force-pushed the semantic-tokens branch from d99b56c to 34e6fb1 Compare July 20, 2021 13:46
Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for continuing to push this through. 😍

I’ve got a few more comments inline, but I think we’re nearly there.

@fwcd fwcd force-pushed the semantic-tokens branch from 5fbdd28 to a3fbc2d Compare July 20, 2021 15:34
@fwcd fwcd force-pushed the semantic-tokens branch from e073057 to 685944d Compare July 26, 2021 13:06
@fwcd fwcd force-pushed the semantic-tokens branch from 92261cd to c715e79 Compare August 4, 2021 14:20
@fwcd
Copy link
Member Author

fwcd commented Aug 10, 2021

I have removed the syntactic tokens for now, which should resolve the performance issues in large files.

Also the language server now parses lexical syntaxtype_identifier tokens into a new semantic token type identifier. The reason for not using one of the standard LSP token types is that none of them provides a clear mapping for 'plain' identifiers that could be either a variable, function, module, parameter or something else. By using a custom token type, clients will ignore these by default and can choose a more specific highlighting from their own grammar instead (e.g. as in the case of VSCode), while others may choose to find these tokens useful.

@fwcd fwcd requested a review from ahoppen August 10, 2021 16:33
Copy link
Contributor

@benlangmuir benlangmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying this out in vscode, and had a couple of questions:

  • Is defaultLibrary not working, or is vscode just not using that from its builtin themes? I don't see any differentiation.
  • It seems like in "public func" the theme wants to emphasize the "public" (modifier) instead of the "func" (keyword). I would have expected the opposite, or that they would look the same. Do you know what other language servers are doing here?

This doesn't need to block the PR though.


Other than my comment about the timeout, I think this is probably ready to go, although @ahoppen should probably sign off on it as well. Thanks for your hard work @fwcd !

@benlangmuir
Copy link
Contributor

@swift-ci please test

@DavidGoldman
Copy link
Contributor

Is defaultLibrary not working, or is vscode just not using that from its builtin themes? I don't see any differentiation.

The default theme doesn't use it, but you can confirm it's there by using the Developer: Inspect Editor Tokens and Scopes command and clicking on one of the library symbols

@fwcd
Copy link
Member Author

fwcd commented Aug 10, 2021

I've bisected the PR and found that 4bbff3d broke (at least) the document symbol test, since it changes the semantics of the SKDResponseDictionary accessors slightly. Since returning nil in the absence of a value seems to be the 'right thing' to do, I will update the tests accordingly.

@benlangmuir
Copy link
Contributor

@swift-ci please test

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went through this again and have a couple more inline comments. Once they’re taken care of, I’m go for merge 🚀.

@ahoppen
Copy link
Member

ahoppen commented Aug 12, 2021

I also filed SR-15058 with some areas for improvement that are out of scope for this PR, to make sure we don’t loose track of them.

fwcd added a commit to fwcd/sourcekit-lsp that referenced this pull request Aug 12, 2021
- Make makeToken a fileprivate initializer instead
- Fix doc comments on withMutableTokensOfEachKind
- Rename `previousLineCount` to `replacedLineCount`
- Fix token shifting logic
- Add another test case for inserting tokens
- Return empty document tokens if we don't get an edit response
- Update DocumentSymbol.children to be non-optional
- Rename DocumentSymbolTest to DocumentSymbolTests
  ...to be consistent with the file name and the other test modules.
- Fix indentation in DocumentSymbolTests
- Fix doc comment on Modifier.lspName
- Remove no-longer-needed useName from token parser
  These were only needed for syntactic tokens, which we decided not to
  include in swiftlang#414 for now.
- Add test case for identifiers only backticked on one side
- Parse tokens into inout array
@benlangmuir
Copy link
Contributor

@swift-ci please etst

@DavidGoldman
Copy link
Contributor

@swift-ci please test

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve got two minor follow-up comments to my last review.

Could you also squash your commits?

@fwcd
Copy link
Member Author

fwcd commented Aug 24, 2021

Sure, should I squash the entire PR into a single commit or do you think that a more granular splitting would be better, given that the PR has become pretty large already?

@ahoppen
Copy link
Member

ahoppen commented Aug 24, 2021

I can’t really think of any way the changes can be split up into commits that are at least somewhat independent, so I think one big commit is fine.

@fwcd
Copy link
Member Author

fwcd commented Aug 24, 2021

Okay, I'd vote to keep 4bbff3d separate though, since it is a (slight) change to the accessors' semantics that isn't directly related to semantic highlighting per se.

@ahoppen
Copy link
Member

ahoppen commented Aug 24, 2021

Sounds good. 👍 If you have more of these commits, I think keeping them separate would be good. All I tried to say is that I don’t think it’s worth the effort to artificially split up the PR into smaller commits.

fwcd added 2 commits August 24, 2021 16:49
This is an implementation of LSP's semantic tokens for Swift. Both
lexical and semantic tokens are provided by using the syntaxmap and the
semantic annotations provided as part of SourceKit's open responses and
document update notifications.

While lexical tokens are parsed and stored in the DocumentManager
synchronously, semantic tokens are provided asynchronously. If an edit
occurs, tokens are automatically shifted by the DocumentManager. This is
especially relevant for lexical tokens, which are updated in deltas.

In addition, unit tests are added that assert that both lexical and
semantic tokens are provided and shifted correctly upon edits.
@fwcd fwcd force-pushed the semantic-tokens branch from e4fe3d4 to eaa8c66 Compare August 24, 2021 14:59
@ahoppen
Copy link
Member

ahoppen commented Aug 24, 2021

@swift-ci Please test

@ahoppen
Copy link
Member

ahoppen commented Aug 25, 2021

LanguageServerProtocolTests.CodingTests testValueCoding failed

Test Case '-[LanguageServerProtocolTests.CodingTests testValueCoding]' started.
/Users/buildnode/jenkins/workspace/swift-sourcekit-lsp-PR-macOS/branch-main/sourcekit-lsp/Tests/LanguageServerProtocolTests/CodingTests.swift:333: error: -[LanguageServerProtocolTests.CodingTests testValueCoding] : XCTAssertEqual failed: ("[
  {
    "kind" : 12,
    "name" : "mySymbol",
    "range" : {
      "end" : {
        "character" : 0,
        "line" : 6
      },
      "start" : {
        "character" : 23,
        "line" : 5
      }
    },
    "selectionRange" : {
      "end" : {
        "character" : 0,
        "line" : 6
      },
      "start" : {
        "character" : 23,
        "line" : 5
      }
    }
  }
]") is not equal to ("[
  {
    "children" : [

    ],
    "kind" : 12,
    "name" : "mySymbol",
    "range" : {
      "end" : {
        "character" : 0,
        "line" : 6
      },
      "start" : {
        "character" : 23,
        "line" : 5
      }
    },
    "selectionRange" : {
      "end" : {
        "character" : 0,
        "line" : 6
      },
      "start" : {
        "character" : 23,
        "line" : 5
      }
    }
  }
]")
Test Case '-[LanguageServerProtocolTests.CodingTests testValueCoding]' failed (1.950 seconds).

The coding of document symbols now correctly expects an empty children
array.
@fwcd
Copy link
Member Author

fwcd commented Aug 27, 2021

Good catch, should be fixed now.

@DavidGoldman
Copy link
Contributor

@swift-ci Please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants