-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[swiftSyntax] Swift side syntax classifier #18251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
// RUN: %utils/gyb --line-directive '' %S/Inputs/TokenKindList.txt.gyb | sort > %T/python_kinds.txt | ||
// RUN: %swift-syntax-test --dump-all-syntax-tokens | sort > %T/def_kinds.txt | ||
// RUN: diff %T/def_kinds.txt %T/python_kinds.txt | ||
|
||
// Check that all token kinds listed in TokenKinds.def are also in | ||
// gyb_syntax_support/Token.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
%{ | ||
from gyb_syntax_support import * | ||
}% | ||
% for token in SYNTAX_TOKENS: | ||
${token.kind} | ||
% end | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
%{ | ||
from gyb_syntax_support import * | ||
# -*- mode: Swift -*- | ||
# Ignore the following admonition it applies to the resulting .swift file only | ||
}% | ||
//// Automatically Generated From SyntaxClassifier.swift.gyb. | ||
//// Do Not Edit Directly! | ||
//===------------ SyntaxClassifier.swift.gyb - Syntax Collection ----------===// | ||
// | ||
// This source file is part of the Swift.org open source project | ||
// | ||
// Copyright (c) 2014 - 2018 Apple Inc. and the Swift project authors | ||
// Licensed under Apache License v2.0 with Runtime Library Exception | ||
// | ||
// See https://swift.org/LICENSE.txt for license information | ||
// See https://swift.org/CONTRIBUTORS.txt for the list of Swift project authors | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
public enum SyntaxClassification { | ||
% for classification in SYNTAX_CLASSIFICATIONS: | ||
% for line in dedented_lines(classification.description): | ||
% if line: | ||
/// ${line} | ||
% end | ||
% end | ||
case ${classification.swift_name} | ||
% end | ||
} | ||
|
||
class _SyntaxClassifier: SyntaxVisitor { | ||
|
||
private var contextStack: [(classification: SyntaxClassification, force: Bool)] = | ||
[(classification: .none, force: false)] | ||
|
||
var classifications: [TokenSyntax: SyntaxClassification] = [:] | ||
|
||
private func visit( | ||
_ node: Syntax, | ||
classification: SyntaxClassification, | ||
force: Bool = false | ||
) { | ||
contextStack.append((classification: classification, force: force)) | ||
visit(node) | ||
contextStack.removeLast() | ||
} | ||
|
||
private func getContextFreeClassificationForTokenKind(_ tokenKind: TokenKind) | ||
-> SyntaxClassification? { | ||
switch (tokenKind) { | ||
% for token in SYNTAX_TOKENS: | ||
case .${token.swift_kind()}: | ||
% if token.classification: | ||
return SyntaxClassification.${token.classification.swift_name} | ||
% else: | ||
return nil | ||
% end | ||
% end | ||
case .eof: | ||
return SyntaxClassification.none | ||
} | ||
} | ||
|
||
override func visit(_ token: TokenSyntax) { | ||
var classification = contextStack.last!.classification | ||
if !contextStack.last!.force { | ||
if let contextFreeClassification = | ||
getContextFreeClassificationForTokenKind(token.tokenKind) { | ||
classification = contextFreeClassification | ||
} | ||
if case .unknown = token.tokenKind, token.text.starts(with: "\"") { | ||
classification = .stringLiteral | ||
} else if case .identifier = token.tokenKind, | ||
token.text.starts(with: "<#") && token.text.hasSuffix("#>") { | ||
classification = .editorPlaceholder | ||
} | ||
} | ||
assert(classifications[token] == nil) | ||
classifications[token] = classification | ||
} | ||
|
||
% for node in SYNTAX_NODES: | ||
% if is_visitable(node): | ||
override func visit(_ node: ${node.name}) { | ||
% if node.is_unknown() or node.is_syntax_collection(): | ||
super.visit(node) | ||
% else: | ||
% for child in node.children: | ||
% if child.is_optional: | ||
if let ${child.swift_name} = node.${child.swift_name} { | ||
% if child.classification: | ||
visit(${child.swift_name}, | ||
classification: .${child.classification.swift_name}, | ||
force: ${"true" if child.force_classification else "false"}) | ||
% else: | ||
visit(${child.swift_name}) | ||
% end | ||
} | ||
% else: | ||
% if child.classification: | ||
visit(node.${child.swift_name}, | ||
classification: .${child.classification.swift_name}, | ||
force: ${"true" if child.force_classification else "false"}) | ||
% else: | ||
visit(node.${child.swift_name}) | ||
% end | ||
% end | ||
% end | ||
% end | ||
|
||
} | ||
% end | ||
% end | ||
} | ||
|
||
public enum SyntaxClassifier { | ||
/// Classify all tokens in the given syntax tree for syntax highlighting | ||
public static func classifyTokensInTree(_ syntaxTree: SourceFileSyntax) | ||
-> [TokenSyntax: SyntaxClassification] { | ||
let classifier = _SyntaxClassifier() | ||
classifier.visit(syntaxTree) | ||
return classifier.classifications | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
107 changes: 107 additions & 0 deletions
107
tools/swift-swiftsyntax-test/ClassifiedSyntaxTreePrinter.swift
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
import SwiftSyntax | ||
import Foundation | ||
|
||
class ClassifiedSyntaxTreePrinter: SyntaxVisitor { | ||
private let classifications: [TokenSyntax: SyntaxClassification] | ||
private var currentTag = "" | ||
private var skipNextNewline = false | ||
private var result = "" | ||
|
||
// MARK: Public interface | ||
|
||
init(classifications: [TokenSyntax: SyntaxClassification]) { | ||
self.classifications = classifications | ||
} | ||
|
||
func print(tree: SourceFileSyntax) -> String { | ||
result = "" | ||
visit(tree) | ||
// Emit the last closing tag | ||
recordCurrentTag("") | ||
return result | ||
} | ||
|
||
// MARK: Implementation | ||
|
||
/// Closes the current tag if it is different from the previous one and opens | ||
/// a tag with the specified ID. | ||
private func recordCurrentTag(_ tag: String) { | ||
if currentTag != tag { | ||
if !currentTag.isEmpty { | ||
result += "</" + currentTag + ">" | ||
} | ||
if !tag.isEmpty { | ||
result += "<" + tag + ">" | ||
} | ||
} | ||
currentTag = tag | ||
} | ||
|
||
private func visit(_ piece: TriviaPiece) { | ||
let tag: String | ||
switch piece { | ||
case .spaces, .tabs, .verticalTabs, .formfeeds: | ||
tag = "" | ||
case .newlines, .carriageReturns, .carriageReturnLineFeeds: | ||
if skipNextNewline { | ||
skipNextNewline = false | ||
return | ||
} | ||
tag = "" | ||
case .backticks: | ||
tag = "" | ||
case .lineComment(let text): | ||
// Don't print CHECK lines | ||
if text.hasPrefix("// CHECK") { | ||
skipNextNewline = true | ||
return | ||
} | ||
tag = "comment-line" | ||
case .blockComment: | ||
tag = "comment-block" | ||
case .docLineComment: | ||
tag = "doc-comment-line" | ||
case .docBlockComment: | ||
tag = "doc-comment-block" | ||
case .garbageText: | ||
tag = "" | ||
} | ||
recordCurrentTag(tag) | ||
piece.write(to: &result) | ||
} | ||
|
||
private func visit(_ trivia: Trivia) { | ||
for piece in trivia { | ||
visit(piece) | ||
} | ||
} | ||
|
||
private func getTagForSyntaxClassification( | ||
_ classification: SyntaxClassification | ||
) -> String { | ||
switch (classification) { | ||
case .none: return "" | ||
case .keyword: return "kw" | ||
case .identifier: return "" | ||
case .typeIdentifier: return "type" | ||
case .dollarIdentifier: return "dollar" | ||
case .integerLiteral: return "int" | ||
case .floatingLiteral: return "float" | ||
case .stringLiteral: return "str" | ||
case .stringInterpolationAnchor: return "anchor" | ||
case .poundDirectiveKeyword: return "#kw" | ||
case .buildConfigId: return "#id" | ||
case .attribute: return "attr-builtin" | ||
case .objectLiteral: return "object-literal" | ||
case .editorPlaceholder: return "placeholder" | ||
} | ||
} | ||
|
||
override func visit(_ node: TokenSyntax) { | ||
visit(node.leadingTrivia) | ||
let classification = classifications[node] ?? SyntaxClassification.none | ||
recordCurrentTag(getTagForSyntaxClassification(classification)) | ||
result += node.text | ||
visit(node.trailingTrivia) | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an action
print-token-kind
in swift-swiftsyntax-test for this output.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that this way is better. This is not really swift-specific since we're just checking the list of declarations in the python declaration of
gyb_syntax_support
.If we were to implement this in
swift-swiftsyntax-test
we would need to:a) Make
swift-swiftsyntax-test
be agyb
tool (I don't like this at all)b) Need to check that all the tokenKinds get generated in the
TokenKind
enum, but for that we'd need to have a list of all kinds define inTokenKind
and we cannot use the autogeneratedallCases
property usingCaseIterable
sinceTokenKind
has associated values.And after all this file is not a real tool that needs to be compiled and is a standalone binary, but is just an input to
gyb
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nkcsgexi and I discussed this in person and decided that it's probably the easiest and cleanest way to test this. In the future we might want to consider generation
TokenKinds.def
fromgyb
which would make the entire test obsolete.