Skip to content

[libSyntax] Syntax coloring using libSyntax #16636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
275821a
[incrParse] Add a stable id to the syntax nodes
ahoppen May 11, 2018
9e0195b
[libSyntax] Add syntax coloring based on the syntax tree
ahoppen May 14, 2018
3505596
[libSyntax] Disable caching of token nodes
ahoppen May 14, 2018
6833540
[libSyntax] Classify unterminated string literals as string literals
ahoppen May 14, 2018
6b03383
[libSyntax] Classify the opening parenthesis of a string interpolatio…
ahoppen May 15, 2018
4670478
[libSyntax] Enable tests for libSyntax based syntax coloring
ahoppen May 14, 2018
82f7e20
[libSyntax] Add syntax coloring based on the syntax tree to SourceKit
ahoppen May 15, 2018
7c9368e
[libSyntax] Enable incremental parsing for syntax tree based syntax c…
ahoppen May 16, 2018
5b596b9
[libSyntax] Allow syntax cache reuse info to be passed back via Sourc…
ahoppen May 16, 2018
7b3580d
[libSyntax] Copy the text of a syntax node
ahoppen May 18, 2018
cabb5d5
[libSyntax] Add a json::Output to print a syntax tree without nodes
ahoppen May 18, 2018
21b438b
[incrParse] Test utility: Compare syntax trees without IDs
ahoppen May 18, 2018
a2ff54d
[incrParse] Add validation of incremental parsing
ahoppen May 18, 2018
7eb0ec8
[incrParse] Refactor logging of syntax reuse regions
ahoppen May 18, 2018
07e9b96
[libSyntax] Adjust some failing tests
ahoppen May 18, 2018
bce813a
[incrParse] Perform a full reparse of the file if needed for formatting
ahoppen May 21, 2018
7691715
[incrParse] Add test case for nested initializers
ahoppen May 21, 2018
bb7919f
[libSyntax] Adjust SyntaxClassifier for visitable SyntaxCollections
ahoppen May 24, 2018
a3a84f8
[libSyntax] By default use the old syntax classifier
ahoppen Jun 18, 2018
a3fb15b
Fix test case for OwnedSyntax because it now by default copies the st…
ahoppen Jun 19, 2018
c9f215c
[libSyntax] Teach SyntaxToSyntaxMapConverter to convert trivia
ahoppen Jun 25, 2018
0500e65
[libSyntax] Classify identifier tokens as identifiers if no other inf…
ahoppen Jun 26, 2018
e7fe537
[libSyntax] Classify pound directives as build config keywords
ahoppen Jun 26, 2018
4b506b2
[libSyntax] Add test variants for building the syntax map via libSyntax
ahoppen Jun 26, 2018
8b037fc
[SourceKit] Move all options into a common options struct
ahoppen Jun 27, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions include/swift/Basic/JSONSerialization.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "swift/Basic/LLVM.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Regex.h"
#include "llvm/Support/raw_ostream.h"
Expand Down Expand Up @@ -341,6 +342,17 @@ struct unvalidatedObjectTraits : public std::integral_constant<bool,
&& !has_ObjectValidateTraits<T>::value> {};

class Output {
public:
/// Enum of all Output subclasses to dynamically cast the output object and
/// thus pass serialization options specific to the serialization of a
/// specific object, e.g. to omit nodes from a syntax tree that have been
/// reused since the last serialization.
enum OutputKind {
Normal,
SyntaxTree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still kind of weird that the serialization util knows syntax tree :( . Can we define a 32-bit field in Output whose meaning can be interpreted while serializing? For instance, we can use the first bit to indicate whether shouldIncludeNodeIds.

};

private:
enum State {
ArrayFirstValue,
ArrayOtherValue,
Expand All @@ -353,13 +365,22 @@ class Output {
bool PrettyPrint;
bool NeedBitValueComma;
bool EnumerationMatchFound;
OutputKind Kind;

protected:
Output(llvm::raw_ostream &os, bool PrettyPrint, OutputKind Kind)
: Stream(os), PrettyPrint(PrettyPrint), NeedBitValueComma(false),
EnumerationMatchFound(false), Kind(Kind) {}

public:
Output(llvm::raw_ostream &os, bool PrettyPrint = true) : Stream(os),
PrettyPrint(PrettyPrint), NeedBitValueComma(false),
EnumerationMatchFound(false) {}
Output(llvm::raw_ostream &os, bool PrettyPrint = true)
: Output(os, PrettyPrint, OutputKind::Normal) {}
virtual ~Output() = default;

static bool classof(const Output *out) { return out->Kind == Normal; }

OutputKind getKind() const { return Kind; }

unsigned beginArray();
bool preflightElement(unsigned, void *&);
void postflightElement(void*);
Expand Down
4 changes: 2 additions & 2 deletions include/swift/Basic/OwnedString.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class OwnedString {
OwnedString(): OwnedString(nullptr, 0, StringOwnership::Unowned) {}

OwnedString(const char *Data, size_t Length):
OwnedString(Data, Length, StringOwnership::Unowned) {}
OwnedString(Data, Length, StringOwnership::Copied) {}

OwnedString(StringRef Str) : OwnedString(Str.data(), Str.size()) {}

Expand Down Expand Up @@ -106,7 +106,7 @@ class OwnedString {
return *this;
}

OwnedString copy() {
OwnedString copy() const {
return OwnedString(Data, Length, StringOwnership::Copied);
}

Expand Down
4 changes: 3 additions & 1 deletion include/swift/Subsystems.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ namespace swift {
class SILParserTUState;
class SourceFile;
class SourceManager;
class SyntaxParsingCache;
class Token;
class TopLevelContext;
struct TypeLoc;
Expand Down Expand Up @@ -326,7 +327,8 @@ namespace swift {
class ParserUnit {
public:
ParserUnit(SourceManager &SM, unsigned BufferID,
const LangOptions &LangOpts, StringRef ModuleName);
const LangOptions &LangOpts, StringRef ModuleName,
SyntaxParsingCache *SyntaxCache = nullptr);
ParserUnit(SourceManager &SM, unsigned BufferID);
ParserUnit(SourceManager &SM, unsigned BufferID,
unsigned Offset, unsigned EndOffset);
Expand Down
1 change: 1 addition & 0 deletions include/swift/Syntax/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ set(generated_include_sources
SyntaxKind.h.gyb
SyntaxNodes.h.gyb
SyntaxBuilders.h.gyb
SyntaxClassifier.h.gyb
SyntaxFactory.h.gyb
SyntaxVisitor.h.gyb
Trivia.h.gyb)
Expand Down
24 changes: 20 additions & 4 deletions include/swift/Syntax/RawSyntax.h
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,13 @@ class RawSyntax final
TriviaPiece> {
friend TrailingObjects;

/// The ID that shall be used for the next node that is created and does not
/// have a manually specified id
static unsigned NextFreeNodeId;

/// An ID of this node that is stable across incremental parses
unsigned NodeId;

union {
uint64_t Clear;
struct {
Expand Down Expand Up @@ -270,13 +277,18 @@ class RawSyntax final
}

/// Constructor for creating layout nodes
/// If \p NodeId is \c Nonde, the next free NodeId is used, if it is passed,
/// the caller needs to assure that the node ID has not been used yet
RawSyntax(SyntaxKind Kind, ArrayRef<RC<RawSyntax>> Layout,
SourcePresence Presence, bool ManualMemory);
SourcePresence Presence, bool ManualMemory,
llvm::Optional<unsigned> NodeId);
/// Constructor for creating token nodes
/// If NodeId is 0, the next free NodeId is used, if it is passed, the caller
/// needs to assure that the NodeId has not been used yet
RawSyntax(tok TokKind, OwnedString Text,
ArrayRef<TriviaPiece> LeadingTrivia,
ArrayRef<TriviaPiece> TrailingTrivia,
SourcePresence Presence, bool ManualMemory);
SourcePresence Presence, bool ManualMemory, unsigned NodeId);

public:
~RawSyntax();
Expand All @@ -298,14 +310,15 @@ class RawSyntax final
/// Make a raw "layout" syntax node.
static RC<RawSyntax> make(SyntaxKind Kind, ArrayRef<RC<RawSyntax>> Layout,
SourcePresence Presence,
SyntaxArena *Arena = nullptr);
SyntaxArena *Arena = nullptr,
llvm::Optional<unsigned> NodeId = llvm::None);

/// Make a raw "token" syntax node.
static RC<RawSyntax> make(tok TokKind, OwnedString Text,
ArrayRef<TriviaPiece> LeadingTrivia,
ArrayRef<TriviaPiece> TrailingTrivia,
SourcePresence Presence,
SyntaxArena *Arena = nullptr);
SyntaxArena *Arena = nullptr, unsigned NodeId = 0);

/// Make a missing raw "layout" syntax node.
static RC<RawSyntax> missing(SyntaxKind Kind, SyntaxArena *Arena = nullptr) {
Expand All @@ -331,6 +344,9 @@ class RawSyntax final

SyntaxKind getKind() const { return static_cast<SyntaxKind>(Bits.Kind); }

/// Get an ID for this node that is stable across incremental parses
unsigned getId() const { return NodeId; }

/// Returns true if the node is "missing" in the source (i.e. it was
/// expected (or optional) but not written.
bool isMissing() const { return getPresence() == SourcePresence::Missing; }
Expand Down
20 changes: 17 additions & 3 deletions include/swift/Syntax/Serialization/SyntaxDeserialization.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,16 +156,30 @@ template <> struct MappingTraits<swift::RC<swift::RawSyntax>> {
in.mapRequired("trailingTrivia", trailingTrivia);
swift::SourcePresence presence;
in.mapRequired("presence", presence);
value = swift::RawSyntax::make(tokenKind, text, leadingTrivia,
trailingTrivia, presence, nullptr);
/// FIXME: This is a workaround for existing bug from llvm yaml parser
/// which would raise error when deserializing number with trailing
/// character like "1\n". See https://bugs.llvm.org/show_bug.cgi?id=15505
StringRef nodeIdString;
in.mapRequired("id", nodeIdString);
unsigned nodeId = std::atoi(nodeIdString.data());
value =
swift::RawSyntax::make(tokenKind, text, leadingTrivia, trailingTrivia,
presence, /*Arena*/ nullptr, nodeId);
} else {
swift::SyntaxKind kind;
in.mapRequired("kind", kind);
std::vector<swift::RC<swift::RawSyntax>> layout;
in.mapRequired("layout", layout);
swift::SourcePresence presence;
in.mapRequired("presence", presence);
value = swift::RawSyntax::make(kind, layout, presence, nullptr);
/// FIXME: This is a workaround for existing bug from llvm yaml parser
/// which would raise error when deserializing number with trailing
/// character like "1\n". See https://bugs.llvm.org/show_bug.cgi?id=15505
StringRef nodeIdString;
in.mapRequired("id", nodeIdString);
unsigned nodeId = std::atoi(nodeIdString.data());
value = swift::RawSyntax::make(kind, layout, presence, /*Arena*/ nullptr,
nodeId);
}
}
};
Expand Down
26 changes: 26 additions & 0 deletions include/swift/Syntax/Serialization/SyntaxSerialization.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,22 @@
namespace swift {
namespace json {

class SyntaxTreeOutput : public Output {
bool IncludeNodeIds;

public:
SyntaxTreeOutput(llvm::raw_ostream &OS, bool IncludeNodeIds,
bool PrettyPrint = true)
: Output(OS, PrettyPrint, Output::OutputKind::SyntaxTree),
IncludeNodeIds(IncludeNodeIds) {}

static bool classof(const Output *out) {
return out->getKind() == SyntaxTree;
}

bool shouldIncludeNodeIds() const { return IncludeNodeIds; }
};

/// Serialization traits for SourcePresence.
template <>
struct ScalarEnumerationTraits<syntax::SourcePresence> {
Expand Down Expand Up @@ -141,6 +157,16 @@ struct ObjectTraits<syntax::RawSyntax> {
}
auto presence = value.getPresence();
out.mapRequired("presence", presence);

bool includeNodeId = true;
if (auto syntaxTreeOutput = dyn_cast<SyntaxTreeOutput>(&out)) {
includeNodeId = syntaxTreeOutput->shouldIncludeNodeIds();
}

if (includeNodeId) {
auto nodeId = value.getId();
out.mapRequired("id", nodeId);
}
}
};

Expand Down
3 changes: 3 additions & 0 deletions include/swift/Syntax/Syntax.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,9 @@ class Syntax {
/// Get the shared raw syntax.
RC<RawSyntax> getRaw() const;

/// Get an ID for this node that is stable across incremental parses
unsigned getId() const { return getRaw()->getId(); }

/// Get the number of child nodes in this piece of syntax, not including
/// tokens.
size_t getNumChildren() const;
Expand Down
114 changes: 114 additions & 0 deletions include/swift/Syntax/SyntaxClassifier.h.gyb
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
%{
# -*- mode: C++ -*-
from gyb_syntax_support import *
NODE_MAP = create_node_map()
# Ignore the following admonition; it applies to the resulting .h file only
}%
//// Automatically Generated From SyntaxClassifier.h.gyb.
//// Do Not Edit Directly!
//===----------- SyntaxClassifier.h - SyntaxClassifier definitions --------===//
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2014 - 2018 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
// See https://swift.org/CONTRIBUTORS.txt for the list of Swift project authors
//
//===----------------------------------------------------------------------===//

#ifndef SWIFT_SYNTAX_CLASSIFIER_H
#define SWIFT_SYNTAX_CLASSIFIER_H

#include "swift/AST/Identifier.h"
#include "swift/Syntax/SyntaxVisitor.h"
#include <stack>

namespace swift {
namespace syntax {


/// A classification that determines which color a token should be colored in
/// for syntax coloring.
enum class SyntaxClassification {
None,
Keyword,
Identifier,
DollarIdentifier,
IntegerLiteral,
FloatingLiteral,
StringLiteral,
/// Marks the parens for a string interpolation.
StringInterpolationAnchor,
TypeIdentifier,
/// #if/#else/#endif occurrence.
BuildConfigKeyword,
/// An identifier in a #if condition.
BuildConfigId,
/// #-keywords like #warning, #sourceLocation
PoundDirectiveKeyword,
/// Any occurrence of '@<attribute-name>' anywhere.
Attribute,
/// An editor placeholder string <#like this#>.
EditorPlaceholder,
ObjectLiteral
};


class SyntaxClassifier: public SyntaxVisitor {
struct ContextStackEntry {
/// The classification all identifiers shall inherit
SyntaxClassification Classification;
/// If set to \c true, all tokens will be forced to receive the above
/// classification, overriding their context-free classification
bool ForceClassification;
};

std::map<unsigned, SyntaxClassification> ClassifiedTokens;
/// The top classification of this stack determines the color of identifiers
std::stack<ContextStackEntry, llvm::SmallVector<ContextStackEntry, 4>> ContextStack;

template<typename T>
void visit(T Node, SyntaxClassification Classification,
bool ForceClassification) {
ContextStack.push({Classification, ForceClassification});
visit(Node);
ContextStack.pop();
}

template<typename T>
void visit(llvm::Optional<T> OptNode) {
if (OptNode.hasValue()) {
static_cast<SyntaxVisitor *>(this)->visit(OptNode.getValue());
}
}

virtual void visit(TokenSyntax TokenNode) override;

virtual void visit(Syntax Node) override {
SyntaxVisitor::visit(Node);
}

% for node in SYNTAX_NODES:
% if is_visitable(node):
virtual void visit(${node.name} Node) override;
% end
% end

public:
std::map<unsigned, SyntaxClassification> classify(Syntax Node) {
// Clean up the environment
ContextStack = std::stack<ContextStackEntry, llvm::SmallVector<ContextStackEntry, 4>>();
ContextStack.push({SyntaxClassification::None, false});
ClassifiedTokens.clear();

Node.accept(*this);

return ClassifiedTokens;
}
};
} // namespace syntax
} // namespace swift

#endif // SWIFT_SYNTAX_CLASSIFIER_H
2 changes: 1 addition & 1 deletion include/swift/Syntax/SyntaxVisitor.h.gyb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ struct SyntaxVisitor {

virtual void visitPre(Syntax node) {}
virtual void visitPost(Syntax node) {}
void visit(Syntax node);
virtual void visit(Syntax node);

void visitChildren(Syntax node) {
for (unsigned i = 0, e = node.getNumChildren(); i != e; ++i) {
Expand Down
5 changes: 5 additions & 0 deletions lib/AST/Module.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1370,6 +1370,11 @@ bool SourceFile::shouldCollectToken() const {
switch (Kind) {
case SourceFileKind::Library:
case SourceFileKind::Main:
if (SyntaxParsingCache) {
// When reuse parts of the syntax tree from a SyntaxParsingCache, not
// all tokens are visited and thus token collection is invalid
return false;
}
return (bool)AllCorrectedTokens;
case SourceFileKind::REPL:
case SourceFileKind::SIL:
Expand Down
6 changes: 4 additions & 2 deletions lib/Parse/Parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1018,9 +1018,11 @@ ParserUnit::ParserUnit(SourceManager &SM, unsigned BufferID)
}

ParserUnit::ParserUnit(SourceManager &SM, unsigned BufferID,
const LangOptions &LangOpts, StringRef ModuleName)
: Impl(*new Implementation(SM, BufferID, LangOpts, ModuleName)) {
const LangOptions &LangOpts, StringRef ModuleName,
SyntaxParsingCache *SyntaxCache)
: Impl(*new Implementation(SM, BufferID, LangOpts, ModuleName)) {

Impl.SF->SyntaxParsingCache = SyntaxCache;
Impl.TheParser.reset(new Parser(BufferID, *Impl.SF, nullptr));
}

Expand Down
1 change: 1 addition & 0 deletions lib/Syntax/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ add_swift_library(swiftSyntax STATIC
RawSyntax.cpp
Syntax.cpp
SyntaxArena.cpp
SyntaxClassifier.cpp.gyb
SyntaxData.cpp
UnknownSyntax.cpp)
Loading