-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add a flag to dump the AST as JSON. (Second version) #78463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI I am slowly but surely working my way through reviewing this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I definitely prefer this approach over the previous one
if (Writer.isParsable()) { | ||
printComponents(); | ||
} else { | ||
// `printList` in default mode simply indents, so we need to preserve the | ||
// additional level of `(components ...)` for that mode. | ||
printRecArbitrary([&](Label label) { | ||
printHead("components", ExprColor, label); | ||
printComponents(); | ||
printFoot(); | ||
}, Label::optional("components")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this use printRecRange
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I'd like to remove printRecRange
in a follow-up, so my low-risk option for this PR was to retain the existing behavior as closely as possible.
8b46a57
to
2211680
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough feedback! I believe I've addressed everything (aside from some minor things that I'd like to hold as follow-ups because they might cause more churn, like removing printRecRange
and changing interface type
to interface_type
).
if (type->hasArchetype()) { | ||
// We can't generate USRs for types that contain archetypes. Replace them | ||
// with their interface types. | ||
type = type.transformRec([&](TypeBase *t) -> std::optional<Type> { | ||
if (auto AT = dyn_cast<ArchetypeType>(t)) { | ||
return AT->getInterfaceType(); | ||
} | ||
return std::nullopt; | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely hit some limitations during type mangling/USR generation where some nested packs would cause a segfault (I have a FIXME marked in the no-crash test), which sounds a lot like what you're describing. I'm definitely interested in fixing those to make sure we don't hit them in production use; thanks for the pointer, I can try to take a closer look as a follow-up.
lib/AST/ASTDumper.cpp
Outdated
if (Writer.isParsable() && FD->hasInterfaceType()) { | ||
printTypeField(FD->getResultInterfaceType(), Label::always("result")); | ||
if (auto opaque = FD->getOpaqueResultTypeDecl()) { | ||
printRec(opaque, "opaque_result_decl"); | ||
printRec(opaque, Label::always("opaque_result_decl")); | ||
} | ||
} else if (FD->getResultTypeRepr()) { | ||
printRec(FD->getResultTypeRepr(), Label::always("result")); | ||
if (auto opaque = FD->getOpaqueResultTypeDecl()) { | ||
printRec(opaque, Label::always("opaque_result_decl")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added the getCached*
functions for these and a helper function in the dumper to print either the Type
or TypeRepr
based on which is present.
2211680
to
64463c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
ASTDumper allows nodes/values to be printed without labels, which works fine for the default output but won't work for JSON when every value needs to have a valid key. To balance these needs, we create a `Label` abstraction that can be created as either `always` or `optional`. All the current labels are treated as `always`, and all other values have had `optional` labels associated with them, which won't be printed in the default AST dump. This change also adds a `printList` function that replaces anywhere that `for` loops with `printRec` calls in their bodies. This will be used to provide the necessary array structuring for JSON output later. (There are some places where `for` loops call `printFlag` which will need to be dealt with later.)
Availability specs are currently dumped by writing a pre-formatted string directly to the output stream instead of using the structured primitives in `PrintBase`. This needs to be fixed before we can introduce the generalized writers. The format here is meant to be the same as the format printed by the original methods, but some colors may differ slightly when dumping to the terminal.
This makes the low-level `print*` methods in `PrintBase` simply forward to the writer, which does the actual work. Right now, there's only the abstract base class for the writer and the default (S-expression-like) writer. A later commit will introduce the JSON writer. Also update the `printField*` methods to take `Label`s instead of `StringRef`s.
This only takes the existing AST information and writes it as JSON instead of S-expressions. Since many of these fields are stringified, they're not ideal for the kind of analysis clients of the JSON format would want to do. A future commit will update these values to use a more structured representation.
Specifically, this means `-dump-ast-format json` is incompatible with `-dump-parse`. This is because the JSON format is meant to export more details about the AST that require type checking to have been performed. I'm open to lifting this restriction in the future.
Also improve the output for thrown error destinations in parsable modes.
64463c9
to
465918b
Compare
@swift-ci please smoke test |
465918b
to
d2fd347
Compare
@swift-ci please smoke test |
This is a second attempt (first one is at #77321), based on feedback I was given that introducing a second distinct AST dumper would make evolution of the compiler's AST internals more difficult (I can't disagree with that).
So this version takes the existing ASTDumper and refactors it as needed to support emitting structured JSON, while otherwise being an NFC for the default format. This was tricky in some places—the primary issue being that the default format allows value labels to be elided for simplicity, so I had to introduce a notion of "always vs. optional labels" that the default formatter could ignore but that JSON always had to print.
I've tried to use the commit history here to separate the refactoring changes from the other functional changes to make it easier to review.
Original PR description
The new
-dump-ast-format
flag can be supplied when invoking the compiler in-dump-ast
mode. Valid values for this flag aredefault
(the default pseudo-S-expression format),json
, andjson-zlib
(compression is actually quite significant for these; I've measured anywhere between 4–10x savings depending on the content).Motivation
This is meant to be used by clients who want to do large-scale semantic analysis of type-checked Swift code. It is written with the understanding that there are no guarantees of stability for the semantic AST in the compiler, so clients must read the version information encoded in the JSON output and adjust their behavior accordingly.
Why not use other solutions, like indexstore? We do! But some of the information we're interested in (e.g., type information, generic substitutions at declref usage sites, etc.) is not available in indexstore, nor is everything we need appropriate to add to indexstore in the first place. Some type information is available from SourceKit, but not often in the format that we need, and since SourceKit is intended for IDE use cases, it is not optimized for making large numbers of large requests. Having a JSON AST dump post-build frees us up to do analysis anywhere (e.g., on a machine that doesn't even have SourceKit or the same exact version of Xcode).