Add a flag to dump the AST as JSON. (Second version) #78463

allevato · 2025-01-07T13:53:41Z

This is a second attempt (first one is at #77321), based on feedback I was given that introducing a second distinct AST dumper would make evolution of the compiler's AST internals more difficult (I can't disagree with that).

So this version takes the existing ASTDumper and refactors it as needed to support emitting structured JSON, while otherwise being an NFC for the default format. This was tricky in some places—the primary issue being that the default format allows value labels to be elided for simplicity, so I had to introduce a notion of "always vs. optional labels" that the default formatter could ignore but that JSON always had to print.

I've tried to use the commit history here to separate the refactoring changes from the other functional changes to make it easier to review.

Original PR description

The new -dump-ast-format flag can be supplied when invoking the compiler in -dump-ast mode. Valid values for this flag are default (the default pseudo-S-expression format), json, and json-zlib (compression is actually quite significant for these; I've measured anywhere between 4–10x savings depending on the content).

Motivation

This is meant to be used by clients who want to do large-scale semantic analysis of type-checked Swift code. It is written with the understanding that there are no guarantees of stability for the semantic AST in the compiler, so clients must read the version information encoded in the JSON output and adjust their behavior accordingly.

Why not use other solutions, like indexstore? We do! But some of the information we're interested in (e.g., type information, generic substitutions at declref usage sites, etc.) is not available in indexstore, nor is everything we need appropriate to add to indexstore in the first place. Some type information is available from SourceKit, but not often in the format that we need, and since SourceKit is intended for IDE use cases, it is not optimized for making large numbers of large requests. Having a JSON AST dump post-build frees us up to do analysis anywhere (e.g., on a machine that doesn't even have SourceKit or the same exact version of Xcode).

hamishknight · 2025-01-16T15:14:44Z

FYI I am slowly but surely working my way through reviewing this

hamishknight

Thanks for working on this! I definitely prefer this approach over the previous one

include/swift/AST/DiagnosticsFrontend.def

lib/Frontend/ArgsToFrontendOptionsConverter.cpp

lib/AST/ASTDumper.cpp

hamishknight · 2025-01-17T14:06:16Z

lib/AST/ASTDumper.cpp

+    if (Writer.isParsable()) {
+      printComponents();
+    } else {
+      // `printList` in default mode simply indents, so we need to preserve the
+      // additional level of `(components ...)` for that mode.
+      printRecArbitrary([&](Label label) {
+        printHead("components", ExprColor, label);
+        printComponents();
+        printFoot();
+      }, Label::optional("components"));
+    }


Could this use printRecRange?

Maybe, but I'd like to remove printRecRange in a follow-up, so my low-risk option for this PR was to retain the existing behavior as closely as possible.

lib/AST/ASTDumper.cpp

lib/Driver/ToolChains.cpp

test/Frontend/ast-dump-json.swift

lib/AST/ASTDumper.cpp

allevato

Thanks for the thorough feedback! I believe I've addressed everything (aside from some minor things that I'd like to hold as follow-ups because they might cause more churn, like removing printRecRange and changing interface type to interface_type).

include/swift/AST/DiagnosticsFrontend.def

allevato · 2025-01-17T20:57:48Z

lib/AST/ASTDumper.cpp

+  if (type->hasArchetype()) {
+    // We can't generate USRs for types that contain archetypes. Replace them
+    // with their interface types.
+    type = type.transformRec([&](TypeBase *t) -> std::optional<Type> {
+      if (auto AT = dyn_cast<ArchetypeType>(t)) {
+        return AT->getInterfaceType();
+      }
+      return std::nullopt;
+    });
+  }


I definitely hit some limitations during type mangling/USR generation where some nested packs would cause a segfault (I have a FIXME marked in the no-crash test), which sounds a lot like what you're describing. I'm definitely interested in fixing those to make sure we don't hit them in production use; thanks for the pointer, I can try to take a closer look as a follow-up.

lib/AST/ASTDumper.cpp

allevato · 2025-01-21T14:29:37Z

lib/AST/ASTDumper.cpp

+        if (Writer.isParsable() && FD->hasInterfaceType()) {
+          printTypeField(FD->getResultInterfaceType(), Label::always("result"));
          if (auto opaque = FD->getOpaqueResultTypeDecl()) {
-            printRec(opaque, "opaque_result_decl");
+            printRec(opaque, Label::always("opaque_result_decl"));
+          }
+        } else if (FD->getResultTypeRepr()) {
+          printRec(FD->getResultTypeRepr(), Label::always("result"));
+          if (auto opaque = FD->getOpaqueResultTypeDecl()) {
+            printRec(opaque, Label::always("opaque_result_decl"));
          }


Done. Added the getCached* functions for these and a helper function in the dumper to print either the Type or TypeRepr based on which is present.

lib/AST/ASTDumper.cpp

hamishknight

Thanks!

lib/AST/ASTDumper.cpp

include/swift/AST/Attr.h

ASTDumper allows nodes/values to be printed without labels, which works fine for the default output but won't work for JSON when every value needs to have a valid key. To balance these needs, we create a `Label` abstraction that can be created as either `always` or `optional`. All the current labels are treated as `always`, and all other values have had `optional` labels associated with them, which won't be printed in the default AST dump. This change also adds a `printList` function that replaces anywhere that `for` loops with `printRec` calls in their bodies. This will be used to provide the necessary array structuring for JSON output later. (There are some places where `for` loops call `printFlag` which will need to be dealt with later.)

Availability specs are currently dumped by writing a pre-formatted string directly to the output stream instead of using the structured primitives in `PrintBase`. This needs to be fixed before we can introduce the generalized writers. The format here is meant to be the same as the format printed by the original methods, but some colors may differ slightly when dumping to the terminal.

This makes the low-level `print*` methods in `PrintBase` simply forward to the writer, which does the actual work. Right now, there's only the abstract base class for the writer and the default (S-expression-like) writer. A later commit will introduce the JSON writer. Also update the `printField*` methods to take `Label`s instead of `StringRef`s.

This only takes the existing AST information and writes it as JSON instead of S-expressions. Since many of these fields are stringified, they're not ideal for the kind of analysis clients of the JSON format would want to do. A future commit will update these values to use a more structured representation.

Specifically, this means `-dump-ast-format json` is incompatible with `-dump-parse`. This is because the JSON format is meant to export more details about the AST that require type checking to have been performed. I'm open to lifting this restriction in the future.

Also improve the output for thrown error destinations in parsable modes.

allevato · 2025-01-22T20:49:23Z

@swift-ci please smoke test

allevato · 2025-01-24T19:12:21Z

@swift-ci please smoke test

allevato requested review from artemcm, tshortli, hborla, slavapestov and xedin as code owners January 7, 2025 13:53

allevato mentioned this pull request Jan 7, 2025

Add a flag to dump the AST as JSON. #77321

Closed

allevato force-pushed the json-ast-new branch from 087045f to 8b46a57 Compare January 8, 2025 17:36

tshortli mentioned this pull request Jan 9, 2025

Remove outdated warning about swift interfaces without library evolution #78342

Open

hamishknight reviewed Jan 17, 2025

View reviewed changes

allevato force-pushed the json-ast-new branch from 8b46a57 to 2211680 Compare January 21, 2025 17:30

allevato commented Jan 21, 2025

View reviewed changes

allevato force-pushed the json-ast-new branch from 2211680 to 64463c9 Compare January 21, 2025 18:04

hamishknight approved these changes Jan 22, 2025

View reviewed changes

lib/AST/ASTDumper.cpp Outdated Show resolved Hide resolved

lib/AST/ASTDumper.cpp Outdated Show resolved Hide resolved

lib/AST/ASTDumper.cpp Show resolved Hide resolved

include/swift/AST/Attr.h Show resolved Hide resolved

allevato and others added 10 commits January 22, 2025 14:23

[AST] Improve JSON representation of various AST nodes.

52c009a

Address most of the simple review feedback.

a4c64b0

Print the buffer ID for non-main buffers and test via macro expansions.

87cabb0

Address more reviewer feedback.

8df48e5

Use cached evaluator results when looking up function types.

f249db3

Also improve the output for thrown error destinations in parsable modes.

allevato force-pushed the json-ast-new branch from 64463c9 to 465918b Compare January 22, 2025 20:48

allevato mentioned this pull request Jan 24, 2025

Pass the -dump-ast-format flag down to the frontend. swiftlang/swift-driver#1722

Merged

Additional post-review fixes.

d2fd347

allevato force-pushed the json-ast-new branch from 465918b to d2fd347 Compare January 24, 2025 19:11

allevato enabled auto-merge January 24, 2025 19:12

allevato merged commit 9907afa into swiftlang:main Jan 24, 2025
3 checks passed

allevato deleted the json-ast-new branch January 25, 2025 13:43

allevato mentioned this pull request Jan 27, 2025

🍒[6.1] Add a flag to dump the AST as JSON. #78955

Merged

Add a flag to dump the AST as JSON. (Second version) #78463

Add a flag to dump the AST as JSON. (Second version) #78463

Uh oh!

Conversation

allevato commented Jan 7, 2025

Original PR description

Motivation

Uh oh!

hamishknight commented Jan 16, 2025

Uh oh!

hamishknight left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hamishknight Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

allevato Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allevato left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

allevato Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allevato Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hamishknight left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allevato commented Jan 22, 2025

Uh oh!

allevato commented Jan 24, 2025

Uh oh!

Uh oh!

Uh oh!