Skip to content

Rework ASTDumper #68438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Sep 12, 2023
Merged

Rework ASTDumper #68438

merged 25 commits into from
Sep 12, 2023

Conversation

beccadax
Copy link
Contributor

@beccadax beccadax commented Sep 11, 2023

This PR refactors the ASTDumper to make it more structured, less mistake-prone, and more amenable to future changes. For example:

  // Before:
  void visitUnresolvedDotExpr(UnresolvedDotExpr *E) {
    printCommon(E, "unresolved_dot_expr")
      << " field '" << E->getName() << "'";
    PrintWithColorRAII(OS, ExprModifierColor)
      << " function_ref=" << getFunctionRefKindStr(E->getFunctionRefKind());
    if (E->getBase()) {
      OS << '\n';
      printRec(E->getBase());
    }
    PrintWithColorRAII(OS, ParenthesisColor) << ')';
  }

  // After:
  void visitUnresolvedDotExpr(UnresolvedDotExpr *E, StringRef label) {
    printCommon(E, "unresolved_dot_expr", label);

    printFieldQuoted(E->getName(), "field");
    printField(E->getFunctionRefKind(), "function_ref", ExprModifierColor);

    if (E->getBase()) {
      printRec(E->getBase());
    }

    printFoot();
  }
  • Values are printed through calls to base class methods, rather than direct access to the underlying raw_ostream.
    • These methods tend to reduce the chances of bugs like missing/extra spaces or newlines, too much/too little indentation, etc.
    • More values are quoted, and unprintable/non-ASCII characters in quoted values are escaped before printing.
  • Infrastructure to label child nodes now exists.
    • Some weird breaks from the normal "style", like PatternBindingDecl's original and processed initializers, have been brought into line.
  • Some types that previously used ad-hoc dumping functions, like conformances and substitution maps, are now structured similarly to the dumper classes.
  • I've fixed the odd dumping bug along the way. For example, distributed actors were only marked actor, not distributed actor.

This PR doesn't change the overall style of AST dumps; they're still pseudo-S-expressions. But the logic that implements this style is now isolated into a relatively small base class, making it feasible to introduce e.g. JSON dumping in the future.

@beccadax
Copy link
Contributor Author

@swift-ci please smoke test

@beccadax
Copy link
Contributor Author

Changes:

  • Corrected some commit messages
  • Fixed one comment change where an apostrophe was incorrectly changed to a double-quote

ASTDumper now uses new printHead() and printFoot() helpers for this.
This does an incredible amount of work to clean up the dumper.
Also changes the signature to be more compatibile with other printRec() overloads.
New helper for printing a name-value pair.
Formalizes the printing of unlabeled flags in a node. Also changes the spelling of a few flags, and corrects a bug where distributed actors were always just flagged as `actor`.
Attempts to remove direct access to PrintBase::OS from its subclasses, so that all printing is mediated by the print*() methods.
Only PrintBase can access the raw_ostream and indentation directly, so all direct I/O is now isolated to that class. I think this will help us to eventually support multiple output formats (e.g. JSON).
printField and several related helpers put the key before the value, while printRec and the visit methods always put the value before the key. Standardize on the latter since we can’t change the visit methods.
Allows us to more easily audit these later for e.g. JSON support.
Except for simple keyword fields and a few other things like capture lists, we now put values in double quotes. We are also correctly escaping these values using functionality built into llvm::raw_ostream.

As part of this change, printField()’s value must now be something that can be handled by a getDumpString() overload. Arbitrary strings should go through printFieldQuoted() or, in a pinch, printFieldRaw().
Make dumps for Requirement, ProtocolConformance, SubstitionMap, etc. use PrintBase and its helpers.
@beccadax
Copy link
Contributor Author

  • More commit message corrections

@beccadax
Copy link
Contributor Author

Updates SILOptimizer/sil_combine_concrete_existential.sil to match updated conformance dumping.
Fixes Concurrency/where_clause_main_resolution.swift.
@beccadax
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS nice improvements 👍

@beccadax beccadax merged commit 8770c7f into swiftlang:main Sep 12, 2023
mininny pushed a commit to mininny/swift that referenced this pull request Sep 14, 2023
This PR refactors the ASTDumper to make it more structured, less mistake-prone, and more amenable to future changes. For example:

```cpp
  // Before:
  void visitUnresolvedDotExpr(UnresolvedDotExpr *E) {
    printCommon(E, "unresolved_dot_expr")
      << " field '" << E->getName() << "'";
    PrintWithColorRAII(OS, ExprModifierColor)
      << " function_ref=" << getFunctionRefKindStr(E->getFunctionRefKind());
    if (E->getBase()) {
      OS << '\n';
      printRec(E->getBase());
    }
    PrintWithColorRAII(OS, ParenthesisColor) << ')';
  }

  // After:
  void visitUnresolvedDotExpr(UnresolvedDotExpr *E, StringRef label) {
    printCommon(E, "unresolved_dot_expr", label);

    printFieldQuoted(E->getName(), "field");
    printField(E->getFunctionRefKind(), "function_ref", ExprModifierColor);

    if (E->getBase()) {
      printRec(E->getBase());
    }

    printFoot();
  }
```

* Values are printed through calls to base class methods, rather than direct access to the underlying `raw_ostream`.
    * These methods tend to reduce the chances of bugs like missing/extra spaces or newlines, too much/too little indentation, etc.
    * More values are quoted, and unprintable/non-ASCII characters in quoted values are escaped before printing.
* Infrastructure to label child nodes now exists.
    * Some weird breaks from the normal "style", like `PatternBindingDecl`'s original and processed initializers, have been brought into line.
* Some types that previously used ad-hoc dumping functions, like conformances and substitution maps, are now structured similarly to the dumper classes.
* I've fixed the odd dumping bug along the way. For example, distributed actors were only marked `actor`, not `distributed actor`.

This PR doesn't change the overall style of AST dumps; they're still pseudo-S-expressions. But the logic that implements this style is now isolated into a relatively small base class, making it feasible to introduce e.g. JSON dumping in the future.

private:
virtual void write_impl(const char *Ptr, size_t Size) override {
base_os.write_escaped(StringRef(Ptr, Size), /*UseHexEscapes=*/true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this made the tau in generic parameter names print as escape sequences:

(primary_archetype_type address=true name="\xCF\x84_0_0"
  (interface_type=generic_type_param_type depth=0 index=0))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Using write_escaped happened to be very convenient, but I wouldn't be averse to a patch that excepted the tau character, or some larger set of known-printable characters (as long as it'll still escape various newlines, quotes, control characters, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants