Skip to content

Implement -dump-minimization-hints flag. #133910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 11, 2025
Merged

Implement -dump-minimization-hints flag. #133910

merged 11 commits into from
Apr 11, 2025

Conversation

VitaNuo
Copy link
Contributor

@VitaNuo VitaNuo commented Apr 1, 2025

This PR implements a CC1 flag -dump-deserialized-declaration-ranges. The flag allows to specify a file path to dump ranges of deserialized declarations in ASTReader. Example usage:

clang -Xclang=-dump-deserialized-declaration-ranges=/tmp/decls -c file.cc -o file.o

Example output:

// /tmp/decls
{
  "required_ranges": [
    {
      "file": "foo.h",
      "range": [
        {
          "from": {
            "line": 26,
            "column": 1
          },
          "to": {
            "line": 27,
            "column": 77
          }
        }
      ]
    },
    {
      "file": "bar.h",
      "range": [
        {
          "from": {
            "line": 30,
            "column": 1
          },
          "to": {
            "line": 35,
            "column": 1
          }
        },
        {
          "from": {
            "line": 92,
            "column": 1
          },
          "to": {
            "line": 95,
            "column": 1
          }
        }
      ]
    }
  ]
}

Specifying the flag creates an instance of DeserializedDeclsLineRangePrinter, which dumps ranges of deserialized declarations to aid debugging and bug minimization.

Required ranges are computed from source ranges of Decls. TranslationUnitDecl, LinkageSpecDecl and NamespaceDecl are ignored for the sake of this PR.

Technical details:

  • DeserializedDeclsLineRangePrinter implements ASTConsumer and ASTDeserializationListener, so that an object of DeserializedDeclsLineRangePrinter registers as its own listener.
  • ASTDeserializationListener interface provides the DeclRead callback that we use to collect the deserialized Decls.
    Printing or otherwise processing them as this point is dangerous, since that could trigger additional deserialization and crash compilation.
  • The collected Decls are processed in HandleTranslationUnit method of ASTConsumer. This is a safe point, since we know that by this point all the Decls needed by the compiler frontend have been deserialized.
  • In case our processing causes further deserialization, DeclRead from the listener might be called again. However, at that point we don't accept any more Decls for processing.

Copy link

github-actions bot commented Apr 1, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a good start! I have left a few comments and also feel there are a few things missing:

  • basic tests
  • the code registering deserialization listener
  • suggestion: a commit message could benefit from a description of how we are going to use this flag (heuristics for minimization tools), describe the output format and mention that the output stability is not guaranteed (it's also another reason why to put it behind a -cc1 flag)

@VitaNuo VitaNuo changed the title [WIP] Implement print-deserialized-declarations flag to dump source… [WIP] Implement -dump-deserialized-declaration-ranges flag. Apr 3, 2025
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 4, 2025

@llvm/pr-subscribers-clang

Author: Viktoriia Bakalova (VitaNuo)

Changes

This PR implements a CC1 flag -dump-deserialized-declaration-ranges. The flag allows to specify a file path to dump ranges of deserialized declarations in ASTReader. Example usage:

clang -Xclang=-dump-deserialized-declaration-ranges=/tmp/decls -c file.cc -o file.o

Example output:

// /tmp/decls
{
  "required_ranges": [
    {
      "file": "foo.h",
      "range": [
        {
          "from": {
            "line": 26,
            "column": 1
          },
          "to": {
            "line": 27,
            "column": 77
          }
        }
      ]
    },
    {
      "file": "bar.h",
      "range": [
        {
          "from": {
            "line": 30,
            "column": 1
          },
          "to": {
            "line": 35,
            "column": 1
          }
        },
        {
          "from": {
            "line": 92,
            "column": 1
          },
          "to": {
            "line": 95,
            "column": 1
          }
        }
      ]
    }
  ]
}

Specifying the flag creates an instance of DeserializedDeclsLineRangePrinter, which dumps ranges of deserialized declarations to aid debugging and bug minimization.

Required ranges are computed from source ranges of Decls. TranslationUnitDecl, LinkageSpecDecl and NamespaceDecl are ignored for the sake of this PR.

Technical details:

  • DeserializedDeclsLineRangePrinter implements ASTConsumer and ASTDeserializationListener, so that an object of DeserializedDeclsLineRangePrinter registers as its own listener.
  • ASTDeserializationListener interface provides the DeclRead callback that we use to collect the deserialized Decls.
    Printing or otherwise processing them as this point is dangerous, since that could trigger additional deserialization and crash compilation.
  • The collected Decls are processed in HandleTranslationUnit method of ASTConsumer. This is a safe point, since we know that by this point all the Decls needed by the compiler frontend have been deserialized.
  • In case our processing causes further deserialization, DeclRead from the listener might be called again. However, at that point we don't accept any more Decls for processing.

Full diff: https://github.com/llvm/llvm-project/pull/133910.diff

4 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/include/clang/Frontend/FrontendOptions.h (+3)
  • (modified) clang/lib/Frontend/FrontendAction.cpp (+156-5)
  • (added) clang/test/Frontend/dump-deserialized-declaration-ranges.cpp (+80)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 3af072242d039..1737e40b776e1 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -7968,6 +7968,10 @@ def print_dependency_directives_minimized_source : Flag<["-"],
   "print-dependency-directives-minimized-source">,
   HelpText<"Print the output of the dependency directives source minimizer">;
 }
+def dump_deserialized_declaration_ranges : Joined<["-"],
+  "dump-deserialized-declaration-ranges=">,
+  HelpText<"Dump ranges of deserialized declarations to aid debugging and minimization">,
+  MarshallingInfoString<FrontendOpts<"DumpDeserializedDeclarationRangesPath">>;
 
 defm emit_llvm_uselists : BoolOption<"", "emit-llvm-uselists",
   CodeGenOpts<"EmitLLVMUseLists">, DefaultFalse,
diff --git a/clang/include/clang/Frontend/FrontendOptions.h b/clang/include/clang/Frontend/FrontendOptions.h
index a9c9849ff52ab..8ef9ce9db8783 100644
--- a/clang/include/clang/Frontend/FrontendOptions.h
+++ b/clang/include/clang/Frontend/FrontendOptions.h
@@ -530,6 +530,9 @@ class FrontendOptions {
   /// Output Path for module output file.
   std::string ModuleOutputPath;
 
+  /// Output path to dump ranges of deserialized declarations.
+  std::string DumpDeserializedDeclarationRangesPath;
+
 public:
   FrontendOptions()
       : DisableFree(false), RelocatablePCH(false), ShowHelp(false),
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index 2d77f06be7446..f98aa5ab1fe51 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -15,6 +15,7 @@
 #include "clang/Basic/FileEntry.h"
 #include "clang/Basic/LangStandard.h"
 #include "clang/Basic/Sarif.h"
+#include "clang/Basic/SourceManager.h"
 #include "clang/Basic/Stack.h"
 #include "clang/Frontend/ASTUnit.h"
 #include "clang/Frontend/CompilerInstance.h"
@@ -35,6 +36,7 @@
 #include "clang/Serialization/ASTReader.h"
 #include "clang/Serialization/GlobalModuleIndex.h"
 #include "llvm/ADT/ScopeExit.h"
+#include "llvm/ADT/StringRef.h"
 #include "llvm/Support/BuryPointer.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/FileSystem.h"
@@ -49,6 +51,144 @@ LLVM_INSTANTIATE_REGISTRY(FrontendPluginRegistry)
 
 namespace {
 
+/// DeserializedDeclsLineRangePrinter dumps ranges of deserialized declarations to aid debugging and bug minimization.
+/// It implements ASTConsumer and ASTDeserializationListener, so that an object of DeserializedDeclsLineRangePrinter registers
+/// as its own listener.
+/// The ASTDeserializationListener interface provides the DeclRead callback that we use to collect the deserialized Decls.
+/// Note that printing or otherwise processing them as this point is dangerous, since that could trigger additional
+/// deserialization and crash compilation.
+/// Therefore, we process the collected Decls in HandleTranslationUnit method of ASTConsumer.
+/// This is a safe point, since we know that by this point all the Decls needed by the compiler frontend have been
+/// deserialized. In case our processing causes further deserialization, DeclRead from the listener might be called again.
+/// However, at that point we don't accept any more Decls for processing.
+class DeserializedDeclsLineRangePrinter : public ASTDeserializationListener, public ASTConsumer {
+public:
+  explicit DeserializedDeclsLineRangePrinter(SourceManager &SM, std::unique_ptr<llvm::raw_fd_ostream> OS)
+      : ASTDeserializationListener(), SM(SM), OS(std::move(OS)) {}
+
+  void DeclRead(GlobalDeclID ID, const Decl *D) override {
+    if (!IsCollectingDecls) {
+      return;
+    }
+    if (!D || isa<TranslationUnitDecl>(D) || isa<LinkageSpecDecl>(D) ||
+          isa<NamespaceDecl>(D))
+      return;
+    if (auto *DC = D->getDeclContext(); !DC || !DC->isFileContext())
+      return;
+    PendingDecls.push_back(D);
+    ASTDeserializationListener::DeclRead(ID, D);
+  }
+
+  using Position = std::pair<unsigned, unsigned>;
+  struct RequiredRanges {
+      StringRef Filename;
+      std::vector<std::pair<Position, Position>> FromTo;
+  };
+  void HandleTranslationUnit(ASTContext &Context) override {
+    IsCollectingDecls = false;
+    std::vector<const Decl *> Decls = std::move(PendingDecls);
+    if (!PendingDecls.empty()) {
+      llvm::errs() << "Deserialized more decls while printing, total of "
+                    << PendingDecls.size() << "\n";
+      PendingDecls.clear();
+    }
+
+    // Merge ranges in each of the files. For simplicity, track lines and hope
+    // they do not break things.
+    struct FileData {
+      std::vector<std::pair<Position, Position>> FromTo;
+      std::vector<std::pair<unsigned, unsigned>> Columns;
+      OptionalFileEntryRef Ref;
+    };
+    llvm::DenseMap<const FileEntry *, FileData> FileToLines;
+    for (const Decl *D : Decls) {
+      CharSourceRange R = SM.getExpansionRange(D->getSourceRange());
+      if (!R.isValid())
+        continue;
+
+      auto *F = SM.getFileEntryForID(SM.getFileID(R.getBegin()));
+      if (F != SM.getFileEntryForID(SM.getFileID(R.getEnd())))
+        continue;
+
+      auto &Data = FileToLines[F];
+      if (!Data.Ref)
+        Data.Ref =
+            SM.getFileEntryRefForID(SM.getFileID(R.getBegin()));
+      Data.FromTo.push_back({{SM.getSpellingLineNumber(R.getBegin()), SM.getSpellingColumnNumber(R.getBegin())},
+                            {SM.getSpellingLineNumber(R.getEnd()), SM.getSpellingColumnNumber(R.getEnd())}});
+    }
+
+    std::vector<RequiredRanges> Result;
+    for (auto &[F, Data] : FileToLines) {
+      auto& FromTo = Data.FromTo;
+      assert(!FromTo.empty());
+
+      if (!Data.Ref) continue;
+
+      llvm::sort(FromTo);
+
+      std::vector<std::pair<Position, Position>> MergedLines;
+      MergedLines.push_back(FromTo.front());
+      for (auto It = FromTo.begin() + 1; It < FromTo.end(); ++It) {
+        if (MergedLines.back().second < It->first) {
+          MergedLines.push_back(*It);
+          continue;
+        }
+        if (MergedLines.back().second < It->second)
+          MergedLines.back().second = It->second;
+      }
+      Result.push_back({Data.Ref->getName(), MergedLines});
+    }
+    printJson(Result);
+  }
+
+  void printJson(const std::vector<RequiredRanges>& Result) {
+    *OS << "{\n";
+    *OS << "  \"required_ranges\": [\n";
+    for (size_t i = 0; i < Result.size(); ++i) {
+      auto &F = Result[i].Filename;
+      auto &MergedLines = Result[i].FromTo;
+      *OS << "    {\n";
+      *OS << "      \"file\": \"" << F << "\",\n";
+      *OS << "      \"range\": [\n";
+      for (size_t j = 0; j < MergedLines.size(); ++j) {
+        auto &From = MergedLines[j].first;
+        auto &To = MergedLines[j].second;
+        *OS << "        {\n";
+        *OS << "          \"from\": {\n";
+        *OS << "            \"line\": " << From.first << ",\n";
+        *OS << "            \"column\": " << From.second << "\n          },\n";
+        *OS << "          \"to\": {\n";
+        *OS << "            \"line\": " << To.first << ",\n";
+        *OS << "            \"column\": " << To.second << "\n          }\n";
+        *OS << "        }";
+        if (j < MergedLines.size() - 1) {
+          *OS << ",";
+        }
+        *OS << "\n";
+      }
+      *OS << "      ]\n    }";
+      if (i < Result.size() - 1) {
+        *OS << ",";
+      }
+      *OS << "\n";
+    }
+    *OS << "  ]\n";
+    *OS << "}\n";
+  }
+
+  ASTDeserializationListener *GetASTDeserializationListener() override {
+    return this;
+  }
+
+private:
+std::vector<const Decl *> PendingDecls;
+bool IsCollectingDecls = true;
+const SourceManager &SM;
+std::unique_ptr<llvm::raw_ostream> OS;
+};
+
+
 /// Dumps deserialized declarations.
 class DeserializedDeclsDumper : public DelegatingDeserializationListener {
 public:
@@ -121,6 +261,19 @@ FrontendAction::CreateWrappedASTConsumer(CompilerInstance &CI,
   if (!Consumer)
     return nullptr;
 
+  std::vector<std::unique_ptr<ASTConsumer>> Consumers;
+  llvm::StringRef DumpDeserializedDeclarationRangesPath = CI.getFrontendOpts().DumpDeserializedDeclarationRangesPath;
+  if (!DumpDeserializedDeclarationRangesPath.empty()) {
+    std::error_code ErrorCode;
+    auto FileStream = std::make_unique<llvm::raw_fd_ostream>(DumpDeserializedDeclarationRangesPath, ErrorCode, llvm::sys::fs::OF_None);
+    if (!ErrorCode) {
+      auto Printer = std::make_unique<DeserializedDeclsLineRangePrinter>(CI.getSourceManager(), std::move(FileStream));
+      Consumers.push_back(std::move(Printer));
+    } else {
+      llvm::errs() << "Failed to create output file for -dump-deserialized-declaration-ranges flag, file path: " << DumpDeserializedDeclarationRangesPath << ", error: " << ErrorCode.message() << "\n";
+    }
+  }
+
   // Validate -add-plugin args.
   bool FoundAllPlugins = true;
   for (const std::string &Arg : CI.getFrontendOpts().AddPluginActions) {
@@ -138,17 +291,12 @@ FrontendAction::CreateWrappedASTConsumer(CompilerInstance &CI,
   if (!FoundAllPlugins)
     return nullptr;
 
-  // If there are no registered plugins we don't need to wrap the consumer
-  if (FrontendPluginRegistry::begin() == FrontendPluginRegistry::end())
-    return Consumer;
-
   // If this is a code completion run, avoid invoking the plugin consumers
   if (CI.hasCodeCompletionConsumer())
     return Consumer;
 
   // Collect the list of plugins that go before the main action (in Consumers)
   // or after it (in AfterConsumers)
-  std::vector<std::unique_ptr<ASTConsumer>> Consumers;
   std::vector<std::unique_ptr<ASTConsumer>> AfterConsumers;
   for (const FrontendPluginRegistry::entry &Plugin :
        FrontendPluginRegistry::entries()) {
@@ -191,6 +339,9 @@ FrontendAction::CreateWrappedASTConsumer(CompilerInstance &CI,
       Consumers.push_back(std::move(C));
   }
 
+  assert(Consumers.size() >= 1 && "should have added the main consumer");
+  if (Consumers.size() == 1) 
+    return std::move(Consumers.front());
   return std::make_unique<MultiplexConsumer>(std::move(Consumers));
 }
 
diff --git a/clang/test/Frontend/dump-deserialized-declaration-ranges.cpp b/clang/test/Frontend/dump-deserialized-declaration-ranges.cpp
new file mode 100644
index 0000000000000..bb43cb7c40e77
--- /dev/null
+++ b/clang/test/Frontend/dump-deserialized-declaration-ranges.cpp
@@ -0,0 +1,80 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+// RUN: %clang_cc1 -xc++ -fmodules -fmodule-name=foo -fmodule-map-file=%t/foo.cppmap -emit-module %t/foo.cppmap -o %t/foo.pcm
+// RUN: %clang_cc1 -xc++ -fmodules -dump-deserialized-declaration-ranges=%t/decls -fmodule-file=%t/foo.pcm %t/foo.cpp -o %t/foo.o
+// RUN: cat %t/decls
+// RUN: echo '{ \
+// RUN:  "required_ranges": [\
+// RUN:    {\
+// RUN:      "file": "/usr/local/google/home/bakalova/llvm-project/build/tools/clang/test/Frontend/Output/dump-deserialized-declaration-ranges.cpp.tmp/foo.h",\
+// RUN:      "range": [\
+// RUN:        {\
+// RUN:          "from": {\
+// RUN:            "line": 1,\
+// RUN:            "column": 1\
+// RUN:          },\
+// RUN:          "to": {\
+// RUN:            "line": 9,\
+// RUN:            "column": 1\
+// RUN:          }\
+// RUN:        },\
+// RUN:        {\
+// RUN:          "from": {\
+// RUN:            "line": 11,\
+// RUN:            "column": 1\
+// RUN:          },\
+// RUN:          "to": {\
+// RUN:            "line": 11,\
+// RUN:            "column": 12\
+// RUN:          }\
+// RUN:        },\
+// RUN:        {\
+// RUN:          "from": {\
+// RUN:            "line": 13,\
+// RUN:            "column": 1\
+// RUN:          },\
+// RUN:          "to": {\
+// RUN:            "line": 15,\
+// RUN:            "column": 1\
+// RUN:          }\
+// RUN:        }\
+// RUN:      ]\
+// RUN:    }\
+// RUN:  ]\
+// RUN:}' > %t/expected_decls
+// RUN: jq '.' %t/expected_decls > %t/expected_decls_formatted
+// RUN: diff %t/decls %t/expected_decls_formatted
+
+//--- foo.cppmap
+module foo {
+  header "foo.h"
+  export *
+}
+
+//--- foo.h
+class MyData {
+public:
+    MyData(int val): value_(val) {}
+    int getValue() const {
+        return 5;
+    }
+private:
+    int value_;
+};
+
+extern int global_value;
+
+int multiply(int a, int b) {
+    return a * b;
+}
+
+//--- foo.cpp
+#include "foo.h"
+int global_value = 5;
+int main() {
+  MyData data(5);
+  int current_value = data.getValue();
+  int doubled_value = multiply(current_value, 2);
+  int final_result = doubled_value + global_value;
+}

Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the implementation mostly looks good. I've left a lot of comments, but they are mostly NITs. One major thing that we need to change is the way we test this, see the comment about relying on jq.

Another thought that crossed my mind is that we are actually heavily optimizing for minimization, but use a name that creates an impression we might care about something else. I don't have great ideas there, but maybe -dump-minimization-hints would be appropriate? Any thoughts on this?

It would be great to link to the code consuming those hints in cvise too to add some context on why we are doing this.

@ilya-biryukov
Copy link
Contributor

Could you also update the PR description and change [WIP] to [Clang] in the title so that we don't accidentally forget this before comitting?

…ges`. The flag allows to specify a file path to dump ranges of deserialized declarations in `ASTReader`. Example usage:

```
clang -Xclang=-dump-deserialized-declaration-ranges=/tmp/decls -c file.cc -o file.o
```

Example output:
```
// /tmp/decls
{
  "required_ranges": [
    {
      "file": "foo.h",
      "range": [
        {
          "from": {
            "line": 26,
            "column": 1
          },
          "to": {
            "line": 27,
            "column": 77
          }
        }
      ]
    },
    {
      "file": "bar.h",
      "range": [
        {
          "from": {
            "line": 30,
            "column": 1
          },
          "to": {
            "line": 35,
            "column": 1
          }
        },
        {
          "from": {
            "line": 92,
            "column": 1
          },
          "to": {
            "line": 95,
            "column": 1
          }
        }
      ]
    }
  ]
}

```
Specifying the flag creates an instance of `DeserializedDeclsLineRangePrinter`, which dumps ranges of deserialized declarations to aid debugging and bug minimization.

Required ranges are computed from source ranges of Decls. `TranslationUnitDecl`, `LinkageSpecDecl` and `NamespaceDecl` are ignored for the sake of this PR.

Technical details:
* `DeserializedDeclsLineRangePrinter` implements `ASTConsumer` and `ASTDeserializationListener`, so that an object of `DeserializedDeclsLineRangePrinter` registers as its own listener.
* `ASTDeserializationListener` interface provides the `DeclRead` callback that we use to collect the deserialized Decls.
Printing or otherwise processing them as this point is dangerous, since that could trigger additional deserialization and crash compilation.
* The collected Decls are processed in `HandleTranslationUnit` method of `ASTConsumer`. This is a safe point, since we know that by this point all the Decls needed by the compiler frontend have been deserialized.
*  In case our processing causes further deserialization, `DeclRead` from the listener might be called again. However, at that point we don't accept any more Decls for processing.
@VitaNuo
Copy link
Contributor Author

VitaNuo commented Apr 4, 2025

Thanks, the implementation mostly looks good. I've left a lot of comments, but they are mostly NITs. One major thing that we need to change is the way we test this, see the comment about relying on jq.

Sure, changed to FileCheck.

Another thought that crossed my mind is that we are actually heavily optimizing for minimization, but use a name that creates an impression we might care about something else. I don't have great ideas there, but maybe -dump-minimization-hints would be appropriate? Any thoughts on this?

TBH I would prefer not to do that. Although we're doing this for minimization, there's nothing particular about the deserialized ranges or the JSON format that we produce that limits its use to minimization. If someone else find this useful, I would only welcome that. So in this sense, I'd prefer not to have a very specific name.

It would be great to link to the code consuming those hints in cvise too to add some context on why we are doing this.

We should do this as a followup, I think. The C-Vise code is undergoing too much change to link to a place that is likely to be outdated very soon.

VitaNuo added 5 commits April 8, 2025 09:26
…range specifies the start of the last token). In this case, compute the source location just past the end of the token at this source location. Fix the test. The end column is not exclusive.
… range. If it's a semicolon, advance the location by one token.
@VitaNuo VitaNuo changed the title [WIP] Implement -dump-deserialized-declaration-ranges flag. Implement -dump-deserialized-declaration-ranges flag. Apr 8, 2025
@VitaNuo VitaNuo requested review from emaxx-google and usx95 April 8, 2025 10:09
@usx95
Copy link
Contributor

usx95 commented Apr 8, 2025

Another thought that crossed my mind is that we are actually heavily optimizing for minimization, but use a name that creates an impression we might care about something else. I don't have great ideas there, but maybe -dump-minimization-hints would be appropriate? Any thoughts on this?

TBH I would prefer not to do that. Although we're doing this for minimization, there's nothing particular about the deserialized ranges or the JSON format that we produce that limits its use to minimization. If someone else find this useful, I would only welcome that. So in this sense, I'd prefer not to have a very specific name.

I am in favor of using -dump-minimization-hints to scope it down and avoid Hyrum's law. If we want to be more descriptive, we could have this as a CommaJoined flag, eg. -dump-minimization-hints=deserialized-decl-ranges,some-other-future-hint,all.

@VitaNuo
Copy link
Contributor Author

VitaNuo commented Apr 8, 2025

Another thought that crossed my mind is that we are actually heavily optimizing for minimization, but use a name that creates an impression we might care about something else. I don't have great ideas there, but maybe -dump-minimization-hints would be appropriate? Any thoughts on this?

TBH I would prefer not to do that. Although we're doing this for minimization, there's nothing particular about the deserialized ranges or the JSON format that we produce that limits its use to minimization. If someone else find this useful, I would only welcome that. So in this sense, I'd prefer not to have a very specific name.

I am in favor of using -dump-minimization-hints to scope it down and avoid Hyrum's law. If we want to be more descriptive, we could have this as a CommaJoined flag, eg. -dump-minimization-hints=deserialized-decl-ranges,some-other-future-hint,all.

Thanks. I will rename the flag in that case. PTAL.

@VitaNuo VitaNuo changed the title Implement -dump-deserialized-declaration-ranges flag. Implement -dump-minimization-hints flag. Apr 9, 2025
@VitaNuo VitaNuo added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Apr 10, 2025
@VitaNuo VitaNuo merged commit 9eeafc6 into llvm:main Apr 11, 2025
12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 11, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-asan running on sanitizer-buildbot1 while building clang at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/7502

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 90138 tests, 88 workers --
Testing:  0.. 10
FAIL: Clang :: Interpreter/inline-virtual.cpp (12959 of 90138)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp # RUN: at line 6
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
JIT session error: In graph incr_module_23-jitted-objectbuffer, section .text.startup: relocation target "_ZN1AD2Ev" at address 0x7dce64e23040 is out of range of Delta32 fixup at 0x79ce63c0d02d (<anonymous block> @ 0x79ce63c0d010 + 0x1d)
error: Failed to materialize symbols: { (main, { $.incr_module_23.__inits.0, __orc_init_func.incr_module_23, a2 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_23 }) }
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 90138 tests, 88 workers --
Testing:  0.. 10
FAIL: Clang :: Interpreter/inline-virtual.cpp (12959 of 90138)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp # RUN: at line 6
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
JIT session error: In graph incr_module_23-jitted-objectbuffer, section .text.startup: relocation target "_ZN1AD2Ev" at address 0x7dce64e23040 is out of range of Delta32 fixup at 0x79ce63c0d02d (<anonymous block> @ 0x79ce63c0d010 + 0x1d)
error: Failed to materialize symbols: { (main, { $.incr_module_23.__inits.0, __orc_init_func.incr_module_23, a2 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_23 }) }
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
This PR implements a CC1 flag `-dump-minimization-hints`.
The flag allows to specify a file path to dump ranges of deserialized
declarations in `ASTReader`. Example usage:

```
clang -Xclang=-dump-minimization-hints=/tmp/decls -c file.cc -o file.o
```

Example output:
```
// /tmp/decls
{
  "required_ranges": [
    {
      "file": "foo.h",
      "range": [
        {
          "from": {
            "line": 26,
            "column": 1
          },
          "to": {
            "line": 27,
            "column": 77
          }
        }
      ]
    },
    {
      "file": "bar.h",
      "range": [
        {
          "from": {
            "line": 30,
            "column": 1
          },
          "to": {
            "line": 35,
            "column": 1
          }
        },
        {
          "from": {
            "line": 92,
            "column": 1
          },
          "to": {
            "line": 95,
            "column": 1
          }
        }
      ]
    }
  ]
}

```
Specifying the flag creates an instance of
`DeserializedDeclsSourceRangePrinter`, which dumps ranges of deserialized
declarations to aid debugging and bug minimization (we use is as input to [C-Vise](https://github.com/emaxx-google/cvise/tree/multifile-hints).

Required ranges are computed from source ranges of Decls.
`TranslationUnitDecl`, `LinkageSpecDecl` and `NamespaceDecl` are ignored
for the sake of this PR.

Technical details:
* `DeserializedDeclsSourceRangePrinter` implements `ASTConsumer` and
`ASTDeserializationListener`, so that an object of
`DeserializedDeclsSourceRangePrinter` registers as its own listener.
* `ASTDeserializationListener` interface provides the `DeclRead`
callback that we use to collect the deserialized Decls.
Printing or otherwise processing them as this point is dangerous, since
that could trigger additional deserialization and crash compilation.
* The collected Decls are processed in `HandleTranslationUnit` method of
`ASTConsumer`. This is a safe point, since we know that by this point
all the Decls needed by the compiler frontend have been deserialized.
* In case our processing causes further deserialization, `DeclRead` from
the listener might be called again. However, at that point we don't
accept any more Decls for processing.
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Apr 29, 2025
Local branch origin/amd-gfx 9398efe Merged main:d02a704ec952 into origin/amd-gfx:6efc92ea074c
Remote branch main 9eeafc6 Implement `-dump-minimization-hints` flag.  (llvm#133910)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants