[mlir][Pass] Enable the option for reproducer generation without crashing #75421

plotfi · 2023-12-14T01:59:43Z

This PR adds API makeReproducer and cl::opt flag --mlir-generate-reproducer=<filename> in order to allow for mlir reproducer dumps even when the pipeline doesn't crash.

This PR also decouples the code that handles generation of an MLIR reproducer from the crash recovery portion. The purpose is to allow for generating reproducers outside of the context of a compiler crash.

This will be useful for frameworks and runtimes that use MLIR where it is needed to reproduce the pipeline behavior for reasons outside of diagnosing crashes. An example is for diagnosing performance issues using offline tools, where being able to dump the reproducer from a runtime compiler would be helpful.

llvmbot · 2023-12-14T02:00:11Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-core

Author: Puyan Lotfi (plotfi)

Changes

Adding API enableReproducerGeneration (with an optional alwaysGenerateReproducer parameter) and cl::opt flag -mlir-pass-pipeline-always-generate-reproducer in order to allow for mlir reproducer dumps even when the pipeline doesn't crash.

This will be useful for frameworks and runtimes that use MLIR where it is needed to reproduce the pipeline behavior for reasons outside of diagnosing crashes. An example is for diagnosing performance issue using offline tools, where being able to dump the reproducer from a runtime compiler would be helpful.

Full diff: https://github.com/llvm/llvm-project/pull/75421.diff

5 Files Affected:

(modified) mlir/include/mlir/Pass/PassManager.h (+20)
(modified) mlir/lib/Pass/PassCrashRecovery.cpp (+55-25)
(modified) mlir/lib/Pass/PassDetail.h (+2-1)
(modified) mlir/lib/Pass/PassManagerOptions.cpp (+14-3)
(added) mlir/test/Pass/crashless-reproducer.mlir (+13)

diff --git a/mlir/include/mlir/Pass/PassManager.h b/mlir/include/mlir/Pass/PassManager.h
index d5f1ea0fe0350d..e6461d90abea98 100644
--- a/mlir/include/mlir/Pass/PassManager.h
+++ b/mlir/include/mlir/Pass/PassManager.h
@@ -243,6 +243,16 @@ class PassManager : public OpPassManager {
   void enableCrashReproducerGeneration(StringRef outputFile,
                                        bool genLocalReproducer = false);
 
+  /// Enable support for the pass manager to generate a reproducer. With this
+  /// invocation the reproducer is generated depending on the parameter
+  /// `alwaysGenerateReproducer`. `outputFile` is a .mlir filename used to
+  /// write the generated reproducer. If `genLocalReproducer` is true, the pass
+  /// manager will attempt to generate a local reproducer that contains the
+  /// smallest pipeline.
+  void enableReproducerGeneration(StringRef outputFile,
+                                  bool genLocalReproducer = false,
+                                  bool alwaysGenerateReproducer = false);
+
   /// Streams on which to output crash reproducer.
   struct ReproducerStream {
     virtual ~ReproducerStream() = default;
@@ -266,6 +276,16 @@ class PassManager : public OpPassManager {
   void enableCrashReproducerGeneration(ReproducerStreamFactory factory,
                                        bool genLocalReproducer = false);
 
+  /// Enable support for the pass manager to generate a reproducer. With this
+  /// invocation the reproducer is generated depending on the parameter
+  /// `alwaysGenerateReproducer`. `factory` is used to construct the streams
+  /// to write the generated reproducer to. If `genLocalReproducer` is true, the
+  /// pass manager will attempt to generate a local reproducer that contains the
+  /// smallest pipeline.
+  void enableReproducerGeneration(ReproducerStreamFactory factory,
+                                  bool genLocalReproducer = false,
+                                  bool alwaysGenerateReproducer = false);
+
   /// Runs the verifier after each individual pass.
   void enableVerifier(bool enabled = true);
 
diff --git a/mlir/lib/Pass/PassCrashRecovery.cpp b/mlir/lib/Pass/PassCrashRecovery.cpp
index df1a0762ae34a5..4480d6797de4c5 100644
--- a/mlir/lib/Pass/PassCrashRecovery.cpp
+++ b/mlir/lib/Pass/PassCrashRecovery.cpp
@@ -176,8 +176,10 @@ void RecoveryReproducerContext::registerSignalHandler() {
 
 struct PassCrashReproducerGenerator::Impl {
   Impl(PassManager::ReproducerStreamFactory &streamFactory,
-       bool localReproducer)
-      : streamFactory(streamFactory), localReproducer(localReproducer) {}
+       bool localReproducer,
+       bool alwaysGenerateReproducer = false)
+      : streamFactory(streamFactory), localReproducer(localReproducer),
+        alwaysGenerateReproducer(alwaysGenerateReproducer){}
 
   /// The factory to use when generating a crash reproducer.
   PassManager::ReproducerStreamFactory streamFactory;
@@ -195,11 +197,17 @@ struct PassCrashReproducerGenerator::Impl {
 
   /// Various pass manager flags that get emitted when generating a reproducer.
   bool pmFlagVerifyPasses = false;
+
+  /// Flag indicating if reproducer generation should occur regardless of 
+  /// a crash or failing pass.
+  bool alwaysGenerateReproducer = false;
 };
 
 PassCrashReproducerGenerator::PassCrashReproducerGenerator(
-    PassManager::ReproducerStreamFactory &streamFactory, bool localReproducer)
-    : impl(std::make_unique<Impl>(streamFactory, localReproducer)) {}
+    PassManager::ReproducerStreamFactory &streamFactory, bool localReproducer,
+    bool alwaysGenerateReproducer)
+    : impl(std::make_unique<Impl>(streamFactory, localReproducer,
+                                  alwaysGenerateReproducer)) {}
 PassCrashReproducerGenerator::~PassCrashReproducerGenerator() = default;
 
 void PassCrashReproducerGenerator::initialize(
@@ -235,13 +243,10 @@ void PassCrashReproducerGenerator::finalize(Operation *rootOp,
     return;
 
   // If the pass manager execution succeeded, we don't generate any reproducers.
-  if (succeeded(executionResult))
+  const bool executionResultSucceeded = succeeded(executionResult);
+  if (executionResultSucceeded && !impl->alwaysGenerateReproducer)
     return impl->activeContexts.clear();
 
-  InFlightDiagnostic diag = emitError(rootOp->getLoc())
-                            << "Failures have been detected while "
-                               "processing an MLIR pass pipeline";
-
   // If we are generating a global reproducer, we include all of the running
   // passes in the error message for the only active context.
   if (!impl->localReproducer) {
@@ -251,13 +256,18 @@ void PassCrashReproducerGenerator::finalize(Operation *rootOp,
     std::string description;
     impl->activeContexts.front()->generate(description);
 
-    // Emit an error to the user.
-    Diagnostic &note = diag.attachNote() << "Pipeline failed while executing [";
-    llvm::interleaveComma(impl->runningPasses, note,
-                          [&](const std::pair<Pass *, Operation *> &value) {
-                            formatPassOpReproducerMessage(note, value);
-                          });
-    note << "]: " << description;
+    if (!executionResultSucceeded) {
+      InFlightDiagnostic diag = emitError(rootOp->getLoc())
+                                << "Failures have been detected while "
+                                   "processing an MLIR pass pipeline";
+      // Emit an error to the user.
+      Diagnostic &note = diag.attachNote() << "Pipeline failed while executing [";
+      llvm::interleaveComma(impl->runningPasses, note,
+                            [&](const std::pair<Pass *, Operation *> &value) {
+                              formatPassOpReproducerMessage(note, value);
+                            });
+      note << "]: " << description;
+    }
     impl->runningPasses.clear();
     impl->activeContexts.clear();
     return;
@@ -274,10 +284,15 @@ void PassCrashReproducerGenerator::finalize(Operation *rootOp,
   std::string description;
   reproducerContext.generate(description);
 
-  // Emit an error to the user.
-  Diagnostic &note = diag.attachNote() << "Pipeline failed while executing ";
-  formatPassOpReproducerMessage(note, impl->runningPasses.back());
-  note << ": " << description;
+  if (!executionResultSucceeded) {
+    InFlightDiagnostic diag = emitError(rootOp->getLoc())
+                              << "Failures have been detected while "
+                                 "processing an MLIR pass pipeline";
+    // Emit an error to the user.
+    Diagnostic &note = diag.attachNote() << "Pipeline failed while executing ";
+    formatPassOpReproducerMessage(note, impl->runningPasses.back());
+    note << ": " << description;
+  }
 
   impl->activeContexts.clear();
   impl->runningPasses.clear();
@@ -420,10 +435,22 @@ LogicalResult PassManager::runWithCrashRecovery(Operation *op,
 
 void PassManager::enableCrashReproducerGeneration(StringRef outputFile,
                                                   bool genLocalReproducer) {
+  enableReproducerGeneration(outputFile, genLocalReproducer);
+}
+
+void PassManager::enableCrashReproducerGeneration(
+    ReproducerStreamFactory factory, bool genLocalReproducer) {
+  enableReproducerGeneration(factory, genLocalReproducer);
+}
+
+void PassManager::enableReproducerGeneration(
+    StringRef outputFile,
+    bool genLocalReproducer,
+    bool alwaysGenerateReproducer) {
   // Capture the filename by value in case outputFile is out of scope when
   // invoked.
   std::string filename = outputFile.str();
-  enableCrashReproducerGeneration(
+  enableReproducerGeneration(
       [filename](std::string &error) -> std::unique_ptr<ReproducerStream> {
         std::unique_ptr<llvm::ToolOutputFile> outputFile =
             mlir::openOutputFile(filename, &error);
@@ -433,11 +460,14 @@ void PassManager::enableCrashReproducerGeneration(StringRef outputFile,
         }
         return std::make_unique<FileReproducerStream>(std::move(outputFile));
       },
-      genLocalReproducer);
+      genLocalReproducer,
+      alwaysGenerateReproducer);
 }
 
-void PassManager::enableCrashReproducerGeneration(
-    ReproducerStreamFactory factory, bool genLocalReproducer) {
+void PassManager::enableReproducerGeneration(
+    ReproducerStreamFactory factory,
+    bool genLocalReproducer,
+    bool alwaysGenerateReproducer) {
   assert(!crashReproGenerator &&
          "crash reproducer has already been initialized");
   if (genLocalReproducer && getContext()->isMultithreadingEnabled())
@@ -446,7 +476,7 @@ void PassManager::enableCrashReproducerGeneration(
         "pass-manager without disabling multi-threading first.");
 
   crashReproGenerator = std::make_unique<PassCrashReproducerGenerator>(
-      factory, genLocalReproducer);
+      factory, genLocalReproducer, alwaysGenerateReproducer);
   addInstrumentation(
       std::make_unique<CrashReproducerInstrumentation>(*crashReproGenerator));
 }
diff --git a/mlir/lib/Pass/PassDetail.h b/mlir/lib/Pass/PassDetail.h
index 0e964b6d6d36bc..decc34c2af6c28 100644
--- a/mlir/lib/Pass/PassDetail.h
+++ b/mlir/lib/Pass/PassDetail.h
@@ -100,7 +100,8 @@ class PassCrashReproducerGenerator {
 public:
   PassCrashReproducerGenerator(
       PassManager::ReproducerStreamFactory &streamFactory,
-      bool localReproducer);
+      bool localReproducer,
+      bool alwaysGenerateReproducer = false);
   ~PassCrashReproducerGenerator();
 
   /// Initialize the generator in preparation for reproducer generation. The
diff --git a/mlir/lib/Pass/PassManagerOptions.cpp b/mlir/lib/Pass/PassManagerOptions.cpp
index ffc53b7e3ed023..d77aed772fee85 100644
--- a/mlir/lib/Pass/PassManagerOptions.cpp
+++ b/mlir/lib/Pass/PassManagerOptions.cpp
@@ -29,6 +29,10 @@ struct PassManagerOptions {
       llvm::cl::desc("When generating a crash reproducer, attempt to generated "
                      "a reproducer with the smallest pipeline."),
       llvm::cl::init(false)};
+  llvm::cl::opt<bool> alwaysGenerateReproducer{
+      "mlir-pass-pipeline-always-generate-reproducer",
+      llvm::cl::desc("Generating a reproducer even if a crash did not occur "),
+      llvm::cl::init(false)};
 
   //===--------------------------------------------------------------------===//
   // IR Printing
@@ -135,9 +139,16 @@ LogicalResult mlir::applyPassManagerCLOptions(PassManager &pm) {
     return failure();
 
   // Generate a reproducer on crash/failure.
-  if (options->reproducerFile.getNumOccurrences())
-    pm.enableCrashReproducerGeneration(options->reproducerFile,
-                                       options->localReproducer);
+  if (options->reproducerFile.getNumOccurrences()) {
+    if (options->alwaysGenerateReproducer) {
+      pm.enableReproducerGeneration(options->reproducerFile,
+                                    options->localReproducer,
+                                    true /*alwaysGenerateReproducer*/);
+    } else {
+      pm.enableCrashReproducerGeneration(options->reproducerFile,
+                                         options->localReproducer);
+    }
+  }
 
   // Enable statistics dumping.
   if (options->passStatistics)
diff --git a/mlir/test/Pass/crashless-reproducer.mlir b/mlir/test/Pass/crashless-reproducer.mlir
new file mode 100644
index 00000000000000..e34efaeace89ea
--- /dev/null
+++ b/mlir/test/Pass/crashless-reproducer.mlir
@@ -0,0 +1,13 @@
+// RUN: mlir-opt %s -pass-pipeline='builtin.module(builtin.module(test-module-pass))' \
+// RUN:   -mlir-pass-pipeline-crash-reproducer=%t \
+// RUN:   -mlir-pass-pipeline-always-generate-reproducer=true -verify-diagnostics
+
+// RUN: cat %t | FileCheck -check-prefix=REPRO %s
+
+module @inner_mod1 {
+  module @foo {}
+}
+
+// REPRO: module @inner_mod1
+// REPRO: module @foo {
+// REPRO: pipeline: "builtin.module(builtin.module(test-module-pass))"

github-actions · 2023-12-14T02:02:22Z

✅ With the latest revision this PR passed the C/C++ code formatter.

joker-eph · 2023-12-14T02:22:18Z

Being able to generate a reproducer for a framework (or other) outside of a crash seems useful, but it’s not clear to me why it is a passe manager option?
I would think that the framework can already generate it independently before calling the pass manager.

plotfi · 2023-12-14T02:27:11Z

Being able to generate a reproducer for a framework (or other) outside of a crash seems useful, but it’s not clear to me why it is a passe manager option?

I considered that, but I am primarily trying to target the mlir-opt's reproducer file ease of use here. Can I generate a reproducer file that could be loaded by mlir-opt or similar tools without doing a pass manager option?

I guess I should I mention this in my commit message explicitly?

plotfi · 2023-12-14T02:33:09Z

@joker-eph Ah, do you mean then to invoke something like RecoveryReproducerContext::generate() from the runtime?

joker-eph · 2023-12-14T06:28:04Z

@joker-eph Ah, do you mean then to invoke something like RecoveryReproducerContext::generate() from the runtime?

Yes exactly!

If we need to extract the implementation of RecoveryReproducerContext::generate() as a more easily reusable API, that's an OK refactoring of course.

plotfi · 2023-12-14T06:55:03Z

@joker-eph Ah, do you mean then to invoke something like RecoveryReproducerContext::generate() from the runtime?

Yes exactly!

If we need to extract the implementation of RecoveryReproducerContext::generate() as a more easily reusable API, that's an OK refactoring of course.

I will have to dive a little deeper into the current implementation to see how things could be setup without doing it in the pass manager directly. Ideally I'd want a similar mechanism to how the pass manager does it where you dump out the reproducer after everything has run. The pass manager's finalizer just made this kind of conveniently easy, but I am not certain if it is the right approach either.

plotfi · 2023-12-21T09:16:15Z

@joker-eph Ah, do you mean then to invoke something like RecoveryReproducerContext::generate() from the runtime?

Yes exactly!

If we need to extract the implementation of RecoveryReproducerContext::generate() as a more easily reusable API, that's an OK refactoring of course.

Work in progress, but I have refactored the bits that append the reproducer info without having to do it from inside of the pass manager. The reproducers can be appended like so:

  for (auto &pass : pm.getPasses())
    makeReproducer(&pass, op.get(), "reproducer.mlir");

mlir/test/Pass/crashless-reproducer.mlir

plotfi · 2023-12-22T22:38:23Z

@joker-eph I've separated out the NFC portions from the test and the new functionality. Ready for review.

mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h

mlir/lib/Pass/Pass.cpp

mlir/lib/Pass/PassCrashRecovery.cpp

joker-eph · 2024-01-02T14:37:35Z

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

+        llvm::cl::desc("Generate a reproducer at"
+                       " --mlir-reproducer-filename=<filename> "
+                       " (no crash required)"),
+        cl::location(generateReproducerFlag), cl::init(false));


Can we just use one option?

--mlir-generate-reproducer=<file>

If it is set we generate the repro using the provided path.
(cl::opt exposes getNumOccurences())

That is ok with me. Do you think it matters if we generate the same file path for a crash reproducer versus not? I wasn't completely certain about this, so thats why I did a separate cl::opt flag file.

I don't follow what you're asking about here?

You're introducing 2 new options, I'm saying you can introduce one only. But that won't conflict with the mlir-pass-pipeline-crash-reproducer option right?

Oh I see, I understand now.

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

…ation This patch decouples the code that handles generation of an MLIR reproducer from the crash recovery portion. The purpose is to allow for generating reproducers outside of the context of a compiler crash. (cherry picked from commit bfb0682f2a318a84a05f91466757a03d68ddfd1f)

…hing This patch adds API `makeReproducer` and cl::opt flag --mlir-generate-reproducer=<filename> in order to allow for mlir reproducer dumps even when the pipeline doesn't crash. This will be useful for frameworks and runtimes that use MLIR where it is needed to reproduce the pipeline behavior for reasons outside of diagnosing crashes. An example is for diagnosing performance issues using offline tools, where being able to dump the reproducer from a runtime compiler would be helpful.

plotfi · 2024-01-03T01:44:19Z

@joker-eph Updated based on your feedback.

plotfi requested review from joker-eph, htyu and River707 December 14, 2023 01:59

llvmbot added mlir:core MLIR Core Infrastructure mlir labels Dec 14, 2023

plotfi force-pushed the plotfi-mlir-pass-always-reproducer branch 2 times, most recently from 0fb3d49 to 02d37a9 Compare December 14, 2023 02:11

plotfi force-pushed the plotfi-mlir-pass-always-reproducer branch from 02d37a9 to 15b3128 Compare December 21, 2023 09:12

plotfi commented Dec 21, 2023

View reviewed changes

mlir/test/Pass/crashless-reproducer.mlir Outdated Show resolved Hide resolved

plotfi force-pushed the plotfi-mlir-pass-always-reproducer branch 5 times, most recently from c4d138c to bfb0682 Compare December 22, 2023 21:38

plotfi force-pushed the plotfi-mlir-pass-always-reproducer branch 6 times, most recently from c3f9e71 to 67130c5 Compare December 29, 2023 21:30

joker-eph reviewed Jan 2, 2024

View reviewed changes

plotfi force-pushed the plotfi-mlir-pass-always-reproducer branch 2 times, most recently from ede2eab to 101b205 Compare January 3, 2024 01:06

plotfi added 2 commits January 2, 2024 17:14

plotfi force-pushed the plotfi-mlir-pass-always-reproducer branch from 101b205 to 8d21a8d Compare January 3, 2024 01:14

joker-eph approved these changes Jan 3, 2024

View reviewed changes

plotfi merged commit 03e29a4 into llvm:main Jan 3, 2024

plotfi mentioned this pull request Jan 3, 2024

[BACKEND] Adding optional environment variable for dumping reproducer mlir files triton-lang/triton#2869

Merged

[mlir][Pass] Enable the option for reproducer generation without crashing #75421

[mlir][Pass] Enable the option for reproducer generation without crashing #75421

Uh oh!

Conversation

plotfi commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph commented Dec 14, 2023

Uh oh!

plotfi commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

plotfi commented Dec 14, 2023

Uh oh!

joker-eph commented Dec 14, 2023

Uh oh!

plotfi commented Dec 14, 2023

Uh oh!

plotfi commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

plotfi commented Dec 22, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joker-eph Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

plotfi Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

joker-eph Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

plotfi Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

plotfi commented Jan 3, 2024

Uh oh!

Uh oh!

plotfi commented Dec 14, 2023 •

edited

Loading

llvmbot commented Dec 14, 2023 •

edited

Loading

github-actions bot commented Dec 14, 2023 •

edited

Loading

plotfi commented Dec 14, 2023 •

edited

Loading

plotfi commented Dec 21, 2023 •

edited

Loading