Add flags to dump IR to a file before and after LLVM passes #65179

NuriAmari · 2023-09-01T19:14:44Z

LLVM offers -print-after and -print-before flags that allow you to print IR to stderr before and after any pass you want. This can be useful for debugging LLVM optimization issue, but is far too noisy for large builds.

This patch adds analogous options -dump-after and -dump-before that dump the IR to appropriately named files. In addition, it also introduces flags -dump-after-all, -dump-before-all and -ir-dump-directory to control where the files are written to.

The implementation follows @chandlerc suggestions in https://reviews.llvm.org/D54794, but does not implement the suggested design completely. Notably, the IR extraction behavior differing from the print family of flags is not implemented. In addition to the suggestion of adding how many times a particular pass has run in at a given nesting level, I've also added how many passes have run at that nesting level. The tests provide a more clear explanation.

This is not complete, but I think it is an acceptable initial pass, and I'd like some feedback on the approach before I go too deep. Features that are missing / I would like:

Less verbose pass names (particular pass managers and adapters have extremely verbose names)
More descriptive names (ie. what function is this function pass operation on)
Filter parity with the print flags (those flags respect flags for example allowing filtering of which functions to print)
Legacy pass manager support (we need this to work for MIR, and at first glance it doesn't)

NuriAmari · 2023-09-01T19:17:40Z

llvm/lib/IR/PrintPasses.cpp

@@ -33,6 +33,31 @@ static cl::opt<bool> PrintAfterAll("print-after-all",
                                   llvm::cl::desc("Print IR after each pass"),
                                   cl::init(false), cl::Hidden);

+static cl::list<std::string>


There is a lot duplication between the print code and the dump code, though there are differences, it might be nice to try to combine them a little more.

NuriAmari · 2023-09-01T19:18:59Z

llvm/include/llvm/Passes/StandardInstrumentations.h

+
+private:
+  void dumpBeforePass(StringRef PassID, Any IR);
+  void dumpAfterPass(StringRef PassID, Any IR);


I did not implement a dumpAfterPassInvalidated callback, mostly because I didn't understand when that occurs. If it makes sense to include one, I can.

NuriAmari · 2023-09-01T19:20:43Z

llvm/test/Other/dump-before-after-filenames

+; Basic dump before and after a single module pass
+; RUN: opt %s -disable-output -passes='no-op-module' -ir-dump-directory %t/logs -dump-after=no-op-module -dump-before=no-op-module
+; RUN: find %t/logs -type f -print | FileCheck %s --check-prefix=SINGLE-RUN -DSOURCE_FILE_NAME=%s
+; SINGLE-RUN-DAG: [[SOURCE_FILE_NAME]]/0.NoOpModulePass.0-before.ll


The first 0 means this is the first pass to run at this nesting level. The second 0 means this is the first NoOpModulePass to run at this nesting level.

While dumping to files is less noisy, without this first number you lose the sequence in which passes run, which makes it harder to trace issues.

NuriAmari · 2023-09-01T19:22:14Z

llvm/test/Other/dump-before-after-filenames

+; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/1.ModuleToFunctionPassAdaptor.0/1.PassManager<llvm::Function>.1/0.NoOpFunctionPass.0-before.ll
+; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/1.ModuleToFunctionPassAdaptor.0/1.PassManager<llvm::Function>.1/0.NoOpFunctionPass.0-after.ll
+; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/2.ModuleToFunctionPassAdaptor.1/1.PassManager<llvm::Function>.1/0.FunctionToLoopPassAdaptor.0/1.PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>.0/0.NoOpLoopPass.0-before.ll
+; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/2.ModuleToFunctionPassAdaptor.1/1.PassManager<llvm::Function>.1/0.FunctionToLoopPassAdaptor.0/1.PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>.0/0.NoOpLoopPass.0-after.ll


Surely a less verbose pass name than PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&> is achievable, but I'm not sure how to best go about that.

I made an attempt at this, the trouble is the new PM instrumentation infrastructure only provides callbacks with the PassID, which is this verbose sometimes. The existing Class name to pass name mapping does not contain entries for all passes (ex. adaptors and PMs).

I think more consistent names are doable, but perhaps something for another patch.

NuriAmari · 2023-09-01T19:25:03Z

llvm/test/Other/lit.local.cfg

@@ -0,0 +1 @@
+config.suffixes = ['', '.ll']


This is because in LIT %s expands to the source file path with periods escaped like so: .../dump-before-after-filenames\\., which won't match the output of find.

I also considered running ls on each file, but I think the output of filecheck on find is a little easier to read when it fails.

llvm/include/llvm/Passes/StandardInstrumentations.h

Summary: LLVM offers -print-after and -print-before flags that allow you to print IR to stderr before and after any pass you want. This can be useful for debugging LLVM optimization issue, but is far too noisy for large builds. This patch adds analogous options -dump-after and -dump-before that dump the IR to appropriately named files. In addition, it also introduces flags -dump-after-all, -dump-before-all and -ir-dump-directory to control where the files are written to. Test Plan: Included LIT tests: ``` ninja check-llvm ```

mshockwave

I like this feature, though I feel like in many cases -print-changed is good enough.
Also slightly tangent: I think LLVM's interim github PR policy is to discourage force push on PR branch in favor of amending commits: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/53

llvm/lib/Passes/StandardInstrumentations.cpp

mshockwave · 2023-09-01T23:33:36Z

llvm/lib/Passes/StandardInstrumentations.cpp

+  this->PIC = &PIC;
+
+  PIC.registerBeforeNonSkippedPassCallback(
+      [this](StringRef P, Any IR) { this->pushPass(P, IR); });


Could we omit this when possible? (ditto for other similar occurrences in this file)

I don't yet understand why, but this is not implicitly captured here, it fails to compile without an explicit capture.

I don't yet understand why, but this is not implicitly captured here, it fails to compile without an explicit capture.

Hmmm alright, then leave it as it for now I guess

llvm/include/llvm/Passes/StandardInstrumentations.h

llvm/lib/Passes/StandardInstrumentations.cpp

NuriAmari · 2023-09-02T00:58:38Z

I like this feature, though I feel like in many cases -print-changed is good enough.

I've never experienced it myself, but my teammate has said another reason to want this is that when using thinLTO, multiple modules are processed concurrently on different threads, and the printed output can end up interleaved and be hard to read.

Also slightly tangent: I think LLVM's interim github PR policy is to discourage force push on PR branch in favor of amending commits: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/53

Ah ok sorry I didn't know, will make new commits moving forward.

mshockwave · 2023-09-02T04:06:05Z

I like this feature, though I feel like in many cases -print-changed is good enough.

I've never experienced it myself, but my teammate has said another reason to want this is that when using thinLTO, multiple modules are processed concurrently on different threads, and the printed output can end up interleaved and be hard to read.

Ah that makes sense

The name of a particular pass is determined automatically using the name of the type that implements in. This in turn is determined using __PRETTY_FUNCTION__ which is not defined in the standard and implementation specific. This means the output paths differ slightly depending on the platform. To adjust tests for this: - Add regexes that relax the requirements on output paths. - Use find to find output paths without specifying - Remove a test that was intended to test multiple modules being processing by a single pipeline, but was not really testing that PR Feedback is also addressed.

Tests are still failing in CI because the expected output paths do not match. Loosen the regexes to allow for more variation in the output paths.

NuriAmari · 2023-09-03T17:57:35Z

I'm having a bit of a hard time getting the output path tests to pass in CI, while they pass on my Linux machine. I also can't see the output observed in CI, so its a little hard to fix.

I believe the variance comes from the output of this function being unreliable:

llvm-project/llvm/include/llvm/Support/TypeName.h

Line 27 in 208fa9a

inline StringRef getTypeName() {

I've loosened the requirements on the paths again in the latest commit, hopefully that satisfies CI while still testing a reasonable amount.

Use a relative source filename rather than an absolute source filename when crafting an output path. Previously two absolute paths would be concatenated like so: ``` /foo/bar + /baz/qux => /foo/bar/baz/qux ``` Which is fine on Linux, but on windows absolute paths start with the partition (ie. C:\foo\bar) so we can't just append them.

NuriAmari · 2023-09-06T13:38:57Z

Latest round of windows failures seems unrelated.

NuriAmari · 2023-09-06T21:54:05Z

I've tried this on a real application, and the output paths quickly get extremely long. So long that we fail to create the directory structure. Perhaps the nesting idea isn't the right choice after all.

aeubanks · 2023-09-07T16:43:40Z

What about an option that just redirects the PrintIRInstrumentation output to files instead of stderr? The files can be something like $N-$modulename.ll where $N is an int that increments.

NuriAmari · 2023-09-07T16:52:28Z

What about an option that just redirects the PrintIRInstrumentation output to files instead of stderr? The files can be something like $N-$modulename.ll where $N is an int that increments.

Yeah that's more or less what I'm working on now. My plan is:

Add a flag that provides a directory to write files to
If that flag is passed, write files of the structure $N-$modulename-$passname-(before|after).ll into that directory instead of writing to stderr
If you pass both -print-after=foo and -print-before=foo the dump files for a particular run of Foo on a particular unit of IR should match except for the before / after suffix.

This should be relatively easy to implement in the old pass manager as well.

This patch adds a flag to LLVM such that the output generated by the `-print-(before|after|all)` family of flags is written to files in a directory rather than to stderr. This new flag is `-ir-dump-directory` and is used to specify where to write the files. No other flags are added, it just modifies the behavior of the print flags. This is a second simplified version of the changes proposed in #65179. This patch only adds support for the new pass manager. If this patch is accepted, similar support can be added to the legacy pass manager. Co-authored-by: Nuri Amari <[email protected]>

NuriAmari marked this pull request as draft September 1, 2023 19:16

NuriAmari commented Sep 1, 2023

View reviewed changes

NuriAmari force-pushed the dump-before-after branch from 3ca4f8b to a143ea0 Compare September 1, 2023 19:28

NuriAmari marked this pull request as ready for review September 1, 2023 19:34

NuriAmari requested a review from a team as a code owner September 1, 2023 19:34

aeubanks self-requested a review September 1, 2023 19:37

asl requested changes Sep 1, 2023

View reviewed changes

llvm/include/llvm/Passes/StandardInstrumentations.h Outdated Show resolved Hide resolved

NuriAmari marked this pull request as draft September 1, 2023 20:50

NuriAmari force-pushed the dump-before-after branch from a143ea0 to 5d395c8 Compare September 1, 2023 20:52

mshockwave reviewed Sep 1, 2023

View reviewed changes

Nuri Amari and others added 3 commits September 2, 2023 18:16

Merge branch 'llvm:main' into dump-before-after

61f9cf0

Loosen filename regex in dump-before-afer-filenames

c3a0df4

Tests are still failing in CI because the expected output paths do not match. Loosen the regexes to allow for more variation in the output paths.

NuriAmari marked this pull request as ready for review September 6, 2023 13:37

NuriAmari requested a review from asl September 6, 2023 19:11

NuriAmari closed this Sep 7, 2023

NuriAmari mentioned this pull request Sep 14, 2023

Add option to dump IR to files instead of stderr #66412

Merged

NuriAmari deleted the dump-before-after branch June 5, 2024 21:54

Add flags to dump IR to a file before and after LLVM passes #65179

Add flags to dump IR to a file before and after LLVM passes #65179

Uh oh!

Conversation

NuriAmari commented Sep 1, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mshockwave left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NuriAmari commented Sep 2, 2023

Uh oh!

mshockwave commented Sep 2, 2023

Uh oh!

NuriAmari commented Sep 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NuriAmari commented Sep 6, 2023

Uh oh!

NuriAmari commented Sep 6, 2023

Uh oh!

aeubanks commented Sep 7, 2023

Uh oh!

NuriAmari commented Sep 7, 2023

Uh oh!

Uh oh!

NuriAmari commented Sep 3, 2023 •

edited

Loading