Skip to content

Add flags to dump IR to a file before and after LLVM passes #65179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

NuriAmari
Copy link
Contributor

LLVM offers -print-after and -print-before flags that allow you to print IR to stderr before and after any pass you want. This can be useful for debugging LLVM optimization issue, but is far too noisy for large builds.

This patch adds analogous options -dump-after and -dump-before that dump the IR to appropriately named files. In addition, it also introduces flags -dump-after-all, -dump-before-all and -ir-dump-directory to control where the files are written to.

The implementation follows @chandlerc suggestions in https://reviews.llvm.org/D54794, but does not implement the suggested design completely. Notably, the IR extraction behavior differing from the print family of flags is not implemented. In addition to the suggestion of adding how many times a particular pass has run in at a given nesting level, I've also added how many passes have run at that nesting level. The tests provide a more clear explanation.

This is not complete, but I think it is an acceptable initial pass, and I'd like some feedback on the approach before I go too deep. Features that are missing / I would like:

  • Less verbose pass names (particular pass managers and adapters have extremely verbose names)
  • More descriptive names (ie. what function is this function pass operation on)
  • Filter parity with the print flags (those flags respect flags for example allowing filtering of which functions to print)
  • Legacy pass manager support (we need this to work for MIR, and at first glance it doesn't)

@NuriAmari NuriAmari marked this pull request as draft September 1, 2023 19:16
@@ -33,6 +33,31 @@ static cl::opt<bool> PrintAfterAll("print-after-all",
llvm::cl::desc("Print IR after each pass"),
cl::init(false), cl::Hidden);

static cl::list<std::string>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot duplication between the print code and the dump code, though there are differences, it might be nice to try to combine them a little more.


private:
void dumpBeforePass(StringRef PassID, Any IR);
void dumpAfterPass(StringRef PassID, Any IR);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not implement a dumpAfterPassInvalidated callback, mostly because I didn't understand when that occurs. If it makes sense to include one, I can.

; Basic dump before and after a single module pass
; RUN: opt %s -disable-output -passes='no-op-module' -ir-dump-directory %t/logs -dump-after=no-op-module -dump-before=no-op-module
; RUN: find %t/logs -type f -print | FileCheck %s --check-prefix=SINGLE-RUN -DSOURCE_FILE_NAME=%s
; SINGLE-RUN-DAG: [[SOURCE_FILE_NAME]]/0.NoOpModulePass.0-before.ll
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first 0 means this is the first pass to run at this nesting level. The second 0 means this is the first NoOpModulePass to run at this nesting level.

While dumping to files is less noisy, without this first number you lose the sequence in which passes run, which makes it harder to trace issues.

; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/1.ModuleToFunctionPassAdaptor.0/1.PassManager<llvm::Function>.1/0.NoOpFunctionPass.0-before.ll
; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/1.ModuleToFunctionPassAdaptor.0/1.PassManager<llvm::Function>.1/0.NoOpFunctionPass.0-after.ll
; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/2.ModuleToFunctionPassAdaptor.1/1.PassManager<llvm::Function>.1/0.FunctionToLoopPassAdaptor.0/1.PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>.0/0.NoOpLoopPass.0-before.ll
; MULTIPLE-NESTED-RUNS-DAG: [[SOURCE_FILE_NAME]]/2.ModuleToFunctionPassAdaptor.1/1.PassManager<llvm::Function>.1/0.FunctionToLoopPassAdaptor.0/1.PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>.0/0.NoOpLoopPass.0-after.ll
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely a less verbose pass name than PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&> is achievable, but I'm not sure how to best go about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an attempt at this, the trouble is the new PM instrumentation infrastructure only provides callbacks with the PassID, which is this verbose sometimes. The existing Class name to pass name mapping does not contain entries for all passes (ex. adaptors and PMs).

I think more consistent names are doable, but perhaps something for another patch.

@@ -0,0 +1 @@
config.suffixes = ['', '.ll']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because in LIT %s expands to the source file path with periods escaped like so: .../dump-before-after-filenames\\., which won't match the output of find.

I also considered running ls on each file, but I think the output of filecheck on find is a little easier to read when it fails.

@NuriAmari NuriAmari marked this pull request as ready for review September 1, 2023 19:34
@NuriAmari NuriAmari requested a review from a team as a code owner September 1, 2023 19:34
@aeubanks aeubanks self-requested a review September 1, 2023 19:37
@NuriAmari NuriAmari marked this pull request as draft September 1, 2023 20:50
Summary:

LLVM offers -print-after and -print-before flags that allow you
to print IR to stderr before and after any pass you want. This can
be useful for debugging LLVM optimization issue, but is far too
noisy for large builds.

This patch adds analogous options -dump-after and -dump-before that
dump the IR to appropriately named files. In addition, it also
introduces flags -dump-after-all, -dump-before-all and -ir-dump-directory
to control where the files are written to.

Test Plan:

Included LIT tests:
```
ninja check-llvm
```
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this feature, though I feel like in many cases -print-changed is good enough.
Also slightly tangent: I think LLVM's interim github PR policy is to discourage force push on PR branch in favor of amending commits: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/53

this->PIC = &PIC;

PIC.registerBeforeNonSkippedPassCallback(
[this](StringRef P, Any IR) { this->pushPass(P, IR); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we omit this when possible? (ditto for other similar occurrences in this file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't yet understand why, but this is not implicitly captured here, it fails to compile without an explicit capture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't yet understand why, but this is not implicitly captured here, it fails to compile without an explicit capture.

Hmmm alright, then leave it as it for now I guess

@NuriAmari
Copy link
Contributor Author

I like this feature, though I feel like in many cases -print-changed is good enough.

I've never experienced it myself, but my teammate has said another reason to want this is that when using thinLTO, multiple modules are processed concurrently on different threads, and the printed output can end up interleaved and be hard to read.

Also slightly tangent: I think LLVM's interim github PR policy is to discourage force push on PR branch in favor of amending commits: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/53

Ah ok sorry I didn't know, will make new commits moving forward.

@mshockwave
Copy link
Member

I like this feature, though I feel like in many cases -print-changed is good enough.

I've never experienced it myself, but my teammate has said another reason to want this is that when using thinLTO, multiple modules are processed concurrently on different threads, and the printed output can end up interleaved and be hard to read.

Ah that makes sense

Nuri Amari and others added 3 commits September 2, 2023 18:16
The name of a particular pass is determined automatically
using the name of the type that implements in. This in turn
is determined using __PRETTY_FUNCTION__ which is not defined
in the standard and implementation specific.

This means the output paths differ slightly depending on the
platform.

To adjust tests for this:

- Add regexes that relax the requirements on output paths.
- Use find to find output paths without specifying
- Remove a test that was intended to test multiple modules being
  processing by a single pipeline, but was not really testing that

PR Feedback is also addressed.
Tests are still failing in CI because the expected output
paths do not match. Loosen the regexes to allow for more
variation in the output paths.
@NuriAmari
Copy link
Contributor Author

NuriAmari commented Sep 3, 2023

I'm having a bit of a hard time getting the output path tests to pass in CI, while they pass on my Linux machine. I also can't see the output observed in CI, so its a little hard to fix.

I believe the variance comes from the output of this function being unreliable:

inline StringRef getTypeName() {

I've loosened the requirements on the paths again in the latest commit, hopefully that satisfies CI while still testing a reasonable amount.

Use a relative source filename rather than an absolute source filename
when crafting an output path. Previously two absolute paths would be
concatenated like so:

```
/foo/bar + /baz/qux => /foo/bar/baz/qux
```

Which is fine on Linux, but on windows absolute paths start with
the partition (ie. C:\foo\bar) so we can't just append them.
@NuriAmari NuriAmari marked this pull request as ready for review September 6, 2023 13:37
@NuriAmari
Copy link
Contributor Author

Latest round of windows failures seems unrelated.

@NuriAmari NuriAmari requested a review from asl September 6, 2023 19:11
@NuriAmari
Copy link
Contributor Author

I've tried this on a real application, and the output paths quickly get extremely long. So long that we fail to create the directory structure. Perhaps the nesting idea isn't the right choice after all.

@NuriAmari NuriAmari closed this Sep 7, 2023
@aeubanks
Copy link
Contributor

aeubanks commented Sep 7, 2023

What about an option that just redirects the PrintIRInstrumentation output to files instead of stderr? The files can be something like $N-$modulename.ll where $N is an int that increments.

@NuriAmari
Copy link
Contributor Author

What about an option that just redirects the PrintIRInstrumentation output to files instead of stderr? The files can be something like $N-$modulename.ll where $N is an int that increments.

Yeah that's more or less what I'm working on now. My plan is:

  • Add a flag that provides a directory to write files to
  • If that flag is passed, write files of the structure $N-$modulename-$passname-(before|after).ll into that directory instead of writing to stderr
  • If you pass both -print-after=foo and -print-before=foo the dump files for a particular run of Foo on a particular unit of IR should match except for the before / after suffix.

This should be relatively easy to implement in the old pass manager as well.

rmaz pushed a commit that referenced this pull request Sep 29, 2023
This patch adds a flag to LLVM such that the output generated by the
`-print-(before|after|all)`
family of flags is written to files in a directory rather than to
stderr.

This new flag is `-ir-dump-directory` and is used to specify where to
write the files. No other flags are added, it just modifies the behavior
of the print flags.

This is a second simplified version of the changes proposed in
#65179.

This patch only adds support for the new pass manager. If this patch is
accepted, similar support can be added to the legacy pass manager.

Co-authored-by: Nuri Amari <[email protected]>
@NuriAmari NuriAmari deleted the dump-before-after branch June 5, 2024 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants