Skip to content

mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin #65779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 20, 2023

Conversation

rohany
Copy link
Contributor

@rohany rohany commented Sep 8, 2023

This commit adjusts the CUDA context management in the SerializeToCubin pass. In particular, it uses the device 0 primary context instead of creating a new CUDA context on each invocation of SerializeToCubin. This yields very large improvements in compile time, especially if an application (like a JIT compiler) is calling SerializeToCubin repeatedly.

Differential Revision: https://reviews.llvm.org/D159487

@rohany rohany requested a review from a team as a code owner September 8, 2023 17:03
@joker-eph joker-eph requested a review from fabianmcg September 8, 2023 18:18
Copy link
Contributor

@fabianmcg fabianmcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch itself LGTM, however, we are on the process of deprecating SerializeToCubin in favor of Target Attributes. I'm introducing today deprecation notices. So I'm -1 on improving the existing passes, in my opinion all new efforts should focus on the new mechanism. However, don't know if @joker-eph or someone else has a different opinion.

@joker-eph
Copy link
Collaborator

The patch is small enough that it seems worthwhile to take in, I would just want to make sure we don't diverge from the lowering done through the new flow: do we need to replicate this somewhere as well @fabianmcg ?

@fabianmcg
Copy link
Contributor

Currently no, as we don't invoke the driver. However, I was thinking on adding a compilation path to stop at PTX and let the driver JIT the code at runtime, I only need to do some small updates, so maybe then.

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohany LG but please acknowledge that this pass is on it's way of deprecation.

@rohany
Copy link
Contributor Author

rohany commented Sep 9, 2023

LG but please acknowledge that this pass is on it's way of deprecation.

That's fine -- i'm currently using it (as part of the reference pipeline), so I am incentivized to make it faster.

I don't understand the CI failure, can I get some help with that?

@joker-eph
Copy link
Collaborator

It's an infra failure, feel free to ignore

@joker-eph
Copy link
Collaborator

Actually your PR is not rebased, seems like you're based on a commit from May!

@rohany rohany force-pushed the serialize-cubin-context-management branch from 91bcd17 to fb0a003 Compare September 9, 2023 21:04
@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir:gpu mlir labels Sep 9, 2023
…izeToCubin

This commit adjusts the CUDA context management in the SerializeToCubin pass.
In particular, it uses the device 0 primary context instead of creating a new
CUDA context on each invocation of SerializeToCubin. This yields very large
improvements in compile time, especially if an application (like a JIT compiler)
is calling SerializeToCubin repeatedly.

Differential Revision: https://reviews.llvm.org/D159487
@rohany rohany force-pushed the serialize-cubin-context-management branch from fb0a003 to 5e1a41b Compare September 9, 2023 21:07
@rohany
Copy link
Contributor Author

rohany commented Sep 9, 2023

Thanks, fixed it.

@xgupta
Copy link
Contributor

xgupta commented Oct 20, 2023

@rohany Do you need help to commit this change?

@rohany
Copy link
Contributor Author

rohany commented Oct 20, 2023

Yes, i don't know how to get it to land, given that tests pass + accepted review.

@xgupta xgupta merged commit 71bdd2c into llvm:main Oct 20, 2023
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Oct 26, 2023
Local branch amd-gfx 319c66a Merged main:080fb3e5b73b into amd-gfx:7c4daea7af99
Remote branch main 71bdd2c mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin (llvm#65779)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir:gpu mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants