-
Notifications
You must be signed in to change notification settings - Fork 14.3k
mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin #65779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch itself LGTM, however, we are on the process of deprecating SerializeToCubin
in favor of Target Attributes. I'm introducing today deprecation notices. So I'm -1 on improving the existing passes, in my opinion all new efforts should focus on the new mechanism. However, don't know if @joker-eph or someone else has a different opinion.
The patch is small enough that it seems worthwhile to take in, I would just want to make sure we don't diverge from the lowering done through the new flow: do we need to replicate this somewhere as well @fabianmcg ? |
Currently no, as we don't invoke the driver. However, I was thinking on adding a compilation path to stop at PTX and let the driver JIT the code at runtime, I only need to do some small updates, so maybe then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohany LG but please acknowledge that this pass is on it's way of deprecation.
That's fine -- i'm currently using it (as part of the reference pipeline), so I am incentivized to make it faster. I don't understand the CI failure, can I get some help with that? |
It's an infra failure, feel free to ignore |
Actually your PR is not rebased, seems like you're based on a commit from May! |
91bcd17
to
fb0a003
Compare
…izeToCubin This commit adjusts the CUDA context management in the SerializeToCubin pass. In particular, it uses the device 0 primary context instead of creating a new CUDA context on each invocation of SerializeToCubin. This yields very large improvements in compile time, especially if an application (like a JIT compiler) is calling SerializeToCubin repeatedly. Differential Revision: https://reviews.llvm.org/D159487
fb0a003
to
5e1a41b
Compare
Thanks, fixed it. |
@rohany Do you need help to commit this change? |
Yes, i don't know how to get it to land, given that tests pass + accepted review. |
Local branch amd-gfx 319c66a Merged main:080fb3e5b73b into amd-gfx:7c4daea7af99 Remote branch main 71bdd2c mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin (llvm#65779)
This commit adjusts the CUDA context management in the SerializeToCubin pass. In particular, it uses the device 0 primary context instead of creating a new CUDA context on each invocation of SerializeToCubin. This yields very large improvements in compile time, especially if an application (like a JIT compiler) is calling SerializeToCubin repeatedly.
Differential Revision: https://reviews.llvm.org/D159487