-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][GPU] Add RecursiveMemoryEffects
to gpu.launch
#75315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
matthias-springer
merged 1 commit into
llvm:main
from
matthias-springer:gpu_launch_side_effects
Dec 20, 2023
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with
RecursiveMemoryEffects
, what exactly it does?Is it safe with
managed memory
orunified memory
? In these cases, a pointer address between gpu and cpu remains the same. Once can imagine there is allvm.ptr
and it can used to build memref descriptor. It can be used insidegpu.launch
body, or something else, is it safe in these cases?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecursiveMemoryEffects
indicates that the side effects of this op are defined by the side effects of nested ops.In essence:
gpu.launch
by itself does not have any side effects. E.g., it does not read/write to memory etc. But the GPU kernel may have ops that read/write to memory. When we use the C++ API to query the side effects of agpu.launch
, it will automatically check the side effects of the ops inside the kernel and return those.If the ops in the kernel have no side effects, then the
gpu.launch
has no side effects. That's why thegpu.launch
in the newly added test case folds away. It does not read/write to memory etc. It can just be removed.When you use that memref value, e.g., with a
memref.load
, you have an op with a side effect and the entire kernel will be considered to have that side effect.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience and explanation.
gpu.launch
doesn't write to memory directly, it performs the following actions. It'll still need to read kernel configurations even wtih empty body. However, I think it's safe to fold.Thinking about the 3rd option with IR below. If
taskC
(that has empty body) is folded , couldtaskD
run beforetaskB
due to not waiting forstream2
anymore? (Currently, MLIR utilizes single stream, but one can imagine implementing support for multiple streams .)task execution graph:
Is it overkill to think about this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernel configuration and parameters (e.g., number of blocks, etc.) are passed to
gpu.launch
as integer operands, so the op does not read anything from memory (memref) at this point, right? (The op may be lowered to something that reads from memory; that op would then have a memory side effect.)gpu.launch
takes async token arguments and produces an async token result. If there is a particular order in which kernels must be launched (and waited for), I would expect this to be modeled with async tokens. IMO, execution order due to "stream is available or not" does not have to be preserved when foldinggpu.launch
ops.In the above example,
%taskC
can canonicalized away if it has no side effects, but then%taskD
will depend on%taskA
and%taskB
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline. The way CUDA streams are expressed in the GPU dialect is via async tokens. It would not make sense for
gpu.launch
and other ops to take an additional%stream
operand in the future. We use async tokens (%task
) for that.Coming back to
RecursiveMemoryEffects
, this trait should be safe to attach. There are no side effects wrt. streams, because streams are encoded with async tokens.Empty GPU kernels like the one in the test case will not simply DCE away if they impose a dependency via async tokens. (A folder or canonicalization pattern may find a way to remove empty kernels in a way that preserves the dependencies expressed by the async tokens.)