-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[amdgpu] D2D memcpy via streams and HSA #69977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[amdgpu] D2D memcpy via streams and HSA #69977
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AMDGPUDeviceTy should have an Agent
inside of it. We can probably expose that via a getter to make the interface better.
It does, accessed via getAgent. In order to pass the DeviceTy I'd have to reorder a bunch of classes in the file or otherwise deal with the type being incomplete at the point it is used. Since this function is local to amdgpu (cuda doesn't have it), passing the agent seems better than the reordering mess plus immediately calling getAgent in the prologue. |
Simpler version is at #69977, this is a response to review comments asking to implement it on the streams api instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, one nit. Let @jhuber6 comment if he's ok.
AMDGPUStreamTy *Stream = nullptr; | ||
if (auto Err = getStream(AsyncInfoWrapper, Stream)) | ||
return Err; | ||
if (Size < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, yes. Done
c6e16d8
to
f0ace3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine as-is, we probably should abstract that agent better in the future. Maybe put a TODO there.
hsa_amd_memory_async_copy can handle device to device copies if passed the corresponding parameters.
No functional change - currently D2D copy goes through a fallback in libomptarget that stages through a host malloc, after this it goes directly through HSA.
Works under exactly the situations that HSA works. Verified locally on a performance benchmark. Hoping to attract further testing from internal developers after it lands.