Skip to content

[amdgpu] D2D memcpy via streams and HSA #69977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JonChesterfield
Copy link
Collaborator

@JonChesterfield JonChesterfield commented Oct 23, 2023

hsa_amd_memory_async_copy can handle device to device copies if passed the corresponding parameters.

No functional change - currently D2D copy goes through a fallback in libomptarget that stages through a host malloc, after this it goes directly through HSA.

Works under exactly the situations that HSA works. Verified locally on a performance benchmark. Hoping to attract further testing from internal developers after it lands.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AMDGPUDeviceTy should have an Agent inside of it. We can probably expose that via a getter to make the interface better.

@JonChesterfield
Copy link
Collaborator Author

The AMDGPUDeviceTy should have an Agent inside of it. We can probably expose that via a getter to make the interface better.

It does, accessed via getAgent. In order to pass the DeviceTy I'd have to reorder a bunch of classes in the file or otherwise deal with the type being incomplete at the point it is used. Since this function is local to amdgpu (cuda doesn't have it), passing the agent seems better than the reordering mess plus immediately calling getAgent in the prologue.

@JonChesterfield
Copy link
Collaborator Author

Simpler version is at #69977, this is a response to review comments asking to implement it on the streams api instead

Copy link
Member

@jdoerfert jdoerfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me, one nit. Let @jhuber6 comment if he's ok.

AMDGPUStreamTy *Stream = nullptr;
if (auto Err = getStream(AsyncInfoWrapper, Stream))
return Err;
if (Size < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<=

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, yes. Done

@JonChesterfield JonChesterfield force-pushed the jc_d2d_memcpy_stream_trunk branch from c6e16d8 to f0ace3c Compare October 23, 2023 22:44
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine as-is, we probably should abstract that agent better in the future. Maybe put a TODO there.

@JonChesterfield JonChesterfield merged commit 840d0b7 into llvm:main Oct 23, 2023
@JonChesterfield JonChesterfield deleted the jc_d2d_memcpy_stream_trunk branch October 23, 2023 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants