-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[amdgpu] Implement D2D memcpy as HSA call #69955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2250,14 +2250,37 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { | |
PinnedMemoryManager); | ||
} | ||
|
||
/// Exchange data between two devices within the plugin. This function is not | ||
/// supported in this plugin. | ||
/// Exchange data between two devices within the plugin. | ||
Error dataExchangeImpl(const void *SrcPtr, GenericDeviceTy &DstGenericDevice, | ||
void *DstPtr, int64_t Size, | ||
AsyncInfoWrapperTy &AsyncInfoWrapper) override { | ||
// This function should never be called because the function | ||
// AMDGPUPluginTy::isDataExchangable() returns false. | ||
return Plugin::error("dataExchangeImpl not supported"); | ||
AMDGPUDeviceTy &DstDevice = static_cast<AMDGPUDeviceTy &>(DstGenericDevice); | ||
|
||
hsa_agent_t SrcAgent = getAgent(); | ||
hsa_agent_t DstAgent = DstDevice.getAgent(); | ||
|
||
AMDGPUSignalTy Signal; | ||
if (auto Err = Signal.init()) | ||
return Err; | ||
|
||
// The agents need to have access to the corresponding memory | ||
// This is presently only true if the pointers were originally | ||
// allocated by this runtime or the caller made the appropriate | ||
// access calls. | ||
hsa_status_t Status = hsa_amd_memory_async_copy( | ||
DstPtr, DstAgent, SrcPtr, SrcAgent, (Size > 0) ? (size_t)Size : 0, 0, | ||
nullptr, Signal.get()); | ||
if (auto Err = | ||
Plugin::check(Status, "Error in D2D hsa_amd_memory_async_copy: %s")) | ||
return Err; | ||
|
||
if (auto Err = Signal.wait(getStreamBusyWaitMicroseconds())) | ||
return Err; | ||
|
||
if (auto Err = Signal.deinit()) | ||
return Err; | ||
|
||
return Plugin::success(); | ||
} | ||
|
||
/// Initialize the async info for interoperability purposes. | ||
|
@@ -2899,7 +2922,7 @@ struct AMDGPUPluginTy final : public GenericPluginTy { | |
|
||
/// This plugin does not support exchanging data between two devices. | ||
bool isDataExchangable(int32_t SrcDeviceId, int32_t DstDeviceId) override { | ||
return false; | ||
return true; | ||
} | ||
|
||
/// Get the host device instance. | ||
|
@@ -3174,9 +3197,11 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) { | |
return nullptr; | ||
} | ||
|
||
if (Alloc && (Kind == TARGET_ALLOC_HOST || Kind == TARGET_ALLOC_SHARED)) { | ||
if (Alloc) { | ||
auto &KernelAgents = Plugin::get<AMDGPUPluginTy>().getKernelAgents(); | ||
|
||
// Inherently necessary for host or shared allocations | ||
// Also enabled for device memory to allow device to device memcpy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this costly if unsued? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unknown. I'd guess it's cheap if there's only one GPU in the system and has non-zero cost if there are multiple GPUs that do not perform D2D copies. However the alternative is to try to work backwards from the void* to a corresponding memory pool using maps and that's definitely non-zero cost as well. |
||
// Enable all kernel agents to access the buffer. | ||
if (auto Err = MemoryPool->enableAccess(Alloc, Size, KernelAgents)) { | ||
REPORT("%s\n", toString(std::move(Err)).data()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have
AsyncInfoWrapper
here, my understanding is that these already contain a "stream" which is coded to an HSA signal. Maybe @jdoerfert can expand on this, but my understanding would be that we'd callgetStream
or something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, this summon a signal from scratch was copied from another memcpy call. This should be slower, simpler and more robust than splicing it into the stream machinery, so better to land this version, see it goes through the testing, and then see if integration with the stream seems to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that
AsyncInfoWrapper
is unused is a strong indicator that something is off. The code below is blocking, it should not be blocking. Instead of creating the signal, use the stream in AsyncInfoWrapper, something like above in the pinned case:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory copies are handwritten in each direction so it's not a matter of pulling a signal from a stream and calling an existing function. It's implementable but not trivial.
I'd like to land this blocking version first in order to establish that the underling HSA calls behave as they're supposed to via the internal testing infrastructure, before adding the additional complication to wire it into the signals.
Especially so as I am far from convinced that signals work robustly between GPUs, given they're on different HSA queues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streams approach in #69977, seems to work equivalently to this one