-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libc] Update HSA queues to use the maximum size and set the barrier bit #110034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: It's safer to use the maximum size, as this prevents the runtime from oversubscribing with multiple producers. Additionally we should set the barrier bit to ensure that the queue entries block if multiple are submitted (Which shouldn't happen for this tool).
@llvm/pr-subscribers-backend-amdgpu Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/110034.diff 1 Files Affected:
diff --git a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
index 1beef8170475a1..f6ce598cd7102a 100644
--- a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
@@ -281,6 +281,7 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
// Initialize the packet header and set the doorbell signal to begin execution
// by the HSA runtime.
uint16_t header =
+ 1u << HSA_PACKET_HEADER_BARRIER |
(HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE);
@@ -540,11 +541,11 @@ int load(int argc, const char **argv, const char **envp, void *image,
}
}
- // Obtain a queue with the minimum (power of two) size, used to send commands
+ // Obtain a queue with the maximum (power of two) size, used to send commands
// to the HSA runtime and launch execution on the device.
uint64_t queue_size;
if (hsa_status_t err = hsa_agent_get_info(
- dev_agent, HSA_AGENT_INFO_QUEUE_MIN_SIZE, &queue_size))
+ dev_agent, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size))
handle_error(err);
hsa_queue_t *queue = nullptr;
if (hsa_status_t err =
|
@llvm/pr-subscribers-libc Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/110034.diff 1 Files Affected:
diff --git a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
index 1beef8170475a1..f6ce598cd7102a 100644
--- a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
@@ -281,6 +281,7 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
// Initialize the packet header and set the doorbell signal to begin execution
// by the HSA runtime.
uint16_t header =
+ 1u << HSA_PACKET_HEADER_BARRIER |
(HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) |
(HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE);
@@ -540,11 +541,11 @@ int load(int argc, const char **argv, const char **envp, void *image,
}
}
- // Obtain a queue with the minimum (power of two) size, used to send commands
+ // Obtain a queue with the maximum (power of two) size, used to send commands
// to the HSA runtime and launch execution on the device.
uint64_t queue_size;
if (hsa_status_t err = hsa_agent_get_info(
- dev_agent, HSA_AGENT_INFO_QUEUE_MIN_SIZE, &queue_size))
+ dev_agent, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size))
handle_error(err);
hsa_queue_t *queue = nullptr;
if (hsa_status_t err =
|
Do you know if that helps with potentially cranking up the parallelism in testing? |
Nope, tested it and still got weird deadlocks. But this is safer either way (Was talking w/ Joe Greathouse). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Maybe give @JonChesterfield a chance to also look at it. He is probably more paranoid about such changes than I am.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good stuff. Thanks!
Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).