Skip to content

[libc] Update HSA queues to use the maximum size and set the barrier bit #110034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 28, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Sep 25, 2024

Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).

Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).
@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).


Full diff: https://github.com/llvm/llvm-project/pull/110034.diff

1 Files Affected:

  • (modified) libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp (+3-2)
diff --git a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
index 1beef8170475a1..f6ce598cd7102a 100644
--- a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
@@ -281,6 +281,7 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
   // Initialize the packet header and set the doorbell signal to begin execution
   // by the HSA runtime.
   uint16_t header =
+      1u << HSA_PACKET_HEADER_BARRIER |
       (HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) |
       (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) |
       (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE);
@@ -540,11 +541,11 @@ int load(int argc, const char **argv, const char **envp, void *image,
     }
   }
 
-  // Obtain a queue with the minimum (power of two) size, used to send commands
+  // Obtain a queue with the maximum (power of two) size, used to send commands
   // to the HSA runtime and launch execution on the device.
   uint64_t queue_size;
   if (hsa_status_t err = hsa_agent_get_info(
-          dev_agent, HSA_AGENT_INFO_QUEUE_MIN_SIZE, &queue_size))
+          dev_agent, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size))
     handle_error(err);
   hsa_queue_t *queue = nullptr;
   if (hsa_status_t err =

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-libc

Author: Joseph Huber (jhuber6)

Changes

Summary:
It's safer to use the maximum size, as this prevents the runtime from
oversubscribing with multiple producers. Additionally we should set the
barrier bit to ensure that the queue entries block if multiple are
submitted (Which shouldn't happen for this tool).


Full diff: https://github.com/llvm/llvm-project/pull/110034.diff

1 Files Affected:

  • (modified) libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp (+3-2)
diff --git a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
index 1beef8170475a1..f6ce598cd7102a 100644
--- a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
@@ -281,6 +281,7 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
   // Initialize the packet header and set the doorbell signal to begin execution
   // by the HSA runtime.
   uint16_t header =
+      1u << HSA_PACKET_HEADER_BARRIER |
       (HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) |
       (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) |
       (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE);
@@ -540,11 +541,11 @@ int load(int argc, const char **argv, const char **envp, void *image,
     }
   }
 
-  // Obtain a queue with the minimum (power of two) size, used to send commands
+  // Obtain a queue with the maximum (power of two) size, used to send commands
   // to the HSA runtime and launch execution on the device.
   uint64_t queue_size;
   if (hsa_status_t err = hsa_agent_get_info(
-          dev_agent, HSA_AGENT_INFO_QUEUE_MIN_SIZE, &queue_size))
+          dev_agent, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size))
     handle_error(err);
   hsa_queue_t *queue = nullptr;
   if (hsa_status_t err =

@jplehr
Copy link
Contributor

jplehr commented Sep 26, 2024

Do you know if that helps with potentially cranking up the parallelism in testing?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 26, 2024

Do you know if that helps with potentially cranking up the parallelism in testing?

Nope, tested it and still got weird deadlocks. But this is safer either way (Was talking w/ Joe Greathouse).

Copy link
Contributor

@jplehr jplehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Maybe give @JonChesterfield a chance to also look at it. He is probably more paranoid about such changes than I am.

Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good stuff. Thanks!

@jhuber6 jhuber6 merged commit 6558e56 into llvm:main Sep 28, 2024
10 checks passed
@jhuber6 jhuber6 deleted the hsa branch September 28, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants