[libc] Change the starting port index to use the SMID #79200

jhuber6 · 2024-01-23T20:09:10Z

Summary:
The RPC interface uses several ports to provide parallel access. Right
now we begin the search at the beginning, which heavily contests the
early ports. Using the SMID allows us to stagger the starting index
based off of the cluster identifier that is executing the current warp.
Multiple warps can share an SM, but it will guaruntee that the
contention for the low indices is lower.

This also increases the maximum port size to around 4096, this is
because 512 isn't enough to cover the full hardare parallelism needed to
guarantee this doesdn't deadlock.

llvmbot · 2024-01-23T20:09:42Z

@llvm/pr-subscribers-libc

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
The RPC interface uses several ports to provide parallel access. Right
now we begin the search at the beginning, which heavily contests the
early ports. Using the SMID allows us to stagger the starting index
based off of the cluster identifier that is executing the current warp.
Multiple warps can share an SM, but it will guaruntee that the
contention for the low indices is lower.

This also increases the maximum port size to around 4096, this is
because 512 isn't enough to cover the full hardare parallelism needed to
guarantee this doesdn't deadlock.

Full diff: https://github.com/llvm/llvm-project/pull/79200.diff

5 Files Affected:

(modified) libc/src/__support/GPU/amdgpu/utils.h (+8)
(modified) libc/src/__support/GPU/generic/utils.h (+2)
(modified) libc/src/__support/GPU/nvptx/utils.h (+4)
(modified) libc/src/__support/RPC/rpc.h (+2-2)
(modified) libc/utils/gpu/server/rpc_server.h (+1-1)

diff --git a/libc/src/__support/GPU/amdgpu/utils.h b/libc/src/__support/GPU/amdgpu/utils.h
index 9f0ff0c717a6c9b..174707b2a0a2360 100644
--- a/libc/src/__support/GPU/amdgpu/utils.h
+++ b/libc/src/__support/GPU/amdgpu/utils.h
@@ -179,6 +179,14 @@ LIBC_INLINE uint64_t fixed_frequency_clock() {
 /// Terminates execution of the associated wavefront.
 [[noreturn]] LIBC_INLINE void end_program() { __builtin_amdgcn_endpgm(); }
 
+/// Returns a unique identifier for the process cluster the current wavefront is
+/// executing on. Here we use the identifier for the compute unit (CU) and shader
+/// engine.
+/// FIXME: Currently unimplemented on AMDGPU until we have a simpler interface
+/// than the one at
+/// https://github.com/ROCm/clr/blob/develop/hipamd/include/hip/amd_detail/amd_device_functions.h#L899
+LIBC_INLINE uint32_t get_cluster_id() { return 0; }
+
 } // namespace gpu
 } // namespace LIBC_NAMESPACE
 
diff --git a/libc/src/__support/GPU/generic/utils.h b/libc/src/__support/GPU/generic/utils.h
index b701db482bbe98d..00b59837ccc6714 100644
--- a/libc/src/__support/GPU/generic/utils.h
+++ b/libc/src/__support/GPU/generic/utils.h
@@ -75,6 +75,8 @@ LIBC_INLINE uint64_t fixed_frequency_clock() { return 0; }
 
 [[noreturn]] LIBC_INLINE void end_program() { __builtin_unreachable(); }
 
+LIBC_INLINE uint32_t get_cluster_id() { return 0; }
+
 } // namespace gpu
 } // namespace LIBC_NAMESPACE
 
diff --git a/libc/src/__support/GPU/nvptx/utils.h b/libc/src/__support/GPU/nvptx/utils.h
index 1519f36850a63c5..c60239ee7895614 100644
--- a/libc/src/__support/GPU/nvptx/utils.h
+++ b/libc/src/__support/GPU/nvptx/utils.h
@@ -159,6 +159,10 @@ LIBC_INLINE uint64_t fixed_frequency_clock() {
   __builtin_unreachable();
 }
 
+/// Returns a unique identifier for the process cluster the current warp is
+/// executing on. Here we use the identifier for the symmetric multiprocessor.
+LIBC_INLINE uint32_t get_cluster_id() { return __nvvm_read_ptx_sreg_smid(); }
+
 } // namespace gpu
 } // namespace LIBC_NAMESPACE
 
diff --git a/libc/src/__support/RPC/rpc.h b/libc/src/__support/RPC/rpc.h
index 7b2c89ac4dce48b..7924d4cec2ac84a 100644
--- a/libc/src/__support/RPC/rpc.h
+++ b/libc/src/__support/RPC/rpc.h
@@ -57,7 +57,7 @@ template <uint32_t lane_size = gpu::LANE_SIZE> struct alignas(64) Packet {
 };
 
 /// The maximum number of parallel ports that the RPC interface can support.
-constexpr uint64_t MAX_PORT_COUNT = 512;
+constexpr uint64_t MAX_PORT_COUNT = 4096;
 
 /// A common process used to synchronize communication between a client and a
 /// server. The process contains a read-only inbox and a write-only outbox used
@@ -519,7 +519,7 @@ LIBC_INLINE void Port<T, S>::recv_n(void **dst, uint64_t *size, A &&alloc) {
 template <uint16_t opcode> LIBC_INLINE Client::Port Client::open() {
   // Repeatedly perform a naive linear scan for a port that can be opened to
   // send data.
-  for (uint32_t index = 0;; ++index) {
+  for (uint32_t index = gpu::get_cluster_id();; ++index) {
     // Start from the beginning if we run out of ports to check.
     if (index >= process.port_count)
       index = 0;
diff --git a/libc/utils/gpu/server/rpc_server.h b/libc/utils/gpu/server/rpc_server.h
index a818aab4ced94b1..f1a8fe06281cbf7 100644
--- a/libc/utils/gpu/server/rpc_server.h
+++ b/libc/utils/gpu/server/rpc_server.h
@@ -18,7 +18,7 @@ extern "C" {
 #endif
 
 /// The maximum number of ports that can be opened for any server.
-const uint64_t RPC_MAXIMUM_PORT_COUNT = 512;
+const uint64_t RPC_MAXIMUM_PORT_COUNT = 4096;
 
 /// The symbol name associated with the client for use with the LLVM C library
 /// implementation.

github-actions · 2024-01-23T20:11:30Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Summary: The RPC interface uses several ports to provide parallel access. Right now we begin the search at the beginning, which heavily contests the early ports. Using the SMID allows us to stagger the starting index based off of the cluster identifier that is executing the current warp. Multiple warps can share an SM, but it will guaruntee that the contention for the low indices is lower. This also increases the maximum port size to around 4096, this is because 512 isn't enough to cover the full hardare parallelism needed to guarantee this doesdn't deadlock.

lntue · 2024-01-23T21:14:35Z

libc/src/__support/RPC/rpc.h

@@ -57,7 +57,7 @@ template <uint32_t lane_size = gpu::LANE_SIZE> struct alignas(64) Packet {
 };

 /// The maximum number of parallel ports that the RPC interface can support.
-constexpr uint64_t MAX_PORT_COUNT = 512;
+constexpr uint64_t MAX_PORT_COUNT = 4096;


This constant is spread across multiple files. Do you want to have a macro defining this something like LIBC_RPC_MAX_PORT_COUNT defaulted to 4096 if undef, so that it can even be set from the build command?

I'm unsure if it's worthwhile to have such a thing. The only reason we define this in multiple places is because the original discussions with Siva indicated that we didn't want to simply provide access to the rpc.h header. So, the compromise was to define it twice and have a static assert in the impl to ensure they always match. I just can't imagine a situation where a user would want to set this manually (maybe save 128 bytes of GPU memory?) so it's probably best not to bloat the options.

jhuber6 requested review from arsenm, jdoerfert, JonChesterfield, lntue, michaelrj-google and shiltian January 23, 2024 20:09

llvmbot added backend:AMDGPU libc labels Jan 23, 2024

jhuber6 force-pushed the SMID branch from bcd97c5 to 344357c Compare January 23, 2024 20:21

lntue reviewed Jan 23, 2024

View reviewed changes

lntue approved these changes Jan 29, 2024

View reviewed changes

jhuber6 merged commit 5470ea4 into llvm:main Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libc] Change the starting port index to use the SMID #79200

[libc] Change the starting port index to use the SMID #79200

Uh oh!

jhuber6 commented Jan 23, 2024

Uh oh!

llvmbot commented Jan 23, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jan 23, 2024 •

edited

Loading

Uh oh!

lntue Jan 23, 2024

Uh oh!

jhuber6 Jan 23, 2024

Uh oh!

Uh oh!

[libc] Change the starting port index to use the SMID #79200

[libc] Change the starting port index to use the SMID #79200

Uh oh!

Conversation

jhuber6 commented Jan 23, 2024

Uh oh!

llvmbot commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lntue Jan 23, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jan 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Jan 23, 2024 •

edited

Loading

github-actions bot commented Jan 23, 2024 •

edited

Loading