Skip to content

[libc] Change the starting port index to use the SMID #79200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions libc/src/__support/GPU/amdgpu/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,14 @@ LIBC_INLINE uint64_t fixed_frequency_clock() {
/// Terminates execution of the associated wavefront.
[[noreturn]] LIBC_INLINE void end_program() { __builtin_amdgcn_endpgm(); }

/// Returns a unique identifier for the process cluster the current wavefront is
/// executing on. Here we use the identifier for the compute unit (CU) and
/// shader engine.
/// FIXME: Currently unimplemented on AMDGPU until we have a simpler interface
/// than the one at
/// https://github.com/ROCm/clr/blob/develop/hipamd/include/hip/amd_detail/amd_device_functions.h#L899
LIBC_INLINE uint32_t get_cluster_id() { return 0; }

} // namespace gpu
} // namespace LIBC_NAMESPACE

Expand Down
2 changes: 2 additions & 0 deletions libc/src/__support/GPU/generic/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ LIBC_INLINE uint64_t fixed_frequency_clock() { return 0; }

[[noreturn]] LIBC_INLINE void end_program() { __builtin_unreachable(); }

LIBC_INLINE uint32_t get_cluster_id() { return 0; }

} // namespace gpu
} // namespace LIBC_NAMESPACE

Expand Down
4 changes: 4 additions & 0 deletions libc/src/__support/GPU/nvptx/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,10 @@ LIBC_INLINE uint64_t fixed_frequency_clock() {
__builtin_unreachable();
}

/// Returns a unique identifier for the process cluster the current warp is
/// executing on. Here we use the identifier for the symmetric multiprocessor.
LIBC_INLINE uint32_t get_cluster_id() { return __nvvm_read_ptx_sreg_smid(); }

} // namespace gpu
} // namespace LIBC_NAMESPACE

Expand Down
4 changes: 2 additions & 2 deletions libc/src/__support/RPC/rpc.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ template <uint32_t lane_size = gpu::LANE_SIZE> struct alignas(64) Packet {
};

/// The maximum number of parallel ports that the RPC interface can support.
constexpr uint64_t MAX_PORT_COUNT = 512;
constexpr uint64_t MAX_PORT_COUNT = 4096;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is spread across multiple files. Do you want to have a macro defining this something like LIBC_RPC_MAX_PORT_COUNT defaulted to 4096 if undef, so that it can even be set from the build command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if it's worthwhile to have such a thing. The only reason we define this in multiple places is because the original discussions with Siva indicated that we didn't want to simply provide access to the rpc.h header. So, the compromise was to define it twice and have a static assert in the impl to ensure they always match. I just can't imagine a situation where a user would want to set this manually (maybe save 128 bytes of GPU memory?) so it's probably best not to bloat the options.


/// A common process used to synchronize communication between a client and a
/// server. The process contains a read-only inbox and a write-only outbox used
Expand Down Expand Up @@ -519,7 +519,7 @@ LIBC_INLINE void Port<T, S>::recv_n(void **dst, uint64_t *size, A &&alloc) {
template <uint16_t opcode> LIBC_INLINE Client::Port Client::open() {
// Repeatedly perform a naive linear scan for a port that can be opened to
// send data.
for (uint32_t index = 0;; ++index) {
for (uint32_t index = gpu::get_cluster_id();; ++index) {
// Start from the beginning if we run out of ports to check.
if (index >= process.port_count)
index = 0;
Expand Down
2 changes: 1 addition & 1 deletion libc/utils/gpu/server/rpc_server.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ extern "C" {
#endif

/// The maximum number of ports that can be opened for any server.
const uint64_t RPC_MAXIMUM_PORT_COUNT = 512;
const uint64_t RPC_MAXIMUM_PORT_COUNT = 4096;

/// The symbol name associated with the client for use with the LLVM C library
/// implementation.
Expand Down