Skip to content

[libc] Use the NVIDIA device allocator for GPU malloc #124277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 24, 2025

Summary:
This is a blocker on another patch in the OpenMP runtime. The problem is
that NVIDIA truly doesn't handle RPC-based allocations very well. It
cannot reliably update the MMU while a kernel is running and it will
usually deadlock if called from a separate thread due to internal use of
TLS.

This patch just removes the definition of malloc and free for NVPTX.
The result here is that they will be undefined, which is the cue for the
nvlink linker to define them for us. So, as far as libc is concerned
it still implements malloc.

Summary:
This is a blocker on another patch in the OpenMP runtime. The problem is
that NVIDIA truly doesn't handle RPC-based allocations very well. It
cannot reliably update the MMU while a kernel is running and it will
usually deadlock if called from a separate thread due to internal use of
TLS.

This patch just removes the definition of `malloc` and `free` for NVPTX.
The result here is that they will be undefined, which is the cue for the
`nvlink` linker to define them for us. So, as far as `libc` is concerned
it still implements malloc.
@llvmbot
Copy link
Member

llvmbot commented Jan 24, 2025

@llvm/pr-subscribers-libc

Author: Joseph Huber (jhuber6)

Changes

Summary:
This is a blocker on another patch in the OpenMP runtime. The problem is
that NVIDIA truly doesn't handle RPC-based allocations very well. It
cannot reliably update the MMU while a kernel is running and it will
usually deadlock if called from a separate thread due to internal use of
TLS.

This patch just removes the definition of malloc and free for NVPTX.
The result here is that they will be undefined, which is the cue for the
nvlink linker to define them for us. So, as far as libc is concerned
it still implements malloc.


Full diff: https://github.com/llvm/llvm-project/pull/124277.diff

3 Files Affected:

  • (modified) libc/src/stdlib/gpu/free.cpp (+4)
  • (modified) libc/src/stdlib/gpu/malloc.cpp (+4)
  • (modified) libc/test/src/stdlib/CMakeLists.txt (+2-1)
diff --git a/libc/src/stdlib/gpu/free.cpp b/libc/src/stdlib/gpu/free.cpp
index 1f0e9ec7359740..6ef9d718315a5c 100644
--- a/libc/src/stdlib/gpu/free.cpp
+++ b/libc/src/stdlib/gpu/free.cpp
@@ -14,6 +14,10 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
+// FIXME: For now we just default to the NVIDIA device allocator which is
+// always available on NVPTX targets. This will be implemented fully later.
+#ifndef LIBC_TARGET_ARCH_IS_NVPTX
 LLVM_LIBC_FUNCTION(void, free, (void *ptr)) { gpu::deallocate(ptr); }
+#endif
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdlib/gpu/malloc.cpp b/libc/src/stdlib/gpu/malloc.cpp
index 54f2d8843996ee..b5909cb9cb4d02 100644
--- a/libc/src/stdlib/gpu/malloc.cpp
+++ b/libc/src/stdlib/gpu/malloc.cpp
@@ -14,8 +14,12 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
+// FIXME: For now we just default to the NVIDIA device allocator which is
+// always available on NVPTX targets. This will be implemented fully later.
+#ifndef LIBC_TARGET_ARCH_IS_NVPTX
 LLVM_LIBC_FUNCTION(void *, malloc, (size_t size)) {
   return gpu::allocate(size);
 }
+#endif
 
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/test/src/stdlib/CMakeLists.txt b/libc/test/src/stdlib/CMakeLists.txt
index 8cc0428632ba39..aba76833be9d41 100644
--- a/libc/test/src/stdlib/CMakeLists.txt
+++ b/libc/test/src/stdlib/CMakeLists.txt
@@ -420,7 +420,8 @@ if(LLVM_LIBC_FULL_BUILD)
   )
 
   # Only baremetal and GPU has an in-tree 'malloc' implementation.
-  if(LIBC_TARGET_OS_IS_BAREMETAL OR LIBC_TARGET_OS_IS_GPU)
+  if((LIBC_TARGET_OS_IS_BAREMETAL OR LIBC_TARGET_OS_IS_GPU) AND
+      NOT LIBC_TARGET_ARCHITECTURE_IS_NVPTX)
     add_libc_test(
       malloc_test
       HERMETIC_TEST_ONLY

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 24, 2025

This unblocks #112988

@jhuber6 jhuber6 merged commit 256f40d into llvm:main Jan 24, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants