Skip to content

[OpenMP] Unconditionally provide an RPC client interface for OpenMP #117933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 2, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Nov 27, 2024

Summary:
This patch adds an RPC interface that lives directly in the OpenMP
device runtime. This allows OpenMP to implement custom opcodes.
Currently this is only providing the host call interface, which is the
raw version of reverse offloading. Previously this lived in libc/ as
an extension which is not the correct place.

The interface here uses a weak symbol for the RPC client by the same
name that the libc interface uses. This means that it will defer to
the libc one if both are present so we don't need to set up multiple
instances.

The presense of this symbol is what controls whether or not we set up
the RPC server. Because this is an external symbol it normally won't be
optimized out, so there's a special pass in OpenMPOpt that deletes this
symbol if it is unused during linking. That means at O0 the RPC server
will always be present now, but will be removed trivially if it's not
used at O1 and higher.

@llvmbot
Copy link
Member

llvmbot commented Nov 27, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
This patch adds an RPC interface that lives directly in the OpenMP
device runtime. This allows OpenMP to implement custom opcodes.
Currently this is only providing the host call interface, which is the
raw version of reverse offloading. Previously this lived in libc/ as
an extension which is not the correct place.

The interface here uses a weak symbol for the RPC client by the same
name that the libc interface uses. This means that it will defer to
the libc one if both are present so we don't need to set up multiple
instances.

The presense of this symbol is what controls whether or not we set up
the RPC server. Because this is an external symbol it normally won't be
optimized out, so there's a special pass in OpenMPOpt that deletes this
symbol if it is unused during linking. That means at O0 the RPC server
will always be present now, but will be removed trivially if it's not
used at O1 and higher.


Full diff: https://github.com/llvm/llvm-project/pull/117933.diff

6 Files Affected:

  • (modified) offload/DeviceRTL/CMakeLists.txt (+2)
  • (modified) offload/DeviceRTL/src/Misc.cpp (+22)
  • (modified) offload/DeviceRTL/src/exports (+1)
  • (added) offload/include/Shared/RPCOpcodes.h (+25)
  • (modified) offload/plugins-nextgen/common/src/RPC.cpp (+17)
  • (renamed) offload/test/api/omp_host_call.c (+5-6)
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
index c76ad018ab4fe7..3da83e5c307132 100644
--- a/offload/DeviceRTL/CMakeLists.txt
+++ b/offload/DeviceRTL/CMakeLists.txt
@@ -131,6 +131,7 @@ set(bc_flags -c -foffload-lto -std=c++17 -fvisibility=hidden
              -DOMPTARGET_DEVICE_RUNTIME
              -I${include_directory}
              -I${devicertl_base_directory}/../include
+             -I${LLVM_MAIN_SRC_DIR}/../libc
              ${LIBOMPTARGET_LLVM_INCLUDE_DIRS_DEVICERTL}
 )
 
@@ -275,6 +276,7 @@ function(compileDeviceRTLLibrary target_cpu target_name target_triple)
     target_compile_definitions(${ide_target_name} PRIVATE SHARED_SCRATCHPAD_SIZE=512)
     target_include_directories(${ide_target_name} PRIVATE
       ${include_directory}
+      ${LLVM_MAIN_SRC_DIR}/../libc
       ${devicertl_base_directory}/../include
       ${LIBOMPTARGET_LLVM_INCLUDE_DIRS}
     )
diff --git a/offload/DeviceRTL/src/Misc.cpp b/offload/DeviceRTL/src/Misc.cpp
index 8e690f6fd8e7ce..c1df477365bcb6 100644
--- a/offload/DeviceRTL/src/Misc.cpp
+++ b/offload/DeviceRTL/src/Misc.cpp
@@ -12,6 +12,8 @@
 #include "Allocator.h"
 #include "Configuration.h"
 #include "DeviceTypes.h"
+#include "Shared/RPCOpcodes.h"
+#include "shared/rpc.h"
 
 #include "Debug.h"
 
@@ -110,6 +112,12 @@ void *indirectCallLookup(void *HstPtr) {
   return HstPtr;
 }
 
+/// The openmp client instance used to communicate with the server.
+/// FIXME: This is marked as 'retain' so that it is not removed via
+/// `-mlink-builtin-bitcode`
+[[gnu::visibility("protected"), gnu::weak,
+  gnu::retain]] rpc::Client Client asm("__llvm_rpc_client");
+
 } // namespace impl
 } // namespace ompx
 
@@ -156,6 +164,20 @@ void omp_free(void *ptr, omp_allocator_handle_t allocator) {
     return;
   }
 }
+
+unsigned long long __llvm_omp_host_call(void *fn, void *data, size_t size) {
+  rpc::Client::Port Port = ompx::impl::Client.open<OFFLOAD_HOST_CALL>();
+  Port.send_n(data, size);
+  Port.send([=](rpc::Buffer *buffer, uint32_t) {
+    buffer->data[0] = reinterpret_cast<uintptr_t>(fn);
+  });
+  unsigned long long Ret;
+  Port.recv([&](rpc::Buffer *Buffer, uint32_t) {
+    Ret = static_cast<unsigned long long>(Buffer->data[0]);
+  });
+  Port.close();
+  return Ret;
+}
 }
 
 ///}
diff --git a/offload/DeviceRTL/src/exports b/offload/DeviceRTL/src/exports
index 288ddf90b4a9f2..01667e7aba827a 100644
--- a/offload/DeviceRTL/src/exports
+++ b/offload/DeviceRTL/src/exports
@@ -15,4 +15,5 @@ malloc
 free
 memcmp
 printf
+__llvm_rpc_client
 __assert_fail
diff --git a/offload/include/Shared/RPCOpcodes.h b/offload/include/Shared/RPCOpcodes.h
new file mode 100644
index 00000000000000..beee29df1f7076
--- /dev/null
+++ b/offload/include/Shared/RPCOpcodes.h
@@ -0,0 +1,25 @@
+//===-- Shared/RPCOpcodes.h - Offload specific RPC opcodes ----- C++ ------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Defines RPC opcodes that are specifically used by the OpenMP device runtime.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef OMPTARGET_SHARED_RPC_OPCODES_H
+#define OMPTARGET_SHARED_RPC_OPCODES_H
+
+#define LLVM_OFFLOAD_RPC_BASE 'o'
+#define LLVM_OFFLOAD_OPCODE(n) (LLVM_OFFLOAD_RPC_BASE << 24 | n)
+
+typedef enum {
+  OFFLOAD_HOST_CALL = LLVM_OFFLOAD_OPCODE(0),
+} offload_opcode_t;
+
+#undef LLVM_OFFLOAD_OPCODE
+
+#endif // OMPTARGET_SHARED_RPC_OPCODES_H
diff --git a/offload/plugins-nextgen/common/src/RPC.cpp b/offload/plugins-nextgen/common/src/RPC.cpp
index c35431da69eb65..38509b5d78e615 100644
--- a/offload/plugins-nextgen/common/src/RPC.cpp
+++ b/offload/plugins-nextgen/common/src/RPC.cpp
@@ -9,6 +9,7 @@
 #include "RPC.h"
 
 #include "Shared/Debug.h"
+#include "Shared/RPCOpcodes.h"
 
 #include "PluginInterface.h"
 
@@ -93,6 +94,22 @@ Error RPCServerTy::runServer(plugin::GenericDeviceTy &Device) {
     });
     break;
   }
+  case OFFLOAD_HOST_CALL: {
+    uint64_t Sizes[64] = {0};
+    unsigned long long Results[64] = {0};
+    void *Args[64] = {nullptr};
+    Port->recv_n(Args, Sizes, [&](uint64_t Size) { return new char[Size]; });
+    Port->recv([&](rpc::Buffer *buffer, uint32_t ID) {
+      using FuncPtrTy = unsigned long long (*)(void *);
+      auto Func = reinterpret_cast<FuncPtrTy>(buffer->data[0]);
+      Results[ID] = Func(Args[ID]);
+    });
+    Port->send([&](rpc::Buffer *Buffer, uint32_t ID) {
+      Buffer->data[0] = static_cast<uint64_t>(Results[ID]);
+      delete[] reinterpret_cast<char *>(Args[ID]);
+    });
+    break;
+  }
   default:
     // Let the `libc` library handle any other unhandled opcodes.
     Status = handle_libc_opcodes(*Port, Device.getWarpSize());
diff --git a/offload/test/libc/host_call.c b/offload/test/api/omp_host_call.c
similarity index 82%
rename from offload/test/libc/host_call.c
rename to offload/test/api/omp_host_call.c
index 61c4e14d5b3881..a3d1a97822b31b 100644
--- a/offload/test/libc/host_call.c
+++ b/offload/test/api/omp_host_call.c
@@ -1,20 +1,18 @@
 // RUN: %libomptarget-compile-run-and-check-generic
 
-// REQUIRES: libc
-
 #include <assert.h>
 #include <omp.h>
 #include <stdio.h>
 
 #pragma omp begin declare variant match(device = {kind(gpu)})
 // Extension provided by the 'libc' project.
-unsigned long long rpc_host_call(void *fn, void *args, size_t size);
-#pragma omp declare target to(rpc_host_call) device_type(nohost)
+unsigned long long __llvm_omp_host_call(void *fn, void *args, size_t size);
+#pragma omp declare target to(__llvm_omp_host_call) device_type(nohost)
 #pragma omp end declare variant
 
 #pragma omp begin declare variant match(device = {kind(cpu)})
 // Dummy host implementation to make this work for all targets.
-unsigned long long rpc_host_call(void *fn, void *args, size_t size) {
+unsigned long long __llvm_omp_host_call(void *fn, void *args, size_t size) {
   return ((unsigned long long (*)(void *))fn)(args);
 }
 #pragma omp end declare variant
@@ -58,7 +56,8 @@ int main() {
 #pragma omp parallel num_threads(2)
   {
     args_t args = {omp_get_thread_num(), omp_get_team_num()};
-    unsigned long long res = rpc_host_call(fn_ptr, &args, sizeof(args_t));
+    unsigned long long res =
+        __llvm_omp_host_call(fn_ptr, &args, sizeof(args_t));
     printf("Result: %d\n", (int)res);
   }
 }

@jplehr
Copy link
Contributor

jplehr commented Nov 28, 2024

Thank you. IMHO that's easier to understand.

Summary:
This patch adds an RPC interface that lives directly in the OpenMP
device runtime. This allows OpenMP to implement custom opcodes.
Currently this is only providing the host call interface, which is the
raw version of reverse offloading. Previously this lived in `libc/` as
an extension which is not the correct place.

The interface here uses a weak symbol for the RPC client by the same
name that the `libc` interface uses. This means that it will defer to
the libc one if both are present so we don't need to set up multiple
instances.

The presense of this symbol is what controls whether or not we set up
the RPC server. Because this is an external symbol it normally won't be
optimized out, so there's a special pass in OpenMPOpt that deletes this
symbol if it is unused during linking. That means at `O0` the RPC server
will always be present now, but will be removed trivially if it's not
used at O1 and higher.
Copy link

github-actions bot commented Dec 2, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Dec 2, 2024

@ronlieb @jplehr This applies cleanly to amd-staging and works fine from my tests.

Copy link
Contributor

@jplehr jplehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
(and hopefully passes our buildbots now)

@jhuber6 jhuber6 merged commit 91f5f97 into llvm:main Dec 2, 2024
6 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 2, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building offload at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/9461

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: sanitizer/ptr_outside_alloc_2.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# RUN: at line 3
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/ptr_outside_alloc_2.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c --check-prefixes=CHECK
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c:21:11: error: CHECK: expected string not found in input
# | // CHECK: OFFLOAD ERROR: Memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
# |           ^
# | <stdin>:1:1: note: scanning from here
# | AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/ptr_outside_alloc_2.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |           1: AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. 
# | check:21     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |           2: AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           3: "PluginInterface" error: Failure to allocate device memory: Failed to allocate from memory manager 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           4: omptarget error: Call to getTargetPointer returned null pointer (device failure or illegal mapping). 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           5: omptarget error: Call to targetDataBegin failed, abort target. 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           6: omptarget error: Failed to process data before launching the kernel. 
# | check:21     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           .
# |           .
# |           .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants