[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime #99035

grypp · 2024-07-16T13:21:12Z

host_shared on gpu.alloc means the memory will be avaiable on host and device. This means managed memory in the nvidia side. However, host_shared is unused in the runtime. This PR uses it to call cuMemAllocManaged.

`host_shared` on `gpu.alloc` means the memory will be avaiable on host and device. This means managed memory in the nvidia side. However, `host_shared` is unused in the runtime. This PR uses it to call cuMemAllocManaged.

llvmbot · 2024-07-16T13:21:44Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Guray Ozen (grypp)

Changes

host_shared on gpu.alloc means the memory will be avaiable on host and device. This means managed memory in the nvidia side. However, host_shared is unused in the runtime. This PR uses it to call cuMemAllocManaged.

Full diff: https://github.com/llvm/llvm-project/pull/99035.diff

2 Files Affected:

(modified) mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp (+10-3)
(added) mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir (+27)

diff --git a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
index 09dc30365e37c..6a32309aa9e05 100644
--- a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
@@ -237,11 +237,18 @@ extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuEventRecord(CUevent event,
 }
 
 extern "C" MLIR_CUDA_WRAPPERS_EXPORT void *
-mgpuMemAlloc(uint64_t sizeBytes, CUstream /*stream*/, bool /*isHostShared*/) {
+mgpuMemAlloc(uint64_t sizeBytes, CUstream stream, bool isHostShared) {
   ScopedContext scopedContext;
   CUdeviceptr ptr = 0;
-  if (sizeBytes != 0)
-    CUDA_REPORT_IF_ERROR(cuMemAlloc(&ptr, sizeBytes));
+  if (sizeBytes == 0)
+    return reinterpret_cast<void *>(ptr);
+
+  if (isHostShared) {
+    CUDA_REPORT_IF_ERROR(
+        cuMemAllocManaged(&ptr, sizeBytes, CU_MEM_ATTACH_GLOBAL));
+    return reinterpret_cast<void *>(ptr);
+  }
+  CUDA_REPORT_IF_ERROR(cuMemAlloc(&ptr, sizeBytes));
   return reinterpret_cast<void *>(ptr);
 }
 
diff --git a/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir b/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir
new file mode 100644
index 0000000000000..77fa0deffdd69
--- /dev/null
+++ b/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir
@@ -0,0 +1,27 @@
+// RUN: mlir-opt %s \
+// RUN: | mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-format=%gpu_compilation_format" \
+// RUN: | mlir-cpu-runner \
+// RUN:   --shared-libs=%mlir_cuda_runtime \
+// RUN:   --shared-libs=%mlir_runner_utils \
+// RUN:   --entry-point-result=void \
+// RUN: | FileCheck %s
+
+// CHECK: 2000
+module attributes {gpu.container_module} {
+  func.func @main() {
+    %c1 = arith.constant 1 : index
+    %c0 = arith.constant 0 : index
+    %c1000_i32 = arith.constant 1000 : i32
+    %memref = gpu.alloc  host_shared () : memref<1xi32>
+    memref.store %c1000_i32, %memref[%c1] : memref<1xi32>
+    gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
+      %1 = memref.load %memref[%c1] : memref<1xi32>
+      %2 = arith.addi %1, %1 : i32
+      memref.store %2, %memref[%c1] : memref<1xi32>
+      gpu.terminator
+    }
+    %0 = memref.load %memref[%c1] : memref<1xi32>
+    vector.print %0 : i32
+    return
+  }
+}

llvm-ci · 2024-07-17T05:31:20Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building mlir at step 6 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/1358

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/alloc-host-shared.mlir' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-cpu-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-lower-to-nvvm-pipeline=cubin-format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-cpu-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void
# .---command stderr------------
# | JIT session error: Symbols not found: [ printNewline, printI64 ]
# | Error: Failed to materialize symbols: { (main, { main, _mlir_main }) }
# `-----------------------------
# error: command failed with exit status: 1
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/alloc-host-shared.mlir
# `-----------------------------
# error: command failed with exit status: 2

--

********************

This fixes the unit test that is broken in #99035.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250900

Summary: This fixes the unit test that is broken in #99035. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251693

[mlir][gpu] Use alloc OP's host_shared in cuda runtime

232d5ed

`host_shared` on `gpu.alloc` means the memory will be avaiable on host and device. This means managed memory in the nvidia side. However, `host_shared` is unused in the runtime. This PR uses it to call cuMemAllocManaged.

grypp requested a review from aartbik July 16, 2024 13:21

llvmbot added mlir:gpu mlir mlir:execution-engine labels Jul 16, 2024

grypp requested review from matthias-springer and dcaballe July 16, 2024 13:22

matthias-springer approved these changes Jul 16, 2024

View reviewed changes

grypp merged commit 20861f1 into llvm:main Jul 17, 2024
9 of 10 checks passed

grypp added a commit that referenced this pull request Jul 17, 2024

[mlir][gpu] Add mlir_c_runner_utils to fix #99035

f2251f9

This fixes the unit test that is broken in #99035.

yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024

[mlir][gpu] Use alloc OP's host_shared in cuda runtime (#99035)

77bfd81

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250900

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime #99035

[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime #99035

Uh oh!

grypp commented Jul 16, 2024

Uh oh!

llvmbot commented Jul 16, 2024 •

edited

Loading

Uh oh!

Uh oh!

llvm-ci commented Jul 17, 2024

Uh oh!

Uh oh!

[mlir][gpu] Use alloc OP's host_shared in cuda runtime #99035

[mlir][gpu] Use alloc OP's host_shared in cuda runtime #99035

Uh oh!

Conversation

grypp commented Jul 16, 2024

Uh oh!

llvmbot commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jul 17, 2024

Uh oh!

Uh oh!

[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime #99035

[mlir][gpu] Use alloc OP's `host_shared` in cuda runtime #99035

llvmbot commented Jul 16, 2024 •

edited

Loading