Skip to content

Commit a7cdea7

Browse files
committed
[mlir][gpu] Add documentation for the new GPU compilation mechanism
Adds documentation to the GPU dialect docs giving a general overview of the new compilation mechanism introduced in the patch series ending in D154153. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D157461
1 parent b43068e commit a7cdea7

File tree

1 file changed

+99
-0
lines changed

1 file changed

+99
-0
lines changed

mlir/docs/Dialects/GPU.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,105 @@ instead; we chose not to use `alloca`-style approach that would require more
3636
complex lifetime analysis following the principles of MLIR that promote
3737
structure and representing analysis results in the IR.
3838

39+
## GPU Compilation
40+
### Deprecation notice
41+
The `--gpu-to-(cubin|hsaco)` passes will be deprecated in a future release.
42+
43+
### Compilation overview
44+
The compilation process in the GPU dialect has two main stages: GPU module
45+
serialization and offloading operations translation. Together these stages can
46+
produce GPU binaries and the necessary code to execute them.
47+
48+
An example of how the compilation workflow look is:
49+
50+
```
51+
mlir-opt example.mlir \
52+
--pass-pipeline="builtin.module( \
53+
nvvm-attach-target{chip=sm_90 O=3}, \ # Attach an NVVM target to a gpu.module op.
54+
gpu.module(convert-gpu-to-nvvm), \ # Convert GPU to NVVM.
55+
gpu-to-llvm, \ # Convert GPU to LLVM.
56+
gpu-module-to-binary \ # Serialize GPU modules to binaries.
57+
)" -o example-nvvm.mlir
58+
mlir-translate example-nvvm.mlir \
59+
--mlir-to-llvmir \ # Obtain the translated LLVM IR.
60+
-o example.ll
61+
```
62+
63+
### Module serialization
64+
Attributes implementing the GPU Target Attribute Interface handle the
65+
serialization process and are called Target attributes. These attributes can be
66+
attached to GPU Modules indicating the serialization scheme to compile the
67+
module into a binary string.
68+
69+
The `gpu-module-to-binary` pass searches for all nested GPU modules and
70+
serializes the module using the target attributes attached to the module,
71+
producing a binary with an object for every target.
72+
73+
Example:
74+
```
75+
// Input:
76+
gpu.module @kernels [#nvvm.target<chip = "sm_90">, #nvvm.target<chip = "sm_60">] {
77+
...
78+
}
79+
// mlir-opt --gpu-module-to-binary:
80+
gpu.binary @kernels [
81+
#gpu.object<#nvvm.target<chip = "sm_90">, "sm_90 cubin">,
82+
#gpu.object<#nvvm.target<chip = "sm_60">, "sm_60 cubin">
83+
]
84+
```
85+
86+
### Offloading LLVM translation
87+
Attributes implementing the GPU Offloading LLVM Translation Attribute Interface
88+
handle the translation of GPU binaries and kernel launches into LLVM
89+
instructions and are called Offloading attributes. These attributes are
90+
attached to GPU binary operations.
91+
92+
During the LLVM translation process, GPU binaries get translated using the
93+
scheme provided by the Offloading attribute, translating the GPU binary into
94+
LLVM instructions. Meanwhile, Kernel launches are translated by searching the
95+
appropriate binary and invoking the procedure provided by the Offloading
96+
attribute in the binary for translating kernel launches into LLVM instructions.
97+
98+
Example:
99+
```
100+
// Input:
101+
// Binary with multiple objects but selecting the second one for embedding.
102+
gpu.binary @binary <#gpu.select_object<#rocdl.target<chip = "gfx90a">>> [
103+
#gpu.object<#nvvm.target, "NVPTX">,
104+
#gpu.object<#rocdl.target<chip = "gfx90a">, "AMDGPU">
105+
]
106+
llvm.func @foo() {
107+
...
108+
// Launching a kernel inside the binary.
109+
gpu.launch_func @binary::@func blocks in (%0, %0, %0)
110+
threads in (%0, %0, %0) : i64
111+
dynamic_shared_memory_size %2
112+
args(%1 : i32, %1 : i32)
113+
...
114+
}
115+
// mlir-translate --mlir-to-llvmir:
116+
@binary_bin_cst = internal constant [6 x i8] c"AMDGPU", align 8
117+
@binary_func_kernel_name = private unnamed_addr constant [7 x i8] c"func\00", align 1
118+
...
119+
define void @foo() {
120+
...
121+
%module = call ptr @mgpuModuleLoad(ptr @binary_bin_cst)
122+
%kernel = call ptr @mgpuModuleGetFunction(ptr %module, ptr @binary_func_kernel_name)
123+
call void @mgpuLaunchKernel(ptr %kernel, ...) ; Launch the kernel
124+
...
125+
call void @mgpuModuleUnload(ptr %module)
126+
...
127+
}
128+
...
129+
```
130+
131+
### The binary operation
132+
From a semantic point of view, GPU binaries allow the implementation of many
133+
concepts, from simple object files to fat binaries. By default, the binary
134+
operation uses the `#gpu.select_object` offloading attribute; this attribute
135+
embeds a single object in the binary as a global string, see the attribute docs
136+
for more information.
137+
39138
## Operations
40139

41140
[include "Dialects/GPUOps.md"]

0 commit comments

Comments
 (0)