@@ -36,6 +36,105 @@ instead; we chose not to use `alloca`-style approach that would require more
36
36
complex lifetime analysis following the principles of MLIR that promote
37
37
structure and representing analysis results in the IR.
38
38
39
+ ## GPU Compilation
40
+ ### Deprecation notice
41
+ The ` --gpu-to-(cubin|hsaco) ` passes will be deprecated in a future release.
42
+
43
+ ### Compilation overview
44
+ The compilation process in the GPU dialect has two main stages: GPU module
45
+ serialization and offloading operations translation. Together these stages can
46
+ produce GPU binaries and the necessary code to execute them.
47
+
48
+ An example of how the compilation workflow look is:
49
+
50
+ ```
51
+ mlir-opt example.mlir \
52
+ --pass-pipeline="builtin.module( \
53
+ nvvm-attach-target{chip=sm_90 O=3}, \ # Attach an NVVM target to a gpu.module op.
54
+ gpu.module(convert-gpu-to-nvvm), \ # Convert GPU to NVVM.
55
+ gpu-to-llvm, \ # Convert GPU to LLVM.
56
+ gpu-module-to-binary \ # Serialize GPU modules to binaries.
57
+ )" -o example-nvvm.mlir
58
+ mlir-translate example-nvvm.mlir \
59
+ --mlir-to-llvmir \ # Obtain the translated LLVM IR.
60
+ -o example.ll
61
+ ```
62
+
63
+ ### Module serialization
64
+ Attributes implementing the GPU Target Attribute Interface handle the
65
+ serialization process and are called Target attributes. These attributes can be
66
+ attached to GPU Modules indicating the serialization scheme to compile the
67
+ module into a binary string.
68
+
69
+ The ` gpu-module-to-binary ` pass searches for all nested GPU modules and
70
+ serializes the module using the target attributes attached to the module,
71
+ producing a binary with an object for every target.
72
+
73
+ Example:
74
+ ```
75
+ // Input:
76
+ gpu.module @kernels [#nvvm.target<chip = "sm_90">, #nvvm.target<chip = "sm_60">] {
77
+ ...
78
+ }
79
+ // mlir-opt --gpu-module-to-binary:
80
+ gpu.binary @kernels [
81
+ #gpu.object<#nvvm.target<chip = "sm_90">, "sm_90 cubin">,
82
+ #gpu.object<#nvvm.target<chip = "sm_60">, "sm_60 cubin">
83
+ ]
84
+ ```
85
+
86
+ ### Offloading LLVM translation
87
+ Attributes implementing the GPU Offloading LLVM Translation Attribute Interface
88
+ handle the translation of GPU binaries and kernel launches into LLVM
89
+ instructions and are called Offloading attributes. These attributes are
90
+ attached to GPU binary operations.
91
+
92
+ During the LLVM translation process, GPU binaries get translated using the
93
+ scheme provided by the Offloading attribute, translating the GPU binary into
94
+ LLVM instructions. Meanwhile, Kernel launches are translated by searching the
95
+ appropriate binary and invoking the procedure provided by the Offloading
96
+ attribute in the binary for translating kernel launches into LLVM instructions.
97
+
98
+ Example:
99
+ ```
100
+ // Input:
101
+ // Binary with multiple objects but selecting the second one for embedding.
102
+ gpu.binary @binary <#gpu.select_object<#rocdl.target<chip = "gfx90a">>> [
103
+ #gpu.object<#nvvm.target, "NVPTX">,
104
+ #gpu.object<#rocdl.target<chip = "gfx90a">, "AMDGPU">
105
+ ]
106
+ llvm.func @foo() {
107
+ ...
108
+ // Launching a kernel inside the binary.
109
+ gpu.launch_func @binary::@func blocks in (%0, %0, %0)
110
+ threads in (%0, %0, %0) : i64
111
+ dynamic_shared_memory_size %2
112
+ args(%1 : i32, %1 : i32)
113
+ ...
114
+ }
115
+ // mlir-translate --mlir-to-llvmir:
116
+ @binary_bin_cst = internal constant [6 x i8] c"AMDGPU", align 8
117
+ @binary_func_kernel_name = private unnamed_addr constant [7 x i8] c"func\00", align 1
118
+ ...
119
+ define void @foo() {
120
+ ...
121
+ %module = call ptr @mgpuModuleLoad(ptr @binary_bin_cst)
122
+ %kernel = call ptr @mgpuModuleGetFunction(ptr %module, ptr @binary_func_kernel_name)
123
+ call void @mgpuLaunchKernel(ptr %kernel, ...) ; Launch the kernel
124
+ ...
125
+ call void @mgpuModuleUnload(ptr %module)
126
+ ...
127
+ }
128
+ ...
129
+ ```
130
+
131
+ ### The binary operation
132
+ From a semantic point of view, GPU binaries allow the implementation of many
133
+ concepts, from simple object files to fat binaries. By default, the binary
134
+ operation uses the ` #gpu.select_object ` offloading attribute; this attribute
135
+ embeds a single object in the binary as a global string, see the attribute docs
136
+ for more information.
137
+
39
138
## Operations
40
139
41
140
[ include "Dialects/GPUOps.md"]
0 commit comments