@@ -12,8 +12,35 @@ manipulations to launch a GPU kernel and provide a simple path towards GPU
12
12
execution from MLIR. It may be targeted, for example, by DSLs using MLIR. The
13
13
dialect uses ` gpu ` as its canonical prefix.
14
14
15
+ This dialect also abstracts away primitives commonly available in GPU code, such
16
+ as with ` gpu.thread_id ` (an operation that returns the ID of threads within
17
+ a thread block/workgroup along a given dimension). While the compilation
18
+ pipelines documented below expect such code to live inside a ` gpu.module ` and
19
+ ` gpu.func ` , these intrinsic wrappers may be used outside of this context.
20
+
21
+ Intrinsic-wrapping operations should not expect that they have a parent of type
22
+ ` gpu.func ` . However, operations that deal in compiling and launching GPU functions,
23
+ like ` gpu.launch_func ` or ` gpu.binary ` may assume that the dialect's full layering
24
+ is being used.
25
+
15
26
[ TOC]
16
27
28
+ ## GPU address spaces
29
+
30
+ The GPU dialect exposes the ` gpu.address_space ` attribute, which currently has
31
+ three values: ` global ` , ` workgroup ` , and ` private ` .
32
+
33
+ These address spaces represent the types of buffer commonly seen in GPU compilation.
34
+ ` global ` memory is memory that resides in the GPU's global memory. ` workgroup `
35
+ memory is a limited, per-workgroup resource: all threads in a workgroup/thread
36
+ block access the same values in ` workgroup ` memory. Finally, ` private ` memory is
37
+ used to represent ` alloca ` -like buffers that are private to a single thread/workitem.
38
+
39
+ These address spaces may be used as the ` memorySpace ` attribute on ` memref ` values.
40
+ The ` gpu.module ` /` gpu.func ` compilation pipeline will lower such memory space
41
+ usages to the correct address spaces on target platforms. Memory attributions should be
42
+ created with the correct memory space on the memref.
43
+
17
44
## Memory attribution
18
45
19
46
Memory buffers are defined at the function level, either in "gpu.launch" or in
@@ -58,6 +85,15 @@ mlir-translate example-nvvm.mlir \
58
85
-o example.ll
59
86
```
60
87
88
+ This compilation process expects all GPU code to live in a ` gpu.module ` and
89
+ expects all kernels to be ` gpu.func ` operations. Non-kernel functions, like
90
+ device library calls, may be defined using ` func.func ` or other non-GPU dialect
91
+ operations. This permits downstream systems to use these wrappers without
92
+ requiring them to use the GPU dialect's function operations, which might not include
93
+ information those systems want to have as intrinsic values on their functions.
94
+ Additionally, this allows for using ` func.func ` for device-side library functions
95
+ in ` gpu.module ` s.
96
+
61
97
### Default NVVM Compilation Pipeline: gpu-lower-to-nvvm-pipeline
62
98
63
99
The ` gpu-lower-to-nvvm-pipeline ` compilation pipeline serves as the default way
@@ -82,9 +118,9 @@ within GPU code execution:
82
118
func.func @main() {
83
119
%c2 = arith.constant 2 : index
84
120
%c1 = arith.constant 1 : index
85
- gpu.launch
86
- blocks(%0, %1, %2) in (%3 = %c1, %4 = %c1, %5 = %c1)
87
- threads(%6, %7, %8) in (%9 = %c2, %10 = %c1, %11 = %c1) {
121
+ gpu.launch
122
+ blocks(%0, %1, %2) in (%3 = %c1, %4 = %c1, %5 = %c1)
123
+ threads(%6, %7, %8) in (%9 = %c2, %10 = %c1, %11 = %c1) {
88
124
gpu.printf "Hello from %d\n" %6 : index
89
125
gpu.terminator
90
126
}
0 commit comments