Skip to content

Commit 560b645

Browse files
krzysz00kuharadam-smnk
authored
[mlir] Document GPU dialect layering to capture discussions from a PR (#95812)
Co-authored-by: Jakub Kuderski <[email protected]> Co-authored-by: Adam Siemieniuk <[email protected]>
1 parent 1ba2768 commit 560b645

File tree

1 file changed

+39
-3
lines changed

1 file changed

+39
-3
lines changed

mlir/docs/Dialects/GPU.md

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,35 @@ manipulations to launch a GPU kernel and provide a simple path towards GPU
1212
execution from MLIR. It may be targeted, for example, by DSLs using MLIR. The
1313
dialect uses `gpu` as its canonical prefix.
1414

15+
This dialect also abstracts away primitives commonly available in GPU code, such
16+
as with `gpu.thread_id` (an operation that returns the ID of threads within
17+
a thread block/workgroup along a given dimension). While the compilation
18+
pipelines documented below expect such code to live inside a `gpu.module` and
19+
`gpu.func`, these intrinsic wrappers may be used outside of this context.
20+
21+
Intrinsic-wrapping operations should not expect that they have a parent of type
22+
`gpu.func`. However, operations that deal in compiling and launching GPU functions,
23+
like `gpu.launch_func` or `gpu.binary` may assume that the dialect's full layering
24+
is being used.
25+
1526
[TOC]
1627

28+
## GPU address spaces
29+
30+
The GPU dialect exposes the `gpu.address_space` attribute, which currently has
31+
three values: `global`, `workgroup`, and `private`.
32+
33+
These address spaces represent the types of buffer commonly seen in GPU compilation.
34+
`global` memory is memory that resides in the GPU's global memory. `workgroup`
35+
memory is a limited, per-workgroup resource: all threads in a workgroup/thread
36+
block access the same values in `workgroup` memory. Finally, `private` memory is
37+
used to represent `alloca`-like buffers that are private to a single thread/workitem.
38+
39+
These address spaces may be used as the `memorySpace` attribute on `memref` values.
40+
The `gpu.module`/`gpu.func` compilation pipeline will lower such memory space
41+
usages to the correct address spaces on target platforms. Memory attributions should be
42+
created with the correct memory space on the memref.
43+
1744
## Memory attribution
1845

1946
Memory buffers are defined at the function level, either in "gpu.launch" or in
@@ -58,6 +85,15 @@ mlir-translate example-nvvm.mlir \
5885
-o example.ll
5986
```
6087

88+
This compilation process expects all GPU code to live in a `gpu.module` and
89+
expects all kernels to be `gpu.func` operations. Non-kernel functions, like
90+
device library calls, may be defined using `func.func` or other non-GPU dialect
91+
operations. This permits downstream systems to use these wrappers without
92+
requiring them to use the GPU dialect's function operations, which might not include
93+
information those systems want to have as intrinsic values on their functions.
94+
Additionally, this allows for using `func.func` for device-side library functions
95+
in `gpu.module`s.
96+
6197
### Default NVVM Compilation Pipeline: gpu-lower-to-nvvm-pipeline
6298

6399
The `gpu-lower-to-nvvm-pipeline` compilation pipeline serves as the default way
@@ -82,9 +118,9 @@ within GPU code execution:
82118
func.func @main() {
83119
%c2 = arith.constant 2 : index
84120
%c1 = arith.constant 1 : index
85-
gpu.launch
86-
blocks(%0, %1, %2) in (%3 = %c1, %4 = %c1, %5 = %c1)
87-
threads(%6, %7, %8) in (%9 = %c2, %10 = %c1, %11 = %c1) {
121+
gpu.launch
122+
blocks(%0, %1, %2) in (%3 = %c1, %4 = %c1, %5 = %c1)
123+
threads(%6, %7, %8) in (%9 = %c2, %10 = %c1, %11 = %c1) {
88124
gpu.printf "Hello from %d\n" %6 : index
89125
gpu.terminator
90126
}

0 commit comments

Comments
 (0)