Skip to content

Commit 9ab3468

Browse files
committed
[mlir] add a simple gpu barrier elimination mechanism
GPU code generation, and specifically the shared memory copy insertion may introduce spurious barriers guarding read-after-read dependencies or read-after-write on non-aliasing data, which degrades performance due to unnecessary synchronization. Add a pattern and transform op that removes such barriers by analyzing memory effects that the barrier actually guards that are not also guarded by other barriers. The code is adapted from the Polygeist incubator project. Co-authored-by: William Moses <[email protected]> Co-authored-by: Ivan Radanov Ivanov <[email protected]> Reviewed By: nicolasvasilache, wsmoses Differential Revision: https://reviews.llvm.org/D154720
1 parent 9b79e15 commit 9ab3468

File tree

5 files changed

+800
-2
lines changed

5 files changed

+800
-2
lines changed

mlir/include/mlir/Dialect/GPU/TransformOps/GPUTransformOps.td

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,38 @@ include "mlir/Dialect/Transform/IR/TransformInterfaces.td"
1414
include "mlir/Interfaces/SideEffectInterfaces.td"
1515
include "mlir/IR/OpBase.td"
1616

17+
def EliminateBarriersOp :
18+
Op<Transform_Dialect, "apply_patterns.gpu.eliminate_barriers",
19+
[DeclareOpInterfaceMethods<PatternDescriptorOpInterface>]> {
20+
let description = [{
21+
Removes unnecessary GPU barriers from the function. If a barrier does not
22+
enforce any conflicting pair of memory effects, including a pair that is
23+
enforced by another barrier, it is unnecessary and can be removed.
24+
25+
The approach is based on "High-Performance GPU-to-CPU Transpilation and
26+
Optimization via High-Level Parallel Constructs" by Moses, Ivanov,
27+
Domke, Endo, Doerfert, and Zinenko in PPoPP 2023. Specifically, it
28+
analyzes the memory effects of the operations before and after the given
29+
barrier and checks if the barrier enforces any of the memory
30+
effect-induced dependencies that aren't already enforced by another
31+
barrier.
32+
33+
For example, in the following code
34+
35+
```mlir
36+
store %A
37+
barrier // enforces load-after-store
38+
load %A
39+
barrier // load-after-store already enforced by the previous barrier
40+
load %A
41+
```
42+
43+
the second barrier can be removed.
44+
}];
45+
46+
let assemblyFormat = [{ attr-dict }];
47+
}
48+
1749
def MapNestedForallToThreads :
1850
Op<Transform_Dialect, "gpu.map_nested_forall_to_threads",
1951
[FunctionalStyleTransformOpTrait,

mlir/lib/Dialect/GPU/TransformOps/CMakeLists.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,11 @@ add_mlir_dialect_library(MLIRGPUTransformOps
1111
MLIRGPUDeviceMapperEnumsGen
1212

1313
LINK_LIBS PUBLIC
14-
MLIRIR
14+
MLIRGPUDialect
1515
MLIRGPUTransforms
16+
MLIRIR
1617
MLIRParser
1718
MLIRSideEffectInterfaces
1819
MLIRTransformDialect
19-
MLIRGPUDialect
20+
MLIRVectorDialect
2021
)

0 commit comments

Comments
 (0)