[mlir][gpu] Eliminate redundant gpu.barrier ops #71575

spaceotter · 2023-11-07T19:12:38Z

Adds a canonicalizer for gpu.barrier that gets rid of duplicates.

llvmbot · 2023-11-07T19:13:05Z

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: None (spaceotter)

Changes

Adds a canonicalizer for gpu.barrier that gets rid of duplicates.

Full diff: https://github.com/llvm/llvm-project/pull/71575.diff

2 Files Affected:

(modified) mlir/include/mlir/Dialect/GPU/IR/GPUOps.td (+1)
(modified) mlir/lib/Dialect/GPU/IR/GPUDialect.cpp (+28)

diff --git a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
index 6375d35f4311295..632cdd96c6d4c2b 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
@@ -1010,6 +1010,7 @@ def GPU_BarrierOp : GPU_Op<"barrier"> {
     in convergence.
   }];
   let assemblyFormat = "attr-dict";
+  let hasCanonicalizer = 1;
 }
 
 def GPU_GPUModuleOp : GPU_Op<"module", [
diff --git a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
index 5eb2cadc884e151..d9ffacfd0d54f59 100644
--- a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+++ b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
@@ -1139,6 +1139,34 @@ void ShuffleOp::build(OpBuilder &builder, OperationState &result, Value value,
         mode);
 }
 
+//===----------------------------------------------------------------------===//
+// BarrierOp
+//===----------------------------------------------------------------------===//
+
+namespace {
+
+/// Remove gpu.barrier after gpu.barrier, the threads are already synchronized!
+struct EraseRedundantGpuBarrierOpPairs : public OpRewritePattern<BarrierOp> {
+public:
+  using OpRewritePattern::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(BarrierOp op,
+                                PatternRewriter &rewriter) const final {
+    if (isa<BarrierOp>(op->getNextNode())) {
+      rewriter.eraseOp(op->getNextNode());
+      return success();
+    }
+    return failure();
+  }
+};
+
+} // end anonymous namespace
+
+void BarrierOp::getCanonicalizationPatterns(RewritePatternSet &results,
+                                            MLIRContext *context) {
+  results.add<EraseRedundantGpuBarrierOpPairs>(context);
+}
+
 //===----------------------------------------------------------------------===//
 // GPUFuncOp
 //===----------------------------------------------------------------------===//

grypp · 2023-11-07T20:04:18Z

The PR is a good start, but it doesn’t really optimize anything. It only removes barriers one after another. The program runs on a same speed even if we don't remove them, because the threads are already synchronized on the first barrier.

I think this PR needs to be largely extended. A proper barrier elimination can remove more redundant barriers.

Example 1:
Here the first barrier can be deleted. The second might be eliminated but we need to do alias analysis and memref region.

Read(%0 : memref<?xf32,3>)
gpu.barrier
Write(%0 : memref<?xf32,3>)
gpu.barrier
Read(%0 : memref<?xf32,3>)

Example 2:

We also need to consider if-else statements. See the example below, we can eliminate first two barriers, because the last barrier is already synchronizing the threads.

if(){ 
…
gpu.barrier
} else {
…
gpu.barrier
}
gpu.barrier

nirvedhmeshram · 2023-11-07T20:54:39Z

In the first example wouldnt alias analysis be needed to remove the first barrier since it could cause a Write-After-Read Hazard?

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

antiagainst · 2023-11-07T21:27:19Z

@grypp Agreed that we can do better for eliding unnecessary barriers. That can happen as a next step to me; and we'd likely need a dedicated pass to have some complicated analysis. Using canonicalization pattern for this simple cleanup is actually a good match here I think.

@spaceotter We need a test for this.

spaceotter · 2023-11-07T21:47:44Z

@antiagainst I added the test.
Synchronizing the barriers does have a performance cost, even if the threads are already synchronized, perhaps more in some architectures than others.

github-actions · 2023-11-07T21:51:05Z

✅ With the latest revision this PR passed the C/C++ code formatter.

grypp · 2023-11-08T07:09:13Z

@grypp Agreed that we can do better for eliding unnecessary barriers. That can happen as a next step to me; and we'd likely need a dedicated pass to have some complicated analysis.

I want to start by saying that I'm not against this barrier elimination, but at the moment, PR doesn't seem to offer any performance improvements. Additionally, if we were to introduce a dedicated pass, this code might become unnecessary.

I was wondering if we could explore ways to improve the value of this PR. For instance, could we consider checking for arithmetic operations between the barriers and potentially remove those barriers as well to make it more useful?

gpu.barriers 
arith scalar
arith scalar
gpu.barriers -> remove this
arith scalar
arith scalar
gpu.barriers

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

ftynse · 2023-11-08T09:13:54Z

In case you missed it, we have added a more advanced barrier elimination in 9ab3468. It does some relatively expensive computations including a localized alias analysis, so it absolutely shouldn't run as part of canonicalization. Having a simple cleanup as part of canonicalization looks good to me.

grypp · 2023-11-08T14:15:05Z

I was not aware of that there is pass, that's awesome!

It does some relatively expensive computations including a localized alias analysis, so it absolutely shouldn't run as part of canonicalization.

Why shouldn't we run this as a part of the canonization? There is one and only reason we use GPUs and it is performance. This pass sounds like essential to me.

spaceotter · 2023-11-09T03:15:01Z

@ftynse Yes I am aware of that code. I've opened a separate PR to make that usable as a pass. #71762

ftynse · 2023-11-09T08:51:57Z

Why shouldn't we run this as a part of the canonization? There is one and only reason we use GPUs and it is performance.

It doesn't mean it should run as part of canonicalization. The purpose of canonicalization isn't to make the generated code fast, it is to make it easier for the compiler to reason about the code by bringing it into the canonical form (even though I have doubts about existence of such a form in MLIR). Since it is called repeatedly in the pipeline as generalized cleanup, we don't want canonicalization patterns to be expensive to apply. This pattern can still be applied by a separate pass, just not canonicalizaiton.

grypp · 2023-11-09T14:27:06Z

I'd say that removing barriers can be considered a part of canonicalization, which helps the compiler reason about the code (we have passes that acts based on number of barriers and etc.). However, I agree that we should not be calling an expensive pass too frequently in the pipeline.

I found this discussion valuable. Thanks for taking the time to discuss it.

Adds a canonicalizer for gpu.barrier that gets rid of duplicates. Co-authored-by: Eric Eaton <[email protected]>

llvmbot added mlir:gpu mlir labels Nov 7, 2023

antiagainst requested changes Nov 7, 2023

View reviewed changes

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp Outdated Show resolved Hide resolved

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp Outdated Show resolved Hide resolved

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp Outdated Show resolved Hide resolved

spaceotter force-pushed the dedup branch 2 times, most recently from b81301e to c0c473e Compare November 7, 2023 21:45

spaceotter force-pushed the dedup branch 2 times, most recently from a3bd73a to 0088612 Compare November 7, 2023 21:52

antiagainst approved these changes Nov 7, 2023

View reviewed changes

ftynse reviewed Nov 8, 2023

View reviewed changes

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp Outdated Show resolved Hide resolved

spaceotter force-pushed the dedup branch from 0088612 to 05245e0 Compare November 8, 2023 19:30

[mlir][gpu] Eliminate redundant gpu.barrier ops

8beedcd

spaceotter force-pushed the dedup branch from 05245e0 to 8beedcd Compare November 8, 2023 19:35

joker-eph approved these changes Nov 9, 2023

View reviewed changes

qedawkins merged commit 51af040 into llvm:main Nov 9, 2023

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023

[mlir][gpu] Eliminate redundant gpu.barrier ops (llvm#71575)

086e56b

Adds a canonicalizer for gpu.barrier that gets rid of duplicates. Co-authored-by: Eric Eaton <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][gpu] Eliminate redundant gpu.barrier ops #71575

[mlir][gpu] Eliminate redundant gpu.barrier ops #71575

spaceotter commented Nov 7, 2023

Uh oh!

llvmbot commented Nov 7, 2023 •

edited

Loading

Uh oh!

grypp commented Nov 7, 2023 •

edited

Loading

Uh oh!

nirvedhmeshram commented Nov 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antiagainst commented Nov 7, 2023

Uh oh!

spaceotter commented Nov 7, 2023

Uh oh!

github-actions bot commented Nov 7, 2023 •

edited

Loading

Uh oh!

grypp commented Nov 8, 2023

Uh oh!

Uh oh!

ftynse commented Nov 8, 2023

Uh oh!

grypp commented Nov 8, 2023 •

edited

Loading

Uh oh!

spaceotter commented Nov 9, 2023

Uh oh!

ftynse commented Nov 9, 2023

Uh oh!

grypp commented Nov 9, 2023

Uh oh!

Uh oh!

[mlir][gpu] Eliminate redundant gpu.barrier ops #71575

[mlir][gpu] Eliminate redundant gpu.barrier ops #71575

Conversation

spaceotter commented Nov 7, 2023

Uh oh!

llvmbot commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grypp commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirvedhmeshram commented Nov 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antiagainst commented Nov 7, 2023

Uh oh!

spaceotter commented Nov 7, 2023

Uh oh!

github-actions bot commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grypp commented Nov 8, 2023

Uh oh!

Uh oh!

ftynse commented Nov 8, 2023

Uh oh!

grypp commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spaceotter commented Nov 9, 2023

Uh oh!

ftynse commented Nov 9, 2023

Uh oh!

grypp commented Nov 9, 2023

Uh oh!

Uh oh!

llvmbot commented Nov 7, 2023 •

edited

Loading

grypp commented Nov 7, 2023 •

edited

Loading

github-actions bot commented Nov 7, 2023 •

edited

Loading

grypp commented Nov 8, 2023 •

edited

Loading