Skip to content

[mlir] fix a crash when lower parallel loop to gpu (#75811) #75946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 20, 2023

Conversation

lipracer
Copy link
Member

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Dec 19, 2023

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: long.chen (lipracer)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/75946.diff

2 Files Affected:

  • (modified) mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp (+2-1)
  • (modified) mlir/test/Conversion/SCFToGPU/parallel_loop.mlir (+44)
diff --git a/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp b/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp
index c2218b7656a9b6..2bcd082fb3e82f 100644
--- a/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp
+++ b/mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp
@@ -456,7 +456,8 @@ static LogicalResult processParallelLoop(
               rewriter.getAffineSymbolExpr(1));
       newIndex = rewriter.create<AffineApplyOp>(
           loc, annotation.getMap().compose(lowerAndStep),
-          ValueRange{operand, step, lowerBound});
+          ValueRange{operand, ensureLaunchIndependent(step),
+                     ensureLaunchIndependent(lowerBound)});
       // If there was also a bound, insert that, too.
       // TODO: Check that we do not assign bounds twice.
       if (annotation.getBound()) {
diff --git a/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir b/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir
index 734961ecfdde1b..deeaec2f81a94e 100644
--- a/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir
+++ b/mlir/test/Conversion/SCFToGPU/parallel_loop.mlir
@@ -384,3 +384,47 @@ func.func @parallel_no_annotations(%arg0 : index, %arg1 : index, %arg2 : index,
 
 // CHECK-LABEL: @parallel_no_annotations
 // CHECK: scf.parallel
+
+// -----
+
+// CHECK-LABEL: @step_invariant
+func.func @step_invariant() {
+  %alloc = memref.alloc() : memref<1x1xf64>
+  %alloc_0 = memref.alloc() : memref<1x1xf64>
+  %alloc_1 = memref.alloc() : memref<1x1xf64>
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %c1_2 = arith.constant 1 : index
+  scf.parallel (%arg0) = (%c0) to (%c1) step (%c1_2) {
+    %c0_3 = arith.constant 0 : index
+    %c1_4 = arith.constant 1 : index
+    %c1_5 = arith.constant 1 : index
+    scf.parallel (%arg1) = (%c0_3) to (%c1_4) step (%c1_5) {
+      %0 = memref.load %alloc_1[%arg0, %arg1] : memref<1x1xf64>
+      %1 = memref.load %alloc_0[%arg0, %arg1] : memref<1x1xf64>
+      %2 = arith.addf %0, %1 : f64
+      memref.store %2, %alloc[%arg0, %arg1] : memref<1x1xf64>
+      scf.yield
+    } {mapping = [#gpu.loop_dim_map<processor = thread_x, map = (d0) -> (d0), bound = (d0) -> (d0)>]}
+    scf.yield
+  } {mapping = [#gpu.loop_dim_map<processor = block_x, map = (d0) -> (d0), bound = (d0) -> (d0)>]}
+  memref.dealloc %alloc_1 : memref<1x1xf64>
+  memref.dealloc %alloc_0 : memref<1x1xf64>
+  memref.dealloc %alloc : memref<1x1xf64>
+  return
+}
+
+// CHECK: %[[alloc_0:.*]] = memref.alloc() : memref<1x1xf64>
+// CHECK: %[[alloc_1:.*]] = memref.alloc() : memref<1x1xf64>
+// CHECK: %[[alloc_2:.*]] = memref.alloc() : memref<1x1xf64>
+// CHECK: %[[map_0:.*]] = affine.apply #map({{.*}})[{{.*}}, {{.*}}]
+// CHECK: %[[map_1:.*]] = affine.apply #map({{.*}})[{{.*}}, {{.*}}]
+// CHECK: gpu.launch
+// CHECK-SAME: blocks(%[[arg_0:.*]], %{{[^)]*}}, %{{[^)]*}}) in (%{{[^)]*}} = %[[map_0]], %{{[^)]*}} = %{{[^)]*}}, %{{[^)]*}} = %{{[^)]*}})
+// CHECK-SAME: threads(%[[arg_3:.*]], %{{[^)]*}}, %{{[^)]*}}) in (%{{[^)]*}} = %[[map_1]], %{{[^)]*}} = %{{[^)]*}}, %{{[^)]*}} = %{{[^)]*}})
+// CHECK: %[[dim0:.*]] = affine.apply #map1(%[[arg_0]])[{{.*}}, {{.*}}]
+// CHECK: %[[dim1:.*]] = affine.apply #map1(%[[arg_3]])[{{.*}}, {{.*}}]
+// CHECK: %[[lhs:.*]] = memref.load %[[alloc_2]][%[[dim0]], %[[dim1]]] : memref<1x1xf64>
+// CHECK: %[[rhs:.*]] = memref.load %[[alloc_1]][%[[dim0]], %[[dim1]]] : memref<1x1xf64>
+// CHECK: %[[sum:.*]] = arith.addf %[[lhs]], %[[rhs]] : f64
+// CHECK: memref.store %[[sum]], %[[alloc_0]][%[[dim0]], %[[dim1]]] : memref<1x1xf64>

@lipracer lipracer merged commit 227bfa1 into llvm:main Dec 20, 2023
@lipracer lipracer deleted the tmp_dev branch December 20, 2023 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants