-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MLIR][Affine] Fix affine data copy generate for zero-ranked memrefs #129186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR][Affine] Fix affine data copy generate for zero-ranked memrefs #129186
Conversation
Fix affine data copy generate for zero-ranked memrefs. Fixes: llvm#122210 and llvm#61167 Test cases borrowed from https://reviews.llvm.org/D147298, authored by Lewuathe <Kai Sasaki>.
@llvm/pr-subscribers-mlir Author: Uday Bondhugula (bondhugula) ChangesFix affine data copy generate for zero-ranked memrefs. Test cases borrowed from https://reviews.llvm.org/D147298, authored by Full diff: https://github.com/llvm/llvm-project/pull/129186.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp b/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp
index 82b96e9876a6f..b6ac44611a866 100644
--- a/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp
+++ b/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp
@@ -1828,14 +1828,14 @@ static void getMultiLevelStrides(const MemRefRegion ®ion,
}
}
-/// Generates a point-wise copy from/to `memref' to/from `fastMemRef' and
-/// returns the outermost AffineForOp of the copy loop nest. `lbMaps` and
-/// `ubMaps` along with `lbOperands` and `ubOperands` hold the lower and upper
-/// bound information for the copy loop nest. `fastBufOffsets` contain the
-/// expressions to be subtracted out from the respective copy loop iterators in
-/// order to index the fast buffer. If `copyOut' is true, generates a copy-out;
-/// otherwise a copy-in. Builder `b` should be set to the point the copy nest is
-/// inserted.
+/// Generates a point-wise copy from/to a non-zero ranked `memref' to/from
+/// `fastMemRef' and returns the outermost AffineForOp of the copy loop nest.
+/// `lbMaps` and `ubMaps` along with `lbOperands` and `ubOperands` hold the
+/// lower and upper bound information for the copy loop nest. `fastBufOffsets`
+/// contain the expressions to be subtracted out from the respective copy loop
+/// iterators in order to index the fast buffer. If `copyOut' is true, generates
+/// a copy-out; otherwise a copy-in. Builder `b` should be set to the point the
+/// copy nest is inserted.
//
/// The copy-in nest is generated as follows as an example for a 2-d region:
/// for x = ...
@@ -1856,6 +1856,8 @@ generatePointWiseCopy(Location loc, Value memref, Value fastMemRef,
}));
unsigned rank = cast<MemRefType>(memref.getType()).getRank();
+ // A copy nest can't be generated for 0-ranked memrefs.
+ assert(rank != 0 && "non-zero rank memref expected");
assert(lbMaps.size() == rank && "wrong number of lb maps");
assert(ubMaps.size() == rank && "wrong number of ub maps");
@@ -1919,19 +1921,20 @@ emitRemarkForBlock(Block &block) {
return block.getParentOp()->emitRemark();
}
-/// Creates a buffer in the faster memory space for the specified memref region;
-/// generates a copy from the lower memory space to this one, and replaces all
-/// loads/stores in the block range [`begin', `end') of `block' to load/store
-/// from that buffer. Returns failure if copies could not be generated due to
-/// yet unimplemented cases. `copyInPlacementStart` and `copyOutPlacementStart`
-/// in copyPlacementBlock specify the insertion points where the incoming copies
-/// and outgoing copies, respectively, should be inserted (the insertion happens
-/// right before the insertion point). Since `begin` can itself be invalidated
-/// due to the memref rewriting done from this method, the output argument
-/// `nBegin` is set to its replacement (set to `begin` if no invalidation
-/// happens). Since outgoing copies could have been inserted at `end`, the
-/// output argument `nEnd` is set to the new end. `sizeInBytes` is set to the
-/// size of the fast buffer allocated.
+/// Creates a buffer in the faster memory space for the specified memref region
+/// (memref has to be non-zero ranked); generates a copy from the lower memory
+/// space to this one, and replaces all loads/stores in the block range
+/// [`begin', `end') of `block' to load/store from that buffer. Returns failure
+/// if copies could not be generated due to yet unimplemented cases.
+/// `copyInPlacementStart` and `copyOutPlacementStart` in copyPlacementBlock
+/// specify the insertion points where the incoming copies and outgoing copies,
+/// respectively, should be inserted (the insertion happens right before the
+/// insertion point). Since `begin` can itself be invalidated due to the memref
+/// rewriting done from this method, the output argument `nBegin` is set to its
+/// replacement (set to `begin` if no invalidation happens). Since outgoing
+/// copies could have been inserted at `end`, the output argument `nEnd` is set
+/// to the new end. `sizeInBytes` is set to the size of the fast buffer
+/// allocated.
static LogicalResult generateCopy(
const MemRefRegion ®ion, Block *block, Block::iterator begin,
Block::iterator end, Block *copyPlacementBlock,
@@ -1982,6 +1985,11 @@ static LogicalResult generateCopy(
SmallVector<Value, 4> bufIndices;
unsigned rank = memRefType.getRank();
+ if (rank == 0) {
+ LLVM_DEBUG(llvm::dbgs() << "Non-zero ranked memrefs supported\n");
+ return failure();
+ }
+
SmallVector<int64_t, 4> fastBufferShape;
// Compute the extents of the buffer.
diff --git a/mlir/test/Dialect/Affine/affine-data-copy.mlir b/mlir/test/Dialect/Affine/affine-data-copy.mlir
index 5615acae5ecc4..26eef0a7925a7 100644
--- a/mlir/test/Dialect/Affine/affine-data-copy.mlir
+++ b/mlir/test/Dialect/Affine/affine-data-copy.mlir
@@ -354,3 +354,68 @@ func.func @arbitrary_memory_space() {
}
return
}
+
+// CHECK-LABEL: zero_ranked
+func.func @zero_ranked(%3:memref<480xi1>) {
+ %false = arith.constant false
+ %4 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ affine.store %false, %4[] : memref<i1>
+ %5 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ memref.copy %4, %5 : memref<i1> to memref<i1>
+ affine.for %arg0 = 0 to 480 {
+ %11 = affine.load %3[%arg0] : memref<480xi1>
+ %12 = affine.load %5[] : memref<i1>
+ %13 = arith.cmpi slt, %11, %12 : i1
+ %14 = arith.select %13, %11, %12 : i1
+ affine.store %14, %5[] : memref<i1>
+ }
+ return
+}
+
+// CHECK-LABEL: func @scalar_memref_copy_without_dma
+func.func @scalar_memref_copy_without_dma() {
+ %false = arith.constant false
+ %4 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ affine.store %false, %4[] : memref<i1>
+
+ // CHECK: %[[FALSE:.*]] = arith.constant false
+ // CHECK: %[[MEMREF:.*]] = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ // CHECK: affine.store %[[FALSE]], %[[MEMREF]][] : memref<i1>
+ return
+}
+
+// CHECK-LABEL: func @scalar_memref_copy_in_loop
+func.func @scalar_memref_copy_in_loop(%3:memref<480xi1>) {
+ %false = arith.constant false
+ %4 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ affine.store %false, %4[] : memref<i1>
+ %5 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ memref.copy %4, %5 : memref<i1> to memref<i1>
+ affine.for %arg0 = 0 to 480 {
+ %11 = affine.load %3[%arg0] : memref<480xi1>
+ %12 = affine.load %5[] : memref<i1>
+ %13 = arith.cmpi slt, %11, %12 : i1
+ %14 = arith.select %13, %11, %12 : i1
+ affine.store %14, %5[] : memref<i1>
+ }
+
+ // CHECK: %[[FALSE:.*]] = arith.constant false
+ // CHECK: %[[MEMREF:.*]] = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ // CHECK: affine.store %[[FALSE]], %[[MEMREF]][] : memref<i1>
+ // CHECK: %[[TARGET:.*]] = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ // CHECK: memref.copy %alloc, %[[TARGET]] : memref<i1> to memref<i1>
+ // CHECK: %[[FAST_MEMREF:.*]] = memref.alloc() : memref<480xi1>
+ // CHECK: affine.for %{{.*}} = 0 to 480 {
+ // CHECK: %{{.*}} = affine.load %arg0[%{{.*}}] : memref<480xi1>
+ // CHECK: affine.store %{{.*}}, %[[FAST_MEMREF]][%{{.*}}] : memref<480xi1>
+ // CHECK: }
+ // CHECK: affine.for %arg1 = 0 to 480 {
+ // CHECK: %[[L0:.*]] = affine.load %[[FAST_MEMREF]][%arg1] : memref<480xi1>
+ // CHECK: %[[L1:.*]] = affine.load %[[TARGET]][] : memref<i1>
+ // CHECK: %[[CMPI:.*]] = arith.cmpi slt, %[[L0]], %[[L1]] : i1
+ // CHECK: %[[SELECT:.*]] = arith.select %[[CMPI]], %[[L0]], %[[L1]] : i1
+ // CHECK: affine.store %[[SELECT]], %[[TARGET]][] : memref<i1>
+ // CHECK: }
+ // CHECK: memref.dealloc %[[FAST_MEMREF]] : memref<480xi1>
+ return
+}
|
@llvm/pr-subscribers-mlir-affine Author: Uday Bondhugula (bondhugula) ChangesFix affine data copy generate for zero-ranked memrefs. Test cases borrowed from https://reviews.llvm.org/D147298, authored by Full diff: https://github.com/llvm/llvm-project/pull/129186.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp b/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp
index 82b96e9876a6f..b6ac44611a866 100644
--- a/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp
+++ b/mlir/lib/Dialect/Affine/Utils/LoopUtils.cpp
@@ -1828,14 +1828,14 @@ static void getMultiLevelStrides(const MemRefRegion ®ion,
}
}
-/// Generates a point-wise copy from/to `memref' to/from `fastMemRef' and
-/// returns the outermost AffineForOp of the copy loop nest. `lbMaps` and
-/// `ubMaps` along with `lbOperands` and `ubOperands` hold the lower and upper
-/// bound information for the copy loop nest. `fastBufOffsets` contain the
-/// expressions to be subtracted out from the respective copy loop iterators in
-/// order to index the fast buffer. If `copyOut' is true, generates a copy-out;
-/// otherwise a copy-in. Builder `b` should be set to the point the copy nest is
-/// inserted.
+/// Generates a point-wise copy from/to a non-zero ranked `memref' to/from
+/// `fastMemRef' and returns the outermost AffineForOp of the copy loop nest.
+/// `lbMaps` and `ubMaps` along with `lbOperands` and `ubOperands` hold the
+/// lower and upper bound information for the copy loop nest. `fastBufOffsets`
+/// contain the expressions to be subtracted out from the respective copy loop
+/// iterators in order to index the fast buffer. If `copyOut' is true, generates
+/// a copy-out; otherwise a copy-in. Builder `b` should be set to the point the
+/// copy nest is inserted.
//
/// The copy-in nest is generated as follows as an example for a 2-d region:
/// for x = ...
@@ -1856,6 +1856,8 @@ generatePointWiseCopy(Location loc, Value memref, Value fastMemRef,
}));
unsigned rank = cast<MemRefType>(memref.getType()).getRank();
+ // A copy nest can't be generated for 0-ranked memrefs.
+ assert(rank != 0 && "non-zero rank memref expected");
assert(lbMaps.size() == rank && "wrong number of lb maps");
assert(ubMaps.size() == rank && "wrong number of ub maps");
@@ -1919,19 +1921,20 @@ emitRemarkForBlock(Block &block) {
return block.getParentOp()->emitRemark();
}
-/// Creates a buffer in the faster memory space for the specified memref region;
-/// generates a copy from the lower memory space to this one, and replaces all
-/// loads/stores in the block range [`begin', `end') of `block' to load/store
-/// from that buffer. Returns failure if copies could not be generated due to
-/// yet unimplemented cases. `copyInPlacementStart` and `copyOutPlacementStart`
-/// in copyPlacementBlock specify the insertion points where the incoming copies
-/// and outgoing copies, respectively, should be inserted (the insertion happens
-/// right before the insertion point). Since `begin` can itself be invalidated
-/// due to the memref rewriting done from this method, the output argument
-/// `nBegin` is set to its replacement (set to `begin` if no invalidation
-/// happens). Since outgoing copies could have been inserted at `end`, the
-/// output argument `nEnd` is set to the new end. `sizeInBytes` is set to the
-/// size of the fast buffer allocated.
+/// Creates a buffer in the faster memory space for the specified memref region
+/// (memref has to be non-zero ranked); generates a copy from the lower memory
+/// space to this one, and replaces all loads/stores in the block range
+/// [`begin', `end') of `block' to load/store from that buffer. Returns failure
+/// if copies could not be generated due to yet unimplemented cases.
+/// `copyInPlacementStart` and `copyOutPlacementStart` in copyPlacementBlock
+/// specify the insertion points where the incoming copies and outgoing copies,
+/// respectively, should be inserted (the insertion happens right before the
+/// insertion point). Since `begin` can itself be invalidated due to the memref
+/// rewriting done from this method, the output argument `nBegin` is set to its
+/// replacement (set to `begin` if no invalidation happens). Since outgoing
+/// copies could have been inserted at `end`, the output argument `nEnd` is set
+/// to the new end. `sizeInBytes` is set to the size of the fast buffer
+/// allocated.
static LogicalResult generateCopy(
const MemRefRegion ®ion, Block *block, Block::iterator begin,
Block::iterator end, Block *copyPlacementBlock,
@@ -1982,6 +1985,11 @@ static LogicalResult generateCopy(
SmallVector<Value, 4> bufIndices;
unsigned rank = memRefType.getRank();
+ if (rank == 0) {
+ LLVM_DEBUG(llvm::dbgs() << "Non-zero ranked memrefs supported\n");
+ return failure();
+ }
+
SmallVector<int64_t, 4> fastBufferShape;
// Compute the extents of the buffer.
diff --git a/mlir/test/Dialect/Affine/affine-data-copy.mlir b/mlir/test/Dialect/Affine/affine-data-copy.mlir
index 5615acae5ecc4..26eef0a7925a7 100644
--- a/mlir/test/Dialect/Affine/affine-data-copy.mlir
+++ b/mlir/test/Dialect/Affine/affine-data-copy.mlir
@@ -354,3 +354,68 @@ func.func @arbitrary_memory_space() {
}
return
}
+
+// CHECK-LABEL: zero_ranked
+func.func @zero_ranked(%3:memref<480xi1>) {
+ %false = arith.constant false
+ %4 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ affine.store %false, %4[] : memref<i1>
+ %5 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ memref.copy %4, %5 : memref<i1> to memref<i1>
+ affine.for %arg0 = 0 to 480 {
+ %11 = affine.load %3[%arg0] : memref<480xi1>
+ %12 = affine.load %5[] : memref<i1>
+ %13 = arith.cmpi slt, %11, %12 : i1
+ %14 = arith.select %13, %11, %12 : i1
+ affine.store %14, %5[] : memref<i1>
+ }
+ return
+}
+
+// CHECK-LABEL: func @scalar_memref_copy_without_dma
+func.func @scalar_memref_copy_without_dma() {
+ %false = arith.constant false
+ %4 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ affine.store %false, %4[] : memref<i1>
+
+ // CHECK: %[[FALSE:.*]] = arith.constant false
+ // CHECK: %[[MEMREF:.*]] = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ // CHECK: affine.store %[[FALSE]], %[[MEMREF]][] : memref<i1>
+ return
+}
+
+// CHECK-LABEL: func @scalar_memref_copy_in_loop
+func.func @scalar_memref_copy_in_loop(%3:memref<480xi1>) {
+ %false = arith.constant false
+ %4 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ affine.store %false, %4[] : memref<i1>
+ %5 = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ memref.copy %4, %5 : memref<i1> to memref<i1>
+ affine.for %arg0 = 0 to 480 {
+ %11 = affine.load %3[%arg0] : memref<480xi1>
+ %12 = affine.load %5[] : memref<i1>
+ %13 = arith.cmpi slt, %11, %12 : i1
+ %14 = arith.select %13, %11, %12 : i1
+ affine.store %14, %5[] : memref<i1>
+ }
+
+ // CHECK: %[[FALSE:.*]] = arith.constant false
+ // CHECK: %[[MEMREF:.*]] = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ // CHECK: affine.store %[[FALSE]], %[[MEMREF]][] : memref<i1>
+ // CHECK: %[[TARGET:.*]] = memref.alloc() {alignment = 128 : i64} : memref<i1>
+ // CHECK: memref.copy %alloc, %[[TARGET]] : memref<i1> to memref<i1>
+ // CHECK: %[[FAST_MEMREF:.*]] = memref.alloc() : memref<480xi1>
+ // CHECK: affine.for %{{.*}} = 0 to 480 {
+ // CHECK: %{{.*}} = affine.load %arg0[%{{.*}}] : memref<480xi1>
+ // CHECK: affine.store %{{.*}}, %[[FAST_MEMREF]][%{{.*}}] : memref<480xi1>
+ // CHECK: }
+ // CHECK: affine.for %arg1 = 0 to 480 {
+ // CHECK: %[[L0:.*]] = affine.load %[[FAST_MEMREF]][%arg1] : memref<480xi1>
+ // CHECK: %[[L1:.*]] = affine.load %[[TARGET]][] : memref<i1>
+ // CHECK: %[[CMPI:.*]] = arith.cmpi slt, %[[L0]], %[[L1]] : i1
+ // CHECK: %[[SELECT:.*]] = arith.select %[[CMPI]], %[[L0]], %[[L1]] : i1
+ // CHECK: affine.store %[[SELECT]], %[[TARGET]][] : memref<i1>
+ // CHECK: }
+ // CHECK: memref.dealloc %[[FAST_MEMREF]] : memref<480xi1>
+ return
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing that. It looks good to me.
…lvm#129186) Fix affine data copy generate for zero-ranked memrefs. Fixes: llvm#122210 and llvm#61167 Test cases borrowed from https://reviews.llvm.org/D147298, authored by Lewuathe <Kai Sasaki>. Co-authored-by: Kai Sasaki <[email protected]>
This also fixes #122208. |
Fix affine data copy generate for zero-ranked memrefs.
Fixes: #122210 and #61167
Test cases borrowed from https://reviews.llvm.org/D147298, authored by
Lewuathe .