[mlir][vectorize] Support affine.apply in SuperVectorize #77968

Hsiangkai · 2024-01-12T19:36:23Z

We have no need to vectorize affine.apply inside the vectorizing loop. However, we still need to generate it in the original scalar form. We have to replace all its operands with the generated scalar operands in the vectorizing loop, e.g., induction variables.

llvmbot · 2024-01-12T19:36:50Z

@llvm/pr-subscribers-mlir-affine

Author: Hsiangkai Wang (Hsiangkai)

Changes

We have no need to vectorize affine.apply inside the vectorizing loop. However, we still need to generate it in the original scalar form. We have to replace all its operands with the generated scalar operands in the vectorizing loop, e.g., induction variables.

Full diff: https://github.com/llvm/llvm-project/pull/77968.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp (+27-4)
(added) mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir (+23)

diff --git a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
index 6b7a157925fae1..3b70618fe151c5 100644
--- a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
+++ b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
@@ -721,8 +721,7 @@ struct VectorizationState {
   /// Example:
   ///   * 'replaced': induction variable of a loop to be vectorized.
   ///   * 'replacement': new induction variable in the new vector loop.
-  void registerValueScalarReplacement(BlockArgument replaced,
-                                      BlockArgument replacement);
+  void registerValueScalarReplacement(Value replaced, Value replacement);
 
   /// Registers the scalar replacement of a scalar result returned from a
   /// reduction loop. 'replacement' must be scalar.
@@ -854,8 +853,8 @@ void VectorizationState::registerValueVectorReplacementImpl(Value replaced,
 /// Example:
 ///   * 'replaced': induction variable of a loop to be vectorized.
 ///   * 'replacement': new induction variable in the new vector loop.
-void VectorizationState::registerValueScalarReplacement(
-    BlockArgument replaced, BlockArgument replacement) {
+void VectorizationState::registerValueScalarReplacement(Value replaced,
+                                                        Value replacement) {
   registerValueScalarReplacementImpl(replaced, replacement);
 }
 
@@ -978,6 +977,28 @@ static arith::ConstantOp vectorizeConstant(arith::ConstantOp constOp,
   return newConstOp;
 }
 
+/// We have no need to vectorize affine.apply. However, we still need to
+/// generate it and replace the operands with values in valueScalarReplacement.
+static Operation *vectorizeAffineApplyOp(AffineApplyOp applyOp,
+                                         VectorizationState &state) {
+  SmallVector<Value, 8> updatedOperands;
+  for (Value operand : applyOp.getOperands()) {
+    Value updatedOperand = operand;
+    if (state.valueScalarReplacement.contains(operand)) {
+      updatedOperand = state.valueScalarReplacement.lookupOrDefault(operand);
+    }
+    updatedOperands.push_back(updatedOperand);
+  }
+
+  auto newApplyOp = state.builder.create<AffineApplyOp>(
+      applyOp.getLoc(), applyOp.getAffineMap(), updatedOperands);
+
+  // Register the new affine.apply result.
+  state.registerValueScalarReplacement(applyOp.getResult(),
+                                       newApplyOp.getResult());
+  return newApplyOp;
+}
+
 /// Creates a constant vector filled with the neutral elements of the given
 /// reduction. The scalar type of vector elements will be taken from
 /// `oldOperand`.
@@ -1493,6 +1514,8 @@ static Operation *vectorizeOneOperation(Operation *op,
     return vectorizeAffineYieldOp(yieldOp, state);
   if (auto constant = dyn_cast<arith::ConstantOp>(op))
     return vectorizeConstant(constant, state);
+  if (auto applyOp = dyn_cast<AffineApplyOp>(op))
+    return vectorizeAffineApplyOp(applyOp, state);
 
   // Other ops with regions are not supported.
   if (op->getNumRegions() != 0)
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
new file mode 100644
index 00000000000000..588663e1f97b61
--- /dev/null
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
@@ -0,0 +1,23 @@
+// RUN: mlir-opt %s -affine-super-vectorize="virtual-vector-size=8 test-fastest-varying=0" -split-input-file | FileCheck %s
+
+// CHECK-DAG: #[[$map_id0:map[0-9a-zA-Z_]*]] = affine_map<(d0) -> (d0 mod 12)>
+// CHECK-DAG: #[[$map_id1:map[0-9a-zA-Z_]*]] = affine_map<(d0) -> (d0 mod 16)>
+
+// CHECK-LABEL: func @vec_affine_apply
+func.func @vec_affine_apply(%arg0: memref<8x12x16xf32>, %arg1: memref<8x24x48xf32>) {
+  affine.for %arg2 = 0 to 8 {
+// CHECK: affine.for %[[S0:.*]] = 0 to 24 {
+// CHECK-NEXT: affine.for %[[S1:.*]] = 0 to 48 step 8 {
+    affine.for %arg3 = 0 to 24 {
+      affine.for %arg4 = 0 to 48 {
+// CHECK-NEXT: affine.apply #[[$map_id0]](%[[S0]])
+// CHECK-NEXT: affine.apply #[[$map_id1]](%[[S1]])
+        %0 = affine.apply affine_map<(d0) -> (d0 mod 12)>(%arg3)
+        %1 = affine.apply affine_map<(d0) -> (d0 mod 16)>(%arg4)
+        %2 = affine.load %arg0[%arg2, %0, %1] : memref<8x12x16xf32>
+        affine.store %2, %arg1[%arg2, %arg3, %arg4] : memref<8x24x48xf32>
+      }
+    }
+  }
+  return
+}

llvmbot · 2024-01-12T19:36:50Z

@llvm/pr-subscribers-mlir

Author: Hsiangkai Wang (Hsiangkai)

Changes

We have no need to vectorize affine.apply inside the vectorizing loop. However, we still need to generate it in the original scalar form. We have to replace all its operands with the generated scalar operands in the vectorizing loop, e.g., induction variables.

Full diff: https://github.com/llvm/llvm-project/pull/77968.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp (+27-4)
(added) mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir (+23)

diff --git a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
index 6b7a157925fae1..3b70618fe151c5 100644
--- a/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
+++ b/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
@@ -721,8 +721,7 @@ struct VectorizationState {
   /// Example:
   ///   * 'replaced': induction variable of a loop to be vectorized.
   ///   * 'replacement': new induction variable in the new vector loop.
-  void registerValueScalarReplacement(BlockArgument replaced,
-                                      BlockArgument replacement);
+  void registerValueScalarReplacement(Value replaced, Value replacement);
 
   /// Registers the scalar replacement of a scalar result returned from a
   /// reduction loop. 'replacement' must be scalar.
@@ -854,8 +853,8 @@ void VectorizationState::registerValueVectorReplacementImpl(Value replaced,
 /// Example:
 ///   * 'replaced': induction variable of a loop to be vectorized.
 ///   * 'replacement': new induction variable in the new vector loop.
-void VectorizationState::registerValueScalarReplacement(
-    BlockArgument replaced, BlockArgument replacement) {
+void VectorizationState::registerValueScalarReplacement(Value replaced,
+                                                        Value replacement) {
   registerValueScalarReplacementImpl(replaced, replacement);
 }
 
@@ -978,6 +977,28 @@ static arith::ConstantOp vectorizeConstant(arith::ConstantOp constOp,
   return newConstOp;
 }
 
+/// We have no need to vectorize affine.apply. However, we still need to
+/// generate it and replace the operands with values in valueScalarReplacement.
+static Operation *vectorizeAffineApplyOp(AffineApplyOp applyOp,
+                                         VectorizationState &state) {
+  SmallVector<Value, 8> updatedOperands;
+  for (Value operand : applyOp.getOperands()) {
+    Value updatedOperand = operand;
+    if (state.valueScalarReplacement.contains(operand)) {
+      updatedOperand = state.valueScalarReplacement.lookupOrDefault(operand);
+    }
+    updatedOperands.push_back(updatedOperand);
+  }
+
+  auto newApplyOp = state.builder.create<AffineApplyOp>(
+      applyOp.getLoc(), applyOp.getAffineMap(), updatedOperands);
+
+  // Register the new affine.apply result.
+  state.registerValueScalarReplacement(applyOp.getResult(),
+                                       newApplyOp.getResult());
+  return newApplyOp;
+}
+
 /// Creates a constant vector filled with the neutral elements of the given
 /// reduction. The scalar type of vector elements will be taken from
 /// `oldOperand`.
@@ -1493,6 +1514,8 @@ static Operation *vectorizeOneOperation(Operation *op,
     return vectorizeAffineYieldOp(yieldOp, state);
   if (auto constant = dyn_cast<arith::ConstantOp>(op))
     return vectorizeConstant(constant, state);
+  if (auto applyOp = dyn_cast<AffineApplyOp>(op))
+    return vectorizeAffineApplyOp(applyOp, state);
 
   // Other ops with regions are not supported.
   if (op->getNumRegions() != 0)
diff --git a/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
new file mode 100644
index 00000000000000..588663e1f97b61
--- /dev/null
+++ b/mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir
@@ -0,0 +1,23 @@
+// RUN: mlir-opt %s -affine-super-vectorize="virtual-vector-size=8 test-fastest-varying=0" -split-input-file | FileCheck %s
+
+// CHECK-DAG: #[[$map_id0:map[0-9a-zA-Z_]*]] = affine_map<(d0) -> (d0 mod 12)>
+// CHECK-DAG: #[[$map_id1:map[0-9a-zA-Z_]*]] = affine_map<(d0) -> (d0 mod 16)>
+
+// CHECK-LABEL: func @vec_affine_apply
+func.func @vec_affine_apply(%arg0: memref<8x12x16xf32>, %arg1: memref<8x24x48xf32>) {
+  affine.for %arg2 = 0 to 8 {
+// CHECK: affine.for %[[S0:.*]] = 0 to 24 {
+// CHECK-NEXT: affine.for %[[S1:.*]] = 0 to 48 step 8 {
+    affine.for %arg3 = 0 to 24 {
+      affine.for %arg4 = 0 to 48 {
+// CHECK-NEXT: affine.apply #[[$map_id0]](%[[S0]])
+// CHECK-NEXT: affine.apply #[[$map_id1]](%[[S1]])
+        %0 = affine.apply affine_map<(d0) -> (d0 mod 12)>(%arg3)
+        %1 = affine.apply affine_map<(d0) -> (d0 mod 16)>(%arg4)
+        %2 = affine.load %arg0[%arg2, %0, %1] : memref<8x12x16xf32>
+        affine.store %2, %arg1[%arg2, %arg3, %arg4] : memref<8x24x48xf32>
+      }
+    }
+  }
+  return
+}

Hsiangkai · 2024-01-30T23:15:42Z

Ping.

dcaballe · 2024-02-01T00:50:40Z

I barely remember this code but IIRC we were able to deal with affine.apply ops. @sergei-grechanik ?

Hsiangkai · 2024-02-01T17:19:33Z

I barely remember this code but IIRC we were able to deal with affine.apply ops. @sergei-grechanik ?

In the current implementation, the vectorizer will fail. The debug log is

******************************************
******************************************
[early-vect] new pattern on parent op
func.func @vec_affine_apply(%arg0: memref<8x12x16xf32>, %arg1: memref<8x24x48xf32>) {
  affine.for %arg2 = 0 to 8 {
    affine.for %arg3 = 0 to 24 {
      affine.for %arg4 = 0 to 48 {
        %0 = affine.apply affine_map<(d0) -> (d0 mod 12)>(%arg3)
        %1 = affine.apply affine_map<(d0) -> (d0 mod 16)>(%arg4)
        %2 = affine.load %arg0[%arg2, %0, %1] : memref<8x12x16xf32>
        affine.store %2, %arg1[%arg2, %arg3, %arg4] : memref<8x24x48xf32>
      }
    }
  }
  return
}
[early-vect]+++++ Vectorizing: affine.for %arg4 = 0 to 48 {
  %0 = affine.apply affine_map<(d0) -> (d0 mod 12)>(%arg3)
  %1 = affine.apply affine_map<(d0) -> (d0 mod 16)>(%arg4)
  %2 = affine.load %arg0[%arg2, %0, %1] : memref<8x12x16xf32>
  affine.store %2, %arg1[%arg2, %arg3, %arg4] : memref<8x24x48xf32>
}
[early-vect]+++++ commit vectorized op:
"affine.for"() <{lowerBoundMap = affine_map<() -> (0)>, operandSegmentSizes = array<i32: 0, 0, 0>, step = 1 : index, upperBoundMap = affine_map<() -> (48)>}> ({
^bb0(%arg4: index):
  %0 = "affine.apply"(%arg3) <{map = affine_map<(d0) -> (d0 mod 12)>}> : (index) -> index
  %1 = "affine.apply"(%arg4) <{map = affine_map<(d0) -> (d0 mod 16)>}> : (index) -> index
  %2 = "affine.load"(%arg0, %arg2, %0, %1) <{map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>}> : (memref<8x12x16xf32>, index, index, index) -> f32
  "affine.store"(%2, %arg1, %arg2, %arg3, %arg4) <{map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>}> : (f32, memref<8x24x48xf32>, index, index, index) -> ()
  "affine.yield"() : () -> ()
}) : () -> ()
into
"affine.for"() <{lowerBoundMap = affine_map<() -> (0)>, operandSegmentSizes = array<i32: 0, 0, 0>, step = 8 : index, upperBoundMap = affine_map<() -> (48)>}> ({
^bb0(%arg4: index):
}) : () -> ()
[early-vect]+++++ Vectorizing: %0 = "affine.apply"(%arg3) <{map = affine_map<(d0) -> (d0 mod 12)>}> : (index) -> index
[early-vect]+++++ vectorize operand: <block argument> of type 'index' at index: 0-> uniform: %0 = "vector.broadcast"(%arg3) : (index) -> vector<8xindex>
[early-vect]+++++ commit vectorized op:
%1 = "affine.apply"(%arg3) <{map = affine_map<(d0) -> (d0 mod 12)>}> : (index) -> index
into
%1 = "affine.apply"(%0) <{map = affine_map<(d0) -> (d0 mod 12)>}> : (vector<8xindex>) -> vector<8xindex>
[early-vect]+++++ Vectorizing: %2 = "affine.apply"(%arg4) <{map = affine_map<(d0) -> (d0 mod 16)>}> : (index) -> index
[early-vect]+++++ vectorize operand: <block argument> of type 'index' at index: 0-> unsupported block argument

[early-vect]+++++ an operand failed vectorize
[early-vect]+++++ failed vectorizing the operation: %2 = "affine.apply"(%arg4) <{map = affine_map<(d0) -> (d0 mod 16)>}> : (index) -> index
[early-vect]+++++ failed vectorization for: "affine.for"() <{lowerBoundMap = affine_map<() -> (0)>, operandSegmentSizes = array<i32: 0, 0, 0>, step = 1 : index, upperBoundMap = affine_map<() -> (48)>}> ({
^bb0(%arg4: index):
  %1 = "affine.apply"(%arg3) <{map = affine_map<(d0) -> (d0 mod 12)>}> : (index) -> index
  %2 = "affine.apply"(%arg4) <{map = affine_map<(d0) -> (d0 mod 16)>}> : (index) -> index
  %3 = "affine.load"(%arg0, %arg2, %1, %2) <{map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>}> : (memref<8x12x16xf32>, index, index, index) -> f32
  "affine.store"(%3, %arg1, %arg2, %arg3, %arg4) <{map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>}> : (f32, memref<8x24x48xf32>, index, index, index) -> ()
  "affine.yield"() : () -> ()
}) : () -> ()
[early-vect]+++++ erasing:
"affine.for"() <{lowerBoundMap = affine_map<() -> (0)>, operandSegmentSizes = array<i32: 0, 0, 0>, step = 8 : index, upperBoundMap = affine_map<() -> (48)>}> ({
^bb0(%arg4: index):
  %1 = "affine.apply"(%0) <{map = affine_map<(d0) -> (d0 mod 12)>}> : (vector<8xindex>) -> vector<8xindex>
}) : () -> ()

sergei-grechanik

I'm not sure about handling affine.apply like this because it's recommended to fold affine.apply ops into load/store ops anyway.

mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir

mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp

mlir/test/Dialect/Affine/SuperVectorize/vectorize_affine_apply.mlir

Hsiangkai · 2024-02-13T20:47:54Z

Ping.

sergei-grechanik · 2024-02-14T04:18:53Z

I think the crash I've mentioned should be fixed, otherwise it would be a regression (I think it doesn't crash without the patch, just avoids vectorization).

Hsiangkai · 2024-02-15T16:20:11Z

I think the crash I've mentioned should be fixed, otherwise it would be a regression (I think it doesn't crash without the patch, just avoids vectorization).

I fixed it and added two more test cases to verify it. Please help me to review this patch. Thank you.

sergei-grechanik

a minor nit, otherwise lgtm

mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp

We have no need to vectorize affine.apply inside the vectorizing loop. However, we still need to generate it in the original scalar form. We have to replace all its operands with the generated scalar operands in the vectorizing loop, e.g., induction variables.

Hsiangkai requested review from nicolasvasilache, dcaballe, rikhuijzer and River707 January 12, 2024 19:36

llvmbot added mlir:affine mlir labels Jan 12, 2024

dcaballe requested a review from sergei-grechanik February 1, 2024 00:49

Hsiangkai force-pushed the add-affine-apply-vectorize branch from 999d525 to 7bdcc9f Compare February 1, 2024 22:54

sergei-grechanik suggested changes Feb 2, 2024

View reviewed changes

Hsiangkai force-pushed the add-affine-apply-vectorize branch from 7bdcc9f to 24f52eb Compare February 5, 2024 23:09

Hsiangkai force-pushed the add-affine-apply-vectorize branch from 24f52eb to 788a7f1 Compare February 15, 2024 16:18

sergei-grechanik approved these changes Feb 16, 2024

View reviewed changes

mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp Outdated Show resolved Hide resolved

Hsiangkai force-pushed the add-affine-apply-vectorize branch from 788a7f1 to f216992 Compare February 16, 2024 09:59

Hsiangkai merged commit 181d960 into llvm:main Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][vectorize] Support affine.apply in SuperVectorize #77968

[mlir][vectorize] Support affine.apply in SuperVectorize #77968

Uh oh!

Hsiangkai commented Jan 12, 2024

Uh oh!

llvmbot commented Jan 12, 2024

Uh oh!

llvmbot commented Jan 12, 2024

Uh oh!

Hsiangkai commented Jan 30, 2024

Uh oh!

dcaballe commented Feb 1, 2024

Uh oh!

Hsiangkai commented Feb 1, 2024

Uh oh!

sergei-grechanik left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hsiangkai commented Feb 13, 2024

Uh oh!

sergei-grechanik commented Feb 14, 2024

Uh oh!

Hsiangkai commented Feb 15, 2024 •

edited

Loading

Uh oh!

sergei-grechanik left a comment

Uh oh!

Uh oh!

Uh oh!

[mlir][vectorize] Support affine.apply in SuperVectorize #77968

[mlir][vectorize] Support affine.apply in SuperVectorize #77968

Uh oh!

Conversation

Hsiangkai commented Jan 12, 2024

Uh oh!

llvmbot commented Jan 12, 2024

Uh oh!

llvmbot commented Jan 12, 2024

Uh oh!

Hsiangkai commented Jan 30, 2024

Uh oh!

dcaballe commented Feb 1, 2024

Uh oh!

Hsiangkai commented Feb 1, 2024

Uh oh!

sergei-grechanik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hsiangkai commented Feb 13, 2024

Uh oh!

sergei-grechanik commented Feb 14, 2024

Uh oh!

Hsiangkai commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergei-grechanik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Hsiangkai commented Feb 15, 2024 •

edited

Loading