[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748

AviadCo · 2024-11-19T06:23:24Z

When using dynamic loop pipelining trasform, we are not sure if we enter kernel loop (depends on dynamic input). This patch fix the possible option that the kernel upperbound might be defined by 2 values from index/unsigned int type that is result of subtracting big value from smaller one.

…erbound * When using dynamic loop pipelining trasform, we are not sure if we enter kernel loop (depends on dynamic input). This patch fix the possible option that the kernel upperbound might be defined by 2 values from index/unsigned int type that is result of subtracting big value from smaller one.

llvmbot · 2024-11-19T06:24:00Z

@llvm/pr-subscribers-mlir-scf

@llvm/pr-subscribers-mlir

Author: Aviad Cohen (AviadCo)

Changes

When using dynamic loop pipelining trasform, we are not sure if we enter kernel loop (depends on dynamic input). This patch fix the possible option that the kernel upperbound might be defined by 2 values from index/unsigned int type that is result of subtracting big value from smaller one.

Full diff: https://github.com/llvm/llvm-project/pull/116748.diff

2 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp (+9-1)
(modified) mlir/test/Dialect/SCF/loop-pipelining.mlir (+7-2)

diff --git a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
index 1b458f410af601..899fe7374dbbd0 100644
--- a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
@@ -444,7 +444,15 @@ scf::ForOp LoopPipelinerInternal::createKernelLoop(
         loc, rewriter.getIntegerAttr(t, maxStage));
     Value maxStageByStep =
         rewriter.create<arith::MulIOp>(loc, step, maxStageValue);
-    newUb = rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
+    Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
+        loc, arith::CmpIPredicate::slt, maxStageByStep, ub);
+    Value possibleNewUB =
+        rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
+    // In case of `index` or `unsigned` type, we need to make sure that the
+    // subtraction does not result in a negative value, instead we use lb
+    // to avoid entering the kernel loop.
+    newUb = rewriter.create<arith::SelectOp>(
+        loc, hasAtLeastOneIteration, possibleNewUB, forOp.getLowerBound());
   }
   auto newForOp =
       rewriter.create<scf::ForOp>(forOp.getLoc(), forOp.getLowerBound(), newUb,
diff --git a/mlir/test/Dialect/SCF/loop-pipelining.mlir b/mlir/test/Dialect/SCF/loop-pipelining.mlir
index c879c83275bf86..d13d258412f2d0 100644
--- a/mlir/test/Dialect/SCF/loop-pipelining.mlir
+++ b/mlir/test/Dialect/SCF/loop-pipelining.mlir
@@ -770,7 +770,10 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //    CHECK-DAG:   %[[C2:.*]] = arith.constant 2 : index
 //    CHECK-DAG:   %[[C0:.*]] = arith.constant 0 : index
 //    CHECK-DAG:   %[[CM1:.*]] = arith.constant -1 : index
-//        CHECK:   %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
+//        CHECK:   %[[IT2_UB:.*]] = arith.muli %[[STEP:.*]], %[[C2:.*]]
+//        CHECK:   %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[IT2_UB:.*]], %[[UB:.*]]
+//        CHECK:   %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[IT2_UB:.*]]
+//        CHECK:   %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
 //        CHECK:   %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
 //        CHECK:       memref.store %[[ARG6]], %{{.*}}[%[[ARG5]]]
 //        CHECK:       %[[ADDF_24:.*]] = arith.addf %[[ARG7]], %{{.*}}
@@ -844,7 +847,9 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
 //   CHECK-DAG:   %[[C0:.*]] = arith.constant 0 : index
 //   CHECK-DAG:   %[[CM1:.*]] = arith.constant -1 : index
 //   CHECK-DAG:   %[[CF0:.*]] = arith.constant 0.000000e+00
-//       CHECK:   %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
+//       CHECK:   %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[STEP:.*]], %[[UB:.*]]
+//       CHECK:   %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[STEP:.*]]
+//       CHECK:   %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
 //       CHECK:   %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
 //       CHECK:       %[[ADDF_13:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
 //       CHECK:       %[[MULF_14:.*]] = arith.mulf %[[ADDF_13]], %{{.*}}

ThomasRaoux · 2024-11-19T07:33:42Z

mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp

+    Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
+        loc, arith::CmpIPredicate::slt, maxStageByStep, ub);


this assumes lb is 0?

Actualy what we want to make sure is that maxStageByStep < ub, because if those value are from unsigned or index may lead to wrong answer. Please see full answer in the second thread.

ThomasRaoux · 2024-11-19T07:34:51Z

mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp

+    Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
+        loc, arith::CmpIPredicate::slt, maxStageByStep, ub);
+    Value possibleNewUB =
+        rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
+    // In case of `index` or `unsigned` type, we need to make sure that the
+    // subtraction does not result in a negative value, instead we use lb
+    // to avoid entering the kernel loop.
+    newUb = rewriter.create<arith::SelectOp>(
+        loc, hasAtLeastOneIteration, possibleNewUB, forOp.getLowerBound());


I don't understand why this is needed, if ub-maxStageByStep < lb then the loop is never executed either so I don't see why we need to have this special case

@ThomasRaoux , actualy what we want to make sure is that maxStageByStep < ub, because if those value are from unsigned or index may lead to wrong answer.
For exmaple, if:

ub = 100 maxStageByStep = 150

Then:
100 - 150 = (-50)
But for index/unsigned this value will be intreprated to big possitive number as negative doesn't exist in unsigned.
This leads to error that we will actually enter the kernel loop although ub < maxStageByStep.
So in that case, to avoid using ub-maxStageByStep < lb which in the case mentioned above may be falsely true, I added a check and use lb just to avoid entering the kernel.
The rest of the code using ub-maxStageByStep is protected by pred condition and usually wrapped with if on the pred so I think we are o.k. there, the only place where it didn't work for me was on the for kernel mentioned above.

scf.for comparison is always signed as far as I know. In what case would we have an unsigned comparison of lb and ub?

Here is how the scf.for gets lowered:
https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/SCFToControlFlow/SCFToControlFlow.cpp#L371

The induction variable/lb/ub of scf.for are from AnySignlessIntegerOrIndex type:

llvm-project/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td

Line 242 in 7e85cb8

let arguments = (ins AnySignlessIntegerOrIndex:$lowerBound,

So it also affects the calcualtion of ub-maxStageByStep is AnySignlessIntegerOrIndex.
Index type is defined by:

llvm-project/mlir/include/mlir/IR/BuiltinTypes.td

Line 451 in 7e85cb8

**Rationale:** integers of platform-specific bit widths are practical to

With the notion of:
**Rationale:** integers of platform-specific bit widths are practical to express sizes, dimensionalities and subscripts.

So we consider index as unsigned int in our platform.

I don't think this is correct, signless means the type doesn't contain information on the whether it is signed or unsigned and the information is carried by the op. In the case of scf.for the comparison is signed in scf.for semantic.

scf.for definition in the td file requires the step to be positive, but doesn't mentioned signed/unsigned. I agree that the pass you mentioned above lowers to signed, but from EmitC lowering I found lowering to size_t (which is similar unsigned int):

llvm-project/mlir/lib/Target/Cpp/TranslateToCpp.cpp

Line 1667 in abac5be

if (auto iType = dyn_cast<IndexType>(type))

@jpienaar @marbre @ThomasRaoux Personally I can agree with both signed/unsigned but I would like us to be aligned and define it in the scf.for td definition to make it clear. With the current status my platform lowers scf.for lb/ub/step into unsigned int

I agree we should clarify the definition of scf.for. The comparison needs to be signed or unsigned it cannot be either.
I would suggest we clarify the spec to clearly mention that each iteration does the signed comparison ind < ub

@ThomasRaoux @joker-eph
After some discussion iwth my team it seems more reasonable to make the lower/upper bounds as signed integer when lowered to HW.
Here is a followed up PR: #117534

Closing this one as now the change is not needed.

…ive or zero Per the discussion here: llvm#116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.

…ive or zero (#117534) Per the discussion here: #116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.

llvmbot added mlir mlir:scf labels Nov 19, 2024

AviadCo requested review from nicolasvasilache and ThomasRaoux November 19, 2024 06:24

ThomasRaoux reviewed Nov 19, 2024

View reviewed changes

AviadCo mentioned this pull request Nov 25, 2024

[nfc][mlir][scf]: Define scf.for lower/upper bounds can be also negative or zero #117534

Merged

AviadCo closed this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748

[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748

Uh oh!

AviadCo commented Nov 19, 2024

Uh oh!

llvmbot commented Nov 19, 2024 •

edited

Loading

Uh oh!

ThomasRaoux Nov 19, 2024

Uh oh!

AviadCo Nov 19, 2024

Uh oh!

ThomasRaoux Nov 19, 2024

Uh oh!

AviadCo Nov 19, 2024 •

edited

Loading

Uh oh!

ThomasRaoux Nov 19, 2024

Uh oh!

ThomasRaoux Nov 19, 2024

Uh oh!

AviadCo Nov 19, 2024

Uh oh!

ThomasRaoux Nov 19, 2024

Uh oh!

AviadCo Nov 19, 2024

Uh oh!

ThomasRaoux Nov 19, 2024

Uh oh!

AviadCo Nov 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

		Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::slt, maxStageByStep, ub);

[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748

[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748

Uh oh!

Conversation

AviadCo commented Nov 19, 2024

Uh oh!

llvmbot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AviadCo Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AviadCo Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Nov 19, 2024 •

edited

Loading

AviadCo Nov 19, 2024 •

edited

Loading

AviadCo Nov 25, 2024 •

edited

Loading