Skip to content

[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

AviadCo
Copy link
Contributor

@AviadCo AviadCo commented Nov 19, 2024

  • When using dynamic loop pipelining trasform, we are not sure if we enter kernel loop (depends on dynamic input). This patch fix the possible option that the kernel upperbound might be defined by 2 values from index/unsigned int type that is result of subtracting big value from smaller one.

…erbound

* When using dynamic loop pipelining trasform, we are not sure if we
  enter kernel loop (depends on dynamic input). This patch fix the
possible option that the kernel upperbound might be defined by 2 values
from index/unsigned int type that is result of subtracting big value
from smaller one.
@llvmbot
Copy link
Member

llvmbot commented Nov 19, 2024

@llvm/pr-subscribers-mlir-scf

@llvm/pr-subscribers-mlir

Author: Aviad Cohen (AviadCo)

Changes
  • When using dynamic loop pipelining trasform, we are not sure if we enter kernel loop (depends on dynamic input). This patch fix the possible option that the kernel upperbound might be defined by 2 values from index/unsigned int type that is result of subtracting big value from smaller one.

Full diff: https://github.com/llvm/llvm-project/pull/116748.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp (+9-1)
  • (modified) mlir/test/Dialect/SCF/loop-pipelining.mlir (+7-2)
diff --git a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
index 1b458f410af601..899fe7374dbbd0 100644
--- a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
@@ -444,7 +444,15 @@ scf::ForOp LoopPipelinerInternal::createKernelLoop(
         loc, rewriter.getIntegerAttr(t, maxStage));
     Value maxStageByStep =
         rewriter.create<arith::MulIOp>(loc, step, maxStageValue);
-    newUb = rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
+    Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
+        loc, arith::CmpIPredicate::slt, maxStageByStep, ub);
+    Value possibleNewUB =
+        rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
+    // In case of `index` or `unsigned` type, we need to make sure that the
+    // subtraction does not result in a negative value, instead we use lb
+    // to avoid entering the kernel loop.
+    newUb = rewriter.create<arith::SelectOp>(
+        loc, hasAtLeastOneIteration, possibleNewUB, forOp.getLowerBound());
   }
   auto newForOp =
       rewriter.create<scf::ForOp>(forOp.getLoc(), forOp.getLowerBound(), newUb,
diff --git a/mlir/test/Dialect/SCF/loop-pipelining.mlir b/mlir/test/Dialect/SCF/loop-pipelining.mlir
index c879c83275bf86..d13d258412f2d0 100644
--- a/mlir/test/Dialect/SCF/loop-pipelining.mlir
+++ b/mlir/test/Dialect/SCF/loop-pipelining.mlir
@@ -770,7 +770,10 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //    CHECK-DAG:   %[[C2:.*]] = arith.constant 2 : index
 //    CHECK-DAG:   %[[C0:.*]] = arith.constant 0 : index
 //    CHECK-DAG:   %[[CM1:.*]] = arith.constant -1 : index
-//        CHECK:   %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
+//        CHECK:   %[[IT2_UB:.*]] = arith.muli %[[STEP:.*]], %[[C2:.*]]
+//        CHECK:   %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[IT2_UB:.*]], %[[UB:.*]]
+//        CHECK:   %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[IT2_UB:.*]]
+//        CHECK:   %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
 //        CHECK:   %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
 //        CHECK:       memref.store %[[ARG6]], %{{.*}}[%[[ARG5]]]
 //        CHECK:       %[[ADDF_24:.*]] = arith.addf %[[ARG7]], %{{.*}}
@@ -844,7 +847,9 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
 //   CHECK-DAG:   %[[C0:.*]] = arith.constant 0 : index
 //   CHECK-DAG:   %[[CM1:.*]] = arith.constant -1 : index
 //   CHECK-DAG:   %[[CF0:.*]] = arith.constant 0.000000e+00
-//       CHECK:   %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
+//       CHECK:   %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[STEP:.*]], %[[UB:.*]]
+//       CHECK:   %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[STEP:.*]]
+//       CHECK:   %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
 //       CHECK:   %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
 //       CHECK:       %[[ADDF_13:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
 //       CHECK:       %[[MULF_14:.*]] = arith.mulf %[[ADDF_13]], %{{.*}}

Comment on lines +447 to +448
Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::slt, maxStageByStep, ub);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes lb is 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actualy what we want to make sure is that maxStageByStep < ub, because if those value are from unsigned or index may lead to wrong answer. Please see full answer in the second thread.

Comment on lines +447 to +455
Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::slt, maxStageByStep, ub);
Value possibleNewUB =
rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
// In case of `index` or `unsigned` type, we need to make sure that the
// subtraction does not result in a negative value, instead we use lb
// to avoid entering the kernel loop.
newUb = rewriter.create<arith::SelectOp>(
loc, hasAtLeastOneIteration, possibleNewUB, forOp.getLowerBound());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is needed, if ub-maxStageByStep < lb then the loop is never executed either so I don't see why we need to have this special case

Copy link
Contributor Author

@AviadCo AviadCo Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasRaoux , actualy what we want to make sure is that maxStageByStep < ub, because if those value are from unsigned or index may lead to wrong answer.
For exmaple, if:

ub = 100
maxStageByStep = 150

Then:
100 - 150 = (-50)
But for index/unsigned this value will be intreprated to big possitive number as negative doesn't exist in unsigned.
This leads to error that we will actually enter the kernel loop although ub < maxStageByStep.
So in that case, to avoid using ub-maxStageByStep < lb which in the case mentioned above may be falsely true, I added a check and use lb just to avoid entering the kernel.
The rest of the code using ub-maxStageByStep is protected by pred condition and usually wrapped with if on the pred so I think we are o.k. there, the only place where it didn't work for me was on the for kernel mentioned above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scf.for comparison is always signed as far as I know. In what case would we have an unsigned comparison of lb and ub?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The induction variable/lb/ub of scf.for are from AnySignlessIntegerOrIndex type:

let arguments = (ins AnySignlessIntegerOrIndex:$lowerBound,

So it also affects the calcualtion of ub-maxStageByStep is AnySignlessIntegerOrIndex.
Index type is defined by:

**Rationale:** integers of platform-specific bit widths are practical to

With the notion of:
**Rationale:** integers of platform-specific bit widths are practical to express sizes, dimensionalities and subscripts.

So we consider index as unsigned int in our platform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, signless means the type doesn't contain information on the whether it is signed or unsigned and the information is carried by the op. In the case of scf.for the comparison is signed in scf.for semantic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scf.for definition in the td file requires the step to be positive, but doesn't mentioned signed/unsigned. I agree that the pass you mentioned above lowers to signed, but from EmitC lowering I found lowering to size_t (which is similar unsigned int):

if (auto iType = dyn_cast<IndexType>(type))

@jpienaar @marbre @ThomasRaoux Personally I can agree with both signed/unsigned but I would like us to be aligned and define it in the scf.for td definition to make it clear. With the current status my platform lowers scf.for lb/ub/step into unsigned int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should clarify the definition of scf.for. The comparison needs to be signed or unsigned it cannot be either.
I would suggest we clarify the spec to clearly mention that each iteration does the signed comparison ind < ub

Copy link
Contributor Author

@AviadCo AviadCo Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasRaoux @joker-eph
After some discussion iwth my team it seems more reasonable to make the lower/upper bounds as signed integer when lowered to HW.
Here is a followed up PR: #117534

Closing this one as now the change is not needed.

AviadCo pushed a commit to AviadCo/llvm-project that referenced this pull request Nov 25, 2024
…ive or zero

Per the discussion here: llvm#116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.
@AviadCo AviadCo closed this Nov 25, 2024
AviadCo pushed a commit to AviadCo/llvm-project that referenced this pull request Dec 14, 2024
…ive or zero

Per the discussion here: llvm#116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.
AviadCo added a commit that referenced this pull request Dec 14, 2024
…ive or zero (#117534)

Per the discussion here:
#116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants