-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AviadCo
commented
Nov 19, 2024
- When using dynamic loop pipelining trasform, we are not sure if we enter kernel loop (depends on dynamic input). This patch fix the possible option that the kernel upperbound might be defined by 2 values from index/unsigned int type that is result of subtracting big value from smaller one.
…erbound * When using dynamic loop pipelining trasform, we are not sure if we enter kernel loop (depends on dynamic input). This patch fix the possible option that the kernel upperbound might be defined by 2 values from index/unsigned int type that is result of subtracting big value from smaller one.
@llvm/pr-subscribers-mlir-scf @llvm/pr-subscribers-mlir Author: Aviad Cohen (AviadCo) Changes
Full diff: https://github.com/llvm/llvm-project/pull/116748.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
index 1b458f410af601..899fe7374dbbd0 100644
--- a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
@@ -444,7 +444,15 @@ scf::ForOp LoopPipelinerInternal::createKernelLoop(
loc, rewriter.getIntegerAttr(t, maxStage));
Value maxStageByStep =
rewriter.create<arith::MulIOp>(loc, step, maxStageValue);
- newUb = rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
+ Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
+ loc, arith::CmpIPredicate::slt, maxStageByStep, ub);
+ Value possibleNewUB =
+ rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
+ // In case of `index` or `unsigned` type, we need to make sure that the
+ // subtraction does not result in a negative value, instead we use lb
+ // to avoid entering the kernel loop.
+ newUb = rewriter.create<arith::SelectOp>(
+ loc, hasAtLeastOneIteration, possibleNewUB, forOp.getLowerBound());
}
auto newForOp =
rewriter.create<scf::ForOp>(forOp.getLoc(), forOp.getLowerBound(), newUb,
diff --git a/mlir/test/Dialect/SCF/loop-pipelining.mlir b/mlir/test/Dialect/SCF/loop-pipelining.mlir
index c879c83275bf86..d13d258412f2d0 100644
--- a/mlir/test/Dialect/SCF/loop-pipelining.mlir
+++ b/mlir/test/Dialect/SCF/loop-pipelining.mlir
@@ -770,7 +770,10 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[CM1:.*]] = arith.constant -1 : index
-// CHECK: %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
+// CHECK: %[[IT2_UB:.*]] = arith.muli %[[STEP:.*]], %[[C2:.*]]
+// CHECK: %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[IT2_UB:.*]], %[[UB:.*]]
+// CHECK: %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[IT2_UB:.*]]
+// CHECK: %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
// CHECK: %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
// CHECK: memref.store %[[ARG6]], %{{.*}}[%[[ARG5]]]
// CHECK: %[[ADDF_24:.*]] = arith.addf %[[ARG7]], %{{.*}}
@@ -844,7 +847,9 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[CM1:.*]] = arith.constant -1 : index
// CHECK-DAG: %[[CF0:.*]] = arith.constant 0.000000e+00
-// CHECK: %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
+// CHECK: %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[STEP:.*]], %[[UB:.*]]
+// CHECK: %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[STEP:.*]]
+// CHECK: %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
// CHECK: %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
// CHECK: %[[ADDF_13:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
// CHECK: %[[MULF_14:.*]] = arith.mulf %[[ADDF_13]], %{{.*}}
|
Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>( | ||
loc, arith::CmpIPredicate::slt, maxStageByStep, ub); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assumes lb is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actualy what we want to make sure is that maxStageByStep < ub
, because if those value are from unsigned
or index
may lead to wrong answer. Please see full answer in the second thread.
Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>( | ||
loc, arith::CmpIPredicate::slt, maxStageByStep, ub); | ||
Value possibleNewUB = | ||
rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep); | ||
// In case of `index` or `unsigned` type, we need to make sure that the | ||
// subtraction does not result in a negative value, instead we use lb | ||
// to avoid entering the kernel loop. | ||
newUb = rewriter.create<arith::SelectOp>( | ||
loc, hasAtLeastOneIteration, possibleNewUB, forOp.getLowerBound()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is needed, if ub-maxStageByStep < lb
then the loop is never executed either so I don't see why we need to have this special case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ThomasRaoux , actualy what we want to make sure is that maxStageByStep < ub, because if those value are from unsigned
or index
may lead to wrong answer.
For exmaple, if:
ub = 100
maxStageByStep = 150
Then:
100 - 150 = (-50)
But for index/unsigned this value will be intreprated to big possitive number as negative doesn't exist in unsigned.
This leads to error that we will actually enter the kernel loop although ub < maxStageByStep
.
So in that case, to avoid using ub-maxStageByStep < lb
which in the case mentioned above may be falsely true, I added a check and use lb
just to avoid entering the kernel.
The rest of the code using ub-maxStageByStep
is protected by pred
condition and usually wrapped with if
on the pred
so I think we are o.k. there, the only place where it didn't work for me was on the for
kernel mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scf.for comparison is always signed as far as I know. In what case would we have an unsigned comparison of lb and ub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is how the scf.for gets lowered:
https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/SCFToControlFlow/SCFToControlFlow.cpp#L371
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The induction variable/lb/ub of scf.for
are from AnySignlessIntegerOrIndex
type:
let arguments = (ins AnySignlessIntegerOrIndex:$lowerBound, |
So it also affects the calcualtion of ub-maxStageByStep
is AnySignlessIntegerOrIndex
.
Index type is defined by:
**Rationale:** integers of platform-specific bit widths are practical to |
With the notion of:
**Rationale:** integers of platform-specific bit widths are practical to express sizes, dimensionalities and subscripts.
So we consider index
as unsigned int
in our platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct, signless means the type doesn't contain information on the whether it is signed or unsigned and the information is carried by the op. In the case of scf.for the comparison is signed in scf.for semantic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scf.for
definition in the td file requires the step
to be positive, but doesn't mentioned signed/unsigned. I agree that the pass you mentioned above lowers to signed, but from EmitC lowering I found lowering to size_t
(which is similar unsigned int):
if (auto iType = dyn_cast<IndexType>(type)) |
@jpienaar @marbre @ThomasRaoux Personally I can agree with both signed/unsigned but I would like us to be aligned and define it in the scf.for
td definition to make it clear. With the current status my platform lowers scf.for
lb/ub/step into unsigned int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should clarify the definition of scf.for. The comparison needs to be signed or unsigned it cannot be either.
I would suggest we clarify the spec to clearly mention that each iteration does the signed comparison ind < ub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ThomasRaoux @joker-eph
After some discussion iwth my team it seems more reasonable to make the lower/upper bounds as signed integer
when lowered to HW.
Here is a followed up PR: #117534
Closing this one as now the change is not needed.
…ive or zero Per the discussion here: llvm#116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.
…ive or zero Per the discussion here: llvm#116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.
…ive or zero (#117534) Per the discussion here: #116748 (comment) , this commit properly declare that lower and upper bounds can be also negative or zero.