Skip to content

[mlir][scf]: Avoid using wrong calculation loop pipelining kernel upperbound #116748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,15 @@ scf::ForOp LoopPipelinerInternal::createKernelLoop(
loc, rewriter.getIntegerAttr(t, maxStage));
Value maxStageByStep =
rewriter.create<arith::MulIOp>(loc, step, maxStageValue);
newUb = rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
Value hasAtLeastOneIteration = rewriter.create<arith::CmpIOp>(
loc, arith::CmpIPredicate::slt, maxStageByStep, ub);
Comment on lines +447 to +448
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes lb is 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actualy what we want to make sure is that maxStageByStep < ub, because if those value are from unsigned or index may lead to wrong answer. Please see full answer in the second thread.

Value possibleNewUB =
rewriter.create<arith::SubIOp>(loc, ub, maxStageByStep);
// In case of `index` or `unsigned` type, we need to make sure that the
// subtraction does not result in a negative value, instead we use lb
// to avoid entering the kernel loop.
newUb = rewriter.create<arith::SelectOp>(
loc, hasAtLeastOneIteration, possibleNewUB, forOp.getLowerBound());
Comment on lines +447 to +455
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is needed, if ub-maxStageByStep < lb then the loop is never executed either so I don't see why we need to have this special case

Copy link
Contributor Author

@AviadCo AviadCo Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasRaoux , actualy what we want to make sure is that maxStageByStep < ub, because if those value are from unsigned or index may lead to wrong answer.
For exmaple, if:

ub = 100
maxStageByStep = 150

Then:
100 - 150 = (-50)
But for index/unsigned this value will be intreprated to big possitive number as negative doesn't exist in unsigned.
This leads to error that we will actually enter the kernel loop although ub < maxStageByStep.
So in that case, to avoid using ub-maxStageByStep < lb which in the case mentioned above may be falsely true, I added a check and use lb just to avoid entering the kernel.
The rest of the code using ub-maxStageByStep is protected by pred condition and usually wrapped with if on the pred so I think we are o.k. there, the only place where it didn't work for me was on the for kernel mentioned above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scf.for comparison is always signed as far as I know. In what case would we have an unsigned comparison of lb and ub?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The induction variable/lb/ub of scf.for are from AnySignlessIntegerOrIndex type:

let arguments = (ins AnySignlessIntegerOrIndex:$lowerBound,

So it also affects the calcualtion of ub-maxStageByStep is AnySignlessIntegerOrIndex.
Index type is defined by:

**Rationale:** integers of platform-specific bit widths are practical to

With the notion of:
**Rationale:** integers of platform-specific bit widths are practical to express sizes, dimensionalities and subscripts.

So we consider index as unsigned int in our platform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, signless means the type doesn't contain information on the whether it is signed or unsigned and the information is carried by the op. In the case of scf.for the comparison is signed in scf.for semantic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scf.for definition in the td file requires the step to be positive, but doesn't mentioned signed/unsigned. I agree that the pass you mentioned above lowers to signed, but from EmitC lowering I found lowering to size_t (which is similar unsigned int):

if (auto iType = dyn_cast<IndexType>(type))

@jpienaar @marbre @ThomasRaoux Personally I can agree with both signed/unsigned but I would like us to be aligned and define it in the scf.for td definition to make it clear. With the current status my platform lowers scf.for lb/ub/step into unsigned int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should clarify the definition of scf.for. The comparison needs to be signed or unsigned it cannot be either.
I would suggest we clarify the spec to clearly mention that each iteration does the signed comparison ind < ub

Copy link
Contributor Author

@AviadCo AviadCo Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasRaoux @joker-eph
After some discussion iwth my team it seems more reasonable to make the lower/upper bounds as signed integer when lowered to HW.
Here is a followed up PR: #117534

Closing this one as now the change is not needed.

}
auto newForOp =
rewriter.create<scf::ForOp>(forOp.getLoc(), forOp.getLowerBound(), newUb,
Expand Down
9 changes: 7 additions & 2 deletions mlir/test/Dialect/SCF/loop-pipelining.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -770,7 +770,10 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[CM1:.*]] = arith.constant -1 : index
// CHECK: %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
// CHECK: %[[IT2_UB:.*]] = arith.muli %[[STEP:.*]], %[[C2:.*]]
// CHECK: %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[IT2_UB:.*]], %[[UB:.*]]
// CHECK: %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[IT2_UB:.*]]
// CHECK: %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
// CHECK: %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
// CHECK: memref.store %[[ARG6]], %{{.*}}[%[[ARG5]]]
// CHECK: %[[ADDF_24:.*]] = arith.addf %[[ARG7]], %{{.*}}
Expand Down Expand Up @@ -844,7 +847,9 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[CM1:.*]] = arith.constant -1 : index
// CHECK-DAG: %[[CF0:.*]] = arith.constant 0.000000e+00
// CHECK: %[[UBM:.*]] = arith.subi %[[UB:.*]], %{{.*}}
// CHECK: %[[ENTERKERNEL:.*]] = arith.cmpi slt, %[[STEP:.*]], %[[UB:.*]]
// CHECK: %[[PUBM:.*]] = arith.subi %[[UB:.*]], %[[STEP:.*]]
// CHECK: %[[UBM:.*]] = arith.select %[[ENTERKERNEL:.*]], %[[PUBM:.*]], %[[LB:.*]]
// CHECK: %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
// CHECK: %[[ADDF_13:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
// CHECK: %[[MULF_14:.*]] = arith.mulf %[[ADDF_13]], %{{.*}}
Expand Down
Loading