-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount #108833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: David Sherwood (david-arm) ChangesLoopVectorizationLegality currently only treats a loop as legal to vectorise This also helps prepare the loop vectoriser for PR #88385. Full diff: https://github.com/llvm/llvm-project/pull/108833.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f726b171969a30..5ab0fd12c538f3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -907,7 +907,7 @@ Value *getRuntimeVF(IRBuilderBase &B, Type *Ty, ElementCount VF) {
const SCEV *createTripCountSCEV(Type *IdxTy, PredicatedScalarEvolution &PSE,
Loop *OrigLoop) {
- const SCEV *BackedgeTakenCount = PSE.getBackedgeTakenCount();
+ const SCEV *BackedgeTakenCount = PSE.getSymbolicMaxBackedgeTakenCount();
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCount) && "Invalid loop count");
ScalarEvolution &SE = *PSE.getSE();
@@ -4090,7 +4090,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
unsigned MaxVFtimesIC =
UserIC ? *MaxPowerOf2RuntimeVF * UserIC : *MaxPowerOf2RuntimeVF;
ScalarEvolution *SE = PSE.getSE();
- const SCEV *BackedgeTakenCount = PSE.getBackedgeTakenCount();
+ const SCEV *BackedgeTakenCount = PSE.getSymbolicMaxBackedgeTakenCount();
const SCEV *ExitCount = SE->getAddExpr(
BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));
const SCEV *Rem = SE->getURemExpr(
@@ -9584,7 +9584,7 @@ static bool processLoopInVPlanNativePath(
ProfileSummaryInfo *PSI, LoopVectorizeHints &Hints,
LoopVectorizationRequirements &Requirements) {
- if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) {
+ if (isa<SCEVCouldNotCompute>(PSE.getSymbolicMaxBackedgeTakenCount())) {
LLVM_DEBUG(dbgs() << "LV: cannot compute the outer-loop trip count\n");
return false;
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be taking the "create small patches" idea a bit too far but I suppose this change does ensure LoopVectorize is not relying on the more restricted behaviour of getBackedgeTakenCount()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be taking the "create small patches" idea a bit too far but I suppose this change does ensure LoopVectorize is not relying on the more restricted behaviour of getBackedgeTakenCount().
Yeah, I think as is this may be more confusing for the reader, at least without a comment. Might also clarify to with an assertion that getBackedgeTakenCount is not SCEVCouldNotCompute....
@@ -9584,7 +9584,7 @@ static bool processLoopInVPlanNativePath( | |||
ProfileSummaryInfo *PSI, LoopVectorizeHints &Hints, | |||
LoopVectorizationRequirements &Requirements) { | |||
|
|||
if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) { | |||
if (isa<SCEVCouldNotCompute>(PSE.getSymbolicMaxBackedgeTakenCount())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the change the behavior to allow vectorizing loops where getBackedgeTakenCount is SCEVCouldNotCompute? May need a test case and would be an unintended change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. I was being too eager! I looked and there is no defence against loops with uncountable exits, since LoopVectorizationLegality doesn't filter those out in advance. I've reverted this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my patch no longer changes this code are you asking if I can add a negative test for the native vplan case that proves we don't vectorise outer loops with an early exit? I can have a look, although I'm not sure that's even possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, not sure if this is what you wanted or not but I added a test for an outer loop with an early exit. It didn't seem to be covered by any existing tests, although the legality code bails out too early to even attempt calculating a backedge taken count, due to the form of the conditional branch that is rejected by canVectorizeOuterLoop. I hacked the legality code and verified that we can get a symbolic max backedge take count for the outer loop, but not a normal one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can add a negative test for the native vplan case that proves we don't vectorise outer loops with an early exit?
yes that was what I had in mind to guard against the regression in the future, thanks!
e0dc97f
to
459661e
Compare
I've added comments explaining why it's ok to use the symbolic max variant. I also fixed rename a variable in createInitialVPlan that was shadowing a variable of the same name in the VPlan class. |
// Currently only loops with countable exits are vectorized so it's safe to | ||
// use getSymbolicMaxBackedgeTakenCount as it should give the same result | ||
// as getBackedgeTakenCount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the comment suggests when uncountable exists are support calling getSymbolicMaxBackedgeTakenCount
becomes unsafe? which I think is the opposite of what you mean.
Perhaps something akin to:
Currently only loops with countable exits are vectorized, but calling getSymbolicMaxBackedgeTakenCount
allows enablement work for loops with uncountable exits whilst also ensuring the symbolic maximum and known back-edge taken count remain identical for loops with countable exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good suggestion! I've rebased the patch to see if the libc++ tests start passing now, and updated the comment. Two for the price of one. :)
…kenCount LoopVectorizationLegality currently only treats a loop as legal to vectorise if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid SCEV, or more precisely that the loop must have an exact backedge taken count. Therefore, in LoopVectorize.cpp we can safely replace all calls to getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount, since the result is the same. This also helps prepare the loop vectoriser for PR llvm#88385.
* Updated comments around calls to getSymbolicMaxBackedgeTakenCount
459661e
to
d12dbae
Compare
* Updated assert. * Add outer loop vectorisation test that has an early exit in the outer loop. I checked and there are no existing tests for this. It fails in LoopVectorizationLegality so we never reach the point in LoopVectorize.cpp where attempt to calculate the backedge taken count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
br i1 %cmp.early, label %for.early, label %for.body.inner | ||
|
||
for.body.inner: | ||
%indvars.iv = phi i64 [ 0, %for.body ], [ %indvars.iv.next, %for.body.inner ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%indvars.iv = phi i64 [ 0, %for.body ], [ %indvars.iv.next, %for.body.inner ] | |
%iv = phi i64 [ 0, %for.body ], [ %indvars.iv.next, %for.body.inner ] |
@@ -9584,7 +9584,7 @@ static bool processLoopInVPlanNativePath( | |||
ProfileSummaryInfo *PSI, LoopVectorizeHints &Hints, | |||
LoopVectorizationRequirements &Requirements) { | |||
|
|||
if (isa<SCEVCouldNotCompute>(PSE.getBackedgeTakenCount())) { | |||
if (isa<SCEVCouldNotCompute>(PSE.getSymbolicMaxBackedgeTakenCount())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can add a negative test for the native vplan case that proves we don't vectorise outer loops with an early exit?
yes that was what I had in mind to guard against the regression in the future, thanks!
…geTakenCount (llvm#108833) LoopVectorizationLegality currently only treats a loop as legal to vectorise if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid SCEV, or more precisely that the loop must have an exact backedge taken count. Therefore, in LoopVectorize.cpp we can safely replace all calls to getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount, since the result is the same. This also helps prepare the loop vectoriser for PR llvm#88385.
LoopVectorizationLegality currently only treats a loop as legal to vectorise
if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid SCEV, or
more precisely that the loop must have an exact backedge taken count.
Therefore, in LoopVectorize.cpp we can safely replace all calls to
getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount, since
the result is the same.
This also helps prepare the loop vectoriser for PR #88385.