-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SCEV] Prove no-self-wrap from negative power of two step #101416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCEV] Prove no-self-wrap from negative power of two step #101416
Conversation
We have existing code which reasons about a step evenly dividing the iteration space is a finite loop with a single exit implying no-self-wrap. The sign of the step doesn't effect this. Not really a fan of the conditional negate logic here, but it seemed like a bit of overkill to have isKnownToBeAPossibleNegativePowerOfTwo. Any better ideas on how to structure this?
@llvm/pr-subscribers-llvm-analysis Author: Philip Reames (preames) ChangesWe have existing code which reasons about a step evenly dividing the iteration space is a finite loop with a single exit implying no-self-wrap. The sign of the step doesn't effect this. Not really a fan of the conditional negate logic here, but it seemed like a bit of overkill to have isKnownToBeAPossibleNegativePowerOfTwo. Any better ideas on how to structure this? Full diff: https://github.com/llvm/llvm-project/pull/101416.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 264ac392b16d1..755fc9a6d35bc 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -9157,13 +9157,17 @@ ScalarEvolution::ExitLimit ScalarEvolution::computeExitLimitFromICmp(
if (auto *ZExt = dyn_cast<SCEVZeroExtendExpr>(LHS))
InnerLHS = ZExt->getOperand();
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(InnerLHS);
- AR && !AR->hasNoSelfWrap() && AR->getLoop() == L && AR->isAffine() &&
- isKnownToBeAPowerOfTwo(AR->getStepRecurrence(*this), /*OrZero=*/true)) {
- auto Flags = AR->getNoWrapFlags();
- Flags = setFlags(Flags, SCEV::FlagNW);
- SmallVector<const SCEV *> Operands{AR->operands()};
- Flags = StrengthenNoWrapFlags(this, scAddRecExpr, Operands, Flags);
- setNoWrapFlags(const_cast<SCEVAddRecExpr *>(AR), Flags);
+ AR && !AR->hasNoSelfWrap() && AR->getLoop() == L && AR->isAffine()) {
+ const SCEV *Step = AR->getStepRecurrence(*this);
+ if (isKnownNegative(Step))
+ Step = getNegativeSCEV(Step);
+ if (isKnownToBeAPowerOfTwo(Step, /*OrZero=*/true)) {
+ auto Flags = AR->getNoWrapFlags();
+ Flags = setFlags(Flags, SCEV::FlagNW);
+ SmallVector<const SCEV *> Operands{AR->operands()};
+ Flags = StrengthenNoWrapFlags(this, scAddRecExpr, Operands, Flags);
+ setNoWrapFlags(const_cast<SCEVAddRecExpr *>(AR), Flags);
+ }
}
}
diff --git a/llvm/test/Analysis/ScalarEvolution/trip-count-scalable-stride.ll b/llvm/test/Analysis/ScalarEvolution/trip-count-scalable-stride.ll
index 7c9498304e939..eda28e24f1b0e 100644
--- a/llvm/test/Analysis/ScalarEvolution/trip-count-scalable-stride.ll
+++ b/llvm/test/Analysis/ScalarEvolution/trip-count-scalable-stride.ll
@@ -455,15 +455,16 @@ define void @vscale_countdown_ne(ptr nocapture %A, i32 %n) mustprogress vscale_r
; CHECK-NEXT: %start = sub i32 %n, %vscale
; CHECK-NEXT: --> ((-1 * vscale)<nsw> + %n) U: full-set S: full-set
; CHECK-NEXT: %iv = phi i32 [ %sub, %for.body ], [ %start, %entry ]
-; CHECK-NEXT: --> {((-1 * vscale)<nsw> + %n),+,(-1 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT: --> {((-1 * vscale)<nsw> + %n),+,(-1 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: ((vscale * (-1 + (-1 * (((-2 * vscale)<nsw> + %n) /u vscale))<nsw>)<nsw>) + %n) LoopDispositions: { %for.body: Computable }
; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %A, i32 %iv
-; CHECK-NEXT: --> {((4 * %n) + (-4 * vscale)<nsw> + %A),+,(-4 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT: --> {((4 * %n) + (-4 * vscale)<nsw> + %A),+,(-4 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: ((4 * %n) + (vscale * (-4 + (-4 * (((-2 * vscale)<nsw> + %n) /u vscale)))) + %A) LoopDispositions: { %for.body: Computable }
; CHECK-NEXT: %sub = sub i32 %iv, %vscale
-; CHECK-NEXT: --> {((-2 * vscale)<nsw> + %n),+,(-1 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT: --> {((-2 * vscale)<nsw> + %n),+,(-1 * vscale)<nsw>}<nw><%for.body> U: full-set S: full-set Exits: ((vscale * (-2 + (-1 * (((-2 * vscale)<nsw> + %n) /u vscale))<nsw>)) + %n) LoopDispositions: { %for.body: Computable }
; CHECK-NEXT: Determining loop execution counts for: @vscale_countdown_ne
-; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
-; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
-; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: backedge-taken count is (((-2 * vscale)<nsw> + %n) /u vscale)
+; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is i32 2147483647
+; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (((-2 * vscale)<nsw> + %n) /u vscale)
+; CHECK-NEXT: Loop %for.body: Trip multiple is 1
;
entry:
%vscale = call i32 @llvm.vscale.i32()
@@ -495,15 +496,16 @@ define void @vscalex4_countdown_ne(ptr nocapture %A, i32 %n) mustprogress vscale
; CHECK-NEXT: %start = sub i32 %n, %VF
; CHECK-NEXT: --> ((-4 * vscale)<nsw> + %n) U: full-set S: full-set
; CHECK-NEXT: %iv = phi i32 [ %sub, %for.body ], [ %start, %entry ]
-; CHECK-NEXT: --> {((-4 * vscale)<nsw> + %n),+,(-4 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT: --> {((-4 * vscale)<nsw> + %n),+,(-4 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: ((vscale * (-4 + (-4 * (((-8 * vscale)<nsw> + %n) /u (4 * vscale)<nuw><nsw>))<nsw>)<nsw>) + %n) LoopDispositions: { %for.body: Computable }
; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %A, i32 %iv
-; CHECK-NEXT: --> {((4 * %n) + (-16 * vscale)<nsw> + %A),+,(-16 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT: --> {((4 * %n) + (-16 * vscale)<nsw> + %A),+,(-16 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: ((4 * %n) + (vscale * (-16 + (-16 * (((-8 * vscale)<nsw> + %n) /u (4 * vscale)<nuw><nsw>)))) + %A) LoopDispositions: { %for.body: Computable }
; CHECK-NEXT: %sub = sub i32 %iv, %VF
-; CHECK-NEXT: --> {((-8 * vscale)<nsw> + %n),+,(-4 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Computable }
+; CHECK-NEXT: --> {((-8 * vscale)<nsw> + %n),+,(-4 * vscale)<nsw>}<nw><%for.body> U: full-set S: full-set Exits: ((vscale * (-8 + (-4 * (((-8 * vscale)<nsw> + %n) /u (4 * vscale)<nuw><nsw>))<nsw>)) + %n) LoopDispositions: { %for.body: Computable }
; CHECK-NEXT: Determining loop execution counts for: @vscalex4_countdown_ne
-; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
-; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
-; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: backedge-taken count is (((-8 * vscale)<nsw> + %n) /u (4 * vscale)<nuw><nsw>)
+; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is i32 536870911
+; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (((-8 * vscale)<nsw> + %n) /u (4 * vscale)<nuw><nsw>)
+; CHECK-NEXT: Loop %for.body: Trip multiple is 1
;
entry:
%vscale = call i32 @llvm.vscale.i32()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a fan of the conditional negate logic here, but it seemed like a bit of overkill to have isKnownToBeAPossibleNegativePowerOfTwo. Any better ideas on how to structure this?
I think it would be okay to add an OrNegative parameter to the function. Computing getNegativeSCEV() may be expensive if it's not just negating a constant.
Will do. Update pending later today. |
return all_of(Mul->operands(), NonRecursive) && (OrZero || isKnownNonZero(S)); | ||
return all_of(Mul->operands(), NonRecursive) && | ||
(OrZero || isKnownNonZero(S)) && | ||
(!OrNegative || llvm::count_if(Mul->operands(), [this](const SCEV *S) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why we need this condition.
If OrNegative=false the previous code is correct. If OrNegative=true, then we're fine with either a positive or negative power of two. So even if we have two negative factors, the result will be a positive power of two (or zero) and still be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I was originally thinking of this as power-of-two XOR negative power-of-two, and then generalized slightly, but you're right, this condition is a remnant of the XOR model. Will kill shortly.
Co-authored-by: Nikita Popov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -455,15 +455,16 @@ define void @vscale_countdown_ne(ptr nocapture %A, i32 %n) mustprogress vscale_r | |||
; CHECK-NEXT: %start = sub i32 %n, %vscale | |||
; CHECK-NEXT: --> ((-1 * vscale)<nsw> + %n) U: full-set S: full-set | |||
; CHECK-NEXT: %iv = phi i32 [ %sub, %for.body ], [ %start, %entry ] | |||
; CHECK-NEXT: --> {((-1 * vscale)<nsw> + %n),+,(-1 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Computable } | |||
; CHECK-NEXT: --> {((-1 * vscale)<nsw> + %n),+,(-1 * vscale)<nsw>}<%for.body> U: full-set S: full-set Exits: ((vscale * (-1 + (-1 * (((-2 * vscale)<nsw> + %n) /u vscale))<nsw>)<nsw>) + %n) LoopDispositions: { %for.body: Computable } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the TODO above.
We have existing code which reasons about a step evenly dividing the iteration space is a finite loop with a single exit implying no-self-wrap. The sign of the step doesn't effect this.