-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LAA] Perform checks for no-wrap separately from getPtrStride. #126971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) ChangesReorganize the code in isNoWrap to perform the no-wrap checks without relying on getPtrStride directly. getPtrStride now uses isNoWrap. The new structure allows deriving no-wrap in more cases in LAA, because there are some cases where getPtrStride bails out early because it cannot return a constant stride, but we can still prove no-wrap for the pointer. An example are AddRecs with non-ConstantInt strides with inbound GEPs, in the improved test cases. This enables vectorization with runtime checks in a few more cases. Full diff: https://github.com/llvm/llvm-project/pull/126971.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 3202ba81be78e..84401d1a6d751 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -813,16 +813,102 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE, Value *Ptr,
return AR->isAffine();
}
+/// Try to compute the stride for \p AR. Used by getPtrStride.
+static std::optional<int64_t>
+getStrideFromAddRec(const SCEVAddRecExpr *AR, const Loop *Lp, Type *AccessTy,
+ Value *Ptr, PredicatedScalarEvolution &PSE) {
+ // The access function must stride over the innermost loop.
+ if (Lp != AR->getLoop()) {
+ LLVM_DEBUG(dbgs() << "LAA: Bad stride - Not striding over innermost loop "
+ << *Ptr << " SCEV: " << *AR << "\n");
+ return std::nullopt;
+ }
+
+ // Check the step is constant.
+ const SCEV *Step = AR->getStepRecurrence(*PSE.getSE());
+
+ // Calculate the pointer stride and check if it is constant.
+ const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
+ if (!C) {
+ LLVM_DEBUG(dbgs() << "LAA: Bad stride - Not a constant strided " << *Ptr
+ << " SCEV: " << *AR << "\n");
+ return std::nullopt;
+ }
+
+ const auto &DL = Lp->getHeader()->getDataLayout();
+ TypeSize AllocSize = DL.getTypeAllocSize(AccessTy);
+ int64_t Size = AllocSize.getFixedValue();
+ const APInt &APStepVal = C->getAPInt();
+
+ // Huge step value - give up.
+ if (APStepVal.getBitWidth() > 64)
+ return std::nullopt;
+
+ int64_t StepVal = APStepVal.getSExtValue();
+
+ // Strided access.
+ int64_t Stride = StepVal / Size;
+ int64_t Rem = StepVal % Size;
+ if (Rem)
+ return std::nullopt;
+
+ return Stride;
+}
+
+static bool isNoWrapAddRec(Value *Ptr, const SCEVAddRecExpr *AR,
+ PredicatedScalarEvolution &PSE, const Loop *L);
+
/// Check whether a pointer address cannot wrap.
static bool isNoWrap(PredicatedScalarEvolution &PSE,
const DenseMap<Value *, const SCEV *> &Strides, Value *Ptr,
- Type *AccessTy, Loop *L, bool Assume) {
- const SCEV *PtrScev = PSE.getSCEV(Ptr);
+ Type *AccessTy, const Loop *L, bool Assume,
+ std::optional<int64_t> Stride = std::nullopt) {
+ const SCEV *PtrScev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);
if (PSE.getSE()->isLoopInvariant(PtrScev, L))
return true;
- return getPtrStride(PSE, AccessTy, Ptr, L, Strides, Assume).has_value() ||
- PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
+ const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
+ if (Assume && !AR)
+ AR = PSE.getAsAddRec(Ptr);
+ if (!AR)
+ return false;
+
+ // The address calculation must not wrap. Otherwise, a dependence could be
+ // inverted.
+ if (isNoWrapAddRec(Ptr, AR, PSE, L))
+ return true;
+
+ // An nusw getelementptr that is an AddRec cannot wrap. If it would wrap,
+ // the distance between the previously accessed location and the wrapped
+ // location will be larger than half the pointer index type space. In that
+ // case, the GEP would be poison and any memory access dependent on it would
+ // be immediate UB when executed.
+ if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
+ GEP && GEP->hasNoUnsignedSignedWrap())
+ return true;
+
+ // If the null pointer is undefined, then a access sequence which would
+ // otherwise access it can be assumed not to unsigned wrap. Note that this
+ // assumes the object in memory is aligned to the natural alignment.
+ if (!Stride)
+ Stride = getStrideFromAddRec(AR, L, AccessTy, Ptr, PSE);
+ if (Stride) {
+ unsigned AddrSpace = Ptr->getType()->getPointerAddressSpace();
+ if (!NullPointerIsDefined(L->getHeader()->getParent(), AddrSpace) &&
+ (*Stride == 1 || *Stride == -1))
+ return true;
+ }
+
+ if (Assume) {
+ PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
+ LLVM_DEBUG(dbgs() << "LAA: Pointer may wrap:\n"
+ << "LAA: Pointer: " << *Ptr << "\n"
+ << "LAA: SCEV: " << *AR << "\n"
+ << "LAA: Added an overflow assumption\n");
+ return true;
+ }
+
+ return PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
}
static void visitPointers(Value *StartPtr, const Loop &InnermostLoop,
@@ -1458,74 +1544,13 @@ llvm::getPtrStride(PredicatedScalarEvolution &PSE, Type *AccessTy, Value *Ptr,
return std::nullopt;
}
- // The access function must stride over the innermost loop.
- if (Lp != AR->getLoop()) {
- LLVM_DEBUG(dbgs() << "LAA: Bad stride - Not striding over innermost loop "
- << *Ptr << " SCEV: " << *AR << "\n");
- return std::nullopt;
- }
-
- // Check the step is constant.
- const SCEV *Step = AR->getStepRecurrence(*PSE.getSE());
-
- // Calculate the pointer stride and check if it is constant.
- const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
- if (!C) {
- LLVM_DEBUG(dbgs() << "LAA: Bad stride - Not a constant strided " << *Ptr
- << " SCEV: " << *AR << "\n");
- return std::nullopt;
- }
-
- const auto &DL = Lp->getHeader()->getDataLayout();
- TypeSize AllocSize = DL.getTypeAllocSize(AccessTy);
- int64_t Size = AllocSize.getFixedValue();
- const APInt &APStepVal = C->getAPInt();
-
- // Huge step value - give up.
- if (APStepVal.getBitWidth() > 64)
- return std::nullopt;
-
- int64_t StepVal = APStepVal.getSExtValue();
-
- // Strided access.
- int64_t Stride = StepVal / Size;
- int64_t Rem = StepVal % Size;
- if (Rem)
- return std::nullopt;
-
- if (!ShouldCheckWrap)
- return Stride;
-
- // The address calculation must not wrap. Otherwise, a dependence could be
- // inverted.
- if (isNoWrapAddRec(Ptr, AR, PSE, Lp))
- return Stride;
-
- // An nusw getelementptr that is an AddRec cannot wrap. If it would wrap,
- // the distance between the previously accessed location and the wrapped
- // location will be larger than half the pointer index type space. In that
- // case, the GEP would be poison and any memory access dependent on it would
- // be immediate UB when executed.
- if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
- GEP && GEP->hasNoUnsignedSignedWrap())
+ auto Stride = getStrideFromAddRec(AR, Lp, AccessTy, Ptr, PSE);
+ if (!ShouldCheckWrap || !Stride)
return Stride;
- // If the null pointer is undefined, then a access sequence which would
- // otherwise access it can be assumed not to unsigned wrap. Note that this
- // assumes the object in memory is aligned to the natural alignment.
- unsigned AddrSpace = Ty->getPointerAddressSpace();
- if (!NullPointerIsDefined(Lp->getHeader()->getParent(), AddrSpace) &&
- (Stride == 1 || Stride == -1))
+ if (isNoWrap(PSE, StridesMap, Ptr, AccessTy, Lp, Assume, Stride))
return Stride;
- if (Assume) {
- PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
- LLVM_DEBUG(dbgs() << "LAA: Pointer may wrap:\n"
- << "LAA: Pointer: " << *Ptr << "\n"
- << "LAA: SCEV: " << *AR << "\n"
- << "LAA: Added an overflow assumption\n");
- return Stride;
- }
LLVM_DEBUG(
dbgs() << "LAA: Bad stride - Pointer may wrap in the address space "
<< *Ptr << " SCEV: " << *AR << "\n");
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/retry-runtime-checks-after-dependence-analysis.ll b/llvm/test/Analysis/LoopAccessAnalysis/retry-runtime-checks-after-dependence-analysis.ll
index e42392df3e93e..26c571b9cb63a 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/retry-runtime-checks-after-dependence-analysis.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/retry-runtime-checks-after-dependence-analysis.ll
@@ -65,13 +65,38 @@ exit:
define void @dependency_check_and_runtime_checks_needed_gepb_not_inbounds_iv2_step5(ptr %a, ptr %b, i64 %offset, i64 %n) {
; CHECK-LABEL: 'dependency_check_and_runtime_checks_needed_gepb_not_inbounds_iv2_step5'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: cannot check memory dependencies at runtime
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP4:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
+; CHECK-NEXT: Against group ([[GRP5:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
+; CHECK-NEXT: Check 1:
+; CHECK-NEXT: Comparing group ([[GRP4]]):
+; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
+; CHECK-NEXT: Against group ([[GRP6:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.b = getelementptr i8, ptr %b, i64 %iv2
+; CHECK-NEXT: Check 2:
+; CHECK-NEXT: Comparing group ([[GRP5]]):
+; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
+; CHECK-NEXT: Against group ([[GRP6]]):
+; CHECK-NEXT: %gep.b = getelementptr i8, ptr %b, i64 %iv2
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP4]]:
+; CHECK-NEXT: (Low: %a High: ((4 * %n) + %a))
+; CHECK-NEXT: Member: {%a,+,4}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP5]]:
+; CHECK-NEXT: (Low: ((4 * %offset) + %a) High: ((4 * %offset) + (4 * %n) + %a))
+; CHECK-NEXT: Member: {((4 * %offset) + %a),+,4}<%loop>
+; CHECK-NEXT: Group [[GRP6]]:
+; CHECK-NEXT: (Low: %b High: (-1 + (5 * %n) + %b))
+; CHECK-NEXT: Member: {%b,+,5}<%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
+; CHECK-NEXT: {%b,+,5}<%loop> Added Flags: <nusw>
; CHECK-EMPTY:
; CHECK-NEXT: Expressions re-written:
;
@@ -102,10 +127,34 @@ exit:
define void @dependency_check_and_runtime_checks_needed_gepb_is_inbounds_iv2_step_not_constant(ptr %a, ptr %b, i64 %offset, i64 %n, i64 %s) {
; CHECK-LABEL: 'dependency_check_and_runtime_checks_needed_gepb_is_inbounds_iv2_step_not_constant'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: cannot check memory dependencies at runtime
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP7:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
+; CHECK-NEXT: Against group ([[GRP8:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.b = getelementptr inbounds i8, ptr %b, i64 %iv2
+; CHECK-NEXT: Check 1:
+; CHECK-NEXT: Comparing group ([[GRP7]]):
+; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
+; CHECK-NEXT: Against group ([[GRP9:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
+; CHECK-NEXT: Check 2:
+; CHECK-NEXT: Comparing group ([[GRP8]]):
+; CHECK-NEXT: %gep.b = getelementptr inbounds i8, ptr %b, i64 %iv2
+; CHECK-NEXT: Against group ([[GRP9]]):
+; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP7]]:
+; CHECK-NEXT: (Low: %a High: ((4 * %n) + %a))
+; CHECK-NEXT: Member: {%a,+,4}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP8]]:
+; CHECK-NEXT: (Low: %b High: (3 + %n + %b))
+; CHECK-NEXT: Member: {%b,+,1}<%loop>
+; CHECK-NEXT: Group [[GRP9]]:
+; CHECK-NEXT: (Low: ((4 * %offset) + %a) High: ((4 * %offset) + (4 * %n) + %a))
+; CHECK-NEXT: Member: {((4 * %offset) + %a),+,4}<%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -144,10 +193,34 @@ exit:
define void @dependency_check_and_runtime_checks_needed_gepb_not_inbounds_iv2_step_not_constant(ptr %a, ptr %b, i64 %offset, i64 %n, i64 %s) {
; CHECK-LABEL: 'dependency_check_and_runtime_checks_needed_gepb_not_inbounds_iv2_step_not_constant'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: cannot check memory dependencies at runtime
+; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Check 0:
+; CHECK-NEXT: Comparing group ([[GRP10:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
+; CHECK-NEXT: Against group ([[GRP11:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.b = getelementptr inbounds i8, ptr %b, i64 %iv2
+; CHECK-NEXT: Check 1:
+; CHECK-NEXT: Comparing group ([[GRP10]]):
+; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
+; CHECK-NEXT: Against group ([[GRP12:0x[0-9a-f]+]]):
+; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
+; CHECK-NEXT: Check 2:
+; CHECK-NEXT: Comparing group ([[GRP11]]):
+; CHECK-NEXT: %gep.b = getelementptr inbounds i8, ptr %b, i64 %iv2
+; CHECK-NEXT: Against group ([[GRP12]]):
+; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
; CHECK-NEXT: Grouped accesses:
+; CHECK-NEXT: Group [[GRP10]]:
+; CHECK-NEXT: (Low: %a High: ((4 * %n) + %a))
+; CHECK-NEXT: Member: {%a,+,4}<nuw><%loop>
+; CHECK-NEXT: Group [[GRP11]]:
+; CHECK-NEXT: (Low: %b High: (3 + %n + %b))
+; CHECK-NEXT: Member: {%b,+,1}<%loop>
+; CHECK-NEXT: Group [[GRP12]]:
+; CHECK-NEXT: (Low: ((4 * %offset) + %a) High: ((4 * %offset) + (4 * %n) + %a))
+; CHECK-NEXT: Member: {((4 * %offset) + %a),+,4}<%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
@@ -189,28 +262,28 @@ define void @dependency_check_and_runtime_checks_needed_gepb_may_wrap(ptr %a, pt
; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Check 0:
-; CHECK-NEXT: Comparing group ([[GRP4:0x[0-9a-f]+]]):
+; CHECK-NEXT: Comparing group ([[GRP13:0x[0-9a-f]+]]):
; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
-; CHECK-NEXT: Against group ([[GRP5:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP14:0x[0-9a-f]+]]):
; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
; CHECK-NEXT: Check 1:
-; CHECK-NEXT: Comparing group ([[GRP4]]):
+; CHECK-NEXT: Comparing group ([[GRP13]]):
; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
-; CHECK-NEXT: Against group ([[GRP6:0x[0-9a-f]+]]):
+; CHECK-NEXT: Against group ([[GRP15:0x[0-9a-f]+]]):
; CHECK-NEXT: %gep.b = getelementptr float, ptr %b, i64 %iv2
; CHECK-NEXT: Check 2:
-; CHECK-NEXT: Comparing group ([[GRP5]]):
+; CHECK-NEXT: Comparing group ([[GRP14]]):
; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
-; CHECK-NEXT: Against group ([[GRP6]]):
+; CHECK-NEXT: Against group ([[GRP15]]):
; CHECK-NEXT: %gep.b = getelementptr float, ptr %b, i64 %iv2
; CHECK-NEXT: Grouped accesses:
-; CHECK-NEXT: Group [[GRP4]]:
+; CHECK-NEXT: Group [[GRP13]]:
; CHECK-NEXT: (Low: %a High: ((4 * %n) + %a))
; CHECK-NEXT: Member: {%a,+,4}<nuw><%loop>
-; CHECK-NEXT: Group [[GRP5]]:
+; CHECK-NEXT: Group [[GRP14]]:
; CHECK-NEXT: (Low: ((4 * %offset) + %a) High: ((4 * %offset) + (4 * %n) + %a))
; CHECK-NEXT: Member: {((4 * %offset) + %a),+,4}<%loop>
-; CHECK-NEXT: Group [[GRP6]]:
+; CHECK-NEXT: Group [[GRP15]]:
; CHECK-NEXT: (Low: %b High: (-4 + (8 * %n) + %b))
; CHECK-NEXT: Member: {%b,+,8}<%loop>
; CHECK-EMPTY:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you split the NFC part out and land it? It's really hard to see what the functional change here actually is.
// location will be larger than half the pointer index type space. In that | ||
// case, the GEP would be poison and any memory access dependent on it would | ||
// be immediate UB when executed. | ||
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isNoWrapAddRec has this whole dance for analyzing nusw. It seems very odd to have the unchecked form right after a call to that routine?
(Though, now I see that in the current code too?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this just moves the existing code. I could look into revisiting/consolidating this with isNoWrapAddRec
separately?
@@ -813,16 +813,102 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE, Value *Ptr, | |||
return AR->isAffine(); | |||
} | |||
|
|||
/// Try to compute the stride for \p AR. Used by getPtrStride. | |||
static std::optional<int64_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you recommit this extraction and rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 424fcc5, thanks
// Huge step value - give up. | ||
if (APStepVal.getBitWidth() > 64) | ||
return std::nullopt; | ||
|
||
int64_t StepVal = APStepVal.getSExtValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trySExtValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For now code just moved as NFC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good re-org and I was going to do it myself, and my opinion is that there is no need to extract an NFC from it, since the whole benefit comes from just the re-org.
if (Stride) { | ||
unsigned AddrSpace = Ptr->getType()->getPointerAddressSpace(); | ||
if (!NullPointerIsDefined(L->getHeader()->getParent(), AddrSpace) && | ||
(*Stride == 1 || *Stride == -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to dereference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
// If the null pointer is undefined, then a access sequence which would | ||
// otherwise access it can be assumed not to unsigned wrap. Note that this | ||
// assumes the object in memory is aligned to the natural alignment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sink comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
if (Assume && !AR) | ||
AR = PSE.getAsAddRec(Ptr); | ||
if (!AR) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can nest the Assume check within the !AR check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
// be immediate UB when executed. | ||
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr); | ||
GEP && GEP->hasNoUnsignedSignedWrap()) | ||
auto Stride = getStrideFromAddRec(AR, Lp, AccessTy, Ptr, PSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/auto/actual type/ since it's not clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's even mandatory according to the LLVM coding policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use the full type spelled out in 424fcc5, thanks!
if (!Stride) | ||
Stride = getStrideFromAddRec(AR, L, AccessTy, Ptr, PSE); | ||
if (Stride) { | ||
unsigned AddrSpace = Ptr->getType()->getPointerAddressSpace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Ptr->getType()/AccessTy/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we need the pointer type (which has the accessed address-space) vs AccessTy
which is the element type that's accessed
TypeSize AllocSize = DL.getTypeAllocSize(AccessTy); | ||
int64_t Size = AllocSize.getFixedValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just change the type of AllocSize to int64_t, and it will automatically get the fixed value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind making the improvements as a separate NFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively, isNoWrap
now duplicates code from getPtrStride
. Can that functionalty somehow refactored into a common function?
// be immediate UB when executed. | ||
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr); | ||
GEP && GEP->hasNoUnsignedSignedWrap()) | ||
auto Stride = getStrideFromAddRec(AR, Lp, AccessTy, Ptr, PSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's even mandatory according to the LLVM coding policy.
TypeSize AllocSize = DL.getTypeAllocSize(AccessTy); | ||
int64_t Size = AllocSize.getFixedValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactors to code to expose the core logic from getPtrStride to compute the stride for a given AddRec. Split off from #126971 as suggested.
Reorganize the code in isNoWrap to perform the no-wrap checks without relying on getPtrStride directly. getPtrStride now uses isNoWrap. The new structure allows deriving no-wrap in more cases in LAA, because there are some cases where getPtrStride bails out early because it cannot return a constant stride, but we can still prove no-wrap for the pointer. An example are AddRecs with non-ConstantInt strides with inbound GEPs, in the improved test cases. This enables vectorization with runtime checks in a few more cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look!
Effectively, isNoWrap now duplicates code from getPtrStride. Can that functionalty somehow refactored into a common function?
yes at the moment it duplicates a bit of code, mostly initial construction of the AddRec. At the moment, I am not sure if there is a good way to refactor that without introducing additional indirections that make it more difficult to see what's going on vs some small duplication.
Some duplication I am planning to remove as a next step, by passing a scene directly to isNoWrap
, as there are some cases where it is currently not used because no pointer is available. Will share a patch once I managed to create a test case.
@@ -813,16 +813,102 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE, Value *Ptr, | |||
return AR->isAffine(); | |||
} | |||
|
|||
/// Try to compute the stride for \p AR. Used by getPtrStride. | |||
static std::optional<int64_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 424fcc5, thanks
TypeSize AllocSize = DL.getTypeAllocSize(AccessTy); | ||
int64_t Size = AllocSize.getFixedValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Huge step value - give up. | ||
if (APStepVal.getBitWidth() > 64) | ||
return std::nullopt; | ||
|
||
int64_t StepVal = APStepVal.getSExtValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For now code just moved as NFC)
if (Assume && !AR) | ||
AR = PSE.getAsAddRec(Ptr); | ||
if (!AR) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
// location will be larger than half the pointer index type space. In that | ||
// case, the GEP would be poison and any memory access dependent on it would | ||
// be immediate UB when executed. | ||
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this just moves the existing code. I could look into revisiting/consolidating this with isNoWrapAddRec
separately?
// If the null pointer is undefined, then a access sequence which would | ||
// otherwise access it can be assumed not to unsigned wrap. Note that this | ||
// assumes the object in memory is aligned to the natural alignment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
if (!Stride) | ||
Stride = getStrideFromAddRec(AR, L, AccessTy, Ptr, PSE); | ||
if (Stride) { | ||
unsigned AddrSpace = Ptr->getType()->getPointerAddressSpace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we need the pointer type (which has the accessed address-space) vs AccessTy
which is the element type that's accessed
if (Stride) { | ||
unsigned AddrSpace = Ptr->getType()->getPointerAddressSpace(); | ||
if (!NullPointerIsDefined(L->getHeader()->getParent(), AddrSpace) && | ||
(*Stride == 1 || *Stride == -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
// be immediate UB when executed. | ||
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr); | ||
GEP && GEP->hasNoUnsignedSignedWrap()) | ||
auto Stride = getStrideFromAddRec(AR, Lp, AccessTy, Ptr, PSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use the full type spelled out in 424fcc5, thanks!
559eb47
to
4e40088
Compare
…se (NFC). Refactors to code to expose the core logic from getPtrStride to compute the stride for a given AddRec. Split off from llvm/llvm-project#126971 as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for starting to clean up that spaghetti-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…ide. (#126971) Reorganize the code in isNoWrap to perform the no-wrap checks without relying on getPtrStride directly. getPtrStride now uses isNoWrap. The new structure allows deriving no-wrap in more cases in LAA, because there are some cases where getPtrStride bails out early because it cannot return a constant stride, but we can still prove no-wrap for the pointer. An example are AddRecs with non-ConstantInt strides with inbound GEPs, in the improved test cases. This enables vectorization with runtime checks in a few more cases. PR: llvm/llvm-project#126971
Refactors to code to expose the core logic from getPtrStride to compute the stride for a given AddRec. Split off from llvm#126971 as suggested.
…126971) Reorganize the code in isNoWrap to perform the no-wrap checks without relying on getPtrStride directly. getPtrStride now uses isNoWrap. The new structure allows deriving no-wrap in more cases in LAA, because there are some cases where getPtrStride bails out early because it cannot return a constant stride, but we can still prove no-wrap for the pointer. An example are AddRecs with non-ConstantInt strides with inbound GEPs, in the improved test cases. This enables vectorization with runtime checks in a few more cases. PR: llvm#126971
Refactors to code to expose the core logic from getPtrStride to compute the stride for a given AddRec. Split off from llvm#126971 as suggested.
…126971) Reorganize the code in isNoWrap to perform the no-wrap checks without relying on getPtrStride directly. getPtrStride now uses isNoWrap. The new structure allows deriving no-wrap in more cases in LAA, because there are some cases where getPtrStride bails out early because it cannot return a constant stride, but we can still prove no-wrap for the pointer. An example are AddRecs with non-ConstantInt strides with inbound GEPs, in the improved test cases. This enables vectorization with runtime checks in a few more cases. PR: llvm#126971
Reorganize the code in isNoWrap to perform the no-wrap checks without relying on getPtrStride directly. getPtrStride now uses isNoWrap.
The new structure allows deriving no-wrap in more cases in LAA, because there are some cases where getPtrStride bails out early because it cannot return a constant stride, but we can still prove no-wrap for the pointer.
An example are AddRecs with non-ConstantInt strides with inbound GEPs, in the improved test cases.
This enables vectorization with runtime checks in a few more cases.