-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LAA] Check accesses don't overlap early to determine NoDep #92307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If we have an invariant access which is strictly before the other access, the accesses are guaranteed to not access any overlapping memory and there is no dependence If either Src or Sink are loop invariant, check if Src + AccessSize <= Sink (or vice versa). This is done by checking (Sink - (Src + AccessSize)) s>= 0.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) ChangesIf we have an invariant access which is strictly before the other access, the accesses are guaranteed to not access any overlapping memory and there is no dependence If either Src or Sink are loop invariant, check if Src + AccessSize <= Sink (or vice versa). This is done by checking (Sink - (Src + AccessSize)) s>= 0. Full diff: https://github.com/llvm/llvm-project/pull/92307.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 4ba2e15222106..9e3b15db76b88 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1982,6 +1982,30 @@ getDependenceDistanceStrideAndSize(
InnermostLoop))
return MemoryDepChecker::Dependence::IndirectUnsafe;
+ uint64_t TypeByteSizeA = ATy->isScalableTy() ? 0 : DL.getTypeAllocSize(ATy);
+ uint64_t TypeByteSizeB = BTy->isScalableTy() ? 0 : DL.getTypeAllocSize(BTy);
+ // If either Src or Sink are loop invariant, check if Src + AccessSize <= Sink
+ // (or vice versa). This is done by checking (Sink - (Src + AccessSize)) s>=
+ // 0.
+ if (SE.isLoopInvariant(Src, InnermostLoop) && TypeByteSizeA > 0) {
+ const SCEV *SrcLastAccess =
+ SE.getAddExpr(Src, SE.getConstant(Src->getType(), TypeByteSizeA));
+ const SCEV *D = SE.getMinusSCEV(Sink, SrcLastAccess);
+ if (!isa<SCEVCouldNotCompute>(D) &&
+ SE.isKnownPredicate(CmpInst::ICMP_SGE, D,
+ SE.getConstant(Src->getType(), 0)))
+ return MemoryDepChecker::Dependence::NoDep;
+ }
+ if (SE.isLoopInvariant(Sink, InnermostLoop) && TypeByteSizeB > 0) {
+ const SCEV *SinkLastAccess =
+ SE.getAddExpr(Sink, SE.getConstant(Src->getType(), TypeByteSizeB));
+ const SCEV *D = SE.getMinusSCEV(Src, SinkLastAccess);
+ if (!isa<SCEVCouldNotCompute>(D) &&
+ SE.isKnownPredicate(CmpInst::ICMP_SGE, D,
+ SE.getConstant(Src->getType(), 0)))
+ return MemoryDepChecker::Dependence::NoDep;
+ }
+
// Need accesses with constant strides and the same direction. We don't want
// to vectorize "A[B[i]] += ..." and similar code or pointer arithmetic that
// could wrap in the address space.
@@ -1991,13 +2015,12 @@ getDependenceDistanceStrideAndSize(
return MemoryDepChecker::Dependence::Unknown;
}
- uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);
bool HasSameSize =
DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
if (!HasSameSize)
- TypeByteSize = 0;
+ TypeByteSizeA = 0;
return DepDistanceStrideAndSizeInfo(Dist, std::abs(StrideAPtr),
- std::abs(StrideBPtr), TypeByteSize,
+ std::abs(StrideBPtr), TypeByteSizeA,
AIsWrite, BIsWrite);
}
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll b/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll
index 2a210a5a445b9..d66d3528586a8 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll
@@ -4,13 +4,8 @@
define void @test_invar_dependence_before_positive_strided_access_1(ptr %a) {
; CHECK-LABEL: 'test_invar_dependence_before_positive_strided_access_1'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT: store i32 %l, ptr %gep, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Grouped accesses:
; CHECK-EMPTY:
@@ -39,13 +34,8 @@ exit:
define void @test_invar_dependence_before_positive_strided_access_2(ptr %a) {
; CHECK-LABEL: 'test_invar_dependence_before_positive_strided_access_2'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT: store i32 %l, ptr %a, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Grouped accesses:
; CHECK-EMPTY:
@@ -144,13 +134,8 @@ exit:
define void @test_invar_dependence_before_positive_strided_access_1_different_access_sizes(ptr %a) {
; CHECK-LABEL: 'test_invar_dependence_before_positive_strided_access_1_different_access_sizes'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT: store i8 %t, ptr %gep, align 1
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Grouped accesses:
; CHECK-EMPTY:
@@ -216,13 +201,8 @@ exit:
define void @test_invar_dependence_before_negative_strided_access_1(ptr %a) {
; CHECK-LABEL: 'test_invar_dependence_before_negative_strided_access_1'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT: store i32 %l, ptr %gep, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Grouped accesses:
; CHECK-EMPTY:
@@ -251,13 +231,8 @@ exit:
define void @test_invar_dependence_before_negative_strided_access_2(ptr %a) {
; CHECK-LABEL: 'test_invar_dependence_before_negative_strided_access_2'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT: store i32 %l, ptr %a, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Grouped accesses:
; CHECK-EMPTY:
@@ -357,13 +332,8 @@ exit:
define void @test_both_invar_before_1(ptr %a) {
; CHECK-LABEL: 'test_both_invar_before_1'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT: store i32 %l, ptr %gep.off, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Grouped accesses:
; CHECK-EMPTY:
@@ -391,13 +361,8 @@ exit:
define void @test_both_invar_before_2(ptr %a) {
; CHECK-LABEL: 'test_both_invar_before_2'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Memory dependences are safe
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Unknown:
-; CHECK-NEXT: %l = load i32, ptr %gep.off, align 4 ->
-; CHECK-NEXT: store i32 %l, ptr %a, align 4
-; CHECK-EMPTY:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Grouped accesses:
; CHECK-EMPTY:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bool HasSameSize = | ||
DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference now with the handling of scalable types. getTypeStoreSizeInBits
/getTypeAllocSize
return the minimum size. Now TypeByteSize
is 0 if ATy
is a scalable type (and happens to have the same minimum size as the TypeStoreSize
of BTy
), but not if the other way around with BTy
.
I think it's safer to pass 0
for such cases (minimum size is probably not safe for a scalable type, why wasn't the maximum size used?), so maybe do the same for BTy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've updated the code to only query DL.getTypeAllocSize
if non of the accessed types is scalable.
At the moment, those sizes will only be used by the newly added code; getPtrStride
will return 0 for scalable types, so we will for now always return an unknown dependence due to the stride checks above.
✅ With the latest revision this PR passed the C/C++ code formatter. |
Code as you have it looks correct, but a couple questions:
|
This allows use at other places, in particular an updated version of #92307.
Originally I thought this would be more difficult, as isKnownPredicate wasn't able to prove (End <= Sink if Sink was an AddRec). But it turns out that if we compute start and end pointers via So I went ahead and updated the PR to use that approach, which overall is slightly simpler and catches more cases. WDYT @preames @Meinersbur |
LGTM. Thanks for generalizing. The result is much cleaner and easier to follow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
const SCEV *Ex = PSE.getBackedgeTakenCount(); | ||
|
||
ScStart = AR->getStart(); | ||
ScEnd = AR->evaluateAtIteration(Ex, *SE); | ||
const SCEV *Step = AR->getStepRecurrence(*SE); | ||
|
||
// For expressions with negative step, the upper bound is ScStart and the | ||
// For expressions with negative step, the& upper bound is ScStart and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped, thanks!
It looks like this causes large compile-time regressions in some cases: http://llvm-compile-time-tracker.com/compare.php?from=7cee61c82f7fe5a563c2781e855c13238c44931f&to=1b377dbeb73ef3abec246fe11fc98ec699625c0c&stat=instructions%3Au For example, libclamav_phishcheck.c regresses by more than 20%. |
oh wow that's definitely unexpected. Let me take a look |
This reverts commit b89010a. Revert "[LAA] Check accesses don't overlap early to determine NoDep (llvm#92307)" This reverts commit 1b377db.
Currently trying to evaluate 2 possible solutions to reduce compile time, but it looks like http://llvm-compile-time-tracker.com/index.php is down at the moment (returning |
@fhahn I've restarted the server, should work again now. |
@fhahn Any news on this? I'd suggest reverting the change for now -- with the amount of commits in between it will be hard to determine whether the regression was fully addressed. |
compile-time should be restored by limiting to loop-invariant source/sinks for now (234cc40). There's only a very small change with the original commit reverted on top of that one (0.01% change). WDYT? I also have a few patches to improve compile-time for the general version which I'll be working on as follow-ups |
That looks good to me, thanks! |
Use getStartAndEndForAccess to compute the start and end of both src
and sink (factored out to helper in bce3680). If they do not
overlap (i.e. SrcEnd <= SinkStart || SinkEnd <= SrcStart), there is no
dependence, regardless of stride.