Skip to content

[LAA] Check accesses don't overlap early to determine NoDep #92307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 21, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 15, 2024

Use getStartAndEndForAccess to compute the start and end of both src
and sink (factored out to helper in bce3680). If they do not
overlap (i.e. SrcEnd <= SinkStart || SinkEnd <= SrcStart), there is no
dependence, regardless of stride.

If we have an invariant access which is strictly before the other
access, the accesses are guaranteed to not access any overlapping memory
and there is no dependence

If either Src or Sink are loop invariant, check if Src + AccessSize <= Sink
(or vice versa). This is done by checking (Sink - (Src + AccessSize)) s>= 0.
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label May 15, 2024
@llvmbot
Copy link
Member

llvmbot commented May 15, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

If we have an invariant access which is strictly before the other access, the accesses are guaranteed to not access any overlapping memory and there is no dependence

If either Src or Sink are loop invariant, check if Src + AccessSize <= Sink (or vice versa). This is done by checking (Sink - (Src + AccessSize)) s>= 0.


Full diff: https://github.com/llvm/llvm-project/pull/92307.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+26-3)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll (+7-42)
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 4ba2e15222106..9e3b15db76b88 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1982,6 +1982,30 @@ getDependenceDistanceStrideAndSize(
                                    InnermostLoop))
     return MemoryDepChecker::Dependence::IndirectUnsafe;
 
+  uint64_t TypeByteSizeA = ATy->isScalableTy() ? 0 : DL.getTypeAllocSize(ATy);
+  uint64_t TypeByteSizeB = BTy->isScalableTy() ? 0 : DL.getTypeAllocSize(BTy);
+  // If either Src or Sink are loop invariant, check if Src + AccessSize <= Sink
+  // (or vice versa). This is done by checking (Sink - (Src + AccessSize)) s>=
+  // 0.
+  if (SE.isLoopInvariant(Src, InnermostLoop) && TypeByteSizeA > 0) {
+    const SCEV *SrcLastAccess =
+        SE.getAddExpr(Src, SE.getConstant(Src->getType(), TypeByteSizeA));
+    const SCEV *D = SE.getMinusSCEV(Sink, SrcLastAccess);
+    if (!isa<SCEVCouldNotCompute>(D) &&
+        SE.isKnownPredicate(CmpInst::ICMP_SGE, D,
+                            SE.getConstant(Src->getType(), 0)))
+      return MemoryDepChecker::Dependence::NoDep;
+  }
+  if (SE.isLoopInvariant(Sink, InnermostLoop) && TypeByteSizeB > 0) {
+    const SCEV *SinkLastAccess =
+        SE.getAddExpr(Sink, SE.getConstant(Src->getType(), TypeByteSizeB));
+    const SCEV *D = SE.getMinusSCEV(Src, SinkLastAccess);
+    if (!isa<SCEVCouldNotCompute>(D) &&
+        SE.isKnownPredicate(CmpInst::ICMP_SGE, D,
+                            SE.getConstant(Src->getType(), 0)))
+      return MemoryDepChecker::Dependence::NoDep;
+  }
+
   // Need accesses with constant strides and the same direction. We don't want
   // to vectorize "A[B[i]] += ..." and similar code or pointer arithmetic that
   // could wrap in the address space.
@@ -1991,13 +2015,12 @@ getDependenceDistanceStrideAndSize(
     return MemoryDepChecker::Dependence::Unknown;
   }
 
-  uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);
   bool HasSameSize =
       DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
   if (!HasSameSize)
-    TypeByteSize = 0;
+    TypeByteSizeA = 0;
   return DepDistanceStrideAndSizeInfo(Dist, std::abs(StrideAPtr),
-                                      std::abs(StrideBPtr), TypeByteSize,
+                                      std::abs(StrideBPtr), TypeByteSizeA,
                                       AIsWrite, BIsWrite);
 }
 
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll b/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll
index 2a210a5a445b9..d66d3528586a8 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/invariant-dependence-before.ll
@@ -4,13 +4,8 @@
 define void @test_invar_dependence_before_positive_strided_access_1(ptr %a) {
 ; CHECK-LABEL: 'test_invar_dependence_before_positive_strided_access_1'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT:            store i32 %l, ptr %gep, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:
@@ -39,13 +34,8 @@ exit:
 define void @test_invar_dependence_before_positive_strided_access_2(ptr %a) {
 ; CHECK-LABEL: 'test_invar_dependence_before_positive_strided_access_2'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT:            store i32 %l, ptr %a, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:
@@ -144,13 +134,8 @@ exit:
 define void @test_invar_dependence_before_positive_strided_access_1_different_access_sizes(ptr %a) {
 ; CHECK-LABEL: 'test_invar_dependence_before_positive_strided_access_1_different_access_sizes'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT:            store i8 %t, ptr %gep, align 1
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:
@@ -216,13 +201,8 @@ exit:
 define void @test_invar_dependence_before_negative_strided_access_1(ptr %a) {
 ; CHECK-LABEL: 'test_invar_dependence_before_negative_strided_access_1'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT:            store i32 %l, ptr %gep, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:
@@ -251,13 +231,8 @@ exit:
 define void @test_invar_dependence_before_negative_strided_access_2(ptr %a) {
 ; CHECK-LABEL: 'test_invar_dependence_before_negative_strided_access_2'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep, align 4 ->
-; CHECK-NEXT:            store i32 %l, ptr %a, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:
@@ -357,13 +332,8 @@ exit:
 define void @test_both_invar_before_1(ptr %a) {
 ; CHECK-LABEL: 'test_both_invar_before_1'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %a, align 4 ->
-; CHECK-NEXT:            store i32 %l, ptr %gep.off, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:
@@ -391,13 +361,8 @@ exit:
 define void @test_both_invar_before_2(ptr %a) {
 ; CHECK-LABEL: 'test_both_invar_before_2'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
-; CHECK-NEXT:  Unknown data dependence.
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
-; CHECK-NEXT:        Unknown:
-; CHECK-NEXT:            %l = load i32, ptr %gep.off, align 4 ->
-; CHECK-NEXT:            store i32 %l, ptr %a, align 4
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 2018 to 2019
bool HasSameSize =
DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference now with the handling of scalable types. getTypeStoreSizeInBits/getTypeAllocSize return the minimum size. Now TypeByteSize is 0 if ATy is a scalable type (and happens to have the same minimum size as the TypeStoreSize of BTy), but not if the other way around with BTy.

I think it's safer to pass 0 for such cases (minimum size is probably not safe for a scalable type, why wasn't the maximum size used?), so maybe do the same for BTy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated the code to only query DL.getTypeAllocSize if non of the accessed types is scalable.

At the moment, those sizes will only be used by the newly added code; getPtrStride will return 0 for scalable types, so we will for now always return an unknown dependence due to the stride checks above.

fhahn added a commit that referenced this pull request May 20, 2024
Copy link

github-actions bot commented May 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@preames
Copy link
Collaborator

preames commented May 20, 2024

Code as you have it looks correct, but a couple questions:

  • Why the focus on strictly before specifically? I'd expect strictly after to work as well. Given that, wouldn't phrasing it as an compile time evaluated overlap check of two regions make the most sense?
  • Unless I'm missing something, the code you added work be correct and precise for scalable types if you used TypeSize.

fhahn added a commit that referenced this pull request May 20, 2024
This allows use at other places, in particular an updated version of
#92307.
@fhahn fhahn changed the title [LAA] Check if invariant accesses is strictly before other access. [LAA] Check accesses don't overlap early to determine NoDep May 20, 2024
@fhahn
Copy link
Contributor Author

fhahn commented May 20, 2024

Code as you have it looks correct, but a couple questions:

  • Why the focus on strictly before specifically? I'd expect strictly after to work as well. Given that, wouldn't phrasing it as an compile time evaluated overlap check of two regions make the most sense?
  • Unless I'm missing something, the code you added work be correct and precise for scalable types if you used TypeSize.

Originally I thought this would be more difficult, as isKnownPredicate wasn't able to prove (End <= Sink if Sink was an AddRec). But it turns out that if we compute start and end pointers via getStartAndEndForAccess factored out in bce3680, isKnownPredicate works as expected.

So I went ahead and updated the PR to use that approach, which overall is slightly simpler and catches more cases. WDYT @preames @Meinersbur

@preames
Copy link
Collaborator

preames commented May 20, 2024

LGTM. Thanks for generalizing. The result is much cleaner and easier to follow.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

const SCEV *Ex = PSE.getBackedgeTakenCount();

ScStart = AR->getStart();
ScEnd = AR->evaluateAtIteration(Ex, *SE);
const SCEV *Step = AR->getStepRecurrence(*SE);

// For expressions with negative step, the upper bound is ScStart and the
// For expressions with negative step, the& upper bound is ScStart and the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped, thanks!

@fhahn fhahn merged commit 1b377db into llvm:main May 21, 2024
3 of 4 checks passed
@fhahn fhahn deleted the laa-invar-before branch May 21, 2024 10:00
@nikic
Copy link
Contributor

nikic commented May 21, 2024

It looks like this causes large compile-time regressions in some cases: http://llvm-compile-time-tracker.com/compare.php?from=7cee61c82f7fe5a563c2781e855c13238c44931f&to=1b377dbeb73ef3abec246fe11fc98ec699625c0c&stat=instructions%3Au

For example, libclamav_phishcheck.c regresses by more than 20%.

@fhahn
Copy link
Contributor Author

fhahn commented May 21, 2024

oh wow that's definitely unexpected. Let me take a look

fhahn added a commit to fhahn/llvm-project that referenced this pull request May 22, 2024
This reverts commit b89010a.

Revert "[LAA] Check accesses don't overlap early to determine NoDep (llvm#92307)"

This reverts commit 1b377db.
@fhahn
Copy link
Contributor Author

fhahn commented May 22, 2024

Currently trying to evaluate 2 possible solutions to reduce compile time, but it looks like http://llvm-compile-time-tracker.com/index.php is down at the moment (returning Error 503 Backend fetch failed)

@nikic
Copy link
Contributor

nikic commented May 22, 2024

@fhahn I've restarted the server, should work again now.

@nikic
Copy link
Contributor

nikic commented May 28, 2024

@fhahn Any news on this? I'd suggest reverting the change for now -- with the amount of commits in between it will be hard to determine whether the regression was fully addressed.

@fhahn
Copy link
Contributor Author

fhahn commented May 28, 2024

@fhahn Any news on this? I'd suggest reverting the change for now -- with the amount of commits in between it will be hard to determine whether the regression was fully addressed.

compile-time should be restored by limiting to loop-invariant source/sinks for now (234cc40). There's only a very small change with the original commit reverted on top of that one (0.01% change). WDYT?

I also have a few patches to improve compile-time for the general version which I'll be working on as follow-ups

@nikic
Copy link
Contributor

nikic commented May 28, 2024

That looks good to me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants