-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LAA] Use MaxStride instead of CommonStride to calculate MaxVF #98142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis Author: vaibhav (mrdaybird) ChangesWe bail out from safe MaxVF calculation if the strides are not the same. Instead, we are dependent on runtime checks, though not yet implemented. We could instead use the MaxStride. #define LEN 256 * 256
float a[LEN];
void gather() {
for (int i = 0; i < LEN - 1024 - 255; i++) {
#pragma clang loop interleave(disable)
#pragma clang loop unroll(disable)
for (int j = 0; j < 256; j++)
a[i + j + 1024] += a[j * 4 + i];
}
} I am not sure about the correctness, but intuitively it felt right. Full diff: https://github.com/llvm/llvm-project/pull/98142.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 018861a665c4c..3a984fafd44d3 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -2133,10 +2133,6 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
"different type sizes\n");
return Dependence::Unknown;
}
-
- if (!CommonStride)
- return Dependence::Unknown;
-
// Bail out early if passed-in parameters make vectorization not feasible.
unsigned ForcedFactor = (VectorizerParams::VectorizationFactor ?
VectorizerParams::VectorizationFactor : 1);
@@ -2176,7 +2172,7 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
// minimum for computations below, as this ensures we compute the closest
// possible dependence distance.
uint64_t MinDistanceNeeded =
- TypeByteSize * *CommonStride * (MinNumIter - 1) + TypeByteSize;
+ TypeByteSize * MaxStride * (MinNumIter - 1) + TypeByteSize;
if (MinDistanceNeeded > static_cast<uint64_t>(MinDistance)) {
if (!isa<SCEVConstant>(Dist)) {
// For non-constant distances, we checked the lower bound of the
@@ -2233,7 +2229,7 @@ MemoryDepChecker::Dependence::DepType MemoryDepChecker::isDependent(
// An update to MinDepDistBytes requires an update to MaxSafeVectorWidthInBits
// since there is a backwards dependency.
- uint64_t MaxVF = MinDepDistBytes / (TypeByteSize * *CommonStride);
+ uint64_t MaxVF = MinDepDistBytes / (TypeByteSize * MaxStride);
LLVM_DEBUG(dbgs() << "LAA: Positive min distance " << MinDistance
<< " with max VF = " << MaxVF << '\n');
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/different_strides.ll b/llvm/test/Analysis/LoopAccessAnalysis/different_strides.ll
new file mode 100644
index 0000000000000..fb3efc2768966
--- /dev/null
+++ b/llvm/test/Analysis/LoopAccessAnalysis/different_strides.ll
@@ -0,0 +1,80 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt --disable-output -mtriple=x86_64 --passes="print<access-info>" %s 2>&1 | FileCheck %s
+
+@a = dso_local local_unnamed_addr global [65536 x float] zeroinitializer, align 16
+
+; Equivalent C code for the test case:
+; #define LEN 256 * 256
+; float a[LEN];
+
+; void different_strides() {
+; for (int i = 0; i < LEN - 1024 - 255; i++) {
+; #pragma clang loop interleave(disable)
+; #pragma clang loop unroll(disable)
+; for (int j = 0; j < 256; j++)
+; a[i + j + 1024] += a[j * 4 + i];
+; }
+; }
+define dso_local void @different_strides() local_unnamed_addr {
+; CHECK-LABEL: 'different_strides'
+; CHECK-NEXT: for.body4:
+; CHECK-NEXT: Memory dependences are safe with a maximum safe vector width of 2048 bits
+; CHECK-NEXT: Dependences:
+; CHECK-NEXT: BackwardVectorizable:
+; CHECK-NEXT: %3 = load float, ptr %arrayidx, align 4 ->
+; CHECK-NEXT: store float %add9, ptr %arrayidx8, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT: Forward:
+; CHECK-NEXT: %5 = load float, ptr %arrayidx8, align 4 ->
+; CHECK-NEXT: store float %add9, ptr %arrayidx8, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Grouped accesses:
+; CHECK-EMPTY:
+; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT: SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT: Expressions re-written:
+; CHECK-NEXT: for.cond1.preheader:
+; CHECK-NEXT: Report: loop is not the innermost loop
+; CHECK-NEXT: Dependences:
+; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Grouped accesses:
+; CHECK-EMPTY:
+; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT: SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT: Expressions re-written:
+;
+entry:
+ br label %for.cond1.preheader
+
+for.cond1.preheader:
+ %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.cond.cleanup3 ]
+ %0 = add nuw nsw i64 %indvars.iv25, 1024
+ br label %for.body4
+
+for.cond.cleanup:
+ ret void
+
+for.cond.cleanup3:
+ %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
+ %exitcond29.not = icmp eq i64 %indvars.iv.next26, 64257
+ br i1 %exitcond29.not, label %for.cond.cleanup, label %for.cond1.preheader
+
+for.body4:
+ %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body4 ]
+ %1 = shl nuw nsw i64 %indvars.iv, 2
+ %2 = add nuw nsw i64 %1, %indvars.iv25
+ %arrayidx = getelementptr inbounds [65536 x float], ptr @a, i64 0, i64 %2
+ %3 = load float, ptr %arrayidx, align 4
+ %4 = add nuw nsw i64 %0, %indvars.iv
+ %arrayidx8 = getelementptr inbounds [65536 x float], ptr @a, i64 0, i64 %4
+ %5 = load float, ptr %arrayidx8, align 4
+ %add9 = fadd fast float %5, %3
+ store float %add9, ptr %arrayidx8, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, 256
+ br i1 %exitcond.not, label %for.cond.cleanup3, label %for.body4
+}
+
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting up the PR, this was an extension I had in mind when I generalized the logic. Could you also add a negative test case (different strides, but not safe)?
As for the title, I'd suggest to prefix it with |
After a bit of thought, I was wondering if we could do better by taking into account the sign of the strides. But I noticed that I was thinking along the lines that if both strides have the same sign, then only the Src's stride matters, otherwise we need to take the max. |
1c2eff2
to
d7668a7
Compare
@fhahn i have updated the test cases & comment. give me a heads up if there are any rookie mistakes in this pr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this completely dropped off my radar, just coming back to is.
LGTM, thanks, with a few more small suggestions inline.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/24539 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/18118 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/16064 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/17025 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/17168 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/18564 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/26754 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/25260 Here is the relevant piece of the build log for the reference
|
…98142) We bail out from MaxVF calculation if the strides are not the same. Instead, we are dependent on runtime checks, though not yet implemented. We could instead use the MaxStride to conservatively use an upper bound. This handles cases like the following: ```c #define LEN 256 * 256 float a[LEN]; void gather() { for (int i = 0; i < LEN - 1024 - 255; i++) { #pragma clang loop interleave(disable) #pragma clang loop unroll(disable) for (int j = 0; j < 256; j++) a[i + j + 1024] += a[j * 4 + i]; } } ``` --------- Co-authored-by: Florian Hahn <[email protected]>
We bail out from MaxVF calculation if the strides are not the same. Instead, we are dependent on runtime checks, though not yet implemented. We could instead use the MaxStride to conservatively use an upper bound.
This handles cases like the following: