-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Polly] Data flow reduction detection to cover more cases #84901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The base concept is same as existing reduction algorithm where we get the list of candidate pairs <store,load>. But the existing algorithm works only if there is single binary operation between the load and store. Example sum += a[i]; This algorithm extends to work with more than single binary operation as well. It is implemented using data flow reduction detection on basic block level. We propagate the loads, the number of times the load is used(flows into instruction) and binary operation performed until we reach a store. Example sum += a[i] + b[i]; sum(Ld) a[i](Ld) \ + / tmp b[i](Ld) \+/ sum(St) In the above case the candidate pairs are formed by associating sum with all of its load inputs which are sum, a[i] and b[i]. Then check functions are used to filter a valid reduction pair ie {sum,sum}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch. I really appreciate the good code comments.
Could you add a test that checks two independent reductions in the same loop, e.g.:
for (int i = 0; i < n; ++i) {
*sum += A[i];
*prod += B[i];
}
ScopBuilder::buildEqivClassBlockStmts
should create two ScopStmts for the same BasicBlock here.
The base concept is same as existing reduction algorithm where we get the list of candidate pairs <store,load>. But the existing algorithm works only if there is single binary operation between the load and store. Example sum += a[i]; This algorithm extends to work with more than single binary operation as well. It is implemented using data flow reduction detection on basic block level. We propagate the loads, the number of times the load is used(flows into instruction) and binary operation performed until we reach a store. Example sum += a[i] + b[i]; sum(Ld) a[i](Ld) \ + / tmp b[i](Ld) \+/ sum(St) In the above case the candidate pairs are formed by associating sum with all of its load inputs which are sum, a[i] and b[i]. Then check functions are used to filter a valid reduction pair ie {sum,sum}.
Ping @Meinersbur @efriedma-quic |
Ping @Meinersbur |
1 similar comment
Ping @Meinersbur |
@@ -0,0 +1,39 @@ | |||
; RUN: opt %loadPolly -basic-aa -polly-print-dependences -polly-allow-nonaffine -disable-output < %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-passes=print<polly-dependences>
(see #92918). I think legacy-pm -polly-print-scops wasn't ported yet, so it should be okay to continue using it for now.
Otherwise, patch looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't forget to update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out Eli. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please excuse the delay.
polly/lib/Analysis/ScopBuilder.cpp
Outdated
@@ -2557,7 +2568,6 @@ bool checkCandidatePairAccesses(MemoryAccess *LoadMA, MemoryAccess *StoreMA, | |||
.intersect_domain(isl::manage(Domain.copy())); | |||
isl::set RS = R.range(); | |||
isl::set WS = W.range(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to avoid unrelated changes.
Co-authored-by: Michael Kruse <[email protected]>
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks @Meinersbur and @efriedma-quic for your time to review and approvals. |
The base concept is same as existing reduction algorithm where we get the list of candidate pairs <store,load>. But the existing algorithm works only if there is single binary operation between the load and store.
Example sum += a[i];
This algorithm extends to work with more than single binary operation as well. It is implemented using data flow reduction detection on basic block level. We propagate the loads, the number of times the load is used(flows into instruction) and binary operation performed until we reach a store.
Example sum += a[i] + b[i];
In the above case the candidate pairs are formed by associating sum with all of its load inputs which are sum, a[i] and b[i]. Then check functions are used to filter a valid reduction pair ie {sum,sum}.