-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[AccessEnforcementOpts] Add mergeAccesses analysis + optimization #18560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8092b2d
to
13425e0
Compare
I'm starting to be able to reason about the code now. I'll just point out the things that are making it hard for me to understand, and some things that are just unconventional for Swift/LLVM passes. Then you and @eeckstein can decide when to merge the PR.
What you called
This DenseMap's value has an extra indirection that requires dynamic allocation. I haven't seen anything like this in Swift/LLVM. It's pretty well accepted that new/delete are a recipe for bugs. I'm not sure what problem is being solved by dynamic allocation, so I can't suggest an alternative.
BlockRegionInfo needs to be allocated for all regions in the program. I don't expect persistent per-region info to hold redundant state that is not specific to a region. Here's why this is unexpected to me:
[Note: in my earlier implementation I had context information inside a local data flow result. There was never more than one instance of that result and it had purely local lifetime.] If would be fine to pass in Result to methods that use it.
This name throws me off. How about
It's probably fine to use this bitset, but here's something to consider first... In my experience, allocating and clearing a bitset for every node in the CFG can become a performance bottleneck for very large functions (lots of nodes and large bitset count). I always try to make the per-CFG-node data proportional to the amount of useful information at that program point, rather than proportional to the size of the function. For example, it would be fine to have a (very small) bitset indexed by storage location rather than access ID, or a dense set of the most recent access for each storage. [Note: In my earlier implementation I did the bitset + vector thing, but there was only one local instance of that set]. (continued...)
Continuing the previous comment. Doing an O(N) set removal defeats the purpose of using a bitset. If the vector is unordered, then you would naturally erase by moving the last element into the hole. However, the conventional way to get O(1) membership and ordered iteration is with MapVector. That has a high constant overhead per element in the set, but has the right asymptotic behavior. To avoid spending time micro-optimizing, just use a SmallMapVector. Then the previous comment about
All this sort of stuff goes away with SmallMapVector.
Comment: propagate access summaries bottom-up from nested loops.
This is insufficient for detecting conflicts, which may not have an identified location. See FunctionAccessedStorage.
I think mergePredAccesses should be called right here since it has nothing to do with being a block or a loop.
This is obviously baffling. Is it a loop or a block? Actually it doesn't matter...
This comment does not tell me enough about what's happening. Why/when do we need this check?
Please clarify. It looks like is to handle irreducable control flow within the current loop. |
13425e0
to
15d2805
Compare
Thanks for the input @atrick ! I updated the code based on it + review from @eeckstein |
@swift-ci Please test |
15d2805
to
84a30d7
Compare
@swift-ci Please clean test |
public: | ||
using AccessMap = llvm::SmallDenseMap<BeginAccessInst *, AccessInfo, 32>; | ||
using AccessedStorageSet = llvm::SmallDenseSet<AccessedStorage, 8>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see any code to keep track of or handle unidentified accesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! forgot to include that from the last re-write. fixed.
b46fb01
to
24d502e
Compare
@swift-ci Please clean test and merge |
The linux swift tests pass but it seems there's an unrelated problem on the bots later on:
|
@swift-ci Please smoke test Linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization pass looks good, just a few minor comments.
What is missing here are SIL tests which test all the parts of the algorithm (at least I didn't find them), e.g.
- summary propagation up the region tree
- data flow within a region
- all the bailing conditions, e.g. in mergePredAccesses
- the oldToNewMap mechanism in mergeAccesses
using RegionIDToLocalInfoMap = llvm::DenseMap<unsigned, RegionInfo>; | ||
// A map of instruction pairs we can merge from dominating instruction to | ||
// dominated | ||
using MergeableMap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be named 'Map'
@@ -173,6 +243,9 @@ class AccessConflictAnalysis { | |||
/// the accesses, then AccessInfo::getAccessIndex() can be used. | |||
AccessMap accessMap; | |||
|
|||
/// A map of instruction pairs we can merge the scope of | |||
MergeableMap mergeMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here and please also change the comment
static void recordConflict(AccessInfo &info, SparseAccessSet &accessSet) { | ||
info.setSeenNestedConflict(); | ||
accessSet.setConflict(info.getAccessIndex()); | ||
// Returns a mapping from each loop sub-region to all its access storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment here that this function 'propagates access summaries bottom-up from nested regions'.
Also consider renaming this function to something like 'propagateAccessSetsBottomUp'
|
||
// make a temporary reverse copy to work on: | ||
// It is in reverse order just to make it easier to debug / follow | ||
AccessConflictAndMergeAnalysis::MergeableMap workMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is not a map
AccessConflictAndMergeAnalysis::MergeableMap workMap; | ||
workMap.append(mergeMap.rbegin(), mergeMap.rend()); | ||
|
||
// Assume we have two pairs in map (1,2) , (2,3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, the result is not a map.
Actually the comment is a bit cryptic. How about:
"Assume the result contains two access pairs to be merged:
(begin_access %1, begin_access %2) // = merge end_access %1 with begin_access %2
(begin_access %2, begin_access %3) // = merge end_access %2 with begin_access %3
After merging the first pair, begin_access %2 is removed, so the second pair in the result list points to a to-be-deleted begin_access instruction. We store (begin_access %2 -> begin_access %1) to re-map a merged begin_access to it's replaced instruction."
@eeckstein I did the tests as Swift files - handling all the different conditions in the algorithm + interaction with other passes such as LICM. but I can write some SIL tests. |
The swift tests are great for testing if the optimization works. But they do not cover the negative cases and corner cases, e.g. irregular loops, etc. |
24d502e
to
40e0803
Compare
@eeckstein I added SIL tests to all the corner-cases + negative tests. |
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
40e0803
to
0a70a4a
Compare
0a70a4a
to
12adde2
Compare
@swift-ci Please test and merge |
same unrelated linux issue - something is really wrong on the bots:
|
@swift-ci Please smoke test Linux |
We have the following code:
If modify access fails then it is inclusive of the read access.
We can merge the two to a bigger “modify” block and reduce the amount of runtime overhead / calls.
We can also do the same for two merge accesses inside the same region, creating one larger scope access, as long as nothing conflicts in the larger scope / prevents us from doing so.
Now consider what happens when we couple what's described above with LICM: we can hoist the newly-merged large-scope access outside of loops! (see attached test case)
I have benchmarked this optimization on
RangeIteration
from the benchmark suite: it improves performance by over 7X !!!radar rdar://problem/40376124
replaces PR #18438