Skip to content

[AccessEnforcementOpts] Add mergeAccesses analysis + optimization #18560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 10, 2018

Conversation

shajrawi
Copy link

@shajrawi shajrawi commented Aug 8, 2018

We have the following code:

  %30 = begin_access [read] [dynamic] [no_nested_conflict] %1 : $*UInt64 // users: %31, %32
  %31 = load %30 : $*UInt64                       // user: %34
  end_access %30 : $*UInt64                       // id: %32
  %33 = struct $UInt64 (%27 : $Builtin.Int64)     // user: %34
  %34 = apply %14(%31, %33) : $@convention(thin) (UInt64, UInt64) -> UInt64 // user: %36
  %35 = begin_access [modify] [dynamic] [no_nested_conflict] %1 : $*UInt64 // users: %36, %37
  store %34 to %35 : $*UInt64                     // id: %36
  end_access %35 : $*UInt64                       // id: %37

If modify access fails then it is inclusive of the read access.

We can merge the two to a bigger “modify” block and reduce the amount of runtime overhead / calls.

We can also do the same for two merge accesses inside the same region, creating one larger scope access, as long as nothing conflicts in the larger scope / prevents us from doing so.

Now consider what happens when we couple what's described above with LICM: we can hoist the newly-merged large-scope access outside of loops! (see attached test case)

I have benchmarked this optimization on RangeIteration from the benchmark suite: it improves performance by over 7X !!!

radar rdar://problem/40376124

replaces PR #18438

@shajrawi shajrawi requested a review from atrick August 8, 2018 01:04
@shajrawi shajrawi force-pushed the merge_accesses branch 2 times, most recently from 8092b2d to 13425e0 Compare August 8, 2018 04:08
@atrick
Copy link
Contributor

atrick commented Aug 8, 2018

I'm starting to be able to reason about the code now. I'll just point out the things that are making it hard for me to understand, and some things that are just unconventional for Swift/LLVM passes. Then you and @eeckstein can decide when to merge the PR.

using RegionIDToLocalInfoMap = llvm::DenseMap<unsigned, BlockRegionInfo *>;

What you called BlockRegionInfo is actually dataflow state for each node (or region) in the control flow subgraph provided by LoopRegionFunctionInfo. The data flow state needs to be maintained for each node in that graph independent of whether the region is a block or loop. In fact, nothing about the data flow state is specific to blocks or loops. The only part of the code that should know anything about blocks or loops is the transfer function for the region. So, why is it called "BlockRegionInfo"?

using RegionIDToLocalInfoMap = llvm::DenseMap<unsigned, BlockRegionInfo *>;

This DenseMap's value has an extra indirection that requires dynamic allocation. I haven't seen anything like this in Swift/LLVM. It's pretty well accepted that new/delete are a recipe for bugs. I'm not sure what problem is being solved by dynamic allocation, so I can't suggest an alternative.

class BlockRegionInfo {
AccessConflictAndMergeAnalysis::Result &result;

BlockRegionInfo needs to be allocated for all regions in the program. I don't expect persistent per-region info to hold redundant state that is not specific to a region. Here's why this is unexpected to me:

  • It's innefficient to allocate redundant state.

  • It creates a web of object references in the heap making it
    harder to understand object lifetimes and dependencies.

  • It signals that the class definition doesn't provide a clean
    encapsulation of its state. i.e. the class "knows something" that it
    isn't supposed to know based on its state and lifetime.

[Note: in my earlier implementation I had context information inside a local data flow result. There was never more than one instance of that result and it had purely local lifetime.]

If would be fine to pass in Result to methods that use it.

using ScopeAccessStuct = struct AccessStuct {

This name throws me off. How about RegionAccessInfo or AccessSummary? I'm also not sure what the using does.

llvm::SmallBitVector accessBitmask;

It's probably fine to use this bitset, but here's something to consider first...

In my experience, allocating and clearing a bitset for every node in the CFG can become a performance bottleneck for very large functions (lots of nodes and large bitset count). I always try to make the per-CFG-node data proportional to the amount of useful information at that program point, rather than proportional to the size of the function. For example, it would be fine to have a (very small) bitset indexed by storage location rather than access ID, or a dense set of the most recent access for each storage.

[Note: In my earlier implementation I did the bitset + vector thing, but there was only one local instance of that set].

(continued...)

void removeInScopeAccess(BeginAccessInst *beginAccess) {
auto it = std::find(
inScopeConflictFreeAccesses.conflictFreeAccesses.begin(),
inScopeConflictFreeAccesses.conflictFreeAccesses.end(), beginAccess);
assert(it != inScopeConflictFreeAccesses.conflictFreeAccesses.end() &&
"the begin_access should have been in Set.");
inScopeConflictFreeAccesses.conflictFreeAccesses.erase(it);

Continuing the previous comment. Doing an O(N) set removal defeats the purpose of using a bitset. If the vector is unordered, then you would naturally erase by moving the last element into the hole.

However, the conventional way to get O(1) membership and ordered iteration is with MapVector. That has a high constant overhead per element in the set, but has the right asymptotic behavior. To avoid spending time micro-optimizing, just use a SmallMapVector. Then the previous comment about accessBitmask is moot.

assert(std::find(inScopeConflictFreeAccesses.conflictFreeAccesses.begin(),
                inScopeConflictFreeAccesses.conflictFreeAccesses.end(),
                beginAccess) ==

All this sort of stuff goes away with SmallMapVector.

void AccessConflictAndMergeAnalysis::identifyRegionToAccessedStorageMapping(
auto subRegionStorageIt = regionToStorageMap.find(subID);

Comment: propagate access summaries bottom-up from nested loops.

using LoopRegionToAccessedStorage =
llvm::SmallDenseMap<unsigned, AccessedStorageSet>;
...
accessedStorageSet.insert(storage);

This is insufficient for detecting conflicts, which may not have an identified location. See FunctionAccessedStorage.

for (auto subID : region->getSubregions()) {
 auto *subRegion = LRFI->getRegion(subID);

I think mergePredAccesses should be called right here since it has nothing to do with being a block or a loop.

void AccessConflictAndMergeAnalysis::mergePredAccesses(
LoopRegion *bbRegion

This is obviously baffling. Is it a loop or a block? Actually it doesn't matter...

if (predRegion->getParentID() != bbRegionParentID) {
// Unhandled control flow - bail - set empty in/out of scope

This comment does not tell me enough about what's happening. Why/when do we need this check?

 if (blockIDToLocalInfoMap.find(pred) == blockIDToLocalInfoMap.end()) {
   // Backedge / did not visit predecessor - bail

Please clarify. It looks like is to handle irreducable control flow within the current loop.

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

Thanks for the input @atrick ! I updated the code based on it + review from @eeckstein

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@swift-ci Please test

@swiftlang swiftlang deleted a comment from swift-ci Aug 9, 2018
@swiftlang swiftlang deleted a comment from swift-ci Aug 9, 2018
@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@swift-ci Please clean test

@swiftlang swiftlang deleted a comment from swift-ci Aug 9, 2018
@swiftlang swiftlang deleted a comment from swift-ci Aug 9, 2018
public:
using AccessMap = llvm::SmallDenseMap<BeginAccessInst *, AccessInfo, 32>;
using AccessedStorageSet = llvm::SmallDenseSet<AccessedStorage, 8>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see any code to keep track of or handle unidentified accesses.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! forgot to include that from the last re-write. fixed.

@shajrawi shajrawi force-pushed the merge_accesses branch 2 times, most recently from b46fb01 to 24d502e Compare August 9, 2018 06:58
@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@swift-ci Please clean test and merge

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

The linux swift tests pass but it seems there's an unrelated problem on the bots later on:

clang: error: no such file or directory: 'tools/SourceKit/tools/sourcekitd/bin/InProc/CMakeFiles/sourcekitdInProc.dir/sourcekitdInProc.cpp.o'

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@swift-ci Please smoke test Linux

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization pass looks good, just a few minor comments.

What is missing here are SIL tests which test all the parts of the algorithm (at least I didn't find them), e.g.

  • summary propagation up the region tree
  • data flow within a region
  • all the bailing conditions, e.g. in mergePredAccesses
  • the oldToNewMap mechanism in mergeAccesses

using RegionIDToLocalInfoMap = llvm::DenseMap<unsigned, RegionInfo>;
// A map of instruction pairs we can merge from dominating instruction to
// dominated
using MergeableMap =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be named 'Map'

@@ -173,6 +243,9 @@ class AccessConflictAnalysis {
/// the accesses, then AccessInfo::getAccessIndex() can be used.
AccessMap accessMap;

/// A map of instruction pairs we can merge the scope of
MergeableMap mergeMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and please also change the comment

static void recordConflict(AccessInfo &info, SparseAccessSet &accessSet) {
info.setSeenNestedConflict();
accessSet.setConflict(info.getAccessIndex());
// Returns a mapping from each loop sub-region to all its access storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here that this function 'propagates access summaries bottom-up from nested regions'.
Also consider renaming this function to something like 'propagateAccessSetsBottomUp'


// make a temporary reverse copy to work on:
// It is in reverse order just to make it easier to debug / follow
AccessConflictAndMergeAnalysis::MergeableMap workMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is not a map

AccessConflictAndMergeAnalysis::MergeableMap workMap;
workMap.append(mergeMap.rbegin(), mergeMap.rend());

// Assume we have two pairs in map (1,2) , (2,3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, the result is not a map.
Actually the comment is a bit cryptic. How about:
"Assume the result contains two access pairs to be merged:
(begin_access %1, begin_access %2) // = merge end_access %1 with begin_access %2
(begin_access %2, begin_access %3) // = merge end_access %2 with begin_access %3
After merging the first pair, begin_access %2 is removed, so the second pair in the result list points to a to-be-deleted begin_access instruction. We store (begin_access %2 -> begin_access %1) to re-map a merged begin_access to it's replaced instruction."

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@eeckstein I did the tests as Swift files - handling all the different conditions in the algorithm + interaction with other passes such as LICM. but I can write some SIL tests.

@eeckstein
Copy link
Contributor

eeckstein commented Aug 9, 2018

The swift tests are great for testing if the optimization works. But they do not cover the negative cases and corner cases, e.g. irregular loops, etc.

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@eeckstein I added SIL tests to all the corner-cases + negative tests.
Thanks to your suggestion, I managed to create an irreducible control flow (testIrreducibleGraph2 in the test cases) that we didn't handle previously.
While I don't think we'll ever see flow like testIrreducibleGraph2's in real-life, I handled that test case / fixed the bug + discussed my solution with @atrick

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@swift-ci Please test

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@shajrawi
Copy link
Author

shajrawi commented Aug 9, 2018

@swift-ci Please test and merge

@shajrawi
Copy link
Author

same unrelated linux issue - something is really wrong on the bots:

19:16:39 
clang: error: no such file or directory: 'tools/SourceKit/tools/sourcekitd/bin/InProc/CMakeFiles/sourcekitdInProc.dir/sourcekitdInProc.cpp.o'

@shajrawi
Copy link
Author

@swift-ci Please smoke test Linux

@shajrawi shajrawi merged commit e68e087 into swiftlang:master Aug 10, 2018
@shajrawi shajrawi deleted the merge_accesses branch April 12, 2019 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants