[AccessEnforcementOpts] Add mergeAccesses analysis + optimization #18560

shajrawi · 2018-08-08T01:04:42Z

We have the following code:

  %30 = begin_access [read] [dynamic] [no_nested_conflict] %1 : $*UInt64 // users: %31, %32
  %31 = load %30 : $*UInt64                       // user: %34
  end_access %30 : $*UInt64                       // id: %32
  %33 = struct $UInt64 (%27 : $Builtin.Int64)     // user: %34
  %34 = apply %14(%31, %33) : $@convention(thin) (UInt64, UInt64) -> UInt64 // user: %36
  %35 = begin_access [modify] [dynamic] [no_nested_conflict] %1 : $*UInt64 // users: %36, %37
  store %34 to %35 : $*UInt64                     // id: %36
  end_access %35 : $*UInt64                       // id: %37

If modify access fails then it is inclusive of the read access.

We can merge the two to a bigger “modify” block and reduce the amount of runtime overhead / calls.

We can also do the same for two merge accesses inside the same region, creating one larger scope access, as long as nothing conflicts in the larger scope / prevents us from doing so.

Now consider what happens when we couple what's described above with LICM: we can hoist the newly-merged large-scope access outside of loops! (see attached test case)

I have benchmarked this optimization on RangeIteration from the benchmark suite: it improves performance by over 7X !!!

radar rdar://problem/40376124

replaces PR #18438

atrick · 2018-08-08T20:35:05Z

I'm starting to be able to reason about the code now. I'll just point out the things that are making it hard for me to understand, and some things that are just unconventional for Swift/LLVM passes. Then you and @eeckstein can decide when to merge the PR.

using RegionIDToLocalInfoMap = llvm::DenseMap<unsigned, BlockRegionInfo *>;

What you called BlockRegionInfo is actually dataflow state for each node (or region) in the control flow subgraph provided by LoopRegionFunctionInfo. The data flow state needs to be maintained for each node in that graph independent of whether the region is a block or loop. In fact, nothing about the data flow state is specific to blocks or loops. The only part of the code that should know anything about blocks or loops is the transfer function for the region. So, why is it called "BlockRegionInfo"?

using RegionIDToLocalInfoMap = llvm::DenseMap<unsigned, BlockRegionInfo *>;

This DenseMap's value has an extra indirection that requires dynamic allocation. I haven't seen anything like this in Swift/LLVM. It's pretty well accepted that new/delete are a recipe for bugs. I'm not sure what problem is being solved by dynamic allocation, so I can't suggest an alternative.

class BlockRegionInfo {
AccessConflictAndMergeAnalysis::Result &result;

BlockRegionInfo needs to be allocated for all regions in the program. I don't expect persistent per-region info to hold redundant state that is not specific to a region. Here's why this is unexpected to me:

It's innefficient to allocate redundant state.
It creates a web of object references in the heap making it
harder to understand object lifetimes and dependencies.
It signals that the class definition doesn't provide a clean
encapsulation of its state. i.e. the class "knows something" that it
isn't supposed to know based on its state and lifetime.

[Note: in my earlier implementation I had context information inside a local data flow result. There was never more than one instance of that result and it had purely local lifetime.]

If would be fine to pass in Result to methods that use it.

using ScopeAccessStuct = struct AccessStuct {

This name throws me off. How about RegionAccessInfo or AccessSummary? I'm also not sure what the using does.

llvm::SmallBitVector accessBitmask;

It's probably fine to use this bitset, but here's something to consider first...

In my experience, allocating and clearing a bitset for every node in the CFG can become a performance bottleneck for very large functions (lots of nodes and large bitset count). I always try to make the per-CFG-node data proportional to the amount of useful information at that program point, rather than proportional to the size of the function. For example, it would be fine to have a (very small) bitset indexed by storage location rather than access ID, or a dense set of the most recent access for each storage.

[Note: In my earlier implementation I did the bitset + vector thing, but there was only one local instance of that set].

(continued...)

void removeInScopeAccess(BeginAccessInst *beginAccess) {
auto it = std::find(
inScopeConflictFreeAccesses.conflictFreeAccesses.begin(),
inScopeConflictFreeAccesses.conflictFreeAccesses.end(), beginAccess);
assert(it != inScopeConflictFreeAccesses.conflictFreeAccesses.end() &&
"the begin_access should have been in Set.");
inScopeConflictFreeAccesses.conflictFreeAccesses.erase(it);

Continuing the previous comment. Doing an O(N) set removal defeats the purpose of using a bitset. If the vector is unordered, then you would naturally erase by moving the last element into the hole.

However, the conventional way to get O(1) membership and ordered iteration is with MapVector. That has a high constant overhead per element in the set, but has the right asymptotic behavior. To avoid spending time micro-optimizing, just use a SmallMapVector. Then the previous comment about accessBitmask is moot.

assert(std::find(inScopeConflictFreeAccesses.conflictFreeAccesses.begin(),
                inScopeConflictFreeAccesses.conflictFreeAccesses.end(),
                beginAccess) ==

All this sort of stuff goes away with SmallMapVector.

void AccessConflictAndMergeAnalysis::identifyRegionToAccessedStorageMapping(
auto subRegionStorageIt = regionToStorageMap.find(subID);

Comment: propagate access summaries bottom-up from nested loops.

using LoopRegionToAccessedStorage =
llvm::SmallDenseMap<unsigned, AccessedStorageSet>;
...
accessedStorageSet.insert(storage);

This is insufficient for detecting conflicts, which may not have an identified location. See FunctionAccessedStorage.

for (auto subID : region->getSubregions()) {
 auto *subRegion = LRFI->getRegion(subID);

I think mergePredAccesses should be called right here since it has nothing to do with being a block or a loop.

void AccessConflictAndMergeAnalysis::mergePredAccesses(
LoopRegion *bbRegion

This is obviously baffling. Is it a loop or a block? Actually it doesn't matter...

if (predRegion->getParentID() != bbRegionParentID) {
// Unhandled control flow - bail - set empty in/out of scope

This comment does not tell me enough about what's happening. Why/when do we need this check?

 if (blockIDToLocalInfoMap.find(pred) == blockIDToLocalInfoMap.end()) {
   // Backedge / did not visit predecessor - bail

Please clarify. It looks like is to handle irreducable control flow within the current loop.

shajrawi · 2018-08-09T02:22:00Z

Thanks for the input @atrick ! I updated the code based on it + review from @eeckstein

shajrawi · 2018-08-09T02:22:11Z

@swift-ci Please test

shajrawi · 2018-08-09T04:22:23Z

@swift-ci Please clean test

atrick · 2018-08-09T05:30:54Z

lib/SILOptimizer/Transforms/AccessEnforcementOpts.cpp

 public:
  using AccessMap = llvm::SmallDenseMap<BeginAccessInst *, AccessInfo, 32>;
+  using AccessedStorageSet = llvm::SmallDenseSet<AccessedStorage, 8>;


I still don't see any code to keep track of or handle unidentified accesses.

Whoops! forgot to include that from the last re-write. fixed.

shajrawi · 2018-08-09T07:05:33Z

@swift-ci Please clean test and merge

shajrawi · 2018-08-09T16:24:03Z

The linux swift tests pass but it seems there's an unrelated problem on the bots later on:

clang: error: no such file or directory: 'tools/SourceKit/tools/sourcekitd/bin/InProc/CMakeFiles/sourcekitdInProc.dir/sourcekitdInProc.cpp.o'

shajrawi · 2018-08-09T16:27:03Z

@swift-ci Please smoke test Linux

eeckstein

The optimization pass looks good, just a few minor comments.

What is missing here are SIL tests which test all the parts of the algorithm (at least I didn't find them), e.g.

summary propagation up the region tree
data flow within a region
all the bailing conditions, e.g. in mergePredAccesses
the oldToNewMap mechanism in mergeAccesses

eeckstein · 2018-08-09T15:59:35Z

lib/SILOptimizer/Transforms/AccessEnforcementOpts.cpp

+  using RegionIDToLocalInfoMap = llvm::DenseMap<unsigned, RegionInfo>;
+  // A map of instruction pairs we can merge from dominating instruction to
+  // dominated
+  using MergeableMap =


This should not be named 'Map'

eeckstein · 2018-08-09T15:59:51Z

lib/SILOptimizer/Transforms/AccessEnforcementOpts.cpp

@@ -173,6 +243,9 @@ class AccessConflictAnalysis {
    /// the accesses, then AccessInfo::getAccessIndex() can be used.
    AccessMap accessMap;

+    /// A map of instruction pairs we can merge the scope of
+    MergeableMap mergeMap;


same here and please also change the comment

eeckstein · 2018-08-09T16:12:28Z

lib/SILOptimizer/Transforms/AccessEnforcementOpts.cpp

-static void recordConflict(AccessInfo &info, SparseAccessSet &accessSet) {
-  info.setSeenNestedConflict();
-  accessSet.setConflict(info.getAccessIndex());
+// Returns a mapping from each loop sub-region to all its access storage


Please add a comment here that this function 'propagates access summaries bottom-up from nested regions'.
Also consider renaming this function to something like 'propagateAccessSetsBottomUp'

eeckstein · 2018-08-09T16:17:24Z

lib/SILOptimizer/Transforms/AccessEnforcementOpts.cpp

+
+  // make a temporary reverse copy to work on:
+  // It is in reverse order just to make it easier to debug / follow
+  AccessConflictAndMergeAnalysis::MergeableMap workMap;


Again, this is not a map

eeckstein · 2018-08-09T16:28:40Z

lib/SILOptimizer/Transforms/AccessEnforcementOpts.cpp

+  AccessConflictAndMergeAnalysis::MergeableMap workMap;
+  workMap.append(mergeMap.rbegin(), mergeMap.rend());
+
+  // Assume we have two pairs in map (1,2) , (2,3)


Also here, the result is not a map.
Actually the comment is a bit cryptic. How about:
"Assume the result contains two access pairs to be merged:
(begin_access %1, begin_access %2) // = merge end_access %1 with begin_access %2
(begin_access %2, begin_access %3) // = merge end_access %2 with begin_access %3
After merging the first pair, begin_access %2 is removed, so the second pair in the result list points to a to-be-deleted begin_access instruction. We store (begin_access %2 -> begin_access %1) to re-map a merged begin_access to it's replaced instruction."

shajrawi · 2018-08-09T17:45:47Z

@eeckstein I did the tests as Swift files - handling all the different conditions in the algorithm + interaction with other passes such as LICM. but I can write some SIL tests.

eeckstein · 2018-08-09T18:24:45Z

The swift tests are great for testing if the optimization works. But they do not cover the negative cases and corner cases, e.g. irregular loops, etc.

shajrawi · 2018-08-09T21:15:36Z

@eeckstein I added SIL tests to all the corner-cases + negative tests.
Thanks to your suggestion, I managed to create an irreducible control flow (testIrreducibleGraph2 in the test cases) that we didn't handle previously.
While I don't think we'll ever see flow like testIrreducibleGraph2's in real-life, I handled that test case / fixed the bug + discussed my solution with @atrick

shajrawi · 2018-08-09T21:16:44Z

@swift-ci Please test

eeckstein

LGTM, thanks!

shajrawi · 2018-08-09T23:26:09Z

@swift-ci Please test and merge

shajrawi · 2018-08-10T05:47:41Z

same unrelated linux issue - something is really wrong on the bots:

19:16:39 
clang: error: no such file or directory: 'tools/SourceKit/tools/sourcekitd/bin/InProc/CMakeFiles/sourcekitdInProc.dir/sourcekitdInProc.cpp.o'

shajrawi · 2018-08-10T06:59:52Z

@swift-ci Please smoke test Linux

shajrawi requested a review from atrick August 8, 2018 01:04

shajrawi force-pushed the merge_accesses branch 2 times, most recently from 8092b2d to 13425e0 Compare August 8, 2018 04:08

shajrawi force-pushed the merge_accesses branch from 13425e0 to 15d2805 Compare August 9, 2018 02:21

swiftlang deleted a comment from swift-ci Aug 9, 2018

shajrawi force-pushed the merge_accesses branch from 15d2805 to 84a30d7 Compare August 9, 2018 04:22

swiftlang deleted a comment from swift-ci Aug 9, 2018

atrick reviewed Aug 9, 2018

View reviewed changes

shajrawi force-pushed the merge_accesses branch 2 times, most recently from b46fb01 to 24d502e Compare August 9, 2018 06:58

eeckstein requested changes Aug 9, 2018

View reviewed changes

[AccessEnforcementOpts] Add merge accesses analysis to conflict analysis

62e43a1

shajrawi force-pushed the merge_accesses branch from 24d502e to 40e0803 Compare August 9, 2018 21:13

eeckstein approved these changes Aug 9, 2018

View reviewed changes

[AccessEnforcementOpts] Add mergeAccesses optimization

7281a76

shajrawi force-pushed the merge_accesses branch from 40e0803 to 0a70a4a Compare August 9, 2018 23:23

[AccessEnforcementOpts] Add tests for mergeAccesses optimization

12adde2

shajrawi force-pushed the merge_accesses branch from 0a70a4a to 12adde2 Compare August 9, 2018 23:24

shajrawi merged commit e68e087 into swiftlang:master Aug 10, 2018

shajrawi deleted the merge_accesses branch April 12, 2019 22:42

[AccessEnforcementOpts] Add mergeAccesses analysis + optimization #18560

[AccessEnforcementOpts] Add mergeAccesses analysis + optimization #18560

Uh oh!

Conversation

shajrawi commented Aug 8, 2018

Uh oh!

atrick commented Aug 8, 2018

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

atrick Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

shajrawi Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

eeckstein left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

eeckstein Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

eeckstein Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

eeckstein Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

eeckstein Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

eeckstein commented Aug 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

eeckstein left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shajrawi commented Aug 9, 2018

Uh oh!

shajrawi commented Aug 10, 2018

Uh oh!

shajrawi commented Aug 10, 2018

Uh oh!

Uh oh!

eeckstein left a comment •

edited

Loading

eeckstein commented Aug 9, 2018 •

edited

Loading

eeckstein left a comment •

edited

Loading