-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[OpenMPOpt] Fix incorrect end-of-kernel barrier removal #65670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2676,17 +2676,19 @@ struct AAExecutionDomainFunction : public AAExecutionDomain { | |
if (!ED.EncounteredAssumes.empty() && !A.isModulePass()) | ||
return; | ||
|
||
// We can remove this barrier, if it is one, or all aligned barriers | ||
// reaching the kernel end. In the latter case we can transitively work | ||
// our way back until we find a barrier that guards a side-effect if we | ||
// are dealing with the kernel end here. | ||
// We can remove this barrier, if it is one, or aligned barriers reaching | ||
// the kernel end (if CB is nullptr). Aligned barriers reaching the kernel | ||
// end may have other successors besides the kernel end (especially if | ||
// they're in loops) with non-local side-effects, so those barriers can | ||
// only be removed if they also only reach the kernel end. If those | ||
// barriers have other barriers reaching them, those can be transitively | ||
// removed as well. | ||
if (CB) { | ||
DeletedBarriers.insert(CB); | ||
A.deleteAfterManifest(*CB); | ||
++NumBarriersEliminated; | ||
Changed = ChangeStatus::CHANGED; | ||
} else if (!ED.AlignedBarriers.empty()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we just delete 2729-2731 and this conditional? I think the idea that the kernel end is an aligned barrier was fundamentally flawed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm still wrapping my head around this pass, but from my understanding it doesn't seem unreasonable to treat the kernel end as an aligned barrier. Even if the kernel exit instructions themselves are divergent and different ones are reached in different threads, it seems like we can still model the control flow as re-converging after the different kernel exits before our implicit post-kernel barrier. My understanding is that it's legal and useful to remove barriers near the end of kernels in cases like this:
It seems like we need a concept of a post-kernel barrier to do this, though, since otherwise There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might want to treat the kernel end special, but it is not an aligned barrier. Threads are allowed to run into the kernel end independently, and the others can do something else in the meantime, e.g., wait in an aligned barrier. What I'm trying to say is that our current abstraction is too coarse-grained. That said, you might be right that for the purpose of barrier elimination, we can continue to "handle" the kernel end. That said, I'm still not sure if this is the right way to fix this. I think the change in multiple_blocks_functions_kernel_effects_0 actually exposes a different error we should first address. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First, can we have a few more tests though (and pre-commit the tests please).
Also add an unknown call in between the 4 aligned barriers.
I'm still thinking about this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added the requested testcases in my update. With this new updated the three barriers in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you are right. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm seeing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This issue with the three barriers in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jdoerfert, is this issue with these barriers still being removed what's holding up this change? It looks to me like the fix for this should be separate from the fix I've implemented here to only remove barriers with a straight line to the kernel end, but is there something I'm missing? |
||
NumBarriersEliminated += ED.AlignedBarriers.size(); | ||
Changed = ChangeStatus::CHANGED; | ||
SmallVector<CallBase *> Worklist(ED.AlignedBarriers.begin(), | ||
ED.AlignedBarriers.end()); | ||
|
@@ -2697,7 +2699,11 @@ struct AAExecutionDomainFunction : public AAExecutionDomain { | |
continue; | ||
if (LastCB->getFunction() != getAnchorScope()) | ||
continue; | ||
const ExecutionDomainTy &PostLastED = CEDMap[{LastCB, POST}]; | ||
if (!PostLastED.IsReachingAlignedBarrierOnly) | ||
continue; | ||
if (!DeletedBarriers.count(LastCB)) { | ||
++NumBarriersEliminated; | ||
A.deleteAfterManifest(*LastCB); | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not true, nor does it reflect what happens below. The kernel end is not an aligned barrier which is why your test case starts to work (the explicit aligned barrier does not only reach aligned barriers but also the kernel end and is consequently not removed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "if they also only reach the kernel end or other aligned barriers", right?
And since
IsReachingAlignedBarrierOnly
doesn't take side-effects like the@unknown()
call into account, I'd need to check something else to make sure there aren't any extra side-effect instructions after this barrier that aren't ahead of the kernel end. Would I need to do another backwards propagation for that? It seems likeEncounteredNonLocalSideEffect
is only for side-effects preceding the execution domain.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is accurate at least not with the rest in place. See my example (loop_barrier3) with an explicit barrier in the end. The one in the loop will hit only aligned barriers. The one in the end is useless and removed. The end -> barrier elimination will skip it and eliminate the one in the loop.
Here are my current thoughts:
The end -> barrier case is different because we delete the "former"(/only) barrier, not the latter one.
The logic we have is basically trying to determine when a barrier is not actually synchronizing/communicating anything, and it is wrong in the kernel end case because the barrier can (evidently) synchronize with itself.
The logic we use right now is fundamentally flawed since end -> barrier does not say anything about other paths from barrier to somewhere else. In fact, there is no need for the threads reaching the barrier to go (directly) to the end at all. I think this is the crucial point. Our logic only holds if the barrier we are about to eliminate, due to the kernel end case, cannot reach any other synchronization point. If it can, the path from it to the end is not enough reason to remove it. Even if it can only reach other aligned barriers, e.g.,:
I see two paths forward:
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think what we're doing now with special-casing the kernel end is not the right approach, but I think there is some value in generalizing the "former" barrier removal if we can get the conditions for it right. That would help in cases like this:
We can't remove
Barrier B
in this case and we can't removeBarrier A
using our normal "latter" barrier removal because it has anunknown
call before it. ButBarrier A
is still redundant here because it's always immediately followed byBarrier B
, so if we had "former" barrier removal in cases other than the kernel end case we could safely remove it.As in your example with a conditional return between the barriers, one of the big challenges here is that if we're removing both former and latter barriers in the common straight-line case we'll remove the former barrier because it's immediately followed by a barrier and also the latter barrier because it's immediately preceded by a barrier. This is a stale analysis problem, though: once we remove one of the barriers it's no longer the case that the remaining barrier does not have any side effects (or only reaches aligned barriers) before/after it. We could fix this by making sure the analysis is up-to-date, either by propagating the results from barriers as we remove them or by re-computing the analysis between the first sweep removing latter barriers and the second sweep removing former barriers.
This is a more ambitious strategy that will take more time to get right, though. For now I think just doing what we are now but only removing barriers in the end -> barrier case when the kernel end is the unique successor is a reasonable solution since it should fix all of the cases where barriers are being incorrectly removed while still removing barriers in simple cases. I'll update this PR to implement this instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is reasonable. Also, change the initialization of ExitED, the Kernel end is never an aligned barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done