-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Fix logic related to isTriviallyDuplicatable. #28249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In SILInstruction::isTriviallyDuplicatable(): - Make deallocating instructions trivially duplicatable. They are by any useful definition--duplicating an instruction does not imply reordering it. Tail duplication was already treating deallocations as duplicatable, but doing it inconsistently. Sometimes it checks isTriviallyDuplicatable, and sometimes it doesn't, which appears to have been an accident. Disallowing duplication of deallocations will cause severe performance regressions. Instead, consistently allow them to be duplicated, making tail duplication more powerful, which could expose other bugs. - Do not duplicate on-stack AllocRefInst (without special consideration). This is a correctness fix that apparently was never exposed. Fix SILLoop::canDuplicate(): - Handle isDeallocatingStack. It's not clear how we were avoiding an assertion before when a stack allocatable reference was confined to a loop--probably just by luck. - Handle begin/end_access inside a loop. This is extremely important and probably prevented many loop optimizations from working with exclusivity. Update LoopRotate canDuplicateOrMoveToPreheader(). This is NFC.
I figured out this is what was been blocking another correctness fix: That fix was regressing performance because it was correctly checking isTriviallyDuplicatable before cloning blocks. |
@swift-ci test |
@swift-ci benchmark |
@swift-ci test source compatibility |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Tough I don't know why @aschwaighofer excluded partial_apply on stack. He should approve this specific change.
The case of |
if (auto *Dealloc = dyn_cast<DeallocRefInst>(I)) | ||
Alloc = dyn_cast<AllocRefInst>(Dealloc->getOperand()); | ||
// The matching alloc_stack must be in the loop. | ||
return Alloc && contains(Alloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add partial_apply [stack]
here.
I don't think the previous support in Loop canDuplicate for on-stack PartialApply actually worked because it only handled the PartialApply, not the corresponding DeallocStack. I added the full support here |
In SILInstruction::isTriviallyDuplicatable():
Make deallocating instructions trivially duplicatable. They are by
any useful definition--duplicating an instruction does not imply
reordering it. Tail duplication was already treating deallocations
as duplicatable, but doing it inconsistently. Sometimes it checks
isTriviallyDuplicatable, and sometimes it doesn't, which appears to
have been an accident. Disallowing duplication of deallocations will
cause severe performance regressions. Instead, consistently allow
them to be duplicated, making tail duplication more powerful, which
could expose other bugs.
Do not duplicate on-stack AllocRefInst (without special
consideration). This is a correctness fix that apparently was never
exposed.
Fix SILLoop::canDuplicate():
Handle isDeallocatingStack. It's not clear how we were avoiding an
assertion before when a stack allocatable reference was confined to
a loop--probably just by luck.
Handle begin/end_access inside a loop. This is extremely important
and probably prevented many loop optimizations from working with
exclusivity.
Update LoopRotate canDuplicateOrMoveToPreheader(). This is NFC.