-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Devirtualize calls to protocol composition type methods #28043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Devirtualize calls to protocol composition type methods #28043
Conversation
cc @atrick |
auto firstUse = *instance->getUses().begin(); | ||
if (auto *store = dyn_cast<StoreInst>(firstUse->getUser())) { | ||
openedVal = store->getSrc(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existential value stored on the stack here might not be the same value on the stack when it is opened.
Also, the value stored on the stack might no longer be live at the apply site.
There's a whole bunch of complexity in getAddressOfStackInit
and canReplaceCopiedValue
to deal with on-stack existentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I understand your first comment but, could you elaborate on the following?
Also, the value stored on the stack might no longer be live at the apply site.
I've updated line 222 to use getAddressOfStackInit(..., firstUse->getUser(), ...)
instead of a dyn_cast<StoreInst>(...)
. Do you think this will be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the abstract, we want to avoid producing a sequence like
r = open_existential_ref
destroy(r)
apply(r)
That shouldn't actually be a problem though because we're not in OSSA form so there will only be 'retain/release_value(r)' operations. As long as none of the retains get eliminated during this transformation, it should be safe.
A lot of the complexity in the existing code comes from the fact that it's handling 'open_existential_addr', which can be destroyed with 'destroy_addr'.
The main thing here is to prove that the stored value is the value used by the apply. One way to do that is to recognize all uses of the alloc_stack and make sure only one of them is a store. getStackInitInst already does that, but only handles copy_addr, not store.
Please try to write some .sil test cases to for corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that was extremely helpful.
A lot of the complexity in the existing code comes from the fact that it's handling 'open_existential_addr', which can be destroyed with 'destroy_addr'.
The constructor I modified does handle open_existential_addr
but the specific optimization I'm working on should only handle open_existential_ref
, right?
The main thing here is to prove that the stored value is the value used by the apply. One way to do that is to recognize all uses of the alloc_stack and make sure only one of them is a store.
I've added more checking around the above code but, I suspect it isn't quite right yet. Should the stack alloc instruction ever be used in more than the initial store, the apply, and then the dealloc instructions? If not it may make it a bit easier to filter out what we can optimize. But that might be too harsh of a filter.
Anyway, I've added many more tests and have uncovered several bugs. Right now I'm in the process of testing those bugs against master to see if they're my issue and if so, fixing them. I'm going to be pretty busy this week with other things so, turn around on this patch might be a bit slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atrick I've added a dominance order check which should ensure that there aren't those kinds of issues. I've also added one .sil test. I plan on writing some more, what other edge cases should I cover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developing a breadth of .sil test cases is extremely helpful. That's what this code was missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to generalize this pattern match, I think it should be part of the existing utilities so there's no new code in this function. getStackInitInst
should be extended so it can return a store. It already checks for dominance of ASIUser
. It's a bit more general than your new code. Whether it should set isCopied
in that case is very subtle. The purpose of that flag is to make sure the original value isn't destroyed before the apply that needs to use it. In the case of a store it's simpler of we do not set isCopied
, but then we should assert that the store's ownership qualifier is Unqualified (non-OSSA) and comment when we support OSSA here, we need to insert a new copy of the value before the store (and make sure that copy is destroyed when replacing the apply operand).
FYI: in non-OSSA it should just work. If we have
retain val
store val to address
destroy val
use address
destroy_addr
It's actually ok in non-OSSA to reuse the original value like this:
retain val
store val to address
destroy val
use val
destroy_addr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea-- that's much cleaner. Also, thanks for the information about OSSA/non-OSSA and when copies are needed. Very interesting.
assert(srcTy.isAddress() == destTy.isAddress() | ||
&& "Addresses aren't compatible with values"); | ||
if (srcTy.isAddress() == destTy.isAddress()) | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This assertion was triggered after the above change. I suspect some other optimization caused the store instruction to trigger this assertion. This was the simplest fix but, maybe I should try to do some more digging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not part of this APIs contract to return nullptr and it doesn't seem to be anticipated by its callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now thinking that this assertion shouldn't exist. The next condition srcTy.isAddress() && destTy.isAddress()
can never succeed if this assertion is triggered (if bool != bool
then bool && bool == false
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is valid code
assert(A == B); if (A && B)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I got confused for a second :P
I'm not sure what the best solution is, but it seems to be OK when I remove the assertion (not a reason on its own to remove the assertion, though).
…f dyn_cast to StoreInst
assert(srcTy.isAddress() == destTy.isAddress() | ||
&& "Addresses aren't compatible with values"); | ||
if (srcTy.isAddress() == destTy.isAddress()) | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is valid code
assert(A == B); if (A && B)
auto firstUse = *instance->getUses().begin(); | ||
if (auto *store = dyn_cast<StoreInst>(firstUse->getUser())) { | ||
openedVal = store->getSrc(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the abstract, we want to avoid producing a sequence like
r = open_existential_ref
destroy(r)
apply(r)
That shouldn't actually be a problem though because we're not in OSSA form so there will only be 'retain/release_value(r)' operations. As long as none of the retains get eliminated during this transformation, it should be safe.
A lot of the complexity in the existing code comes from the fact that it's handling 'open_existential_addr', which can be destroyed with 'destroy_addr'.
The main thing here is to prove that the stored value is the value used by the apply. One way to do that is to recognize all uses of the alloc_stack and make sure only one of them is a store. getStackInitInst already does that, but only handles copy_addr, not store.
Please try to write some .sil test cases to for corner cases.
return shouldOptimize1(c) | ||
} | ||
|
||
// TODO: create SR -- this causes a crash on master too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example. This crashes because of the issue #28012 resolved but even after that commit is applied, it still crashes because of another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Now that you've been looking at this code, can you see what the problem is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take another look and see if I can figure it out.
@@ -907,7 +907,7 @@ SILCombiner::buildConcreteOpenedExistentialInfoFromSoleConformingType( | |||
(archetypeTy->getConformsTo().size() == 1)) { | |||
PD = archetypeTy->getConformsTo()[0]; | |||
} else if (ArgType.isExistentialType() && !ArgType.isAnyObject() && | |||
!SwiftArgType->isAny()) { | |||
!SwiftArgType->isAny() && SwiftArgType->getAnyNominal()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, didn't rebase off master.
|
||
public func test(x: B, y: C) -> Int | ||
|
||
// CHECK-LABEL: sil shared [noinline] @${{.*}}impl{{.*}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the relevant bit of this test.
@atrick friendly ping. |
@zoecarver sorry I keep getting side tracked. This is next on my list to review. |
@atrick don't worry about it. Thanks for the update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you put a lot of work into this. Thanks. There are a few small cleanups below and an optional suggestion for generalizing the implementation.
store = foundStore; | ||
} else if (auto *foundApply = dyn_cast<ApplyInst>(use->getUser())) { | ||
apply = foundApply; | ||
} else if (auto *foundDealloc = dyn_cast<DeallocStackInst>(use->getUser())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
80-col violation
for (auto *use : instance->getUses()) { | ||
if (auto *foundStore = dyn_cast<StoreInst>(use->getUser())) { | ||
store = foundStore; | ||
} else if (auto *foundApply = dyn_cast<ApplyInst>(use->getUser())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just check that user->getUser() == user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean use->getUser() == user
? How would that check be useful? Wouldn't it fail if use
(the argument) and use
(the for-loop element) were different uses of the alloc_stack
instruction?
I'm going to rename some of these variables, I didn't realize there were two use
variables.
auto firstUse = *instance->getUses().begin(); | ||
if (auto *store = dyn_cast<StoreInst>(firstUse->getUser())) { | ||
openedVal = store->getSrc(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developing a breadth of .sil test cases is extremely helpful. That's what this code was missing.
auto firstUse = *instance->getUses().begin(); | ||
if (auto *store = dyn_cast<StoreInst>(firstUse->getUser())) { | ||
openedVal = store->getSrc(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to generalize this pattern match, I think it should be part of the existing utilities so there's no new code in this function. getStackInitInst
should be extended so it can return a store. It already checks for dominance of ASIUser
. It's a bit more general than your new code. Whether it should set isCopied
in that case is very subtle. The purpose of that flag is to make sure the original value isn't destroyed before the apply that needs to use it. In the case of a store it's simpler of we do not set isCopied
, but then we should assert that the store's ownership qualifier is Unqualified (non-OSSA) and comment when we support OSSA here, we need to insert a new copy of the value before the store (and make sure that copy is destroyed when replacing the apply operand).
FYI: in non-OSSA it should just work. If we have
retain val
store val to address
destroy val
use address
destroy_addr
It's actually ok in non-OSSA to reuse the original value like this:
retain val
store val to address
destroy val
use val
destroy_addr
return shouldOptimize1(c) | ||
} | ||
|
||
// TODO: create SR -- this causes a crash on master too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Now that you've been looking at this code, can you see what the problem is?
dealloc_stack %5 : $*@opened("A1F1B526-1463-11EA-932F-ACBC329C418B") A & P | ||
return %8 : $Int | ||
} // end sil function 'impl' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We certainly want as many optimizer issues as possible covered by SIL tests, but the tests themselves should be minimal. The maintenance burden of these tests is a primary reason that it's so difficult to make SIL changes. It seems like everything after this point should be deleted, except maybe the SIL declaration of "foo". You don't need to include bodies of called SIL functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I removed much of this sil test. Any other test cases I should cover?
@atrick thank you for the review, I really appreciate all the info you gave me here. I might not get to updating this until next week, just FYI. |
@swift-ci test |
Build failed |
OK, at least I don't have to fix anything :P |
Build failed |
Looks like all the tests passed. Maybe you could trigger the benchmarks again and if it looks good to you, can you merge it for me? Thanks again for all the help with this patch, @atrick! |
@@ -914,7 +924,29 @@ SILInstruction *SILCombiner::createApplyWithConcreteType( | |||
}); | |||
} | |||
|
|||
if (!UpdatedArgs) { | |||
bool canUpdateArgs, madeUpdate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section of new code was confusing at first. I think what it's doing is determining both whether it is possible to rewrite the call at all and whether at least one argument has changed. If not, avoid creating a new call and delete any temporary instructions to avoid the top-level SILCombine driver from infinitely attempting to optimize. If that's right, then the code looks good but please explain that in a comment before new code.
I'm not sure why it's a written as a closure. The results need to be exposed anyway and there's not much else to hide here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. I'll factor it out and add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I requested a comment, but otherwise this PR looks good now.
@atrick thanks, as I've said before, I really appreciate all your reviews of this patch. I updated it based on your comment. |
@zoecarver migrating passes to OSSA is one of the most useful things that can be done. I'm not sure which pass is easiest to start with. It's something we'll be figuring over the next few months. @gottesmm is looking into it. |
@swift-ci smoke test |
@zoecarver thanks for iterating on this to get it right! |
Of course.
I think I remember @gottesmm saying that it would be hard to find a good pass for me to work on but, let me know if that changes, or if there are other things that I can do that would be helpful. |
@atrick could you also run the benchmarks? The benchmark I added should now work properly. |
@swift-ci benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
return sum | ||
} | ||
|
||
func shouldOptimize1<T>(_ x: ClassA<T> & ProtocolA, count: Int) -> Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the names here.
@swift-ci smoke test |
@swift-ci benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@atrick friendly ping. Anything else I need to do with this? |
@swift-ci smoke test |
Looks like a swiftpm issue. |
@swift-ci please smoke test Linux |
@theblixguy thanks! |
@swift-ci smoke test linux |
This patch devirtualizes calls to methods of protocol composition types when those types are composed of a class and method. (when possible) It removes
init_existential_ref
andopen_existential_ref
instructions (existentials/existential unwrapping). This also allows for optimization of the methods directly (inlining, etc.).Here's an example:
Before:
After:
Partially resolves SR-11695 (doesn't optimize the overridden class -- I can work on that in another PR).
Once reviewed, can someone with commit access run swift-ci to test and benchmark, please?