-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Cut down on exponential growth in checking switch exhaustivity. #21999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cut down on exponential growth in checking switch exhaustivity. #21999
Conversation
Saves a few copies of linked lists. No real impact expected, certainly no user-facing changes.
The problem Assume 'A' and 'B' are enums with cases .a1, .a2, etc. and .b1, .b2, etc. If we try to typecheck this switch: switch (a, b) { case (.a1, .b1), (.a1, .b2): break ... The compiler is going to try to perform the following series of operations: > var s = (A, B) > s -= (.a1, .b1) ((A - .a1, B) | (A, B - .b1)) > s -= (.a1, .b2) (((A - .a1, B) | (A - .a1, B - .b2)) | ((A - .a1, B - .b1) | (A, B - .b1 - .b2))) ... As you can see, the disjunction representing the uncovered space is growing exponentially. Eventually some of these will start disappearing (for instance, if B only has .b1 and .b2, that last term can go away), and if the switch is exhaustive they can /all/ start disappearing. But several of them are also redundant: the second and third cases are fully covered by the first. I exaggerated a little: the compiler is already smart enough to know that the second case is redundant with the first, because it already knows that (.a1, .b2) isn't a subset of (A - .a1, B). However, the third and fourth cases are generated separately from the first two, and so nothing ever checks that the third case is also redundant. This patch changes the logic for subtracting from a disjunction so that 1. any resulting disjunctions are flattened, and 2. any later terms that are subspaces of earlier terms are dropped This is a quadratic algorithm in the worst case (compare every space against every earlier space), but because it saves us from exponential growth (or at least brings down the exponent) it's worth it. For the test case now committed in the repository, we went from 40 seconds (using -switch-checking-invocation-threshold=20000000 to avoid cutting off early) to 0.2 seconds. I'll admit I was only timing this one input, and it's possible that other complex switches will not see the same benefit, or may even see a slowdown. But I do think this kind of switch is common in both hand-written and auto-generated code, and therefore this is likely to be a benefit in many places. rdar://problem/47365349
9234619
to
671944d
Compare
@swift-ci Please test |
@swift-ci Please test compiler performance |
@swift-ci Please test source compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh... I would guess that this change, considered in isolation, is probably good. But I can't help thinking that this whole file seems to be much too much to maintain relative to its utility. As I understand it, the benefit of this algorithm is good diagnostics about missing cases. Something much simpler could just say covered or not-covered, which is what I think we concluded last year when we discussed the topic.
I hate to say it, and maybe I'm wrong, but maintaining this periodically-troublesome and seemingly-complicated-to-me sophisticated diagnostic capability just may not be the best use of our precious cognitons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @davidungar, but I also think that this is a viable for the time being until one of us gets to implement a better algorithm here...
@jrose-apple for types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization is sound and warranted. There are probably a few more places where it applies as well.
@davidungar It's important to note that, while we're seeing space complexity blowups, the overall operations involved in the algorithm itself are relatively cheap to compute and scale well with the problem space. If you take a step back and look at where we're seeing these cases, it's consistently large products of large enums. If we were losing on straight-line switches I'd be more fervently looking into a replacement. Even then, it seems like the replacements might only postpone the inevitable. The ideas we discussed before about optimizing the check seemed attractive at first (briefly - a hypothetical quasi-linear-time algorithm for a quick checks then the full monty if you fail that) but the inherently infinitely-recursive nature of the problem space means it resists the kinds of caching you have in mind. My experience tells me that we can make this check fast by focusing on memoizing or pruning (sub)-computations since that's the generalization of the solution @jrose-apple has identified here: we're duplicating a ton of work we've already performed down different subtrees. |
@CodaFi Thanks for your response. I trust your judgement here; I've not studied this algorithm closely enough. Forgive the repetition, but I'm hoping to get to a point where we put no further effort into this area. I harbor this hope because I feel that we can more effectively improve the experience for our users by investing our time elsewhere, just a personal opinion. I had thought that the counter I had put into this code a while ago might accomplish that feat, and am dismayed that this code continues to divert us from other pursuits. |
That ends up not helping because of Also, recursive enums. :-(
This is a true statement, but it doesn't seem helpful. If you're missing a case from a switch, how would you ever find out which case it was? You can't even break apart the switch like you can with a complex expression. By the way, I don't believe the compiler perf instruction counter there—it says we executed three times as many instructions but the overall run time didn't go up by more than a few percent. So I guess that wasn't useful after all. |
By the way…
The impetus to look into this is pretty much directly from Robert removing the last source of unsoundness from the implementation, an early exit that approximated whether the switch was covered. Without that, an existing moderately complicated switch was suddenly being checked "properly", and thus ran afoul of the counter-based timeout. So we might be closer to that goal now, where we shouldn't regress anything for our users as long as enum capabilities don't change again. On the other hand, we might just not have encountered some other bad patterns that other languages handle fine and Swift does not. |
People who use complex case statements will be well-served by these 1400 lines of code. I hope we don't have to put much more effort into them (the code, not the people!). Jordan- and Robert- cycles are rare and precious! |
Thanks to all three of you for reviewing! |
…expletives-to-quadratic-quality Cut down on exponential growth in checking switch exhaustivity. (cherry picked from commit 37c30b1)
@jrose-apple If we remove recursive enums from consideration (assuming that it's easy to detect enums like that) - wouldn't it be a better algorithm to eagerly expand types into their Cartesian products and match linearly and handle |
I don't thiiink so but I'll come back and think about it again later! |
@xedin Your comment reminds me of the sort of algorithm I expected to see there before I looked at the code. My naive intuition felt that what you describe ought to be much faster for some cases. |
The problem Assume 'A' and 'B' are enums with cases
.a1
,.a2
, etc. and.b1
,.b2
, etc. If we try to typecheck this switch:The compiler is going to try to perform the following series of operations:
As you can see, the disjunction representing the uncovered space is growing exponentially. Eventually some of these will start disappearing (for instance, if B only has
.b1
and.b2
, that last term can go away), and if the switch is exhaustive they can all start disappearing. But several of them are also redundant: the second and third cases are fully covered by the first.I exaggerated a little: the compiler is already smart enough to know that the second case is redundant with the first, because it already knows that
(.a1, .b2)
isn't a subset of(A - .a1, B)
. However, the third and fourth cases are generated separately from the first two, and so nothing ever checks that the third case is also redundant.This patch changes the logic for subtracting from a disjunction so that
This is a quadratic algorithm in the worst case (compare every space against every earlier space), but because it saves us from exponential growth (or at least brings down the exponent) it's worth it. For the test case now committed in the repository, we went from 40 seconds (using
-switch-checking-invocation-threshold=20000000
to avoid cutting off early) to 0.2 seconds.I'll admit I was only timing this one input, and it's possible that other complex switches will not see the same benefit, or may even see a slowdown. But I do think this kind of switch is common in both hand-written and auto-generated code, and therefore this is likely to be a benefit in many places.
rdar://problem/47365349