-
-
Notifications
You must be signed in to change notification settings - Fork 753
Recursive bisect strategy #1997
Recursive bisect strategy #1997
Conversation
Thanks for digging into this! I'm quite keen to see the bisector improved, and you've clearly thought a lot about this....so this looks great! I do have a few questions/concerns:
Thanks again! |
Hi Myron,
In this instance the algorithms should be identical. The recursive strategy uses the same two initial slices as the rounds-based one, so if there is only a single failure, one of those slices can always be eliminated until the terminating condition is reached. It's only when both of the initial slices are implicated that the algorithms start to differ. There's a minor difference in that the recursive strategy tests the expected failure(s) on their own at the start to eliminate totally independent failures, then proceeds with the knowledge that at least one culprit exists. For singly-dependent failures this makes no difference to the number of runs since the rounds-based strategy also does a run with just the expected failure; it just does it at the end, rather than the beginning. For totally independent failures, the recursive strategy will return in a single run, whereas the rounds-based strategy will take ~log2(n) runs. For multiply-dependent failures the number of runs differs anyway.
This is my mistake - the diagram is incorrect at that point, and the last run would be skipped because of the
At the moment because the code is clearer when the recursive steps are independent. It's certainly an optimisation that could be applied, but my (brief) attempt to do so resulted in a bit of a mess, so I thought I'd submit the PR with the more understandable code. I'll have another look at this.
I'd love to try something like the diagram. :) The challenging bit would be handling cases where there are more examples than will fit in the terminal width (or 80 chars, or whatever). I'll certainly give it a go - is there a standard width you try to keep your formatter output to? I guess the output also needs to be comprehensible in a non-color terminal, so we couldn't rely solely on colors.
Yep, some real-world testing would definitely be nice. I have a good test project (private source unfortunately) that had a bunch of order-dependent failures I can resurrect and try bisecting, but broader testing would be nice, too. I don't suppose the rspec project has any example test suites lying around to hand that are used for this sort of thing? If not I could look at using the github API to gather some, or just make a twitter appeal. :) |
We don't have any width we limit output to. In fact, our formatters commonly go over normal terminal widths and the terminal just wraps it. The dots for the progress formatter are just one line line that the terminal wraps, for example. For the doc formatter, the length is whatever the doc strings provided by the user are, and we do nothing to limit it. Here it could be a problem for it to wrap if it causes it to run together and makes it hard to distinguish different runs. Note that we also support multiple bisect formatters -- currently we have a normal one and a verbose one that is enabled via |
Sadly, no. |
end | ||
|
||
def bisect_over_recursive(candidate_ids, fixed) | ||
return candidate_ids if candidate_ids.length == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one?
works nicely for this case:
return candidate_ids if candidate_ids.one?
b909be7
to
b600552
Compare
Thanks for the code feedback, Myron. I have:
I'll definitely continue to work on getting a diagrammatic formatter going, but from my POV it'd be nice to do that as a separate PR after doing the real-world testing on the new algorithm and getting this merged (if it's accepted, of course :)). Would that be a good idea? If you'd like it all in one PR though, that's fine with me. The full test suite now passes for me locally; I've fixed the travis failures I understand, but some of the suites seem to be failing despite having only pending failures. |
The build is failing due to missing code coverage. To see the missing coverage, run |
Bisecting over non-failing examples 1-9 .. ignoring examples 6-9 | ||
Bisecting over non-failing examples 1-5 .. ignoring examples 4-5 | ||
Bisecting over non-failing examples 1-3 .. ignoring example 3 | ||
Bisecting over non-failing examples 1-2 . ignoring example 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duration of each round was really nice before. Can we include the duration on these lines as well?
Also, how does this format it when the set being bisected over isn't contiguous? e.g. when you've fixed the LHS and are running part of the RHS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll look at putting that back. It dropped out because there wasn't a natural equivalent of :bisect_round_finished
in the recursive scheme, but I'll work it back in.
There's an illustration of the more complex recursive case here - in the case where both sides need recursing on it just doesn't display any ignorable ids at the moment. I'm thinking of adding text like "Both sides implicated - splitting..." to indicate this situation. Will follow up with a proper test to illustrate.
This is looking really good, @urbanautomaton. I haven't had a chance to finish my review and have to get ready for work but I'll try to finish my review later today or tomorrow. |
Thanks @myronmarston - I've only been able to grab 30 minutes here and there to work on this since submitting it, so a steady drip of feedback is perfect. :) |
bfb9348
to
080da9d
Compare
Okay, feedback addressed so far:
For the latter issue, I figured that rather than try to cover formatter edge cases in either the |
@@ -106,8 +107,10 @@ Feature: Bisect | |||
- ./spec/calculator_7_spec.rb[1:1] | |||
- ./spec/calculator_8_spec.rb[1:1] | |||
- ./spec/calculator_9_spec.rb[1:1] | |||
Checking that failures are order-dependent.. | |||
- Running: rspec ./spec/calculator_1_spec.rb[1:1] --seed 1234 (n.nnnn seconds) failure is order-dependent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "failure is order dependent" bit isn't separated at all from the command so it kinda runs together. Thoughts on adding an elipses or something similar to separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll sort something out there (might put it on a separate line):
Checking that failures are order-dependent...
- Running: <snip>
Failure is order-dependent
(with the above-mentioned changes to the "is order-dependent" copy, of course)
080da9d
to
c54fc0b
Compare
) | ||
end | ||
|
||
it 'reduces a multiply-dependent failure' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "multiple-dependent failure" instead of "multiply-dependent failure"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the adverb form (c.f. "singly") - happy to change it though if it's unusual in US English.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do. I've never heard "multiply" used as anything other than a verb in the arithmetic sense. I assumed it was a typo and I think other US English speakers would think the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never heard of singly either :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gadzooks! All right then, away with it. :)
"Multiple-dependent" sounds odd to me in turn, though. How's it "reduces a failure with multiple dependencies"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
d235493
to
ba8a16c
Compare
Okay, thanks for all the latest feedback - I think I've covered it all:
Edit: there's a single JRuby failure which appears to be related to my changes (at least, it appears in a bisect-related spec), but is pretty baffling. Don't suppose anyone's seen something similar in the suite before? |
We periodically get file descriptor errors like that in our builds for JRuby. I have no idea why, but it's always transient so we just kick the build and run it again. That's what I just did. It would be nice to not have to do that but since it has to do with a JVM resource limit and not a logic bug in RSpec or our specs, I'm not sure what to do about it :(. |
it 'detects an unminimisable failure in the minimum number of runs' do | ||
counting_minimizer.find_minimal_repro | ||
|
||
expect(counting_reporter.round_count).to eq(15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear from this example why 15 is the minimum number of runs. It kinda feels like a magic number right now. Can you add a comment (or update the docstring, if the explanation is short enough) explaining why 15 is the minimum?
Thanks, @urbanautomaton, this is fantastic! I left two more comments that are cosmetic changes (formatting / comments). I'm completely happy (ecstatic, even) with this otherwise. Do you want to rebase into a smaller number of commits before we merge? It's not strictly needed but given the back-and-forth and the fact that we're up to 26 commits it could help cleanup the history a bit. |
ba8a16c
to
0a654a5
Compare
if start == finish | ||
"example #{start}" | ||
else | ||
"examples #{[start, finish].join("-")}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems simpler as:
"examples #{start}-#{finish}"
...which should also be a bit faster -- it's fewer object allocations (no unnecessary array allocation or string allocation for "-"
) and fewer method calls (no join
call).
To illustrate cases for which a recursive bisection strategy is faster than subset enumeration, FakeRunner needs to support failures that are dependent on multiple candidate examples.
8f58382
to
fcaa214
Compare
If an expected failure depends on a large number of the passing examples, the subset enumeration strategy can take a large number of rounds to minimise the run. Add a test to count rounds in the worst case, where a failure depends on all of the preceding examples. With 8 candidates preceding a failure, a recursive strategy could return in 15 rounds (1 + 2 + 4 + 8).
7339f87
to
cf1482a
Compare
Okay, remaining bits of feedback addressed - I've rebased into a smaller number of commits, too. The rebase isn't perfect - because of all the back-and-forth I wasn't able to do it totally cleanly, so there are a couple of additions in 6a22644 that only make complete sense in the light of cf1482a (relating to formatter notifications). If you're happy with that then we could just leave it - alternatively I could just squash those commits together and have one big commit that lands both the algorithm and formatter modifications at once. Thanks once again for all the feedback - the PR is so much better for it. I've always been appreciative of the hard work that goes into maintaining RSpec, but I don't think I've really understood just how much work that actually is. So, cheers. :) |
Thanks, @urbanautomaton! The way you've squashed this looks great to me. It tells a very clear story, which is quite helpful. I did notice one of the commit messages that I think is slightly inaccurate (in a way that can cause confusion), and if I'm correct, I'd like the message amended before we merge. I'm happy to take care of amending the message (or you can take care of it; I don't particularly care) but can you first confirm for me whether or not I'm correct about the message? I'm talking specifically about the message for 6a22644:
Specifically, these lines:
It shows that it first discovers on that the failure happens with using the RH quarter of the RHS...and then for some reason it proceeds to also try the LH quarter of the RHS, when it shouldn't need to. Given that you use Actually, the
...where the LH quarter of the RHS is tested before the RH quarter of the RHS. Not trying to be anal here but the usefulness of the graphical diagram (which is really, really great, BTW!) depends on it being accurate or else it causes confusion so I think it's worth getting right. Can you confirm if my understanding is correct? |
cf1482a
to
17087bb
Compare
To speed up bisection of complex dependent failures, a recursive bisection strategy is introduced in place of permutation searching. The strategy works as follows: 1. Split the candidate set into two slices 2. Test each slice with the expected failure(s) 3. If either slice can be ignored: - recurse on the other slice 4. If neither slice can be ignored, for each slice: - recurse on slice, retaining other slice for test runs This is easier to illustrate graphically - shown below is a case where a single expected failure depends on two preceding non-failing examples: [✘] - expected failure (failing) [✔︎] - expected failure (passing) . - uninvolved example % - culprit Failing run: . % . . . . % . [✘] Divide candidates in two and test: . . % . [✔︎] . % . . [✔︎] Both sides implicated Recurse on LHS with RHS fixed . . . . % . [✔︎] . % . . % . [✘] % . . % . [✘] Identified ^ Recurse on RHS with LHS result % % . [✘] % . [✔︎] % % [✘] Identified ^ Both sides returned, final results: % % [✘]
This modifies BisectProgressFormatter to more closely model the recursive bisection algorithm, fixing the associated BisectCoordinator specs.
17087bb
to
5a2f476
Compare
Hmm - I think you're right about the former, but not the latter. The ids_to_ignore, duration = track_duration do
[lhs, rhs].find do |ids|
get_expected_failures_for?(remaining_ids - ids)
end
end
Edit: sorry, I keep trying to reply in too little time. So I think the following is correct:
and have amended the commit accordingly. |
Recursive bisect strategy
👍 Merged. Thanks again! |
…rategy Recursive bisect strategy
[ci skip]
Hi,
First, thank you very much for the new
--bisect
feature - it's already saved me and a colleague hours of painstaking manual debugging, and it doesn't get much better than that. :)While looking at the implementation recently, I came across a way I think it might be possible to improve the bisection strategy in
ExampleMinimizer
. Presently a round-based approach is used, which attempts to find ~50% of the current residual candidates to ignore in each round. It first searches for contiguous blocks of candidates to ignore, then for more complex combinations of the candidate set. If it can't find ~50% of the candidate set to ignore at any stage, it concludes that the run is irreducible.In the case where there is a single, singly-dependent failure (i.e. when there is a single expected failure, which depends upon a single preceding example), I believe the current strategy is optimal.
However, when there are multiple preceding examples that combine to cause the expected failure, then there are cases that the current strategy fails to reduce, and cases that it successfully reduces, but could reduce in fewer steps.
Examples
Since the problem cases involve expected failures that depend on multiple preceding examples, I first expanded the
FakeBisectRunner
class to support simulating multiply-dependent failures.I then added failing test cases:
The rounds-based strategy will not reduce the case
C C C C C C C I F
(where C = culprit, I = innocent, and F = expected failure), as the first round fails to remove 50% of the candidates.Also, the rounds-based strategy takes 70 runs to infer that the following is irreducible:
C C C C C C C C F
. With a recursive strategy, I believe this should be possible in1 + 2 + 4 + 8 = 15
runs:(My test case originally had 14 runs, because I hadn't accounted for the first step.)
Recursive strategy
Instead of searching amongst combinations of the candidate set, the recursive algorithm always splits its candidate set into two slices. Informally:
(Where "fixed" means while reducing one set, the other set is always included as part of each run.)
In case my lousy prose description makes no sense, here is a visual representation of a simple case where there are two culprit examples, one in each half of the candidate set:
I've implemented the above strategy in this PR, and the additional test cases now pass.
Formatter
There remain some failures elsewhere in the suite, however, because the formatter output no longer matches the previous algorithm's, and because the recursive strategy doesn't proceed in rounds, it's difficult to restore these tests without changing the bisection output.
So, before I tackle that: is this something you'd be interested in merging? I'm happy to modify anything that's unclear. :)
Cheers,
Simon