Skip to content
This repository was archived by the owner on Nov 30, 2024. It is now read-only.

Recursive bisect strategy #1997

Merged
merged 6 commits into from
Jun 29, 2015

Conversation

urbanautomaton
Copy link
Contributor

Hi,

First, thank you very much for the new --bisect feature - it's already saved me and a colleague hours of painstaking manual debugging, and it doesn't get much better than that. :)

While looking at the implementation recently, I came across a way I think it might be possible to improve the bisection strategy in ExampleMinimizer. Presently a round-based approach is used, which attempts to find ~50% of the current residual candidates to ignore in each round. It first searches for contiguous blocks of candidates to ignore, then for more complex combinations of the candidate set. If it can't find ~50% of the candidate set to ignore at any stage, it concludes that the run is irreducible.

In the case where there is a single, singly-dependent failure (i.e. when there is a single expected failure, which depends upon a single preceding example), I believe the current strategy is optimal.

However, when there are multiple preceding examples that combine to cause the expected failure, then there are cases that the current strategy fails to reduce, and cases that it successfully reduces, but could reduce in fewer steps.

Examples

Since the problem cases involve expected failures that depend on multiple preceding examples, I first expanded the FakeBisectRunner class to support simulating multiply-dependent failures.

I then added failing test cases:

The rounds-based strategy will not reduce the case C C C C C C C I F (where C = culprit, I = innocent, and F = expected failure), as the first round fails to remove 50% of the candidates.

Also, the rounds-based strategy takes 70 runs to infer that the following is irreducible: C C C C C C C C F. With a recursive strategy, I believe this should be possible in 1 + 2 + 4 + 8 = 15 runs:

  • 1 run to determine that at least one of the candidates is responsible
  • 2 runs to determine that both halves of the candidates are responsible
  • 4 runs to determine that all quarters of the candidates are responsible
  • 8 runs to determine that all of the candidates are responsible

(My test case originally had 14 runs, because I hadn't accounted for the first step.)

Recursive strategy

Instead of searching amongst combinations of the candidate set, the recursive algorithm always splits its candidate set into two slices. Informally:

  1. Is the candidate set a single example?
    • yes: return it as the culprit
    • no:
      1. Divide the candidate set in two
      2. Run each slice with the expected failure
      3. Can either set be eliminated?
        • yes: recurse on other set
        • no: recurse on both sets, retaining other set as "fixed" for each run

(Where "fixed" means while reducing one set, the other set is always included as part of each run.)

In case my lousy prose description makes no sense, here is a visual representation of a simple case where there are two culprit examples, one in each half of the candidate set:

I've implemented the above strategy in this PR, and the additional test cases now pass.

Formatter

There remain some failures elsewhere in the suite, however, because the formatter output no longer matches the previous algorithm's, and because the recursive strategy doesn't proceed in rounds, it's difficult to restore these tests without changing the bisection output.

So, before I tackle that: is this something you'd be interested in merging? I'm happy to modify anything that's unclear. :)

Cheers,
Simon

@myronmarston
Copy link
Member

Thanks for digging into this! I'm quite keen to see the bisector improved, and you've clearly thought a lot about this....so this looks great!

I do have a few questions/concerns:

  • How does this affect the common case where there is on culprit causing one or more other failures when it runs before them? You mention that the current algorithm is optimal for this common case, so presumably your recursive algorithm performs less optimally...any ideas what the affect is? (Particularly in terms of runtime and number of runs)
  • The diagram you provided is reaaallly helpful, but there are a couple things that surprise me about it:
    • For the "Recurse on RHS with LHS Fixed" section, why is the last run needed? It looks like you've identified the culprit on the RHS after the 3rd run so the 4th seems unnecessary (and wasteful).
    • For the "Recurse on LHS with RHS Fixed" section, why is the RHS fixed to all 4 examples when you've already identified which of the 4 examples is the culprit? The individual runs will take less time if we don't run the extra examples we've already identified as safe to ignore.
  • Given that this is a departure from a rounds-based strategy, and the formatter output is built around that...do you have thoughts on what the formatter output might look like for this strategy? The diagram is really cool and it makes me wonder if we could do something like that...
  • I'd be interested on seeing some A/B test results of the two strategies on real-world example repos. It'd help me understand how this affects users.

Thanks again!

@urbanautomaton
Copy link
Contributor Author

Hi Myron,

How does this affect the common case where there is one culprit

In this instance the algorithms should be identical. The recursive strategy uses the same two initial slices as the rounds-based one, so if there is only a single failure, one of those slices can always be eliminated until the terminating condition is reached. It's only when both of the initial slices are implicated that the algorithms start to differ.

There's a minor difference in that the recursive strategy tests the expected failure(s) on their own at the start to eliminate totally independent failures, then proceeds with the knowledge that at least one culprit exists.

For singly-dependent failures this makes no difference to the number of runs since the rounds-based strategy also does a run with just the expected failure; it just does it at the end, rather than the beginning.

For totally independent failures, the recursive strategy will return in a single run, whereas the rounds-based strategy will take ~log2(n) runs.

For multiply-dependent failures the number of runs differs anyway.

For the "Recurse on RHS with LHS Fixed" section, why is the last run needed?

This is my mistake - the diagram is incorrect at that point, and the last run would be skipped because of the #find usage here, combined with the terminating condition that assumes any candidate set of length 1 must be the culprit.

why is the RHS fixed to all 4 examples when you've already identified which of the 4 examples is the culprit?

At the moment because the code is clearer when the recursive steps are independent. It's certainly an optimisation that could be applied, but my (brief) attempt to do so resulted in a bit of a mess, so I thought I'd submit the PR with the more understandable code. I'll have another look at this.

do you have thoughts on what the formatter output might look like for this strategy?

I'd love to try something like the diagram. :) The challenging bit would be handling cases where there are more examples than will fit in the terminal width (or 80 chars, or whatever). I'll certainly give it a go - is there a standard width you try to keep your formatter output to? I guess the output also needs to be comprehensible in a non-color terminal, so we couldn't rely solely on colors.

I'd be interested on seeing some A/B test results of the two strategies on real-world example repos.

Yep, some real-world testing would definitely be nice. I have a good test project (private source unfortunately) that had a bunch of order-dependent failures I can resurrect and try bisecting, but broader testing would be nice, too. I don't suppose the rspec project has any example test suites lying around to hand that are used for this sort of thing? If not I could look at using the github API to gather some, or just make a twitter appeal. :)

@myronmarston
Copy link
Member

I'd love to try something like the diagram. :) The challenging bit would be handling cases where there are more examples than will fit in the terminal width (or 80 chars, or whatever). I'll certainly give it a go - is there a standard width you try to keep your formatter output to? I guess the output also needs to be comprehensible in a non-color terminal, so we couldn't rely solely on colors.

We don't have any width we limit output to. In fact, our formatters commonly go over normal terminal widths and the terminal just wraps it. The dots for the progress formatter are just one line line that the terminal wraps, for example. For the doc formatter, the length is whatever the doc strings provided by the user are, and we do nothing to limit it.

Here it could be a problem for it to wrap if it causes it to run together and makes it hard to distinguish different runs.

Note that we also support multiple bisect formatters -- currently we have a normal one and a verbose one that is enabled via --bisect=verbose. So maybe the diagram could use --bisect=diagram or replace the verbose output or something.

@myronmarston
Copy link
Member

I don't suppose the rspec project has any example test suites lying around to hand that are used for this sort of thing?

Sadly, no.

end

def bisect_over_recursive(candidate_ids, fixed)
return candidate_ids if candidate_ids.length == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one? works nicely for this case:

return candidate_ids if candidate_ids.one?

@urbanautomaton
Copy link
Contributor Author

Thanks for the code feedback, Myron. I have:

  • Altered the algorithm to track eliminated ids rather than remaining ids, allowing the algorithm to use the knowledge gained from LHS recursion when recursing on the RHS (updated algorithm diagram).
  • Simplified the recursive logic to eliminate the #permutation call (don't know what I was thinking there, really :) )
  • Had another go at eliminating the object-level progress state, but it got messy again. In the end I think if it has to exist for the graceful-exit-on-SIGINT feature, it's less confusing to just accept and use it, rather than having multiple sources of truth for algorithm progress.
  • Updated the progress and verbose formatters as minimally as possible to make sense with the new algorithm - here's a gist with sample output.
  • Patched up the specs and features involving formatter output to match the new display (for the debug formatter, this had the nice side-effect of demonstrating that the algorithms are essentially identical for the singly-dependent case).

I'll definitely continue to work on getting a diagrammatic formatter going, but from my POV it'd be nice to do that as a separate PR after doing the real-world testing on the new algorithm and getting this merged (if it's accepted, of course :)). Would that be a good idea? If you'd like it all in one PR though, that's fine with me.

The full test suite now passes for me locally; I've fixed the travis failures I understand, but some of the suites seem to be failing despite having only pending failures.

@myronmarston
Copy link
Member

The build is failing due to missing code coverage. To see the missing coverage, run script/rspec_with_simplecov and the open coverage/index.html. It looks like the bisect formatter has some uncovered lines. Let me know if you want help with that.

Bisecting over non-failing examples 1-9 .. ignoring examples 6-9
Bisecting over non-failing examples 1-5 .. ignoring examples 4-5
Bisecting over non-failing examples 1-3 .. ignoring example 3
Bisecting over non-failing examples 1-2 . ignoring example 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duration of each round was really nice before. Can we include the duration on these lines as well?

Also, how does this format it when the set being bisected over isn't contiguous? e.g. when you've fixed the LHS and are running part of the RHS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll look at putting that back. It dropped out because there wasn't a natural equivalent of :bisect_round_finished in the recursive scheme, but I'll work it back in.

There's an illustration of the more complex recursive case here - in the case where both sides need recursing on it just doesn't display any ignorable ids at the moment. I'm thinking of adding text like "Both sides implicated - splitting..." to indicate this situation. Will follow up with a proper test to illustrate.

@myronmarston
Copy link
Member

This is looking really good, @urbanautomaton. I haven't had a chance to finish my review and have to get ready for work but I'll try to finish my review later today or tomorrow.

@urbanautomaton
Copy link
Contributor Author

Thanks @myronmarston - I've only been able to grab 30 minutes here and there to work on this since submitting it, so a steady drip of feedback is perfect. :)

@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch from bfb9348 to 080da9d Compare June 21, 2015 12:44
@urbanautomaton
Copy link
Contributor Author

Okay, feedback addressed so far:

  • Renamed bisect_over/bisect_over_recursive as suggested
  • Reverted to tracking remaining ids instead of eliminated ones
  • Restored durations to formatter output for both simple and complex cases
  • Fixed the coverage issues for the bisect formatter

For the latter issue, I figured that rather than try to cover formatter edge cases in either the Bisect::Coordinator spec or the feature tests, it made most sense to have a dedicated spec for BisectProgressFormatter. I added an example for the case I needed, but it now looks weirdly incomplete. Since I'm planning to try a diagrammatic version soon, though, I don't really want to exhaustively test the current behaviour only to immediately rewrite it. Are you happy for me to fill this spec out in a follow-up PR with the diagram formatter?

@@ -106,8 +107,10 @@ Feature: Bisect
- ./spec/calculator_7_spec.rb[1:1]
- ./spec/calculator_8_spec.rb[1:1]
- ./spec/calculator_9_spec.rb[1:1]
Checking that failures are order-dependent..
- Running: rspec ./spec/calculator_1_spec.rb[1:1] --seed 1234 (n.nnnn seconds) failure is order-dependent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "failure is order dependent" bit isn't separated at all from the command so it kinda runs together. Thoughts on adding an elipses or something similar to separate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll sort something out there (might put it on a separate line):

Checking that failures are order-dependent...
 - Running: <snip>
Failure is order-dependent

(with the above-mentioned changes to the "is order-dependent" copy, of course)

@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch from 080da9d to c54fc0b Compare June 23, 2015 21:52
)
end

it 'reduces a multiply-dependent failure' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "multiple-dependent failure" instead of "multiply-dependent failure"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the adverb form (c.f. "singly") - happy to change it though if it's unusual in US English.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do. I've never heard "multiply" used as anything other than a verb in the arithmetic sense. I assumed it was a typo and I think other US English speakers would think the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard of singly either :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gadzooks! All right then, away with it. :)

"Multiple-dependent" sounds odd to me in turn, though. How's it "reduces a failure with multiple dependencies"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch 3 times, most recently from d235493 to ba8a16c Compare June 26, 2015 09:17
@urbanautomaton
Copy link
Contributor Author

Okay, thanks for all the latest feedback - I think I've covered it all:

Edit: there's a single JRuby failure which appears to be related to my changes (at least, it appears in a bisect-related spec), but is pretty baffling. Don't suppose anyone's seen something similar in the suite before?

@myronmarston
Copy link
Member

Edit: there's a single JRuby failure which appears to be related to my changes (at least, it appears in a bisect-related spec), but is pretty baffling. Don't suppose anyone's seen something similar in the suite before?

We periodically get file descriptor errors like that in our builds for JRuby. I have no idea why, but it's always transient so we just kick the build and run it again. That's what I just did.

It would be nice to not have to do that but since it has to do with a JVM resource limit and not a logic bug in RSpec or our specs, I'm not sure what to do about it :(.

it 'detects an unminimisable failure in the minimum number of runs' do
counting_minimizer.find_minimal_repro

expect(counting_reporter.round_count).to eq(15)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear from this example why 15 is the minimum number of runs. It kinda feels like a magic number right now. Can you add a comment (or update the docstring, if the explanation is short enough) explaining why 15 is the minimum?

@myronmarston
Copy link
Member

Thanks, @urbanautomaton, this is fantastic!

I left two more comments that are cosmetic changes (formatting / comments). I'm completely happy (ecstatic, even) with this otherwise.

Do you want to rebase into a smaller number of commits before we merge? It's not strictly needed but given the back-and-forth and the fact that we're up to 26 commits it could help cleanup the history a bit.

@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch from ba8a16c to 0a654a5 Compare June 26, 2015 15:27
if start == finish
"example #{start}"
else
"examples #{[start, finish].join("-")}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems simpler as:

"examples #{start}-#{finish}"

...which should also be a bit faster -- it's fewer object allocations (no unnecessary array allocation or string allocation for "-") and fewer method calls (no join call).

To illustrate cases for which a recursive bisection strategy is faster than
subset enumeration, FakeRunner needs to support failures that are dependent on
multiple candidate examples.
@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch from 8f58382 to fcaa214 Compare June 28, 2015 22:10
If an expected failure depends on a large number of the passing examples, the
subset enumeration strategy can take a large number of rounds to minimise the
run.

Add a test to count rounds in the worst case, where a failure depends on all
of the preceding examples. With 8 candidates preceding a failure, a recursive
strategy could return in 15 rounds (1 + 2 + 4 + 8).
@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch 2 times, most recently from 7339f87 to cf1482a Compare June 28, 2015 22:14
@urbanautomaton
Copy link
Contributor Author

Okay, remaining bits of feedback addressed - I've rebased into a smaller number of commits, too. The rebase isn't perfect - because of all the back-and-forth I wasn't able to do it totally cleanly, so there are a couple of additions in 6a22644 that only make complete sense in the light of cf1482a (relating to formatter notifications). If you're happy with that then we could just leave it - alternatively I could just squash those commits together and have one big commit that lands both the algorithm and formatter modifications at once.

Thanks once again for all the feedback - the PR is so much better for it. I've always been appreciative of the hard work that goes into maintaining RSpec, but I don't think I've really understood just how much work that actually is. So, cheers. :)

@myronmarston
Copy link
Member

Thanks, @urbanautomaton! The way you've squashed this looks great to me. It tells a very clear story, which is quite helpful.

I did notice one of the commit messages that I think is slightly inaccurate (in a way that can cause confusion), and if I'm correct, I'd like the message amended before we merge. I'm happy to take care of amending the message (or you can take care of it; I don't particularly care) but can you first confirm for me whether or not I'm correct about the message? I'm talking specifically about the message for 6a22644:

[✘] - expected failure (failing)
[✔︎] - expected failure (passing)
.   - uninvolved example
%   - culprit

Failing run:
          . % . . . . % . [✘]

Divide candidates in two and test:
                  . . % . [✔︎]
          . % . .         [✔︎]
Both sides implicated

Recurse on LHS with RHS fixed
              . . . . % . [✔︎]
          . %     . . % . [✘]
            %     . . % . [✘]
 Identified ^

Recurse on RHS with LHS result
            %         % . [✘]
            %     . .     [✔︎]
            %           . [✔︎]
            %         %   [✘]
           Identified ^

Both sides returned, final results:
            %         %   [✘]

Specifically, these lines:

Recurse on RHS with LHS result
            %         % . [✘]
            %     . .     [✔︎]

It shows that it first discovers on that the failure happens with using the RH quarter of the RHS...and then for some reason it proceeds to also try the LH quarter of the RHS, when it shouldn't need to. Given that you use Enumerable#find to find test the two quarters, it should abort as soon as it finds a quarter that causes the expected failures so it wouldn't actually try the 2nd quarter, right?

Actually, the find is run against [lhs, rhs] so really I think that should be:

Recurse on RHS with LHS result
            %     . .     [✔︎]
            %         % . [✘]

...where the LH quarter of the RHS is tested before the RH quarter of the RHS.

Not trying to be anal here but the usefulness of the graphical diagram (which is really, really great, BTW!) depends on it being accurate or else it causes confusion so I think it's worth getting right. Can you confirm if my understanding is correct?

@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch from cf1482a to 17087bb Compare June 29, 2015 07:27
To speed up bisection of complex dependent failures, a recursive bisection
strategy is introduced in place of permutation searching. The strategy works
as follows:

1. Split the candidate set into two slices
2. Test each slice with the expected failure(s)
3. If either slice can be ignored:
  - recurse on the other slice
4. If neither slice can be ignored, for each slice:
  - recurse on slice, retaining other slice for test runs

This is easier to illustrate graphically - shown below is a case where a
single expected failure depends on two preceding non-failing examples:

    [✘] - expected failure (failing)
    [✔︎] - expected failure (passing)
    .   - uninvolved example
    %   - culprit

    Failing run:
              . % . . . . % . [✘]

    Divide candidates in two and test:
                      . . % . [✔︎]
              . % . .         [✔︎]
    Both sides implicated

    Recurse on LHS with RHS fixed
                  . . . . % . [✔︎]
              . %     . . % . [✘]
                %     . . % . [✘]
     Identified ^

    Recurse on RHS with LHS result
                %         % . [✘]
                %           . [✔︎]
                %         %   [✘]
               Identified ^

    Both sides returned, final results:
                %         %   [✘]
This modifies BisectProgressFormatter to more closely model the recursive
bisection algorithm, fixing the associated BisectCoordinator specs.
@urbanautomaton urbanautomaton force-pushed the recursive-bisect-strategy branch from 17087bb to 5a2f476 Compare June 29, 2015 07:29
@urbanautomaton
Copy link
Contributor Author

Hmm - I think you're right about the former, but not the latter. The find runs on the lhs first, but the block tests whether the failure exists without the tested side:

ids_to_ignore, duration = track_duration do
  [lhs, rhs].find do |ids|
    get_expected_failures_for?(remaining_ids - ids)
  end
end

So I think your first version is correct:

Edit: sorry, I keep trying to reply in too little time. So I think the following is correct:

Recurse on RHS with LHS result
                %         % . [✘]
                %           . [✔︎]
                %         %   [✘]

and have amended the commit accordingly.

myronmarston added a commit that referenced this pull request Jun 29, 2015
@myronmarston myronmarston merged commit 86c4fd4 into rspec:master Jun 29, 2015
@myronmarston
Copy link
Member

👍 Merged. Thanks again!

myronmarston added a commit that referenced this pull request Jun 29, 2015
MatheusRich pushed a commit to MatheusRich/rspec-core that referenced this pull request Oct 30, 2020
MatheusRich pushed a commit to MatheusRich/rspec-core that referenced this pull request Oct 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants