Bisect #1917

myronmarston · 2015-03-24T07:28:19Z

This adds a new --bisect flag for #1767.

TODOs:

myronmarston · 2015-03-25T22:26:03Z

@rspec/rspec -- There are a number of TODOs above, but someone could go ahead and start reviewing this anytime. What's already there is pretty stable and what's left is largely additive.

JonRowe · 2015-03-30T00:14:30Z

lib/rspec/core/bisect/example_minimizer.rb

+
+        def debug(level, msg)
+          puts "#{'  ' * level}Minimizer: #{msg}" if ENV['DEBUG_RSPEC_BISECT']
+        end


Is this intended for "release" usage? or just whilst this is WIP?

Is this intended for "release" usage? or just whilst this is WIP?

Kinda both, actually. It was originally inspired by a similar feature in minitest-bisect:

https://github.com/seattlerb/minitest-bisect/blob/233b4fe69131358efa03da79270f91b2e7da9ea5/lib/minitest/find_minimal_combination.rb#L82-L84

It helped me get it working right. I think it also has value to keep something like this in place, though. Without it, if bisect's not working for a user, there's not much we can do to understand why. Having a verbose/debug mode will allow them to give us a print out that will hopefully allow us to understand where it went wrong. Bundler has something similar -- see rubygems/bundler#2248.

That said, this is a crappy WIP implementation of such debugging output. I'm thinking the debug output should be moved into the formatter (or perhaps be an alternate formatter). And perhaps we shouldn't necessarily use an ENV var to activate it, although I don't necessarily want to add a public API for it either -- it's more just something we can ask users to enable when they report bisect bugs.

Thoughts?

Well I think we should at least use notify(:message, "#{' ' * level}Minimizer: #{msg}"), given the formatter has something designed for this sort of usage... I'm not totally opposed to the usage of the ENV variable if it's intended for "semi private" debugging use...

myronmarston · 2015-03-30T18:41:25Z

OK, @JonRowe, I pushed some improvements to the debug output. Check it out and let me know what you think.

myronmarston · 2015-04-01T07:32:14Z

OK, I think I've taken this as far as I'm going to take it. Would be great to get a full review from @rspec/rspec folks. Anyone?

One open question/issue: the some of the bisect specs (particularly those in spec/integration/bisect_spec.rb) are quite slow -- notice the results of bin/rspec --profile:

Top 10 slowest examples (7.42 seconds, 57.0% of total time):
  Bisect finds the minimum rerun command and exits
    2.31 seconds ./spec/integration/bisect_spec.rb:19
  Bisect supports a verbose mode via `DEBUG_RSPEC_BISECT` so we can get detailed log output from users when they report bugs
    2.06 seconds ./spec/integration/bisect_spec.rb:39
  Bisect when the user aborts the bisect with ctrl-c prints the most minimal repro command it has found so far
    1.14 seconds ./spec/integration/bisect_spec.rb:128
  Bisect when the spec ordering is inconsistent stops bisecting and surfaces the problem to the user
    0.61015 seconds ./spec/integration/bisect_spec.rb:99
  RSpec::Core::Bisect::Runner#run ensures environment variables are propagated to the spawned process
    0.32271 seconds ./spec/rspec/core/bisect/runner_spec.rb:29
  RSpec behaves like library wide checks has no malformed whitespace
    0.28569 seconds
  RSpec::Core::Ordering::Random.order has a good distribution
    0.2568 seconds ./spec/rspec/core/ordering_spec.rb:38
  Bisect when a load-time problem occurs while running the suite surfaces the stdout and stderr output to the user
    0.19322 seconds ./spec/integration/bisect_spec.rb:92
  RSpec loads mocks and expectations when the constants are referenced
    0.14021 seconds ./spec/rspec/core_spec.rb:234
  RSpec::Core::Formatters::HtmlFormatter produced HTML with mathn loaded is identical to the one we designed manually
    0.10045 seconds ./spec/rspec/core/formatters/html_formatter_spec.rb:105

Notice that the bisect specs dominate the slow list! I'd really like to avoid having slow specs, but I'm not quite sure what to do. A few ideas:

Can we convert some of them to cukes? I'm trying to limit the cukes to just scenarios that have real documentation value. Generally using aruba from RSpec is faster, anyway.
Are there any of the slow specs I can combine or test in a different way (while still getting the same level of test coverage)?

myronmarston · 2015-04-02T00:00:04Z

...and the build is finally green. (That wasn't easy).

JonRowe · 2015-04-05T11:17:58Z

lib/rspec/core/bisect/subset_enumerator.rb

+            end
+
+            slice_size  /= 2
+            combo_count *= 2


Oddly I'd never considered that this would work, nice!

JonRowe · 2015-04-07T07:09:17Z

What benefit does DRB get us? Without having tried it, shelling out and capturing output of a JSON formatter (or something) seems to be simple and keeps networking out of the picture.

Actually I wonder if that would be better... It'd also involve less of the std lib... (we could even use a custom text output rather than a standard formatter to control communication)

myronmarston · 2015-04-07T14:54:57Z

(Moving this comment from above because it's hidden since I pushed a fix to the docstring you commented on)

is this correct? Seems intuitive to me that flakey specs are more likely to be poorly written and having global side effects.

Flakey specs are likely to be poorly written and have global side effects. The question is what we should do about them.

The bisect feature is trying to find a command that will cause the same set of failures as you started with. For each spec run, it compares what failed against the original set of failures to see if it was able to get that set of failures again. If so, it means the examples that were not included in the current run can be removed from the set of remaining ids to consider. Originally, we used a simple original_failed_example_ids == this_run_failed_examples_ids check for that. In trying bisect on a real-world project that has a flapping example, I noticed that the flapping example was interfering. When it failed, it meant that the == check failed and it would get "stuck" on that round for a while, having to try more and more combos. I realized that the == check was overly strict -- when bisect runs, we're interested in finding a command that will cause all the failures from the original first run to fail, and any additional failures aren't part of what bisect is looking for and can simply be ignored. So in eb5c5fe I changed it, and now the == check is a less-strict "did all the original failures fail this time, too?" check.

We could perhaps do something else with flapping failures (suggestions?) but if they aren't flapping based on an ordering dependency then it's not what bisect is good at working with anyway. The important thing in my mind is that we don't allow a flapping example or two to interfere with bisect being able to find a minimal repro command to get the same set of failures as the original run.

Thoughts?

myronmarston · 2015-04-07T15:09:32Z

What benefit does DRB get us? Without having tried it, shelling out and capturing output of a JSON formatter (or something) seems to be simple and keeps networking out of the picture.

DRb allows two-way communication. We use that so that that the BisectFormatter can ask the bisect server what the expected failure ids are. The knowledge of what the expected failure ids are is then used to abort an individual run as early as possible, once we've learned everything we can from this run:
- If an example that is expected to fail passes, we abort immediately; there's no reason to continue since we've already answered the question "do all the original failures fail with this limited subset of examples loaded?".
- When the last expected failure finishes, we abort immediately, not allowing later examples to run, since we don't care about the results of any later examples.
See 32045fd for more details on that change. Being able to abort a particular spec run early can make a big difference in how long --bisect takes.
If we dropped DRb, we'd have to find an alternate way to communicate the set of expected failures from the main bisect process to the individual spec run process. DRb just works.
Using a JSON formatter to STDOUT (I assume that's where it would go, right?) risks mixing the output that specific to bisecting with other output from the user. For example, if they had a puts in their suite it would get mixed with the JSON output and cause problems.
My use of DRb was initially inspired by minitest-bisect, which uses it and has proved it out as a good solution for this kind of thing. Initially I was a bit unsure about using DRb since I have never really worked with it before, but it was incredibly easy to setup, and appears to "just work". I've yet to see a problem with it.

Actually I wonder if that would be better... It'd also involve less of the std lib... (we could even use a custom text output rather than a standard formatter to control communication)

Avoiding the useful bits of the stdlib has never been (and is still not) one of the goals of RSpec. As a courtesy to users we don way the tradeoffs involved in bringing in new stdlib bits, but as explained above, our use drb and open3 for bisect shouldn't be a concern at all -- the bisect logic (and those dependencies) will never be loaded in the same process as the user's code (unless they user is specifically requiring rspec/core/bisect/runner.rb or whatever....).

DRb is working really well for bisect, I think.

myronmarston · 2015-04-07T16:56:35Z

As a point of comparison, I tried identifying examples in xspec using a short hash:

Interesting. Seems like a good approach for what you were trying to accomplish but clearly wouldn't work here.

127.0.0.1 is ipv4, whereas localhost should work for ipv4 and ipv6.

This allows it to be loaded by the bisect runner but if it’s loaded by other things it’ll notify by causing the “minimizes stdlibs” spec to fail.

myronmarston · 2015-04-07T17:20:41Z

Thanks for the review feedback, everyone. I think I've responded to all the feedback above (and/or pushed updates to address it).

myronmarston · 2015-04-07T17:42:43Z

JRuby seems to intermittently fail with "Errno::EBADF: Bad file descriptor - Bad file descriptor"...but I have no idea what we can do about that other than retry the build :(.

JonRowe · 2015-04-07T23:23:57Z

JRuby seems to intermittently fail with "Errno::EBADF: Bad file descriptor - Bad file descriptor"...but I have no idea what we can do about that other than retry the build :(.

Yeah thats a regular flake

xaviershay · 2015-04-08T03:23:04Z

Ship it!

Makes sense re DRb, I didn't realise the two-way communication piece. Ditto re flakey specs, your reasoning is good, thanks for spelling it out for me :)

myronmarston · 2015-04-08T03:30:33Z

@xaviershay thanks for the thorough review. Before merging this, I was wondering if you had any ideas to address this concern I mentioned above?

Some of the bisect specs (particularly those in spec/integration/bisect_spec.rb) are quite slow -- notice the results of bin/rspec --profile:

Top 10 slowest examples (7.42 seconds, 57.0% of total time):
  Bisect finds the minimum rerun command and exits
    2.31 seconds ./spec/integration/bisect_spec.rb:19
  Bisect supports a verbose mode via `DEBUG_RSPEC_BISECT` so we can get detailed log output from users when they report bugs
    2.06 seconds ./spec/integration/bisect_spec.rb:39
  Bisect when the user aborts the bisect with ctrl-c prints the most minimal repro command it has found so far
    1.14 seconds ./spec/integration/bisect_spec.rb:128
  Bisect when the spec ordering is inconsistent stops bisecting and surfaces the problem to the user
    0.61015 seconds ./spec/integration/bisect_spec.rb:99
  RSpec::Core::Bisect::Runner#run ensures environment variables are propagated to the spawned process
    0.32271 seconds ./spec/rspec/core/bisect/runner_spec.rb:29
  RSpec behaves like library wide checks has no malformed whitespace
    0.28569 seconds
  RSpec::Core::Ordering::Random.order has a good distribution
    0.2568 seconds ./spec/rspec/core/ordering_spec.rb:38
  Bisect when a load-time problem occurs while running the suite surfaces the stdout and stderr output to the user
    0.19322 seconds ./spec/integration/bisect_spec.rb:92
  RSpec loads mocks and expectations when the constants are referenced
    0.14021 seconds ./spec/rspec/core_spec.rb:234
  RSpec::Core::Formatters::HtmlFormatter produced HTML with mathn loaded is identical to the one we designed manually
    0.10045 seconds ./spec/rspec/core/formatters/html_formatter_spec.rb:105

Notice that the bisect specs dominate the slow list! I'd really like to avoid having slow specs, but I'm not quite sure what to do. A few ideas:

Can we convert some of them to cukes? I'm trying to limit the cukes to just scenarios that have real documentation value. Generally using aruba from RSpec is faster, anyway.
Are there any of the slow specs I can combine or test in a different way (while still getting the same level of test coverage)?

I wasn't sure that ENV vars would be propagated to spawned process properly, but no logic was required to make that work. I tried backticks and `system` as well and those also do the right thing. Given that, there's no need to keep this slow spec around. Revert "Add spec demonstrating that ENV vars are propagated properly." This reverts commit 45ed50c.

JonRowe · 2015-04-09T00:47:38Z

spec/integration/bisect_spec.rb

+      normalize_durations(formatter_output.string)
+    end
+
+    it 'finds the minimum rerun command and exits' do


In respect to your last comment, this could be a cucumber scenario (as it'd be useful documentation of the output).

- This surfaces the output to the user so they know what to expect based on reading the relish docs. - This greatly reduces the runtime of our spec suite, from ~11.8 seconds to ~6.9 seconds.

myronmarston · 2015-04-09T05:14:45Z

OK, I updated the specs/cukes based on @JonRowe's suggestions (which were good). I also decided to switch from DEBUG_RSPEC_BISECT to supporting --bisect=verbose, which seemed to make sense once we were going to document that in the cuke.

xaviershay · 2015-04-09T06:44:31Z

LGTM

Bisect

tmertens · 2015-07-17T16:00:19Z

@myronmarston Thanks for this! I just upgraded to 3.3 and feel like a kid in a candy shop right now 🍬 💯

myronmarston · 2015-07-17T16:46:16Z

@tmertens -- you're welcome!

parrish · 2015-07-23T16:29:53Z

@myronmarston This is awesome! Thanks!

myronmarston force-pushed the bisect branch 5 times, most recently from eff255b to 81deaa4 Compare March 25, 2015 17:18

myronmarston force-pushed the bisect branch 7 times, most recently from 34b672c to 788036e Compare March 27, 2015 21:31

JonRowe reviewed Mar 30, 2015
View reviewed changes

myronmarston force-pushed the bisect branch 4 times, most recently from b04b4e9 to f91d4a1 Compare April 1, 2015 06:49

myronmarston changed the title ~~[WIP] Bisect~~ Bisect Apr 1, 2015

myronmarston mentioned this pull request Apr 1, 2015

Feature idea: new --bisect flag #1767

Closed

myronmarston force-pushed the bisect branch from 3877923 to 858f17c Compare April 1, 2015 19:54

myronmarston force-pushed the bisect branch from 858f17c to 41378e1 Compare April 3, 2015 15:17

JonRowe reviewed Apr 5, 2015
View reviewed changes

myronmarston force-pushed the bisect branch from 41378e1 to 46e7f33 Compare April 5, 2015 19:03

Remove unused instance variable.

d48d07f

Fix oddly worded sentences.

9e90efe

myronmarston added 2 commits April 7, 2015 10:04

Prefer localhost over 127.0.0.1.

20c9709

127.0.0.1 is ipv4, whereas localhost should work for ipv4 and ipv6.

Change how we whitelist open3.

f5483c4

This allows it to be loaded by the bisect runner but if it’s loaded by other things it’ll notify by causing the “minimizes stdlibs” spec to fail.

JonRowe reviewed Apr 9, 2015
View reviewed changes

Move bisect integration tests into cucumber scenarios.

dc07521

- This surfaces the output to the user so they know what to expect based on reading the relish docs. - This greatly reduces the runtime of our spec suite, from ~11.8 seconds to ~6.9 seconds.

myronmarston added 2 commits April 8, 2015 22:18

Update test to ensure bisect coordinator closes stream.

1f11351

Use the --bisect flag value rather than ENV var for verbose mode.

2666401

myronmarston force-pushed the bisect branch from 543d408 to 2666401 Compare April 9, 2015 05:19

myronmarston added a commit that referenced this pull request Apr 9, 2015

Merge pull request #1917 from rspec/bisect

978f0b7

Bisect

myronmarston merged commit 978f0b7 into master Apr 9, 2015

myronmarston deleted the bisect branch April 9, 2015 06:45

rspec locked and limited conversation to collaborators Jul 23, 2015

Uh oh!

Bisect #1917

Bisect #1917

Uh oh!

Conversation

myronmarston commented Mar 24, 2015

Uh oh!

myronmarston commented Mar 25, 2015

Uh oh!

JonRowe Mar 30, 2015

Choose a reason for hiding this comment

Uh oh!

myronmarston Mar 30, 2015

Choose a reason for hiding this comment

Uh oh!

JonRowe Mar 30, 2015

Choose a reason for hiding this comment

Uh oh!

myronmarston commented Mar 30, 2015

Uh oh!

myronmarston commented Apr 1, 2015

Uh oh!

myronmarston commented Apr 2, 2015

Uh oh!

JonRowe Apr 5, 2015

Choose a reason for hiding this comment

Uh oh!

JonRowe commented Apr 7, 2015

Uh oh!

myronmarston commented Apr 7, 2015

Uh oh!

myronmarston commented Apr 7, 2015

Uh oh!

myronmarston commented Apr 7, 2015

Uh oh!

myronmarston commented Apr 7, 2015

Uh oh!

myronmarston commented Apr 7, 2015

Uh oh!

JonRowe commented Apr 7, 2015

Uh oh!

xaviershay commented Apr 8, 2015

Uh oh!

myronmarston commented Apr 8, 2015

Uh oh!

JonRowe Apr 9, 2015

Choose a reason for hiding this comment

Uh oh!

myronmarston commented Apr 9, 2015

Uh oh!

xaviershay commented Apr 9, 2015

Uh oh!

tmertens commented Jul 17, 2015

Uh oh!

myronmarston commented Jul 17, 2015

Uh oh!

parrish commented Jul 23, 2015

Uh oh!

Uh oh!