Skip to content
This repository was archived by the owner on Nov 30, 2024. It is now read-only.

Bisect #1917

Merged
merged 38 commits into from
Apr 9, 2015
Merged

Bisect #1917

merged 38 commits into from
Apr 9, 2015

Conversation

myronmarston
Copy link
Member

This adds a new --bisect flag for #1767.

TODOs:

  • Running a DRb server opens the process up to undesirable remote communication. We should probably use drb/acl to limit access to just localhost.
  • Sometimes users use ENV vars to toggle certain behaviors in their test suite, and we need to make sure that any passed ENV vars get forwarded from the rspec --bisect process to the test suite runs.
  • If there are any puts or warn statements in the test suite, it will print output in the middle of the bisect progress output. We should silence it.
  • If the test suite bails with an error while booting (e.g. due to a NoMethodError in a describe block or whatever), we don't surface that to the user.
  • Bisect only works if the ordering is consistent. If the user is using a custom ordering that is inconsistent (such as using a shuffle in there...), we should detect that and bail with a clear error about the problem.
  • The runner ignores --format progress when running the suite but not -fp. This should be fixed.
  • Ensure things work OK when no examples fail.
  • Ignore additional failures that didn't fail the on the first run.
  • Figure out why the AppVeyor build for 2.1 is failing.
  • When bisection fails (e.g. due to bailing early from a spec suite load time error, or inconsistent ordering or whatever) we should exit with a non-zero status.
  • Handle SIGINT, so that it prints the most minimal repro command it's been able to find so far and then exits -- that way, when bisect is taking along time, if the user gets impatient and hits ctrl-c, they can still benefit from the work we've done up to that point.

@myronmarston myronmarston force-pushed the bisect branch 5 times, most recently from eff255b to 81deaa4 Compare March 25, 2015 17:18
@myronmarston
Copy link
Member Author

@rspec/rspec -- There are a number of TODOs above, but someone could go ahead and start reviewing this anytime. What's already there is pretty stable and what's left is largely additive.

@myronmarston myronmarston force-pushed the bisect branch 7 times, most recently from 34b672c to 788036e Compare March 27, 2015 21:31

def debug(level, msg)
puts "#{' ' * level}Minimizer: #{msg}" if ENV['DEBUG_RSPEC_BISECT']
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended for "release" usage? or just whilst this is WIP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended for "release" usage? or just whilst this is WIP?

Kinda both, actually. It was originally inspired by a similar feature in minitest-bisect:

https://github.com/seattlerb/minitest-bisect/blob/233b4fe69131358efa03da79270f91b2e7da9ea5/lib/minitest/find_minimal_combination.rb#L82-L84

It helped me get it working right. I think it also has value to keep something like this in place, though. Without it, if bisect's not working for a user, there's not much we can do to understand why. Having a verbose/debug mode will allow them to give us a print out that will hopefully allow us to understand where it went wrong. Bundler has something similar -- see rubygems/bundler#2248.

That said, this is a crappy WIP implementation of such debugging output. I'm thinking the debug output should be moved into the formatter (or perhaps be an alternate formatter). And perhaps we shouldn't necessarily use an ENV var to activate it, although I don't necessarily want to add a public API for it either -- it's more just something we can ask users to enable when they report bisect bugs.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think we should at least use notify(:message, "#{' ' * level}Minimizer: #{msg}"), given the formatter has something designed for this sort of usage... I'm not totally opposed to the usage of the ENV variable if it's intended for "semi private" debugging use...

@myronmarston
Copy link
Member Author

OK, @JonRowe, I pushed some improvements to the debug output. Check it out and let me know what you think.

@myronmarston myronmarston force-pushed the bisect branch 4 times, most recently from b04b4e9 to f91d4a1 Compare April 1, 2015 06:49
@myronmarston
Copy link
Member Author

OK, I think I've taken this as far as I'm going to take it. Would be great to get a full review from @rspec/rspec folks. Anyone?

One open question/issue: the some of the bisect specs (particularly those in spec/integration/bisect_spec.rb) are quite slow -- notice the results of bin/rspec --profile:

Top 10 slowest examples (7.42 seconds, 57.0% of total time):
  Bisect finds the minimum rerun command and exits
    2.31 seconds ./spec/integration/bisect_spec.rb:19
  Bisect supports a verbose mode via `DEBUG_RSPEC_BISECT` so we can get detailed log output from users when they report bugs
    2.06 seconds ./spec/integration/bisect_spec.rb:39
  Bisect when the user aborts the bisect with ctrl-c prints the most minimal repro command it has found so far
    1.14 seconds ./spec/integration/bisect_spec.rb:128
  Bisect when the spec ordering is inconsistent stops bisecting and surfaces the problem to the user
    0.61015 seconds ./spec/integration/bisect_spec.rb:99
  RSpec::Core::Bisect::Runner#run ensures environment variables are propagated to the spawned process
    0.32271 seconds ./spec/rspec/core/bisect/runner_spec.rb:29
  RSpec behaves like library wide checks has no malformed whitespace
    0.28569 seconds
  RSpec::Core::Ordering::Random.order has a good distribution
    0.2568 seconds ./spec/rspec/core/ordering_spec.rb:38
  Bisect when a load-time problem occurs while running the suite surfaces the stdout and stderr output to the user
    0.19322 seconds ./spec/integration/bisect_spec.rb:92
  RSpec loads mocks and expectations when the constants are referenced
    0.14021 seconds ./spec/rspec/core_spec.rb:234
  RSpec::Core::Formatters::HtmlFormatter produced HTML with mathn loaded is identical to the one we designed manually
    0.10045 seconds ./spec/rspec/core/formatters/html_formatter_spec.rb:105

Notice that the bisect specs dominate the slow list! I'd really like to avoid having slow specs, but I'm not quite sure what to do. A few ideas:

  • Can we convert some of them to cukes? I'm trying to limit the cukes to just scenarios that have real documentation value. Generally using aruba from RSpec is faster, anyway.
  • Are there any of the slow specs I can combine or test in a different way (while still getting the same level of test coverage)?

@myronmarston myronmarston changed the title [WIP] Bisect Bisect Apr 1, 2015
@myronmarston
Copy link
Member Author

...and the build is finally green. (That wasn't easy).

end

slice_size /= 2
combo_count *= 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly I'd never considered that this would work, nice!

@JonRowe
Copy link
Member

JonRowe commented Apr 7, 2015

What benefit does DRB get us? Without having tried it, shelling out and capturing output of a JSON formatter (or something) seems to be simple and keeps networking out of the picture.

Actually I wonder if that would be better... It'd also involve less of the std lib... (we could even use a custom text output rather than a standard formatter to control communication)

@myronmarston
Copy link
Member Author

(Moving this comment from above because it's hidden since I pushed a fix to the docstring you commented on)

is this correct? Seems intuitive to me that flakey specs are more likely to be poorly written and having global side effects.

Flakey specs are likely to be poorly written and have global side effects. The question is what we should do about them.

The bisect feature is trying to find a command that will cause the same set of failures as you started with. For each spec run, it compares what failed against the original set of failures to see if it was able to get that set of failures again. If so, it means the examples that were not included in the current run can be removed from the set of remaining ids to consider. Originally, we used a simple original_failed_example_ids == this_run_failed_examples_ids check for that. In trying bisect on a real-world project that has a flapping example, I noticed that the flapping example was interfering. When it failed, it meant that the == check failed and it would get "stuck" on that round for a while, having to try more and more combos. I realized that the == check was overly strict -- when bisect runs, we're interested in finding a command that will cause all the failures from the original first run to fail, and any additional failures aren't part of what bisect is looking for and can simply be ignored. So in eb5c5fe I changed it, and now the == check is a less-strict "did all the original failures fail this time, too?" check.

We could perhaps do something else with flapping failures (suggestions?) but if they aren't flapping based on an ordering dependency then it's not what bisect is good at working with anyway. The important thing in my mind is that we don't allow a flapping example or two to interfere with bisect being able to find a minimal repro command to get the same set of failures as the original run.

Thoughts?

@myronmarston
Copy link
Member Author

What benefit does DRB get us? Without having tried it, shelling out and capturing output of a JSON formatter (or something) seems to be simple and keeps networking out of the picture.

  • DRb allows two-way communication. We use that so that that the BisectFormatter can ask the bisect server what the expected failure ids are. The knowledge of what the expected failure ids are is then used to abort an individual run as early as possible, once we've learned everything we can from this run:
    • If an example that is expected to fail passes, we abort immediately; there's no reason to continue since we've already answered the question "do all the original failures fail with this limited subset of examples loaded?".
    • When the last expected failure finishes, we abort immediately, not allowing later examples to run, since we don't care about the results of any later examples.
  • See 32045fd for more details on that change. Being able to abort a particular spec run early can make a big difference in how long --bisect takes.
  • If we dropped DRb, we'd have to find an alternate way to communicate the set of expected failures from the main bisect process to the individual spec run process. DRb just works.
  • Using a JSON formatter to STDOUT (I assume that's where it would go, right?) risks mixing the output that specific to bisecting with other output from the user. For example, if they had a puts in their suite it would get mixed with the JSON output and cause problems.
  • My use of DRb was initially inspired by minitest-bisect, which uses it and has proved it out as a good solution for this kind of thing. Initially I was a bit unsure about using DRb since I have never really worked with it before, but it was incredibly easy to setup, and appears to "just work". I've yet to see a problem with it.

Actually I wonder if that would be better... It'd also involve less of the std lib... (we could even use a custom text output rather than a standard formatter to control communication)

Avoiding the useful bits of the stdlib has never been (and is still not) one of the goals of RSpec. As a courtesy to users we don way the tradeoffs involved in bringing in new stdlib bits, but as explained above, our use drb and open3 for bisect shouldn't be a concern at all -- the bisect logic (and those dependencies) will never be loaded in the same process as the user's code (unless they user is specifically requiring rspec/core/bisect/runner.rb or whatever....).

DRb is working really well for bisect, I think.

@myronmarston
Copy link
Member Author

As a point of comparison, I tried identifying examples in xspec using a short hash:

Interesting. Seems like a good approach for what you were trying to accomplish but clearly wouldn't work here.

127.0.0.1 is ipv4, whereas localhost should work
for ipv4 and ipv6.
This allows it to be loaded by the bisect runner but if
it’s loaded by other things it’ll notify by causing the
“minimizes stdlibs” spec to fail.
@myronmarston
Copy link
Member Author

Thanks for the review feedback, everyone. I think I've responded to all the feedback above (and/or pushed updates to address it).

@myronmarston
Copy link
Member Author

JRuby seems to intermittently fail with "Errno::EBADF: Bad file descriptor - Bad file descriptor"...but I have no idea what we can do about that other than retry the build :(.

@JonRowe
Copy link
Member

JonRowe commented Apr 7, 2015

JRuby seems to intermittently fail with "Errno::EBADF: Bad file descriptor - Bad file descriptor"...but I have no idea what we can do about that other than retry the build :(.

Yeah thats a regular flake

@xaviershay
Copy link
Member

Ship it!

Makes sense re DRb, I didn't realise the two-way communication piece. Ditto re flakey specs, your reasoning is good, thanks for spelling it out for me :)

@myronmarston
Copy link
Member Author

@xaviershay thanks for the thorough review. Before merging this, I was wondering if you had any ideas to address this concern I mentioned above?

Some of the bisect specs (particularly those in spec/integration/bisect_spec.rb) are quite slow -- notice the results of bin/rspec --profile:

Top 10 slowest examples (7.42 seconds, 57.0% of total time):
  Bisect finds the minimum rerun command and exits
    2.31 seconds ./spec/integration/bisect_spec.rb:19
  Bisect supports a verbose mode via `DEBUG_RSPEC_BISECT` so we can get detailed log output from users when they report bugs
    2.06 seconds ./spec/integration/bisect_spec.rb:39
  Bisect when the user aborts the bisect with ctrl-c prints the most minimal repro command it has found so far
    1.14 seconds ./spec/integration/bisect_spec.rb:128
  Bisect when the spec ordering is inconsistent stops bisecting and surfaces the problem to the user
    0.61015 seconds ./spec/integration/bisect_spec.rb:99
  RSpec::Core::Bisect::Runner#run ensures environment variables are propagated to the spawned process
    0.32271 seconds ./spec/rspec/core/bisect/runner_spec.rb:29
  RSpec behaves like library wide checks has no malformed whitespace
    0.28569 seconds
  RSpec::Core::Ordering::Random.order has a good distribution
    0.2568 seconds ./spec/rspec/core/ordering_spec.rb:38
  Bisect when a load-time problem occurs while running the suite surfaces the stdout and stderr output to the user
    0.19322 seconds ./spec/integration/bisect_spec.rb:92
  RSpec loads mocks and expectations when the constants are referenced
    0.14021 seconds ./spec/rspec/core_spec.rb:234
  RSpec::Core::Formatters::HtmlFormatter produced HTML with mathn loaded is identical to the one we designed manually
    0.10045 seconds ./spec/rspec/core/formatters/html_formatter_spec.rb:105

Notice that the bisect specs dominate the slow list! I'd really like to avoid having slow specs, but I'm not quite sure what to do. A few ideas:

  • Can we convert some of them to cukes? I'm trying to limit the cukes to just scenarios that have real documentation value. Generally using aruba from RSpec is faster, anyway.
  • Are there any of the slow specs I can combine or test in a different way (while still getting the same level of test coverage)?

I wasn't sure that ENV vars would be propagated to spawned
process properly, but no logic was required to make that work.
I tried backticks and `system` as well and those also do
the right thing.

Given that, there's no need to keep this slow spec around.

Revert "Add spec demonstrating that ENV vars are propagated properly."

This reverts commit 45ed50c.
normalize_durations(formatter_output.string)
end

it 'finds the minimum rerun command and exits' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In respect to your last comment, this could be a cucumber scenario (as it'd be useful documentation of the output).

- This surfaces the output to the user so they know what
  to expect based on reading the relish docs.
- This greatly reduces the runtime of our spec suite, from
  ~11.8 seconds to ~6.9 seconds.
@myronmarston
Copy link
Member Author

OK, I updated the specs/cukes based on @JonRowe's suggestions (which were good). I also decided to switch from DEBUG_RSPEC_BISECT to supporting --bisect=verbose, which seemed to make sense once we were going to document that in the cuke.

@xaviershay
Copy link
Member

LGTM

myronmarston added a commit that referenced this pull request Apr 9, 2015
@myronmarston myronmarston merged commit 978f0b7 into master Apr 9, 2015
@myronmarston myronmarston deleted the bisect branch April 9, 2015 06:45
@tmertens
Copy link

@myronmarston Thanks for this! I just upgraded to 3.3 and feel like a kid in a candy shop right now 🍬 💯

@myronmarston
Copy link
Member Author

@tmertens -- you're welcome!

@parrish
Copy link

parrish commented Jul 23, 2015

@myronmarston This is awesome! Thanks!

@rspec rspec locked and limited conversation to collaborators Jul 23, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants