Skip to content

[Testing] Formalize stress tests #15211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 21, 2018

Conversation

davezarzycki
Copy link
Contributor

Stress tests are, by definition, stressful. They intentionally burn a lot of resources by using randomness to hopefully surface state machine bugs. Additionally, many stress tests are multi-threaded these days and they may attempt to use all of the available CPUs to better uncover bugs. In isolation, this is not a problem, but the test suite as a whole assumes that individual tests are single threaded and therefore running multiple stress tests at once can quickly spiral out of control.

This change formalizes stress tests and then treats them like long tests, i.e. tested via 'check-swift-all' and otherwise opt-in.

Finally, with this change, the CI build bots might need to change if they are still only testing 'validation' instead of all of the tests. I see three options:

  1. Run all of the tests. -- There are very few long tests left these days, and the additional costs seems small relative to the cost of the whole validation test suite before this change.
  2. Continue checking 'validation', now sans stress tests.
  3. Check 'validation', then the stress tests. If the former doesn't pass, then there is no point in the latter, and by running the stress tests separately, they stand a better chance of uncovering bugs and not overwhelming build bot resources. And by running the stress tests second, 'lit' can potentially be configured to run the stress tests serially, thus giving each stress test its best chance at uncovering bugs.

@davezarzycki
Copy link
Contributor Author

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 7e00f96f40d02dfff4e2b0dec3c9c10a1bbc7a83

@davezarzycki
Copy link
Contributor Author

@swift-ci please test

@davezarzycki
Copy link
Contributor Author

@swift-ci please test linux

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 7e00f96f40d02dfff4e2b0dec3c9c10a1bbc7a83

@davezarzycki
Copy link
Contributor Author

@swift-ci please smoke test linux

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 7e00f96f40d02dfff4e2b0dec3c9c10a1bbc7a83

@davezarzycki
Copy link
Contributor Author

@swift-ci please smoke test osx

@davezarzycki
Copy link
Contributor Author

@swift-ci please smoke test macos

@davezarzycki
Copy link
Contributor Author

Hi @gottesmm, @shahmishal, and @jrose-apple – Is there anybody else at Apple that should review this?

@jrose-apple jrose-apple requested a review from Rostepher March 14, 2018 15:53
@jrose-apple
Copy link
Contributor

jrose-apple commented Mar 14, 2018

I'm not sure how I feel about adding another category of tests, rather than just using long_tests again, but I can see the point of differentiating these (maybe you want to run them with a lower number of CPUs, to deal with the higher memory usage or the threading). I personally think it makes sense to say that long tests and stress tests all belong under validation-test/, though. What do you think?

@davezarzycki
Copy link
Contributor Author

davezarzycki commented Mar 14, 2018

Personally, I feel that long_test and the primary versus validation differentiation are not clearly defined, and even if they are clearly defined, actual test writers forget (at best) or don't care (at worst) about the differences over time.

If it were up to me, I'd ditch long_test as a REQUIRES flag, and then I'd have three test directories: tests, nonlinear-tests, and stress-tests.

  1. tests – quick functional/unit tests. In other words, a good signal-to-cost ratio.
  2. nonlinear-tests – combinatorial, permutation, exhaustive, and misc "long" tests. In other words, a low signal-to-cost ratio.
  3. stress-tests – Unreliable failure or success scenarios. Resource intensive.

@jrose-apple
Copy link
Contributor

jrose-apple commented Mar 14, 2018

validation-test is for "low signal-to-cost ratio" already—even though they're fast, we don't expect compiler_crashers to provide much information 90% of the time. long_test is "maybe it has signal but it's really slow". The fact that we differentiate one of those by directory and the other by REQUIRES is perhaps questionable, but that's a separate issue.

@davezarzycki
Copy link
Contributor Author

If you want long tests and stress tests under validation-test, that sounds fine.

Personally speaking, 99.99% of the validation suite is fast enough that I run it all of the time and then Ctrl-C at the end when the stress tests are fighting with each other and the latent unmarked long_test tests are still running. If I don't hit Ctrl-C, then the validation suite takes about six times longer. That's why I modeled stress_test after long_test which is already excluded from validation.

@davezarzycki
Copy link
Contributor Author

Hey @jrose-apple – I've consolidated the stress/long tests into validation-test as you requested. I did not mark the latent long_test tests as such, because I didn't want to conflate this PR with the question/debate of "what counts as long?" and because long_test is conflated with no build bot testing as far as I know.

Should anybody else sign off on this PR before it is merged?

@swift-ci please smoke test

@jrose-apple
Copy link
Contributor

Ah, don't forget to move the inputs to the tests too!

long_tests are only run on specific bots, but it's not none. (They aren't included in PR testing unless you use test-with-preset, though.)

I'd like all three of the other people to get a chance to look at this before you merge, sorry.

@davezarzycki davezarzycki force-pushed the formalize_stress_tests branch from 78760d3 to 466352c Compare March 14, 2018 18:43
@davezarzycki
Copy link
Contributor Author

Nothing to be sorry for, this is a nontrivial testing change and it needs signing off. :-)

I moved the test inputs that I forgot. This just happened to work on my machine because I don't have the sanitizers setup and the command-line stress tests already had a copy of the inputs in validation-test.

@gottesmm
Copy link
Contributor

@jrose-apple I need to do after-commit review of this. I need to focus on +0 right now... sorry = (.

@davezarzycki
Copy link
Contributor Author

@swift-ci please smoke test

@davezarzycki davezarzycki changed the title DO NOT MERGE YET -- [Testing] Formalize stress tests WAITING FOR REVIEW – [Testing] Formalize stress tests Mar 14, 2018
@davezarzycki
Copy link
Contributor Author

Hi @jrose-apple – I posted to the compiler forum over the weekend in an effort to give people a heads up. Do think that if nobody responds by Friday, we can merge this? We can always revert this if people complain.

@jrose-apple
Copy link
Contributor

Someone else should look at it; I'd be fine with either @Rostepher or @shahmishal signing off.

@davezarzycki
Copy link
Contributor Author

@jrose-apple – Is there any specific question or concern that you have about this change? Or do you want review out of an abundance of caution?

For whatever it may be worth, this change is basically just copy-and-pasted from the long_test logic, so any feedback to this change would likely apply to long_test too.

@jrose-apple
Copy link
Contributor

It's not about the implementation at all. It's just making sure that someone else besides you and me knows that it's happening and what it means for the buildbots.

@gottesmm
Copy link
Contributor

@davezarzycki @jrose-apple I landed +0. I can spend some time reviewing this now.

@gottesmm
Copy link
Contributor

@davezarzycki Just from thinking about this in the abstract, I think that this is fine as long as we make sure that the long test bots (e.g.: https://ci.swift.org/job/oss-swift-incremental-RA-linux-ubuntu-14_04-long-test/) also run the stress tests. As long as that is true, I think this is fine.

@davezarzycki
Copy link
Contributor Author

Hi @gottesmm – Right. That's the idea. The least among of work would be for the long test bot to run the check-swift-all target (especially given that only_long doesn't work as advertised).

(Someday, the bots should be changed to check the stress tests serially, but that isn't urgent right now.)

How will I know when the bots have been updated?

@gottesmm
Copy link
Contributor

@davezarzycki I would just update all of the long test presets to invoke the right flag for stress testing as well. See ./utils/build-presets.ini. I may need to update a few things internally, but I do not think that this will /break/ anything (we just won't run all of the tests that we need to). But I can fix/take care of that after this is merged.

@davezarzycki
Copy link
Contributor Author

Right. The stress tests will no longer be run on the build bots after this change. If you're willing to update the build bots, that's great! Thanks! If you could mark this as reviewed, and maybe nudge @Rostepher or @shahmishal to review this as well, I'd be happy to merge this. Thanks again!

@shahmishal
Copy link
Member

@davezarzycki Can you update the following preset with the new stress test flag? otherwise, it looks good.

buildbot_incremental_linux,long_test
buildbot,tools=RA,stdlib=RDA,long_test

@davezarzycki
Copy link
Contributor Author

Please let me know if I updated utils/build-presets.ini correctly. Thanks

@davezarzycki
Copy link
Contributor Author

@swift-ci please smoke test

@shahmishal
Copy link
Member

You will need to add support for stress test flag in build-script, after that you want to add the flag in presets. I would look at how long-test flag was added in the build-script.

@davezarzycki
Copy link
Contributor Author

@swift-ci please smoke test

@jrose-apple
Copy link
Contributor

I think the point was to keep the preset names the same but change the flags they pass.

@davezarzycki
Copy link
Contributor Author

@jrose-apple – I don't think I fully appreciate the build-presets.ini file and its interaction with the build-script. Can you be more specific about what you want changed?

@jrose-apple
Copy link
Contributor

You can check out build-script --help for more information, but they're basically just predefined sets of arguments. Nearly all of the buildbots then run build-script --preset=xyz. That makes it possible to reproduce a buildbot build locally.

@gottesmm
Copy link
Contributor

gottesmm commented Mar 20, 2018

@davezarzycki You can see how presents translate to arguments by using the flag --expand-build-script-invocation. For instance:

cartan:solon gottesmm$ ./swift/utils/build-script --expand-build-script-invocation --preset=buildbot_incremental,tools=DA,stdlib=DA,build
./swift/utils/build-script: note: using preset "buildbot_incremental,tools=DA,stdlib=DA,build", which expands to 

./swift/utils/build-script --build-subdir=buildbot_incremental --compiler-vendor=apple --ios --tvos --watchos --reconfigure --verbose-build --build-ninja --build-swift-stdlib-unittest-extra --skip-build-benchmarks --release-debuginfo --assertions --debug-llvm --debug-swift --swift-stdlib-build-type=RelWithDebInfo --swift-stdlib-enable-assertions=true

Stress tests are, by definition, stressful. They intentionally burn a
lot of resources by using randomness to hopefully surface state machine
bugs. Additionally, many stress tests are multi-threaded these days and
they may attempt to use all of the available CPUs to better uncover
bugs. In isolation, this is not a problem, but the test suite as a whole
assumes that individual tests are single threaded and therefore running
multiple stress tests at once can quickly spiral out of control.

This change formalizes stress tests and then treats them like long
tests, i.e. tested via 'check-swift-all' and otherwise opt-in.

Finally, with this change, the CI build bots might need to change if
they are still only testing 'validation' instead of all of the tests.
I see three options:

1) Run all of the tests. -- There are very few long tests left these
   days, and the additional costs seems small relative to the cost of
   the whole validation test suite before this change.
2) Continue checking 'validation', now sans stress tests.
3) Check 'validation', *then* the stress tests. If the former doesn't
   pass, then there is no point in the latter, and by running the stress
   tests separately, they stand a better chance of uncovering bugs and
   not overwhelming build bot resources.
@davezarzycki davezarzycki force-pushed the formalize_stress_tests branch from 7af2f8c to cc7ac38 Compare March 21, 2018 02:25
@davezarzycki
Copy link
Contributor Author

I've rebased and dropped the unintentional preset rename. I think this should be good to go now.

@swift-ci please smoke test

Copy link
Member

@shahmishal shahmishal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@davezarzycki davezarzycki force-pushed the formalize_stress_tests branch from cc7ac38 to dc22a4d Compare March 21, 2018 10:48
@davezarzycki
Copy link
Contributor Author

A test is determined to make 79 columns be the new "80 columns".

@swift-ci please smoke test

@davezarzycki davezarzycki changed the title WAITING FOR REVIEW – [Testing] Formalize stress tests [Testing] Formalize stress tests Mar 21, 2018
@davezarzycki davezarzycki merged commit d9f5232 into swiftlang:master Mar 21, 2018
@davezarzycki davezarzycki deleted the formalize_stress_tests branch March 21, 2018 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants