[Testing] Formalize stress tests #15211

davezarzycki · 2018-03-13T18:15:45Z

Stress tests are, by definition, stressful. They intentionally burn a lot of resources by using randomness to hopefully surface state machine bugs. Additionally, many stress tests are multi-threaded these days and they may attempt to use all of the available CPUs to better uncover bugs. In isolation, this is not a problem, but the test suite as a whole assumes that individual tests are single threaded and therefore running multiple stress tests at once can quickly spiral out of control.

This change formalizes stress tests and then treats them like long tests, i.e. tested via 'check-swift-all' and otherwise opt-in.

Finally, with this change, the CI build bots might need to change if they are still only testing 'validation' instead of all of the tests. I see three options:

Run all of the tests. -- There are very few long tests left these days, and the additional costs seems small relative to the cost of the whole validation test suite before this change.
Continue checking 'validation', now sans stress tests.
Check 'validation', then the stress tests. If the former doesn't pass, then there is no point in the latter, and by running the stress tests separately, they stand a better chance of uncovering bugs and not overwhelming build bot resources. And by running the stress tests second, 'lit' can potentially be configured to run the stress tests serially, thus giving each stress test its best chance at uncovering bugs.

davezarzycki · 2018-03-13T18:15:59Z

@swift-ci please test

swift-ci · 2018-03-13T18:35:36Z

Build failed
Swift Test Linux Platform
Git Sha - 7e00f96f40d02dfff4e2b0dec3c9c10a1bbc7a83

davezarzycki · 2018-03-13T18:39:29Z

@swift-ci please test

davezarzycki · 2018-03-13T18:39:52Z

@swift-ci please test linux

swift-ci · 2018-03-13T18:47:36Z

Build failed
Swift Test Linux Platform
Git Sha - 7e00f96f40d02dfff4e2b0dec3c9c10a1bbc7a83

davezarzycki · 2018-03-13T19:15:00Z

@swift-ci please smoke test linux

swift-ci · 2018-03-13T20:22:49Z

Build failed
Swift Test OS X Platform
Git Sha - 7e00f96f40d02dfff4e2b0dec3c9c10a1bbc7a83

davezarzycki · 2018-03-13T20:26:11Z

@swift-ci please smoke test osx

davezarzycki · 2018-03-13T20:27:03Z

@swift-ci please smoke test macos

davezarzycki · 2018-03-14T11:43:11Z

Hi @gottesmm, @shahmishal, and @jrose-apple – Is there anybody else at Apple that should review this?

jrose-apple · 2018-03-14T15:55:16Z

I'm not sure how I feel about adding another category of tests, rather than just using long_tests again, but I can see the point of differentiating these (maybe you want to run them with a lower number of CPUs, to deal with the higher memory usage or the threading). I personally think it makes sense to say that long tests and stress tests all belong under validation-test/, though. What do you think?

davezarzycki · 2018-03-14T16:14:32Z

Personally, I feel that long_test and the primary versus validation differentiation are not clearly defined, and even if they are clearly defined, actual test writers forget (at best) or don't care (at worst) about the differences over time.

If it were up to me, I'd ditch long_test as a REQUIRES flag, and then I'd have three test directories: tests, nonlinear-tests, and stress-tests.

tests – quick functional/unit tests. In other words, a good signal-to-cost ratio.
nonlinear-tests – combinatorial, permutation, exhaustive, and misc "long" tests. In other words, a low signal-to-cost ratio.
stress-tests – Unreliable failure or success scenarios. Resource intensive.

jrose-apple · 2018-03-14T17:28:21Z

validation-test is for "low signal-to-cost ratio" already—even though they're fast, we don't expect compiler_crashers to provide much information 90% of the time. long_test is "maybe it has signal but it's really slow". The fact that we differentiate one of those by directory and the other by REQUIRES is perhaps questionable, but that's a separate issue.

davezarzycki · 2018-03-14T17:49:25Z

If you want long tests and stress tests under validation-test, that sounds fine.

Personally speaking, 99.99% of the validation suite is fast enough that I run it all of the time and then Ctrl-C at the end when the stress tests are fighting with each other and the latent unmarked long_test tests are still running. If I don't hit Ctrl-C, then the validation suite takes about six times longer. That's why I modeled stress_test after long_test which is already excluded from validation.

davezarzycki · 2018-03-14T18:11:41Z

Hey @jrose-apple – I've consolidated the stress/long tests into validation-test as you requested. I did not mark the latent long_test tests as such, because I didn't want to conflate this PR with the question/debate of "what counts as long?" and because long_test is conflated with no build bot testing as far as I know.

Should anybody else sign off on this PR before it is merged?

@swift-ci please smoke test

jrose-apple · 2018-03-14T18:19:49Z

Ah, don't forget to move the inputs to the tests too!

long_tests are only run on specific bots, but it's not none. (They aren't included in PR testing unless you use test-with-preset, though.)

I'd like all three of the other people to get a chance to look at this before you merge, sorry.

davezarzycki · 2018-03-14T18:46:17Z

Nothing to be sorry for, this is a nontrivial testing change and it needs signing off. :-)

I moved the test inputs that I forgot. This just happened to work on my machine because I don't have the sanitizers setup and the command-line stress tests already had a copy of the inputs in validation-test.

gottesmm · 2018-03-14T19:17:56Z

@jrose-apple I need to do after-commit review of this. I need to focus on +0 right now... sorry = (.

davezarzycki · 2018-03-14T20:22:18Z

@swift-ci please smoke test

davezarzycki · 2018-03-20T15:57:27Z

Hi @jrose-apple – I posted to the compiler forum over the weekend in an effort to give people a heads up. Do think that if nobody responds by Friday, we can merge this? We can always revert this if people complain.

jrose-apple · 2018-03-20T16:02:43Z

Someone else should look at it; I'd be fine with either @Rostepher or @shahmishal signing off.

davezarzycki · 2018-03-20T16:10:50Z

@jrose-apple – Is there any specific question or concern that you have about this change? Or do you want review out of an abundance of caution?

For whatever it may be worth, this change is basically just copy-and-pasted from the long_test logic, so any feedback to this change would likely apply to long_test too.

jrose-apple · 2018-03-20T17:14:44Z

It's not about the implementation at all. It's just making sure that someone else besides you and me knows that it's happening and what it means for the buildbots.

gottesmm · 2018-03-20T20:01:03Z

@davezarzycki @jrose-apple I landed +0. I can spend some time reviewing this now.

gottesmm · 2018-03-20T20:24:20Z

@davezarzycki Just from thinking about this in the abstract, I think that this is fine as long as we make sure that the long test bots (e.g.: https://ci.swift.org/job/oss-swift-incremental-RA-linux-ubuntu-14_04-long-test/) also run the stress tests. As long as that is true, I think this is fine.

davezarzycki · 2018-03-20T20:35:13Z

Hi @gottesmm – Right. That's the idea. The least among of work would be for the long test bot to run the check-swift-all target (especially given that only_long doesn't work as advertised).

(Someday, the bots should be changed to check the stress tests serially, but that isn't urgent right now.)

How will I know when the bots have been updated?

gottesmm · 2018-03-20T20:41:40Z

@davezarzycki I would just update all of the long test presets to invoke the right flag for stress testing as well. See ./utils/build-presets.ini. I may need to update a few things internally, but I do not think that this will /break/ anything (we just won't run all of the tests that we need to). But I can fix/take care of that after this is merged.

davezarzycki · 2018-03-20T20:47:52Z

Right. The stress tests will no longer be run on the build bots after this change. If you're willing to update the build bots, that's great! Thanks! If you could mark this as reviewed, and maybe nudge @Rostepher or @shahmishal to review this as well, I'd be happy to merge this. Thanks again!

shahmishal · 2018-03-20T21:08:54Z

@davezarzycki Can you update the following preset with the new stress test flag? otherwise, it looks good.

buildbot_incremental_linux,long_test
buildbot,tools=RA,stdlib=RDA,long_test

davezarzycki · 2018-03-20T21:19:32Z

Please let me know if I updated utils/build-presets.ini correctly. Thanks

davezarzycki · 2018-03-20T21:20:09Z

@swift-ci please smoke test

shahmishal · 2018-03-20T21:28:03Z

You will need to add support for stress test flag in build-script, after that you want to add the flag in presets. I would look at how long-test flag was added in the build-script.

davezarzycki · 2018-03-20T21:53:18Z

@swift-ci please smoke test

jrose-apple · 2018-03-20T22:06:30Z

I think the point was to keep the preset names the same but change the flags they pass.

davezarzycki · 2018-03-20T22:20:49Z

@jrose-apple – I don't think I fully appreciate the build-presets.ini file and its interaction with the build-script. Can you be more specific about what you want changed?

jrose-apple · 2018-03-20T22:36:35Z

You can check out build-script --help for more information, but they're basically just predefined sets of arguments. Nearly all of the buildbots then run build-script --preset=xyz. That makes it possible to reproduce a buildbot build locally.

gottesmm · 2018-03-20T22:39:32Z

@davezarzycki You can see how presents translate to arguments by using the flag --expand-build-script-invocation. For instance:

cartan:solon gottesmm$ ./swift/utils/build-script --expand-build-script-invocation --preset=buildbot_incremental,tools=DA,stdlib=DA,build
./swift/utils/build-script: note: using preset "buildbot_incremental,tools=DA,stdlib=DA,build", which expands to 

./swift/utils/build-script --build-subdir=buildbot_incremental --compiler-vendor=apple --ios --tvos --watchos --reconfigure --verbose-build --build-ninja --build-swift-stdlib-unittest-extra --skip-build-benchmarks --release-debuginfo --assertions --debug-llvm --debug-swift --swift-stdlib-build-type=RelWithDebInfo --swift-stdlib-enable-assertions=true

Stress tests are, by definition, stressful. They intentionally burn a lot of resources by using randomness to hopefully surface state machine bugs. Additionally, many stress tests are multi-threaded these days and they may attempt to use all of the available CPUs to better uncover bugs. In isolation, this is not a problem, but the test suite as a whole assumes that individual tests are single threaded and therefore running multiple stress tests at once can quickly spiral out of control. This change formalizes stress tests and then treats them like long tests, i.e. tested via 'check-swift-all' and otherwise opt-in. Finally, with this change, the CI build bots might need to change if they are still only testing 'validation' instead of all of the tests. I see three options: 1) Run all of the tests. -- There are very few long tests left these days, and the additional costs seems small relative to the cost of the whole validation test suite before this change. 2) Continue checking 'validation', now sans stress tests. 3) Check 'validation', *then* the stress tests. If the former doesn't pass, then there is no point in the latter, and by running the stress tests separately, they stand a better chance of uncovering bugs and not overwhelming build bot resources.

davezarzycki · 2018-03-21T02:27:26Z

I've rebased and dropped the unintentional preset rename. I think this should be good to go now.

@swift-ci please smoke test

shahmishal

LGTM! Thanks!

davezarzycki · 2018-03-21T10:49:28Z

A test is determined to make 79 columns be the new "80 columns".

@swift-ci please smoke test

davezarzycki requested review from gottesmm, jrose-apple and shahmishal March 13, 2018 18:17

jrose-apple requested a review from Rostepher March 14, 2018 15:53

davezarzycki force-pushed the formalize_stress_tests branch from 78760d3 to 466352c Compare March 14, 2018 18:43

davezarzycki changed the title ~~DO NOT MERGE YET -- [Testing] Formalize stress tests~~ WAITING FOR REVIEW – [Testing] Formalize stress tests Mar 14, 2018

davezarzycki added 2 commits March 20, 2018 21:45

[Tests] Consolidate stress/long tests in validation-tests

c00c18f

davezarzycki force-pushed the formalize_stress_tests branch from 7af2f8c to cc7ac38 Compare March 21, 2018 02:25

shahmishal approved these changes Mar 21, 2018

View reviewed changes

[build-script] Track underlying CMake/lit changes

dc22a4d

davezarzycki force-pushed the formalize_stress_tests branch from cc7ac38 to dc22a4d Compare March 21, 2018 10:48

davezarzycki changed the title ~~WAITING FOR REVIEW – [Testing] Formalize stress tests~~ [Testing] Formalize stress tests Mar 21, 2018

davezarzycki merged commit d9f5232 into swiftlang:master Mar 21, 2018

davezarzycki deleted the formalize_stress_tests branch March 21, 2018 12:17

[Testing] Formalize stress tests #15211

[Testing] Formalize stress tests #15211

Uh oh!

Conversation

davezarzycki commented Mar 13, 2018

Uh oh!

davezarzycki commented Mar 13, 2018

Uh oh!

swift-ci commented Mar 13, 2018

Uh oh!

davezarzycki commented Mar 13, 2018

Uh oh!

davezarzycki commented Mar 13, 2018

Uh oh!

swift-ci commented Mar 13, 2018

Uh oh!

davezarzycki commented Mar 13, 2018

Uh oh!

swift-ci commented Mar 13, 2018

Uh oh!

davezarzycki commented Mar 13, 2018

Uh oh!

davezarzycki commented Mar 13, 2018

Uh oh!

davezarzycki commented Mar 14, 2018

Uh oh!

jrose-apple commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davezarzycki commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrose-apple commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davezarzycki commented Mar 14, 2018

Uh oh!

davezarzycki commented Mar 14, 2018

Uh oh!

jrose-apple commented Mar 14, 2018

Uh oh!

davezarzycki commented Mar 14, 2018

Uh oh!

gottesmm commented Mar 14, 2018

Uh oh!

davezarzycki commented Mar 14, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

jrose-apple commented Mar 20, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

jrose-apple commented Mar 20, 2018

Uh oh!

gottesmm commented Mar 20, 2018

Uh oh!

gottesmm commented Mar 20, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

gottesmm commented Mar 20, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

shahmishal commented Mar 20, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

shahmishal commented Mar 20, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

jrose-apple commented Mar 20, 2018

Uh oh!

davezarzycki commented Mar 20, 2018

Uh oh!

jrose-apple commented Mar 20, 2018

jrose-apple commented Mar 14, 2018 •

edited

Loading

davezarzycki commented Mar 14, 2018 •

edited

Loading

jrose-apple commented Mar 14, 2018 •

edited

Loading

gottesmm commented Mar 20, 2018 •

edited

Loading