Skip to content

Add float32 tests to travis #2264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 6, 2017
Merged

Add float32 tests to travis #2264

merged 12 commits into from
Jun 6, 2017

Conversation

kyleabeauchamp
Copy link
Contributor

No description provided.

@kyleabeauchamp
Copy link
Contributor Author

The test failures are exactly as expected: the float32 fails, while float64 succeeds.

PS: I made the choice of running py27 + float32 and py3k + float64, under the assumption that it is unlikely that any errors will be correlated with BOTH py version and float precision.

@kyleabeauchamp
Copy link
Contributor Author

Another possibility would be to separately add 3 more lines to the env matrix so that we have py27+32, py27+64, py36+64. That would take more time for CI, but allow us to see whether py27+36 are workiing for float64

@ColCarroll
Copy link
Member

Erg -- I like the idea of these tests, but not sure how to get everything to run. I'd guess the tests could run about twice as fast if we looked over them all and figured out what is needed.

If I recall correctly, the free tier of travis has a limit of 50minutes per job, and 4 hours over all the jobs. Last successful build of master was 6 jobs of between 19 and 38 minutes, totaling 2:36.

All this is to say that I like your approach here, and wish there was a way to see if failures were due to python2 vs python3 or not. I wonder if there's a sneaky test fixture we could use that

  • catches theano TypeErrors,
  • reruns with float64
  • fails with the original message along with (and {passes | fails} with float64) appended

@kyleabeauchamp
Copy link
Contributor Author

OK, I tried adding the 3 extra lines of tests. This will let us see py27/36 errors orthogonally from float32 errors. Hopefully the wall time is still within the limits.

@twiecki
Copy link
Member

twiecki commented Jun 3, 2017

I agree with @ColCarroll. We would need to fix all tests before merging this. A different approach would be to fix individual tests and add a fixture for those that already work.

@kyleabeauchamp
Copy link
Contributor Author

We don't need to fix all tests first. We have 3 separate indicators for the float32 tests, which will allow you to ignore them at will.

The alternative is not testing float32, which will inevitably lead to regressions and duplicated efforts.

@ColCarroll
Copy link
Member

what about using pytest's xfail decorator? It allows you to mark expected failures given a condition, and will not fail the tests, but will mark tests that "unexpectedly pass". As more of the float32 tests pass the suite, the decorator could be removed to prevent regressions.

@ColCarroll
Copy link
Member

ColCarroll commented Jun 3, 2017

The unexpected pass seems super useful in figuring out where the decorator could be removed, and you might even add the strict argument so that unexpected passes fail the test suite, forcing the commiter to remove the decorator greedily.

@kyleabeauchamp
Copy link
Contributor Author

We have several tests that are run using parameterized, which IIRC won't work with xfail decorations.

Also, we're currently almost at the point where we should remove the decorator to prevent regressions. I basically did everything that I could without getting feedback from owners of individual tests. If float32 support is something that is desired, then exposing the test failures on Travis is a good way to floatX behavior on everyone's radar.

@twiecki
Copy link
Member

twiecki commented Jun 4, 2017

@kyleabeauchamp What do you think we should do about the failing tests then? Keep float32 failing for the time being until we address all of them?

@kyleabeauchamp
Copy link
Contributor Author

I guess I'm proposing that we could keep them failing, with the idea that we then check new features by success on float64 and not increasing the number of float32 failures.

@ColCarroll
Copy link
Member

Hm... I haven't used xfail myself, so I'm not familiar with difficulties, but this suggests you can mark individual parameters as failing.

The problem with merging failing tests is that it becomes way more onerous to see why they're failing, and increases the chance of missing a regression. I think especially when trying to push out a 3.1 release, we need all the automated help we can get.

I'm suggesting having a single commit activate the float32 tests and xfail(strict=True) all the currently failing cases. As those remaining failures get fixed (inadvertently or not) the strict flag will naturally force removing those flags. I have a pretty good test setup on my machine and can make a pr with on your branch with those decorators if it would help?

Just feel like I should also emphasize that I think adding float32 to the test matrix is a great idea for getting float32 support working (and that it would be awesome if everything ran smoothly on the gpu). Maybe that can be a/the focus for 3.2 @twiecki ?

@twiecki
Copy link
Member

twiecki commented Jun 5, 2017

@kyleabeauchamp Merged the previous two PRs, want to rebase?

@kyleabeauchamp
Copy link
Contributor Author

So the failing tests seem to be running fine on my machine:

 pytest -x pymc3/tests/test_text_backend.py 
=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.6.0, pytest-3.0.7, py-1.4.32, pluggy-0.4.0
rootdir: pymc3, inifile: setup.cfg
collected 113 items 

pymc3/tests/test_text_backend.py ..X....X....X..XXXXXxXXXXXXXXXXXXXXXXXXXXxXXXXXXXXXXXXXXXXXXXXxXXXXXXXXXXXXXXXXX...XXXxXXXXXXXXXXXXXXxXXXXXXXXXXX

================================================================================ 15 passed, 5 xfailed, 93 xpassed in 20.74 seconds =================================================================================````

@kyleabeauchamp
Copy link
Contributor Author

nevermind, still see issues with sqlite backend

@kyleabeauchamp
Copy link
Contributor Author

So the previous commit fixes a problem that I only see on Travis, but not on my local setup. :(

@kyleabeauchamp
Copy link
Contributor Author

OK this seems to be working now.

@@ -430,7 +430,7 @@ def test_bound_normal(self):
PositiveNormal = Bound(Normal, lower=0.)
self.pymc3_matches_scipy(PositiveNormal, Rplus, {'mu': Rplus, 'sd': Rplus},
lambda value, mu, sd: sp.norm.logpdf(value, mu, sd),
decimal=select_by_precision(float64=6, float32=0))
decimal=select_by_precision(float64=6, float32=-1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy that float32 matches so little...

@twiecki
Copy link
Member

twiecki commented Jun 6, 2017

@kyleabeauchamp This looks fantastic, thanks for the herculian effort.

@twiecki twiecki merged commit 21511ad into pymc-devs:master Jun 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants