-
-
Notifications
You must be signed in to change notification settings - Fork 55
MPICH test 58/72 co_reduce-factorial-int8 fails on Fedora #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Susi, thanks for the report! Can you please re-run the tests using either
and then report the results here? This will show us the test output and help us track down the source of the problem. Also, which version of MPICH are you using? There is an MPICH bug that has been fixed but has not made it into a release or pre-release, AFAIK. If you disable failed image support by passing Some additional details (suggested on the new issue default form) would be helpful too, such as # of physical cores, version of GFortran, GCC & MPICH and any links to build logs. Thanks! |
For anyone interested in testing, the RPM looks like it is available here: https://jussilehtola.fedorapeople.org/OpenCoarrays-2.0.0-1.fc27.src.rpm |
Hi @susilehtola , Yes I've also hit this bug. With my configuration (mpich I've dug into the code but could not find any starting point for this bug, it might be happening at a lower level (mpich implementation). Deserves a bounty ! |
After adding
Running on an 8-core machine with
|
I have confirmed that this bug happens intermittently. Here is some more detailed debug output from a recent Travis-CI job. To trigger it, it helps to run the tests multiple times in a row: Relavent output from https://travis-ci.org/sourceryinstitute/OpenCoarrays/jobs/372200706#L1603:
|
I wonder if @vehre's MPICH patch might help/fix this. This is a reminder to myself to see if it needs to be backported to MPICH release branches. |
Nope, the MPICH patch is not addressing this issue. The MPICH patch is only addressing an issue in the parts of MPICH that are needed to support failed images. This test is presumable failing, because the datatype (4-byte int) used in the reduce is too large for the target databyte (1-byte int). |
Hmmm I see. So _gfortran_caf_get() in caf_this == image_index, size = 1, dst_kind = 1, src_kind = 1 is trying to send and receive 1 byte ints without converting, or converting incorrectly. |
No, that one is of course working as expected. |
OK, I guess there is no added debug output highlighting where the type conversion error is happening. Thanks for responding! |
I've packaged OpenCoarrays for Fedora both using OpenMPI and MPICH, see review request at
https://bugzilla.redhat.com/show_bug.cgi?id=1560874
With OpenMPI all tests run succesfully, but with MPICH test 58/72 fails:
The text was updated successfully, but these errors were encountered: