-
-
Notifications
You must be signed in to change notification settings - Fork 55
Defect: event post hangs using 2 images per node. #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, this sounds like a problem related with the MPI progress and the fact that events are currently based on MPI atomic operations. What happens if you run the same code on a single node with 4 processes? |
In that case I got the expected output. I obtain the expected output if I run the same code on 4 nodes too. The problem arises only when I distribute images on 2 nodes (2 per node). |
Ok, I know what is going on, I'll produce a patch in the next hour. Because of our policy, the patch may take up to 24 hours before hitting the trunk. If you need to fix this issue quickly I recommend to apply the patch yourself to your OpenCoarrays. The patch will be just few lines long. |
I'm just curious: have you tried to run your code with MVAPICH? |
@afanfa Alessandro, very responsive fix! I'm happy to merge this soon, but I saw over on the PR you wanted to discuss this further. I'm happy to look at it with you, or we can also try to get Damian online too. @Ambra91 Thanks for the detailed bug report, and for using the template! It helps A LOT when people go the extra mile to help make our lives easier! Should be getting a fix out pretty soon. |
Thank you for the quick response and for addressing the issue. |
@Ambra91 thanks again for such a wonderful and detailed report! 💯 Please try again using the Thanks again for your contribution! |
@Ambra91 Can we use your code for a regression/unit test? FYI we use the Linux Foundation CLA: https://gist.github.com/zbeekman/0a5d60a1cbd1f6a8cfa5 |
@zbeekman Yes, of course. |
Really fix #411 on all mpi-installations reliably.
Uh oh!
There was an error while loading. Please reload this page.
Defect/Bug Report
Hi. When performing the following toy example for some configuration settings, the event post statement hangs.
For example, this happens with 4 and 8 images, only when the images are distributed in the following way: 2 images for each node.
uname -a
: Linux yoda 2.6.32-642.11.1.el6.x86_64 tests dis_transpose: test passed #1 SMP Fri Nov 18 19:25:05 UTC 2016 x86_64 x86_64 x86_64 GNU/LinuxObserved Behavior
(program hangs)
Expected Behavior
Steps to Reproduce
I am using a PBS script to run the program.
I have compiled the program using:
And I run it using the following script:
The text was updated successfully, but these errors were encountered: