-
-
Notifications
You must be signed in to change notification settings - Fork 55
Add installer detection of gfortran/mpich mismatch #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There may be some cases when this isn't a problem, at least once we've decoupled the need to explicitly pass in One big issue stems from the Furthermore, the only hard dependency of the actual library is on the C MPI interface: if my understanding is correct, we only depend on the Fortran MPI interface for the OpenCoarrays wrapper module and perhaps some tests or system introspection. I think the best option for detecting this would be through system introspection at configure/cmake time. We can try to build and run an MPI hello world Fortran program. If we're using a recent GCC then we could even consider skipping the OpenCoarrays wrapper module and any tests in which |
@zbeekman Thanks for writing this thorough and thoughtful response. It helped me remember the details of the failure mode I've seen. On several occasions both in my own work and in supporting collaborators, I've seen the OpenCoarrays installation progress without error or warning and the resulting installation produces executable programs without warning, but the simplest of executable programs fails to launch MPI correctly. Your suggestion to build and run a simple MPI hello world program is spot on. In such situations, we could either terminate the build with an error message or attempt to recover by finding a working MPI installation and offering the user the option to build with it. This also reminds me, however, that the failure mode is sometimes even more subtle. I think it might even be the cause of behavior that @vehre just encountered. He downloaded the virtual machine and built OpenCoarrays with the system-installed MPI. They system MPI was built with an older version of gfortran, but in his the-set up, @vehre please let us know whether your new build confirms my suspicion. |
If this is true, that is a very insidious error. I wonder if there is a way to determine which GCC built MPICH. If not, then there's no real guarantee that you can catch and prevent this issue. Continuing to use the |
Hi all, my previous post did not get inserted here. I therefore inline it now: I did the following test cycle:
e) module load mpich/3.2
So nothing helped. Seems to be some other issue not depending on the The arrays in the testcase are all plain C-style arrays, i.e., without an array Regards, |
Well, my last comment in the above is incorrect. I was looking at a scalar that was transfered which obviously has no array bounds attached it.
|
Correct with dfe2ec0 all tests pass on bare metal and on the virtual machine. You still need to bisect which commit is the troublemaker. This commit is only the one that I know is working. |
yes I'm about to start running git bisect. Which gcc trunk are you currently on? For me, I always get a test failure, but the odd part is that when I run from the latest master with GCC 7 I get test # 8 |
On bare metal I am on trunk from noon today. On the virtual machine trunk is from yesterday noon. So both are quite recent. I don't see test # 9 failing on the VM the dfe2ec0 commit. The VM is using mpich3.3a1. Sure give me a call. |
@vehre have we determined the cause of the test 8 and/or test 9 failures, or are your comments still pertinent? |
Well, I am convinced to know the causes of the failures: sameloc.f90: Needs an improved gfortran-compiler as available in vehre/coarray on github.gcc. |
@vehre great thanks so much for clarifying this for me. |
So, @rouson, I've done some more research, and AFAICT there is no good way to determine which compiler built the MPI library. Perhaps on some linux systems you can use readelf and perhaps on some systems that don't use I think, the closest we can come to detecting these sorts of issues is to do some system introspection via testing a few MPI hello world type of examples: This will at least tell us if there is an incompatible or missing |
Fall back to using `#include 'mpif.h'` Create interfaces when needed # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # On branch #246-gfortran-mpich-mismatch-detection # Changes to be committed: # modified: ../CMakeLists.txt # modified: ../src/extensions/opencoarrays.F90 # modified: ../src/mpi/CMakeLists.txt # modified: ../src/tests/integration/dist_transpose/coarray_distributed_transpose.F90 # modified: ../src/tests/integration/pde_solvers/navier-stokes/coarray-shear_coll.F90 # modified: ../src/tests/performance/BurgersMPI/shared.F90 # modified: ../src/tests/performance/mpi_dist_transpose/mpi_distributed_transpose.F90 # modified: ../src/tests/unit/init_register/CMakeLists.txt # renamed: ../src/tests/unit/init_register/initialize_mpi.f90 -> ../src/tests/unit/init_register/initialize_mpi.F90 # # Untracked files: # ../.travis-scripts/ # ./ #
# Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # On branch #246-gfortran-mpich-mismatch-detection # Changes to be committed: # modified: src/tests/integration/dist_transpose/coarray_distributed_transpose.F90 # # Untracked files: # .travis-scripts/ # build/ #
# Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # On branch #246-gfortran-mpich-mismatch-detection # Changes to be committed: # modified: src/tests/integration/dist_transpose/coarray_distributed_transpose.F90 # modified: src/tests/integration/pde_solvers/navier-stokes/coarray-shear_coll.F90 # modified: src/tests/performance/mpi_dist_transpose/mpi_distributed_transpose.F90 # # Untracked files: # .travis-scripts/ # build/ #
I'm going to close this when we merge #258 unless anyone objects; I don't really see a way to do anything beyond this. |
…smatch-detection Handle MPI and Fortran mod file compatibility more robustly - Fixes #246
A common installation failure mode arises when OpenCoarrays is built with an MPICH installation that itself was built by different gfortran version than the gfortran version currently invoked via
mpif90
. Detecting this failure mode in the OpenCoarrays installer has long been on my To Do list. This issue is a reminder to work on it. Detecting and preventing such subtle and hard-to-diagnose problems is one of the main aims of the OpenCoarrays installer.The text was updated successfully, but these errors were encountered: