-
-
Notifications
You must be signed in to change notification settings - Fork 55
Defect: MPI_Type_extent, type of 2nd argument is pointer to MPI_Aint #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
N.B.: Officially, we do not support Windows, because we don't have good access to windows machines, and are short on manpower and funding. However, we welcome any contributions to improve our unofficial windows support. I can't remember the details, but I seem to recall that MS MPI may have some issues with MPI-3 RMA. Try searching through our closed issues for "windows" to see if you can find any relevant clues. Also, perhaps @jeffhammond or @afanfa knows more about this. As far as appropriate defines go, for GCC 7.2 (would be same for 7.1) on mac OS, CMake generates the following compilation lines:
Here is a guide to the defines you should and should not be using:
@jbmaggard Is my understanding correct that by applying the patch wherein |
Also, FYI, there is a script to build OpenCoarrays using WSL at https://github.com/sourceryinstitute/OpenCoarrays/blob/master/windows-install.sh |
Also, this def looks like a bug in OpenCoarrays: https://www.mpich.org/static/docs/v3.2/www3/MPI_Type_extent.html |
@jbmaggard one additional question: Do you anticipate those defines need to always be set on Windows? (i.e. for cygwin or other non mingw windows environments? Or other MPI implementations?) Could they ever cause any harm if set when Windows is detected? |
Based on tales of success by @jbmaggard in issue #435 Fixes #435 Error in the type def of an argument to `MPI_Type_extent` `long int string_len;` → `MPI_Aint string_len;` on `mpi_caf.c:4140` [L4140]: https://github.com/sourceryinstitute/OpenCoarrays/blob/f7a5f2ebeaf935a67184a978bc40177e4399b82b/src/mpi/mpi_caf.c#L4140
@jbmaggard please let me know if https://github.com/sourceryinstitute/OpenCoarrays/pull/436/files looks like it will fix the issue. |
@zbeekman Affirmative. Using the Intel MPI Runtime, along with an import library to impi.dll allows build and run of all the src/tests examples (Win64 native, not cygwin or WSL). Results are comparable to what I get on linux by cafrun'ing (all of) the executables built by install.sh. @zbeekman Thanks for the comments on compiler defines. I started by doing a cmake, make clean, make VERBOSE=1 on linux to see exactly what gfortran, gcc, and ar command lines were being used for compile and link. -D_POSIX is a mingw-w64 issue. You will see the #ifdef if you take a look at signal.h. Specifically, it was needed to have #include <signal.h> in mpi_caf.c define SIGKILL. From intel mpi.h, on -DUSE_GCC to typedef MPI_Aint as long long int: From mpi_caf.c on alloca.h; functionality is apparently provided elsewhere by other includes for mingw-w64: |
@zbeekman On the src/mpi/CMakeLists.txt changes you propose, I think it may be a little more complicated than that.
|
@zbeekman I do not consider Windows to be a relevant platform for parallel computing and know very little about how various parallel programming tools behave in that environment. My standard recommendation for using HPC tools on Windows is to install a Debian VM in Virtual Box. My attempts to use WSL in Windows 10 have proven unsuccessful and frustrating. |
@jeffhammond sure, I guess I miss-remembered you being part of a past discussion. Thanks for the input! @jbmaggard What do you suggest I do RE: PR #436? The change to Am I understanding you correctly that:
As far as |
Unclear if it applies to all mingw (i.e. mingw-w64 & mingw) but this is my best guess. See #435 for further discussion.
I hope you don't mind but I'm uploading your notes on Windows MPI + caf here for future reference. Step3_oca_windows_results.txt I also asked the CMake folks for Intel MPI introspection advice: https://gitlab.kitware.com/cmake/cmake/issues/17189 |
@zbeekman I've tried to keep my comments to facts. That is why my original post was only about something I was certain was an error in mpi_caf.c, based on the mpi.h of MPICH 3.2. My hope in posting was that OpenCoarrays will use this correction as a way to go forward without making the code specific to linux (not LP64 only). Since Intel MPI is at least available on many high performance clusters, and works with gfortran and OpenCoarrays, I thought it might be of interest to the project that the Windows Intel MPI can work with 1.9.1 and gfortran for a native Win64 CAF. I make no claims to expertise on open source for the Win64 platform, but I'll to help as I can. I've been using mingw-w64 for only a few months, but with very good success. I plead ignorance as to whether -D_POSIX applies to any native Win64 toolchains other than mingw-64 (other native Win64 toolchains include TDM, mingw, msys2-mingw-w64). On -D_POSIX, I observed a compile error that SIGKILL was undefined, and looking at signal.h, observed that -D_POSIX would fix that on mingw-w64. The -DUSE_GCC is specifically for the Intel Windows MPI typedef of MPI_Aint in mpi.h, which is why I excerpted a few relevant lines from mpi.h at the part where the typedef of MPI_Aint is made. Looking at mpi.h from Microsoft MPI SDK (8.1) for windows indicates that Microsoft does something similar for the typedef of MPI_Aint, using -D_WIN64. Microsoft MPI (8.1) defines MPI_VERSION as 2, and my experience indicates that trying to compile mpi_caf.c (1.9.1) with mpi.h from Microsoft MPI results in several missing defines and "implicit procedure" errors. Intel Windows MPI Runtime (2017, Update 3) indicates it is MPI-3.1, and its mpi.h sets MPI_VERSION 3. I do know that with the import library to impi.dll, OpenCoarrays 1.9.1 can be used to build CAF applications. I don't really know what might be a good choice going forward if this project wants to have a cmake build of a native Win64 OpenCoarrays. I will try to put together some thoughts on this an make a post when I have a few facts. |
On going forward with a native Win64 "install.sh" type of install, it looks like MSYS2 might work well for the project, as it provides a bash shell, cmake, gnu make, current GCC, etc. It appears that MSYS2/mingw64/GCC-7.2.0 is a native x86_64-posix-seh toolchain built from (or at least very similar to) the mingw-w64 project. I don't know very much about cmake (MSYS2 has 3.9.1 as mingw64/), but I did post my notes on installing MSYS2, building native Win64 libcaf_mpi, and documented testing of natively compiled and linked Win64 CAF programs with the same examples under src/tests as built by install.sh on linux; using Win7-64, updated MSYS2/mingw64/GCC-7.2.0, Intel MPI Runtime 2017, Update 3 (by creating an import library interfacing impi.dll), and OpenCoarrays-1.9.1.zip. Detailed notes are posted at people.tamu.edu/~bmaggard in the hpc/oca folder. On the topic of introspection for cmake, the install of the Intel MPI Runtime sets the I_MPI_ROOT environment variable to the location where .\intel64\bin\impi.dll is installed. If/when Microsoft MPI reaches a point of development compatible with OpenCoarrays, it may be noteworthy that its install sets the MSMPI_BIN environment variable to the location where .\msmpi.dll is located. |
Apologies for joining this discussion late. I haven't ready every post in detail, but I would like to provide context that I hope is helpful. I suggest we support all platforms but add significant caveats regarding non-HPC platforms or uncommon platforms. For example, we could require that addressing issues on such platforms be handled via user contribution of code or funding. I will be the technical lead on a PDE solver project project starting soon with the following amongst its requirements and preferences for the application code: Requirements:
Preferences:
Because this list relates to the application, it doesn't constrain the compiler or runtime library. However, it makes coarray Fortran and gfortran/OpenCoarrays very attractive. It therefore makes sense to consider Windows supported whenever made feasible via user contributions of code or funding. As a fallback, if Windows proves infeasible, then the aforementioned project could require the Intel compiler be used on Windows, but that precludes the use of the Fortran 2015 parallel features that gfortran and OpenCoarrays support. |
Fix bug & improve Windows support -Fixes #435
- Thanks to @jbmaggard for pointing this out in #435
In the first reply above, zbeekman says:
Please comment, especially with regard to GCC 7.2, MPICH 3.2, and Intel MPI 2017, Update 3. Excerpted from mpi.h (Intel MPI 2017, Update 3)
|
Excerpted from your previous comment above:
Could you add a bit of detail on Fortran 2015 parallel features supported by gfortran (version specific please) with OpenCoarrays, that go beyond current ifort capabilities? I've been successful building libcaf_mpi.a on the ada custer (CentOs 6, GCC 6.4.0, Intel MPI 2017 Update 3, OpenCoarrays 1.9.1), and am considering a presentation to share with HPRC staff. On a side note, that might be more appropriate through a different channel, I am highly interested in PDE solution... I'd be fascinated to hear more about problem being solved, approach, methods, etc. |
Intel 17 support for 2015 features seems to be limited to From TS 18508 the following are the major F2015 additional parallel features:
More on the utility of events:For example, if you have a FD or FV code for the solution of PDEs with a traditional domain decomposition, each image can use events to determine when to do a halo exchange with its neighbor. The images can use puts, which are non blocking, to give their neighbors the data they need, and then
This way, each image tries to get out of the way for the other image as quickly as possible, and will perform its own local work while it waits on remote data if any work is left to do. You can introduce two buffers for each halo exchange region (N, W, S, E, up, down, NW, SW, SE, NE, etc.) similar to a double-buffered read, so that you can put data even if the remote image has yet to consume the current time-steps data held in that buffer. Using defined assignment, data dependencies can be handled completely automatically with this scheme. |
@zbeekman Above, you said:
@afanfa
|
I suspect we may need to edit our INSTALL.md again.... Your best bet is to avoid strided transfers if you can (for efficiency) and if you do need them and try -DSTRIDED I think it's not 100% complete in implementation (i.e. puts, gets, and getputs) |
@zbeekman |
OK, this is a bit of a long story... GCC 7 has been supported since 1.9.0, BUT for the average everyday user we default to the 6.x branch. Perhaps we're at the point where enough bugs have been fixed, but only for the 7.x branch to outweigh the outstanding regressions in 7.x. To me, the biggest issue with 7.x is #292. What happened with #292 is that to support allocatable components and further optimization (and perhaps other reasons, this change was made to GFortran without consulting us... or at least the majority of us) the responsibility for type conversion during coarray assignments was moved from GFortran to the -fcoarray=lib library. What this means is that Assignments involving coarrays with type conversions will fail when using 7.x. In my mind this is a pretty substantial issue, so we point most users at the 6.x branch when they install OpenCoarrays with the super user friendly (we hope) If you want to install OpenCoarrays with GFortran 7.x you have a few options.
|
~Oh, also I need to look into the code for acceptable_compiler.f90...~
~Perhaps we stopped using it? Or more likely there is some strange bug.~
Never mind I forgot that this was not part of the CMake build, only the install.sh build...
…On Mon, Sep 25, 2017 at 5:32 PM jbmaggard ***@***.***> wrote:
@zbeekman <https://github.com/zbeekman>
Good job on 1.9.2 release. I like the improvements to install.sh. With the
SOVERSIONing system, is GCC7 now supported? I observe that
acceptable_compiler.f90 still has < 7.0.0.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#435 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAREPJB3KgSm9yGhVhwxkeBVxkreUyhQks5smBvogaJpZM4O8ccU>
.
|
I had some time on the weekend and am looking forward to find some on Tuesday, which is a bank holiday in Germany. |
Uh oh!
There was an error while loading. Please reload this page.
Defect/Bug Report
Defect: MPI_Type_extent, type of 2nd argument is pointer to MPI_Aint
ver
: Microsoft Windows [Version 6.1.7601]Observed Behavior
-I..
used to findlibcaf.h
-I../..
used to findmpi.h
-D_POSIX
used to findSIGKILL
insignal.h
-DALLOCA_MISSING
because it is missing-DUSE_GCC
for correct typedef ofMPI_Aint
inmpi.h
(long long int
)-g -Og
for debugging withgdb
-m64
to give an error message if compile attempted with Win32 version of mingw-w64 gccIn linux (which is LP64), the correct typedef of
MPI_Aint
is long int (seempi.h
from MPICH 3.2). For Windows (which is LLP64), the mpi.h from the Intel MPI Windows SDK defines-DUSE_GCC
to implement the correct typedef of long long int forMPI_Aint
with GCC.Question
I posted details of my build (including build of import library for impi.dll) and results at people.tamu.edu/~bmaggard, in case anyone is interested in a native Win64 CAF solution (Win7-64; mingw-w64 gfortran 7.1.0.rev2; Intel MPI Runtime 2017 Update 3; OpenCoarrays-1.9.1). Execution results are equivalent to building 1.9.1 on linux with install.sh (Fedora 26; gcc-gfortran 7.1.1-3, MPICH 3.2, OpenCoarrays-1.9.1).
For GCC 7.1.0, my question at this time is whether I should be using any/all of the defines: COMPILER_SUPPORTS_CAF_INTRINSICS, STRIDED, and COMPILER_SUPPORTS_ATOMICS. Feedback appreciated on this question.
The text was updated successfully, but these errors were encountered: