-
-
Notifications
You must be signed in to change notification settings - Fork 55
Build failure when compiling w/ patched build system in parallel #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This appears to be a CMake bug, although I need to confirm this to ensure we're setting up the dependencies correctly. Which version of CMake are you using? |
@amckinstry I suspect that your patch to Knowing which version of CMake you're using would still be helpful so I can test/confirm locally and then submit a bug report to CMake. |
3.7.2 |
Yes, it is triggered by your patch. It's because we're compiling and adding the coarrays extension module with the library... If we were to make that it's own target this would go away... I'll look into this. |
Moving patch issue from #365 here because implementing the patch goes hand in hand with fixing the parallel build issue... Index: ./src/mpi/CMakeLists.txt
===================================================================
--- ./src/mpi/CMakeLists.txt
+++ ./src/mpi/CMakeLists.txt
@@ -36,7 +36,8 @@ if (MPI_Fortran_MODULE_COMPILES)
set(MPI_CAF_FORTRAN_FILES ../extensions/opencoarrays.F90)
endif()
-add_library(caf_mpi mpi_caf.c ../common/caf_auxiliary.c ${MPI_CAF_FORTRAN_FILES})
+add_library(caf_mpi SHARED mpi_caf.c ../common/caf_auxiliary.c ${MPI_CAF_FORTRAN_FILES})
+add_library(caf_mpi_static STATIC mpi_caf.c ../common/caf_auxiliary.c ${MPI_CAF_FORTRAN_FILES})
target_link_libraries(caf_mpi PRIVATE ${MPI_C_LIBRARIES} ${MPI_Fortran_LIBRARIES})
set_target_properties ( caf_mpi
@@ -53,9 +54,14 @@ endif()
include_directories(${CMAKE_BINARY_DIR}/mod)
install(TARGETS caf_mpi EXPORT OpenCoarraysTargets
+ DESTINATION "${CMAKE_INSTALL_LIBDIR}"
+ LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}"
+)
+install(TARGETS caf_mpi_static EXPORT OpenCoarraysTargets
ARCHIVE DESTINATION "${CMAKE_INSTALL_LIBDIR}"
LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}"
)
+set_target_properties(caf_mpi PROPERTIES SOVERSION 1 SONAME "libcaf_mpi.so.${OpenCoarraysVersion}")
# Install modules to standard include dir, but namespace them with compiler/version
set (mod_install "OpenCoarrays/${CMAKE_Fortran_COMPILER_ID}/${CMAKE_Fortran_COMPILER_VERSION}") |
Hello. I come from completely different Fortran project. We also have hit (I believe) the same race condition-during-compilation bug. Unlike you, we don't build shared libraries, though. Error is non-deterministic and hard to reliably reproduce. I'd like to file a bug report against (CMake? gfortran?), but I can't really nail it. I understand from this topic that you also didn't find the root cause of the bug. The bug went gone accidentally with the fix for another issue, #365. Am I correct? |
No, I know the root cause of this behavior. It is due to CMake's handling of CMake/Kitware is aware of this behavior, and it is intentional (not a bug). In fact, under some circumstances the same problem may be encountered using add_library(my_obj_lib OBJECT ftn_mod1.f90 ftn_mod2.f90 ftn_mod3.f90)
add_library(lib_using_ftn_mods STATIC $<TARGET_OBJECTS:my_obj_lib> lib_src.f90)
add_executable(exe_using_ftn_mods $<TARGET_OBJECTS:my_obj_lib> a.out)
add_library(other_ftn_lib STATIC $<TARGET_OBJECTS:my_obj_lib> other_lib_src.f90) Now CMake compiles object files (and corresponding Hope this helps! |
Fixed in #440 |
I am seeing a very similar race condition, but was thinking this problem was with the Intel Fortran compiler. In my case I compile the same source directory twice in two different build directories. Each creates a different library and the .mod files are copied to different "include" subdirectories. (One build is single precision and one is double precision, but I don't think that is relevant here.) I was thinking that the Intel compiler is somehow using a shared resource because the source file is the same for both builds and I get errors like "empty" stream. CMake uses Intel's "-module" flag to move the .module, and I think this is correlated. OTOH, I've not been able to make a non-cmake reproducer for this behavior yet. I'll try some more experiments tomorrow. Unfortunately, I only hit the error about 50% of my builds and it takes about 2 minutes to get to the race after a make clean. |
oh boy, that doesn't sound fun to debug... I'm not sure I completely understand though... are the builds in the two separate directories (single and double precision) happening concurrently? Are they triggered manually or by a script/super-build? My thoughts on the matter are:
I would make sure that all source files containing module definitions appear in one and only one I've never been one to shy away from accusing Intel of having Fortran compiler bugs, but the "empty stream" error you're getting really sounds to me like a CMake parallel build issue. This CMake behavior is pretty obnoxious, IMO, whether they consider it a "bug" or not, and I'd be willing to share my opinions on the matter with them if you file a new issue. |
Just to elaborate more, the "empty stream" error sounds like one instance/thread of |
The directory in question is built in two separate directories. This is using cmake's optional argument for add_subdirectory. Until recently, only one of the directories was actually being built because only one was a dependency for my targets. Having resolved a long standing issue with another target, both directories are now dependencies for my ultimate target and so now both are being built - apparently simultaneously. In theory, there is no need to serialize anything. If the compiler only creates files in the different working directories, it should be fine that two threads are reading from the same file simultaneously. I've had a number of issues where Intel is "overly" clever about finding .mod files in directories that are not part of the build, so I'm perhaps predisposed to be suspicious here. I have a project where I can do a conventional GNUmake build in the source tree or a CMake build in a build tree. I've spent hours tracking down a problem that eventually turned out to be that Intel was first looking in the same directory as the source file for .mod files rather than the working directory. (No "-I" was pointing to the source directory.) May have been tied to a particular version of Intel, but I'm now careful to either clone or to do make clean if I need to check the GNUmake build. |
Regarding your "empty stream" comment. Each invocation of the compiler should be creating a different .mod file. Either in a different build directory as explained above, or due to the use of CMake's module move option which specifies a different target directory for each precision. Hence why I think Intel is doing something unnecessary (like a tmp file based upon the source file name) but possibly not technically a bug. |
And of course, typing "make" a second time always appears to work, so it is more annoying than problematic. (May undermine my attempt to cell cmake to the org though.) |
Cmake does some tricky stamp creation, modfile relocation etc. too. Without
seeing what's going on myself it's hard to know for sure what's up.
…On Tue, Sep 5, 2017 at 5:13 PM tclune ***@***.***> wrote:
Regarding your "empty stream" comment. Each invocation of the compiler
should be creating a different .mod file. Either in a different build
directory as explained above, or due to the use of CMake's module move
option which specifies a different target directory for each precision.
Hence why I think Intel is doing something unnecessary (like a tmp file
based upon the source file name) but possibly not technically a bug.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#366 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAREPIO8xZDULIap08PcBhlDTe2xGYtBks5sfbmJgaJpZM4NHUFE>
.
|
I am now encountering this issue (or similar) with gfortran on Darwin. It happens almost immediately when I start to build (but of course not with VERBOSE=1 or with serial).
Notice that apparently two different threads are trying to build the same file. I've double checked that this file is only listed once in a single target for the build. Further it is not always this file or even this target that is effected. Another potential data point, is that this is currently happening when I am building a top level target that includes EXCLUDE_FROM_ALL targets (building tests). Hmmmm. This is now with cmake 3.11.4 |
After a bit more investigating, I understand what was causing my current issue. It could easily be argued that it is not a CMake bug. I had used ALLOW_DUPLICATE_CUSTOM_TARGETS for my tests and this was triggering simultaneous builds. I had already planned my next bit of work to be eliminating that cmake antipattern before I raised the issue above. Having done that now, the build seems fine again. |
Happy to hear you've sorted it out. This is always a source of confusion. My rule of thumb is never, ever allow a source file providing a |
I completely agree on the principle, but I had no idea that cmake was doing this to me via the indirect route. The compilation errors were happening in a layer that was quite clean. It was 2 different “test” targets with the same name that both wanted the same source file to be built …
On Jul 23, 2018, at 1:39 PM, zbeekman <[email protected]<mailto:[email protected]>> wrote:
Happy to hear you've sorted it out. This is always a source of confusion. My rule of thumb is never, ever allow a source file providing a .mod module file to be built by more than one thread. In practice this usually means the use of object libraries.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#366 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AF7_4C3wKaj-3oa6YWAFImu-IL94c1sHks5uJgpngaJpZM4NHUFE>.
|
Yes, I agree that it is very unintuitive.
…On Mon, Jul 23, 2018 at 1:45 PM Tom Clune ***@***.***> wrote:
I completely agree on the principle, but I had no idea that cmake was
doing this to me via the indirect route. The compilation errors were
happening in a layer that was quite clean. It was 2 different “test”
targets with the same name that both wanted the same source file to be
built …
On Jul 23, 2018, at 1:39 PM, zbeekman ***@***.***<mailto:
***@***.***>> wrote:
Happy to hear you've sorted it out. This is always a source of confusion.
My rule of thumb is never, ever allow a source file providing a .mod module
file to be built by more than one thread. In practice this usually means
the use of object libraries.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<
#366 (comment)>,
or mute the thread<
https://github.com/notifications/unsubscribe-auth/AF7_4C3wKaj-3oa6YWAFImu-IL94c1sHks5uJgpngaJpZM4NHUFE>.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#366 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAREPPwg1kv8ewEGqCcltzhMPxukyQlBks5uJgupgaJpZM4NHUFE>
.
|
This is seen in Debian (stretch, unstable):
This appears (randomly?) when make -j2 or -j4 used, but not when -j1. A race condition?
This with openmpi-2.0.2
The text was updated successfully, but these errors were encountered: