Skip to content

non-coarray vector lhs produces crashes or wrong results with gets #322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reinh-bader opened this issue Jan 30, 2017 · 12 comments
Closed

Comments

@reinh-bader
Copy link

reinh-bader commented Jan 30, 2017

The attached source code was built with

  • gfortran 6.3 + opencoarrays 1.6.2
  • gfortran 7.0 (preview) + opencoarrays 1.8.2
    and produces wrong results when executed with any number of images. The version build with gfortran -fcoarray=single executes fine, hence I suspect the problem to be in the run time library.

The program checks its output.
For opencoarrays 1.8.2, I get the run time error

Fatal error in MPI_Win_unlock_all: Invalid MPI_Win, error stack:
MPI_Win_unlock_all(127): MPI_Win_unlock_all(win=0x55444f4d) failed
MPI_Win_unlock_all(82).: Invalid MPI_Win
Fatal error in MPI_Win_unlock_all: Invalid MPI_Win, error stack:
MPI_Win_unlock_all(127): MPI_Win_unlock_all(win=0x55444f4d) failed
MPI_Win_unlock_all(82).: Invalid MPI_Win

PMR.zip

@zbeekman
Copy link
Collaborator

zbeekman commented Jan 31, 2017

Can you please include a little bit more information: MPI implementation, system (OS, arch, virtualized?) and how you built and installed OpenCoarrays (with install log if possible). Also how did you compile and invoke your code? How many procs?

Thanks!

@reinh-bader
Copy link
Author

reinh-bader commented Jan 31, 2017

Base MPI is Intel MPI 5.1.3
The system is a X86-based shared memory node with SLES11 SP4 as OS.
Here is the command line used to build:

CC=gcc FC="gfortran -I/lrz/sys/intel/impi/5.1.3.181/intel64/include/gfortran/7.0.0" cmake /lrz/noarch/src/gcc/opencoarrays/opencoarrays-1.8.2 -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_INSTALL_PREFIX=${PWD}

The only log file I found was CMakeFiles/CMakeOutput.log which I have attached (adding a .txt extension, to enable the upload).

The example code was compiled with

caf vector_subscript_pos_01.f90
and executed with
cafrun -n 2 ./a.out

CMakeOutput.log.txt

Hope this helps.

@zbeekman
Copy link
Collaborator

Thanks so much @reinh-bader... I think I know of a machine or two that has Intel MPI 5.1.3 if the problem does not show up with MPICH/OpenMPI

@zbeekman
Copy link
Collaborator

zbeekman commented Feb 8, 2017

Sorry I have been swamped, and have not had a chance to fully investigate yet, but this is still on my radar

@reinh-bader
Copy link
Author

No trouble, take your time.

@zbeekman zbeekman self-assigned this Mar 14, 2017
@zbeekman zbeekman added the ready label Aug 30, 2017
@zbeekman
Copy link
Collaborator

This is still broken as of 0c2dce7 on 12/26/2017

Here is the test case inlined:

program mod_vecsub_01
  implicit none
  integer, parameter :: ndim = 5, vdim = 2
  real :: vec(ndim), res(ndim)[*]
  integer :: idx(vdim)
  integer :: i, me
  logical :: ok[*]

  res = [ (real(i), i=1, ndim) ]
  vec = 0.0
  idx = [ ndim, 1 ]
  ok = .true.
  sync all
  me = this_image()
  vec(idx) = res(1:2)[1]
  if (vec(1) /= 2.0 .or. vec(5) /= 1.0) then
    critical
      ok[1] = .false.
      write(*, *) 'FAIL on image ',me,vec(idx)
    end critical
  end if
  if (me == 1) then
     if (ok) then
       write(*, *) 'OK'
     end if
  end if
end program

@zbeekman zbeekman changed the title assignment to vector subscripted lhs produces crashes or wrong results non-coarray vector lhs produces crashes or wrong results with gets Dec 26, 2017
@vehre
Copy link
Collaborator

vehre commented Dec 27, 2017

That's got nothing to do with opencoarrays. It's a compiler bug. Gfortran is not respecting the indexed addressing for the lhs after the get(). It's trying to do it before.

@zbeekman
Copy link
Collaborator

@vehre awesome, thanks for the response! @rouson we should look and see if this bug has been filed, and if it has not, make sure we file one and post the tracking number here. Perhaps @afanfa or Sorren might be interested in working on a fix for this?

@rouson
Copy link
Member

rouson commented Dec 28, 2017

This has been submitted as Bug 83606 on gfortran with the following slightly simplified code:

  integer, parameter :: ndim=5
  integer :: i,vec(ndim)=0, res(ndim)[*]=[ (i, i=1, ndim) ]
  vec([ndim,1]) = res(1:2)[1]
  if (vec(1) /= res(2) .or. vec(ndim) /= res(1)) &
    print *," wrong result ",vec([ndim,1])," on image ",this_image()
end

@zbeekman
Copy link
Collaborator

zbeekman commented Jan 2, 2018

See also #427.

@vehre
Copy link
Collaborator

vehre commented Apr 30, 2018

Fixed on all gcc's from 6 on.

@vehre vehre closed this as completed Apr 30, 2018
@zbeekman
Copy link
Collaborator

zbeekman commented May 1, 2018

@vehre is the fix in 6.4 and 7.3 or will it be in 6.5 and 7.4?

zbeekman added a commit that referenced this issue May 1, 2018
 - Enable this once we figure out which GCCs it's fixed on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants