Skip to content

osc/pt2pt hang in master #1299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ggouaillardet opened this issue Jan 13, 2016 · 1 comment · Fixed by #1339
Closed

osc/pt2pt hang in master #1299

ggouaillardet opened this issue Jan 13, 2016 · 1 comment · Fixed by #1339
Assignees
Labels
Milestone

Comments

@ggouaillardet
Copy link
Contributor

@hjelmn can you please have a look at this ?

here is a reproducer

#include <mpi.h>

int main(int argc, char* argv[])
{
    MPI_Win win;
    double * d, a;

    MPI_Init(&argc, &argv);

    a = 0;
    MPI_Win_allocate(sizeof(double), 1, MPI_INFO_NULL, MPI_COMM_SELF, (void *)&d, &win);
    MPI_Win_lock_all(MPI_MODE_NOCHECK, win);
    *d = 0.;

    MPI_Accumulate(&a, 1, MPI_DOUBLE, 0, 0, 1, MPI_DOUBLE, MPI_SUM, win);

    MPI_Win_flush_all(win);
    MPI_Win_unlock_all(win);
    MPI_Win_free(&win);

    MPI_Finalize();

    return 0;
}

this can be ran with only one MPI task.

it works fine with --mca osc sm on both v1.10 and master
but with --mca osc pt2pt, it works fine on v1.10 but it hangs on master

i ran this under the debugger, and ended up writing this patch so master mimic v1.10.
that being said, i have no idea whether this is correct or not ...

diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_sync.c b/ompi/mca/osc/pt2pt/osc_pt2pt_sync.c
index 7e28914..34df3ab 100644
--- a/ompi/mca/osc/pt2pt/osc_pt2pt_sync.c
+++ b/ompi/mca/osc/pt2pt/osc_pt2pt_sync.c
@@ -2,6 +2,8 @@
 /*
  * Copyright (c) 2015      Los Alamos National Security, LLC.  All rights
  *                         reserved.
+ * Copyright (c) 2016      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -17,6 +19,7 @@ static void ompi_osc_pt2pt_sync_constructor (ompi_osc_pt2pt_sync_t *sync)
     sync->type = OMPI_OSC_PT2PT_SYNC_TYPE_NONE;
     sync->eager_send_active = false;
     sync->epoch_active = false;
+    sync->sync.pscw.group = NULL;
     OBJ_CONSTRUCT(&sync->lock, opal_mutex_t);
     OBJ_CONSTRUCT(&sync->cond, opal_condition_t);
 }
diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h b/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h
index eee2964..cfed4e9 100644
--- a/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h
+++ b/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h
@@ -2,6 +2,8 @@
 /*
  * Copyright (c) 2015      Los Alamos National Security, LLC.  All rights
  *                         reserved.
+ * Copyright (c) 2016      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -45,7 +47,7 @@ struct ompi_osc_pt2pt_sync_t {
     ompi_osc_pt2pt_sync_type_t type;

     /** synchronization data */
-    union {
+    struct {
         /** lock specific synchronization data */
         struct {
             /** lock target rank (-1 for all) */
@@ -129,13 +131,15 @@ bool ompi_osc_pt2pt_sync_pscw_peer (struct ompi_osc_pt2pt_module_t *module, int
  */
 static inline void ompi_osc_pt2pt_sync_wait (ompi_osc_pt2pt_sync_t *sync)
 {
-    OPAL_THREAD_LOCK(&sync->lock);
-    while (!sync->eager_send_active) {
-        OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output,
-                             "waiting for access epoch to start"));
-        opal_condition_wait(&sync->cond, &sync->lock);
+    if (sync->sync.pscw.group) {
+        OPAL_THREAD_LOCK(&sync->lock);
+        while (!sync->eager_send_active) {
+            OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output,
+                                 "waiting for access epoch to start"));
+            opal_condition_wait(&sync->cond, &sync->lock);
+        }
+        OPAL_THREAD_UNLOCK(&sync->lock);
     }
-    OPAL_THREAD_UNLOCK(&sync->lock);

     OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output,
                          "access epoch ready"));
@hppritcha hppritcha added this to the v2.0.1 milestone Jan 19, 2016
@hjelmn
Copy link
Member

hjelmn commented Jan 26, 2016

Not quite right but I see what is going on. Working on a patch now.

hjelmn added a commit to hjelmn/ompi that referenced this issue Feb 2, 2016
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi#1299.

Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Feb 2, 2016
hjelmn added a commit to hjelmn/ompi-release that referenced this issue Feb 3, 2016
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <[email protected]>

(cherry picked from open-mpi/ompi@d7264aa)

Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi-release that referenced this issue Feb 3, 2016
This commit fixes open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <[email protected]>

(cherry picked from open-mpi/ompi@519fffb)

Signed-off-by: Nathan Hjelm <[email protected]>
bosilca pushed a commit to bosilca/ompi that referenced this issue Oct 3, 2016
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi#1299.

Signed-off-by: Nathan Hjelm <[email protected]>
bosilca pushed a commit to bosilca/ompi that referenced this issue Oct 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants