Skip to content

ThreadSafetySupport

Jeff Squyres edited this page Sep 2, 2014 · 6 revisions

This page is to be used to track the progress of thread safety testing by the community members. Later we will use this (or another page) to jot down ideas that we think could improve OMPI's threadsafety performances.

Issues

  • There are issues with concurrent thread access to a tcp endpoint. r15963 provides a workaround that prevents a segv when dereferencing a NULL frag pointer. But the underlying issue of multiple threads polling on the same endpoint still needs to be resolved. Hence, this note here.

Enhancements

  • Possible comm creation performance improvement if we implemented an unexpected cid queue (get's rid of a collective).

IU

  • Intel Tests (We run these nightly via MTT. Here is a recent run)
    • MPI: r15955 w/ --enable-mpi-threads
    • Tests: All tests in the file 'alltestsno_perf'
    • System: Thor: Dual processor Dual processor Xeon(32 bit) w/ hyperthreading, running on 4 nodes, 2 ppn
    • Network combinations used: "gm,self", "gm,sm,self", "openib,self", "tcp,openib,sm,self", "tcp,self", "tcp,sm,self"
    • All tests in file 'alltestsno_perf' pass.
  • Threads Tests
    • MPI: r15957 w/ --enable-mpi-threads
    • Tests: built with pthreads and ran with defaults (no arguments)
    • System: Thor: Dual processor Xeon(32 bit) w/ hyperthreading, running on 8 nodes, 2 ppn ||Test Group ||Test Name||BTLs ||Status||Detail|| ||MT_sendrecv ||All ||gm,self ||Pass || || || || ||gm,sm,self ||Pass || || || || ||openib,self ||Pass || || || || ||openib,sm,self ||Pass || || || || ||tcp,self ||Fail || segfault in tcp btl || || || ||tcp,sm,self ||Pass || || ||MTcoll || ||gm,self ||Fail || assertion failure in mcacollbasecomm_unselect || || || ||gm,sm,self ||Fail || aborted: assertion error pmlob1recvreq.c:542 || || || ||openib,self ||Fail || assertion failure in pmlob1irecv.c:69 || || || ||openib,sm,self ||Fail ||hangs: mcaoobtcppeersend_handler: invalid connection state (3) || || || ||tcp,self ||Fail || segfault in tcp btl || || || ||tcp,sm,self ||Fail || segfault in tcp btl || ||MT_comm || ||gm,self ||Fail || hangs || || || ||gm,sm,self ||Fail || hangs || ||MT_commcaching|| ||gm,self ||Pass |||| || || ||gm,sm,self ||Pass |||| ||MTenv || ||gm,self ||Fail ||abort: assert error in pcommset_errhandler.c:61|| || || ||gm,sm,self ||Fail ||abort: assert error in pcommseterrhandler.c:61|| ||MTgreqs || ||gm,self ||Fail ||MPIWait failed with MPIERRTYPE: invalid datatype || || || ||gm,sm,self ||Fail ||MPIWait failed with MPIERR_TYPE: invalid datatype || ||MT_group || ||gm,self ||Pass || || || || ||gm,sm,self ||Pass || || ||MT_misc || ||gm,self ||Fail || hangs|| || || ||gm,sm,self ||Fail || hangs|| ||MT_pt2pt2 || ||gm,self ||Pass || || || || ||gm,sm,self ||Pass || || ||MT_rcvany || ||gm,self ||Pass || || || || ||gm,sm,self ||Pass || || ||MTsend || ||gm,self ||Pass || Intentionally calling MPIAbort || || || ||gm,sm,self ||Pass || Intentionally calling MPI_Abort || ||MTsend2 || ||gm,self ||Pass || Intentionally calling MPIAbort || || || ||gm,sm,self ||Pass || Intentionally calling MPI_Abort || ||MTwin || ||gm,self ||Fail || Error in MPIWinfree: MPIERRRMASYNC || || || ||gm,sm,self ||Fail || Error in MPIWinfree: MPIERRRMA_SYNC ||

Sun

  • Intel Tests
  • Threads Tests
    • MPI: built r15584 w/ --enable-mpi-threads --with-threads=posix --disable-progress-threads (and lock init fix)
    • Tests: built with Solaris Threads and ran with defaults (no arguments)
    • System: Solaris 10/x86 ||Test Group ||Test Name||BTLs ||Status||Detail|| ||mt_sendrecv ||All ||tcp,sm,self||Pass || || ||mt_coll || ||tcp,self ||Fail ||np=2 hangs in Gather, np=4 hangs in Bcast || ||mtcomm || ||tcp,self ||Fail ||various hangs and segv in mcapmlob1recvfragcallback|| ||mt_commcaching|| ||tcp,self ||false Pass||messages about truncation receive|| ||mtenv || ||tcp,self ||Fail ||np=4 fails Assertion pcommset_errhandler.c, line 56|| ||mtgreqs || ||tcp,self ||Fail ||MPIWait failed with MPIERRTYPE: invalid datatype || ||mt_group || ||tcp,self ||Pass || || ||mt_misc || ||tcp,self ||Fail ||hangs with np=4|| ||mt_pt2pt2 || ||tcp,self ||Pass || || ||mt_rcvany || ||tcp,self ||Fail ||hangs with np=2 & 4 || ||mt_send || ||tcp,self ||Pass ||Note mtt actually inteprets it as a fail because how it aborts|| || ||mt_send2 || ||tcp,self ||Pass ||Note mtt actually inteprets it as a fail because how it aborts|| ||mtstartcompl|| ||tcp,self ||Fail ||np=2 hangs, np=4 segv's in mcaallocatorbucket_cleanup|| ||mtwin || ||tcp,self ||Fail ||several test failures and Segv MPIWinlock and ompicomm_nextcid || ||mt_wincache || ||tcp,self ||Pass || || ||mtf2cc2f || ||tcp,self ||Pass || ||
  • Threads Tests
    • MPI: built r15936 w/ --enable-mpi-threads --with-threads=posix --disable-progress-threads
    • Tests: built with Solaris Threads and ran with defaults (no arguments)
    • System: Solaris 10/x86 ||Test Group ||Test Name||BTLs ||Status||Detail|| ||mt_sendrecv ||All ||tcp,self||Fail ||np=2 hangs, np=4 segv's in tuned collectives || ||mt_coll || ||tcp,self ||Fail ||np=2 hangs in Gather, np=4 hangs in Bcast || ||mt_commcaching|| ||tcp,self ||Pass|| || ||mtenv || ||tcp,self ||Fail ||np=4 fails Assertion pcommset_errhandler.c, line 56|| ||mtgreqs || ||tcp,self ||Fail ||MPIWait failed with MPIERRTYPE: invalid datatype || ||mt_group || ||tcp,self ||Pass || || ||mt_misc || ||tcp,self ||Fail ||hangs with np=4|| ||mt_pt2pt2 || ||tcp,self ||Pass || || ||mt_rcvany || ||tcp,self ||Pass || || ||mt_send || ||tcp,self ||Pass ||Note mtt actually inteprets it as a fail because how it aborts|| || ||mt_send2 || ||tcp,self ||Pass ||Note mtt actually inteprets it as a fail because how it aborts|| ||mtstartcompl|| ||tcp,self ||Fail ||np=2 passes, np=4 hangs || ||mtwin || ||tcp,self ||Fail ||fails with MPIERRRMASYNC error message || ||mt_wincache || ||tcp,self ||Pass || || ||mtf2cc2f || ||tcp,self ||Pass || ||
  • Threads Tests
    • MPI: built r16064 w/ --enable-mpi-threads --with-threads=posix --disable-progress-threads
    • Tests: built with Solaris Threads and ran with defaults (no arguments)
    • System: Solaris 10/x86 ||Test Group ||Test Name||BTLs ||Status||Detail|| ||mt_sendrecv ||All ||tcp,self||Fail ||np=2&4 hangs || ||mt_coll || ||tcp,self ||Fail ||np=2 hangs in Gather, np=4 hangs in Bcast || ||mt_comm || ||tcp,self ||Fail ||np=2 hangs in Free tests, np=4 passes??? || ||mt_commcaching|| ||tcp,self ||Pass|| || ||mtenv || ||tcp,self ||Fail ||np=4 fails Assertion pcommset_errhandler.c, line 56|| ||mtgreqs || ||tcp,self ||Fail ||MPIWait failed with MPIERRTYPE: invalid datatype || ||mt_group || ||tcp,self ||Pass || || ||mt_misc || ||tcp,self ||Fail ||hangs with np=4|| ||mt_pt2pt2 || ||tcp,self ||Pass || || ||mt_rcvany || ||tcp,self ||Pass || || ||mt_send || ||tcp,self ||Fail ||np=4 hangs|| ||mt_send2 || ||tcp,self ||Fail ||np=4 hangs|| ||mtstartcompl|| ||tcp,self ||Fail ||np=2 passes, np=4 hangs || ||mtwin || ||tcp,self ||Fail ||fails with MPIERRRMASYNC error message || ||mt_wincache || ||tcp,self ||Pass || || ||mtf2cc2f || ||tcp,self ||Pass || ||
  • Threads Tests
    • MPI: built r16237 w/ --enable-mpi-threads --with-threads=posix --disable-progress-threads
    • Tests: built with Solaris Threads and ran with defaults (no arguments)
    • System: Solaris 10/x86 ||Test Group ||Test Name||BTLs ||Status||Detail|| ||mt_sendrecv ||All ||tcp,self||Fail ||np=2&4 hangs || ||mt_coll || ||tcp,self ||Fail ||np=2 hangs in Gather, np=4 hangs in Bcast || ||mt_comm || ||tcp,self ||Pass || || ||mt_commcaching|| ||tcp,self ||Pass|| || ||mtenv || ||tcp,self ||Fail ||fails Assertion pcommset_errhandler.c, line 56|| ||mtgreqs || ||tcp,self ||Fail ||MPIWait failed with MPIERRTYPE: invalid datatype || ||mt_group || ||tcp,self ||Pass || || ||mt_misc || ||tcp,self ||Fail ||hangs with np=4|| ||mt_pt2pt2 || ||tcp,self ||Fail ||hangs || ||mt_rcvany || ||tcp,self ||Pass || || ||mt_send || ||tcp,self ||Fail ||hangs|| ||mt_send2 || ||tcp,self ||Pass || || ||mtstartcompl|| ||tcp,self ||Fail ||np=2 passes, np=4 hangs || ||mtwin || ||tcp,self ||Fail ||fails with MPIERRRMASYNC error message || ||mt_wincache || ||tcp,self ||Pass || || ||mtf2cc2f || ||tcp,self ||Pass || ||
  • Threads Tests
    • MPI: built r16237 w/ --enable-mpi-threads --with-threads=posix --disable-progress-threads
    • Tests: built with Solaris Threads and ran with defaults (no arguments)
    • System: Solaris 10/x86 ||Test Group ||Test Name||BTLs ||Status||Detail|| ||mt_sendrecv ||All ||sm,self||Pass || || ||mt_coll || ||sm,self ||Pass || || ||mt_comm || ||sm,self ||Pass || || ||mt_commcaching|| ||sm,self ||Pass|| || ||mtenv || ||sm,self ||Fail ||fails Assertion pcommset_errhandler.c, line 56|| ||mtgreqs || ||sm,self ||Fail ||MPIWait failed with MPIERRTYPE: invalid datatype || ||mt_group || ||sm,self ||Pass || || ||mt_misc || ||sm,self ||Fail ||hangs with np=4|| ||mt_pt2pt2 || ||sm,self ||Pass || || ||mt_rcvany || ||sm,self ||Pass || || ||mt_send || ||sm,self ||Pass || || ||mt_send2 || ||sm,self ||Pass || || ||mtstartcompl|| ||sm,self ||Pass || || ||mtwin || ||sm,self ||Fail ||fails with MPIERRRMASYNC error message || ||mt_wincache || ||sm,self ||Pass || || ||mtf2cc2f || ||sm,self ||Fail ||np=4 segv in ompioscrdamcomponentfinalize ||

UTK

  • Intel Tests
  • Threads Tests

HLRS

  • Threads Tests

    • MPI: built r15661 w/ --enable-mpi-threads --enable-progress-threads
    • Tests: ran with defaults (no arguments)
    • System: Linux EMT64t ||Test Group ||Test Name||BTLs ||Status||Detail|| ||MT_sendrecv ||All ||tcp,self ||Pass || || || ||All ||sm,self ||Pass || || ||MT_coll || ||tcp,self ||Pass || || || || ||sm,self ||Fail || various hangs and segv || ||MT_comm || ||tcp,self ||Fail || hangs || || || ||sm,self ||Fail || segv || ||MT_commcaching|| ||tcp,self ||false Pass||messages about truncation receive|| || || ||sm,self ||false Pass||messages about truncation receive|| ||MT_env || ||tcp,self ||Fail ||np=4 is ok else fails assertion and segv|| || || ||sm,self ||Fail ||np=4 is ok else fails with assertion and segv|| ||MTgreqs || ||tcp,self ||Fail ||MPIWait failed with MPIERRTYPE: invalid datatype || || || ||sm,self ||Fail ||MPIWait failed with MPIERR_TYPE: invalid datatype || ||MT_group || ||tcp,self ||Pass || || || || ||sm,self ||Fail || segv|| ||MT_misc || ||tcp,self ||Pass || || || || ||sm,self ||Pass || || ||MT_pt2pt2 || ||tcp,self ||Pass || || || || ||sm,self ||Fail || hangs || ||MT_rcvany || ||tcp,self ||Fail || segv || || || ||sm,self ||Fail || segv || ||MTsend || ||tcp,self ||Pass || Intentionally calling MPIAbort || || || ||sm,self ||Pass || Intentionally calling MPI_Abort || ||MTsend2 || ||tcp,self ||Pass || Intentionally calling MPIAbort || || || ||sm,self ||Pass || Intentionally calling MPI_Abort || ||MT_win || ||tcp,self ||Fail || hangs || || || ||sm,self ||Fail || segv ||
  • Threads Tests

    • MPI: built r15936 w/ --enable-mpi-threads
    • Tests: ran with defaults (no arguments)
    • System: Linux EMT64t 2nodes/ 4 processes ||Test Group ||Test Name||BTLs ||Status||Detail|| ||MT_sendrecv ||All ||tcp,self ||Failed || segvfault || || ||All ||mvapi,self ||Pass || || || ||All ||sm,self ||Pass || || ||MT_coll || ||tcp,self ||Failed || segvfault || || || ||mvapi,self ||Pass || || || || ||sm,self ||Pass || || ||MT_comm || ||tcp,self ||Pass || || || || ||mvapi,self ||Pass || || || || ||sm,self ||Pass || || ||MT_commcaching|| ||tcp,self ||false Pass||messages about truncation receive|| || || ||mvapi,self ||false Pass||messages about truncation receive|| || || ||sm,self ||false Pass||messages about truncation receive|| ||MT_env || ||tcp,self ||Fail || fails assertion and segv || || || ||mvapi,self ||Fail || fails assertion and segv || || || ||sm,self ||Pass || || ||MT_greqs || ||tcp,self ||Fail || segvfault || || || ||mvapi,self ||Fail || MPIWait failed with MPIERR_TYPE: invalid datatype || || || ||sm,self ||Fail || MPIWait failed with MPIERR_TYPE: invalid datatype || ||MT_group || ||tcp,self ||Pass || || || || ||mvapi,self ||Pass || || || || ||sm,self ||Pass || || ||MT_misc || ||tcp,self ||Fail || hang || || || ||mvapi,self ||Fail || hang || || || ||sm,self ||Fail || hang || ||MT_pt2pt2 || ||tcp,self ||Fail || hang || || || ||mvapi,self ||Fail || hang || || || ||sm,self ||Fail || hang || ||MT_rcvany || ||tcp,self ||Pass || || || || ||mvpai,self ||Fail || segvfault || || || ||sm,self ||Pass || || ||MTsend || ||tcp,self ||Pass || Intentionally calling MPIAbort || || || ||mvapi,self ||Pass || Intentionally calling MPI_Abort || || || ||sm,self ||Pass || Intentionally calling MPI_Abort || ||MTsend2 || ||tcp,self ||Pass || Intentionally calling MPIAbort || || || ||mvapi,self ||Pass || Intentionally calling MPI_Abort || || || ||sm,self ||Pass || Intentionally calling MPI_Abort || ||MT_win || ||tcp,self ||Fail || segv || || || ||mvapi,self ||Fail || segv || || || ||sm,self ||Fail || segv ||

IBM

  • Threads Tests

    • MPI: built r15980 w/ --enable-mpi-threads --with-threads=posix --disable-progress-threads
    • Tests: ran with defaults (no arguments)
    • System: PPC64/SLES10/eHCA, 2 nodes, np=6 (3 processes/node) ||Test Group ||Test Name||BTLs ||Status||Detail|| ||MT_sendrecv ||All ||tcp,sm,self ||Pass || || || ||All ||openib,sm,self ||Pass || || ||MTcoll || ||tcp,sm,self ||Fail || mcapmlob1irecv objmagicid failed assert || || || ||openib,sm,self ||Pass || || ||MT_comm || ||tcp,sm,self ||Fail || intermittent hangs || || || ||openib,sm,self ||Fail || intermittent hangs || ||MT_commcaching|| ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MTenv || ||tcp,sm,self ||Fail || pcommseterrhandler objmagic_id failed assert || || || ||openib,sm,self ||Fail || pcommseterrhandler objmagicid failed assert || ||MTgreqs || ||tcp,sm,self ||Fail || MPIWait failed with MPIERRTYPE: invalid datatype || || || ||openib,sm,self ||Fail || MPIWait failed with MPIERR_TYPE: invalid datatype || ||MT_group || ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MT_misc || ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MT_pt2pt2 || ||tcp,sm,self ||Fail || hangs || || || ||openib,sm,self ||Fail || hangs || ||MT_rcvany || ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MTsend || ||tcp,sm,self ||Pass || Intentionally calling MPIAbort || || || ||openib,sm,self ||Pass || Intentionally calling MPI_Abort || ||MTsend2 || ||tcp,sm,self ||Pass || Intentionally calling MPIAbort || || || ||openib,sm,self ||Pass || Intentionally calling MPI_Abort || ||MTstartcompl|| ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MTwin || ||tcp,self ||Fail || err in MPIWin_free: error while executing rma sync || || || ||openib,sm,self ||Fail || err in MPIWinfree: error while executing rma sync || ||mt_wincache || ||tcp,self ||Pass || || || || ||openib,sm,self ||Pass || || ||mtf2cc2f || ||tcp,self ||Fail || Error: comparing MPI_LOGICAL{1,2,4}|| || || ||openib,sm,self ||Fail || Error: comparing MPI_LOGICAL{1,2,4}||
  • Threads Tests

    • MPI: built r15980 w/ --enable-mpi-threads --with-threads=posix --disable-progress-threads
    • Tests: ran with defaults (no arguments)
    • System: x86_64/Fedora 7/mthca, 2 nodes, np=4 (2 processes/node) ||Test Group ||Test Name||BTLs ||Status||Detail|| ||MT_sendrecv ||All ||tcp,sm,self ||Pass || || || ||All ||openib,sm,self ||Pass || || ||MT_coll || ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MT_comm || ||tcp,sm,self ||Fail || intermittent hangs || || || ||openib,sm,self ||Fail || intermittent hangs || ||MT_commcaching|| ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MTenv || ||tcp,sm,self ||Fail || intermittent segv's: bad addr from PMPICommseterrhandler? || || || ||openib,sm,self ||Fail || intermittent segv's: bad addr from PMPICommset_errhandler? || ||MTgreqs || ||tcp,sm,self ||Fail || MPIWait failed with MPIERRTYPE: invalid datatype || || || ||openib,sm,self ||Fail || MPIWait failed with MPIERR_TYPE: invalid datatype || ||MT_group || ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MT_misc || ||tcp,sm,self ||Fail || hangs || || || ||openib,sm,self ||Fail || hangs || ||MT_pt2pt2 || ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MT_rcvany || ||tcp,sm,self ||Fail || occasional hangs || || || ||openib,sm,self ||Pass || || ||MTsend || ||tcp,sm,self ||Pass || Intentionally calling MPIAbort || || || ||openib,sm,self ||Pass || Intentionally calling MPI_Abort || ||MTsend2 || ||tcp,sm,self ||Pass || Intentionally calling MPIAbort || || || ||openib,sm,self ||Pass || Intentionally calling MPI_Abort || ||MTstartcompl|| ||tcp,sm,self ||Pass || || || || ||openib,sm,self ||Pass || || ||MTwin || ||tcp,self ||Fail || err in MPIWin_free: error while executing rma sync || || || ||openib,sm,self ||Fail || err in MPIWinfree: error while executing rma sync || ||mt_wincache || ||tcp,self ||Pass || || || || ||openib,sm,self ||Pass || || ||mtf2cc2f || ||tcp,self ||Fail || Error: comparing MPI_LOGICAL{1,8}|| || || ||openib,sm,self ||Fail || Error: comparing MPI_LOGICAL{1,8}||
Clone this wiki locally