Skip to content

ompi5.0.0rc10 - mapping issues - PMIX #11450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomasgillis opened this issue Feb 27, 2023 · 47 comments
Closed

ompi5.0.0rc10 - mapping issues - PMIX #11450

thomasgillis opened this issue Feb 27, 2023 · 47 comments
Assignees
Milestone

Comments

@thomasgillis
Copy link

Background information

I have PMIX errors when I request non-straightforward mpiexec bindings.
I have 2 nodes of 128 cores each, with 2 MPI processes and 16 threads I am trying to get:

node 0: core:[0-15]
node 1: core:[0-15]

To achieve this I use

mpiexec -np 2 --map-by ppr:1:node:pe=16

But then I get some errors from PMIX:

PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750

I am not 100% confident on the --map-by ppr:1:node:pe=16 command but the segfault from pmix seems suspicious as well.
Is my command correct? is there something I need to change to get rid of the pmix error?

details:

  • ompi-5.0.0rc10 built from source
@thomasgillis thomasgillis changed the title ompi5.0.0rc10 - binding to core with threads ompi5.0.0rc10 - binding issues - PMIX Feb 27, 2023
@hppritcha
Copy link
Member

there's a remote chance this may be related to use of cma.
Could you rerun your test with

--mca smsc none

and see if you see the PMIX error?

@hppritcha
Copy link
Member

so did PMIx still emit the error message with CMA disabled?

@thomasgillis
Copy link
Author

thomasgillis commented Feb 27, 2023

here is what I get:

==================================================================
 mpi cmd:  /home/users/u100155/lib-OMPI-5.0.0rc10-UCX-1.13.1/bin/mpiexec -np 2 --map-by ppr:1:node:pe=16 --mca smsc none
------------------------------------------------------------------
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      mel0429
Framework: smsc
Component: none
--------------------------------------------------------------------------
^@--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  mca_base_framework_open on opal_smsc failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
^@*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and MPI will try to terminate your MPI job as well)
[mel0429:18037] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [prterun-mel0429-18023@1,0]
  Exit code:    14
--------------------------------------------------------------------------

apparently I don't have smsc, is that expected?

@hppritcha
Copy link
Member

okay i got syntax wrong, try

--mca smsc ^cma

@hppritcha
Copy link
Member

smsc is an optimization for intra-node long message transfers but is not essential for correct operation.

@thomasgillis
Copy link
Author

the PMIX error is still there:

 mpi cmd:  /home/users/u100155/lib-OMPI-5.0.0rc10-UCX-1.13.1/bin/mpiexec -np 2 --map-by ppr:1:node:pe=16 --mca smsc ^cma
------------------------------------------------------------------
[mel0429:29695] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750

@thomasgillis
Copy link
Author

@hppritcha FYI when the binding leads to an intra-node layout the error is not there

 mpi cmd:  /home/users/u100155/lib-OMPI-5.0.0rc10-UCX-1.13.1/bin/mpiexec -np 2 --map-by ppr:1:node:pe=16 --report-bindings --hostfile /home/users/u100155/temp_hostfile_2_16.txt
------------------------------------------------------------------
[mel0429:06002] Rank 0 bound to package[0][core:0-15]
[mel0430:03768] Rank 1 bound to package[0][core:0-15]
[mel0429:06002] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750

vs

==================================================================
 mpi cmd:  /home/users/u100155/lib-OMPI-5.0.0rc10-UCX-1.13.1/bin/mpiexec -np 2 --cpus-per-rank 16 --report-bindings --hostfile nodefile
------------------------------------------------------------------
[mel0429:13751] Rank 0 bound to package[0][core:0-15]
[mel0429:13751] Rank 1 bound to package[0][core:16-31]

@rhc54
Copy link
Contributor

rhc54 commented Feb 27, 2023

The command is correct and leads to the correct mapping, at least on PRRTE master:

$ prterun --prtemca ras_simulator_num_nodes 2 --prtemca hwloc_use_topo_file /Users/rhc/pmix/topologies/summit.h17n08.lstopo-2.2.0.xml --map-by ppr:1:node:pe=16 --display map-devel hostname

=================================   JOB MAP   =================================
Data for JOB prterun-Ralphs-iMac-2-22235@1 offset 0 Total slots allocated 84
Mapper requested: NULL  Last mapper: ppr  Mapping policy: BYNODE:NOOVERSUBSCRIBE  Ranking policy: SLOT
Binding policy: HWTHREAD:IF-SUPPORTED  Cpu set: N/A  PPR: 1:node  Cpus-per-rank: 16  Cpu Type: HWT
Num new daemons: 0	New daemon starting vpid INVALID
Num nodes: 2

Data for node: nodeA0	State: 3	Flags: MAPPED:SLOTS_GIVEN
        Daemon: [prterun-Ralphs-iMac-2-22235@0,1]	Daemon launched: False
            Num slots: 42	Slots in use: 1	Oversubscribed: FALSE
            Num slots allocated: 42	Max slots: 42	Num procs: 1
        Data for proc: [prterun-Ralphs-iMac-2-22235@1,0]
                Pid: 0	Local rank: 0	Node rank: 0	App rank: 0
                State: INITIALIZED	App_context: 0
        	Binding: package[0][hwt:0-15]

Data for node: nodeA1	State: 3	Flags: MAPPED:SLOTS_GIVEN
        Daemon: [prterun-Ralphs-iMac-2-22235@0,2]	Daemon launched: False
            Num slots: 42	Slots in use: 1	Oversubscribed: FALSE
            Num slots allocated: 42	Max slots: 42	Num procs: 1
        Data for proc: [prterun-Ralphs-iMac-2-22235@1,1]
                Pid: 0	Local rank: 0	Node rank: 0	App rank: 1
                State: INITIALIZED	App_context: 0
        	Binding: package[0][hwt:0-15]

Warning: This map has been generated with the DONOTLAUNCH option;
	The compute node architecture has not been probed, and the displayed
	map reflects the HEADNODE ARCHITECTURE. On systems with a different
	architecture between headnode and compute nodes, the map can be
	displayed using `prte --display map /bin/true`, which will launch
	enough of the DVM to probe the compute node architecture.

=============================================================

I used the topology from Summit as it matches the one described.

@thomasgillis
Copy link
Author

thomasgillis commented Feb 27, 2023

@rhc54 ok, thanks for the confirmation.
Any idea on the error from pmix?
The system I use is IB (so I have built ucx-1.13.1) :-)

EDIT: FYI I have built rc9 and everything works fine, so I guess it has been introduced recently

==================================================================
 mpi cmd:  /home/users/u100155/lib-OMPI-5.0.0rc9-UCX-1.13.1/bin/mpiexec -np 2 --map-by ppr:1:node:pe=16 --report-bindings
------------------------------------------------------------------
[mel0385:19899] Rank 0 bound to package[0][core:0-15]
[mel0386:30153] Rank 1 bound to package[0][core:0-15]

@rhc54
Copy link
Contributor

rhc54 commented Feb 27, 2023

No ideas, I'm afraid. Looks like it is coming from an application process? If so, then I think I've seen some OMPI bug reports about incorrect data retrieval for IB transports - not sure if anyone has addressed those.

@hppritcha
Copy link
Member

is there anything special about the application your are trying to launch? I'd like to be able to reproduce.

@thomasgillis
Copy link
Author

No it's a ping-pong/osu bandwidth measurement.
I use thread multiple but I might not be relevant to the issue

@jsquyres jsquyres added this to the v5.0.0 milestone Feb 28, 2023
@hppritcha hppritcha self-assigned this Feb 28, 2023
@jsquyres jsquyres changed the title ompi5.0.0rc10 - binding issues - PMIX ompi5.0.0rc10 - mapping issues - PMIX Feb 28, 2023
@hppritcha
Copy link
Member

could you try running with

--mca pml ^ucx

@thomasgillis
Copy link
Author

thomasgillis commented Feb 28, 2023

could you try running with

--mca pml ^ucx

it goes through (no segfault) but the error is still there:

 mpi cmd:  /home/users/u100155/lib-OMPI-5.0.0rc10-UCX-1.13.1/bin/mpiexec -np 2 --map-by ppr:1:node:pe=16 --report-bindings --mca pml ^ucx
------------------------------------------------------------------
[mel0237:11996] Rank 0 bound to package[0][core:0-15]
[mel0239:05266] Rank 1 bound to package[0][core:0-15]
[mel0237:11996] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750
[mel0237:11996] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750

@jsquyres
Copy link
Member

It might be worth trying with non-MPI executables to see if the problem is in the OMPI stack or in PRTE.

@hppritcha
Copy link
Member

I'm having problems trying to reproduce this with main.

@thomasgillis
Copy link
Author

@hppritcha here is the script I have just run:

mpi_dir=${HOME}/lib-OMPI-5.0.0rc9-UCX-1.13.1
${mpi_dir}/bin/ompi_info
${mpi_dir}/bin/mpiexec --np 2 --map-by ppr:1:node:pe=1 --report-bindings true
${mpi_dir}/bin/mpiexec --np 2 --map-by ppr:1:node:pe=1 --report-bindings ${mpi_dir}/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw

mpi_dir=${HOME}/lib-OMPI-5.0.0rc10-UCX-1.13.1
${mpi_dir}/bin/ompi_info
${mpi_dir}/bin/mpiexec --np 2 --map-by ppr:1:node:pe=1 --report-bindings true
${mpi_dir}/bin/mpiexec --np 2 --map-by ppr:1:node:pe=1 --report-bindings ${mpi_dir}/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw

I think this should answer @jsquyres comment as well: true works fine, osu_bw has the PMIX error and seems to run. Here is my slurm output
slurm-288631.out.txt

@rhc54
Copy link
Contributor

rhc54 commented Feb 28, 2023

Not a surprise - as I noted above, the error report is coming from the application process, not the PRRTE daemon.

@thomasgillis
Copy link
Author

thomasgillis commented Feb 28, 2023

@rhc54 side question: is there an env variable I can set to replace --map-by ppr:1:node:pe=16? I have tried export PRRTE_MCA_rmaps_default_mapping_policy=ppr:1:node:pe=16 with no success

@rhc54
Copy link
Contributor

rhc54 commented Feb 28, 2023

Try it with just one 'R' in the name: PRTE_MCA_rmaps_default_mapping_policy=ppr:1:node:pe=16

@thomasgillis
Copy link
Author

thomasgillis commented Feb 28, 2023

Try it with just one 'R' in the name: PRTE_MCA_rmaps_default_mapping_policy=ppr:1:node:pe=16

@rhc54 I have tried and I get the following error:

--------------------------------------------------------------------------
A mapping policy was provided that is not supported as a default value:

  Policy:  ppr

You can provide this policy on a per-job basis, but it cannot
be the default setting.
--------------------------------------------------------------------------

btw in the doc it's with 2 Rs :-)

@rhc54
Copy link
Contributor

rhc54 commented Feb 28, 2023

Ah, that's an OMPI doc - not mine. 😄

I had forgotten that we don't allow ppr as a default - I believe that was because of PRRTE's origins as a persistent DVM (prior to OMPI re-using it as mpirun). I'll have to look at how we might handle that case.

@thomasgillis
Copy link
Author

ok, let me know what the solution is then :-) thx!

@hppritcha
Copy link
Member

still trying to reproduce this. I thought UCX might be causing a problem but i'm not seeing an issue with these mpirun options and osu_bw, but its working for me with the 5.0.0rc10

@thomasgillis
Copy link
Author

where do you run the tests?
I don't think it's ucx-based, I have seen similar issues on perlmutter (ofi).
I can try to reproduce there if it helps

@hppritcha
Copy link
Member

i was using a ib/aarch64 cluster. I do have accounts on jlse cluster, polaris, and sunspot if you were hitting this one one of those systems I can try there.

@hppritcha
Copy link
Member

please try to reproduce on perlmutter.

@hppritcha
Copy link
Member

there's one thing i noticed in the ompi_info you posted that is different from mine is the smsc xpmem option. its kind of a long shot, but could you rerun the 5.0.0rc10 with

--mca smsc ^xpmem

included on the mpiexec command line?

@thomasgillis
Copy link
Author

sorry for the late response.
I have tried with your module on perlmutter in /global/common/software/m3169/perlmutter/openmpi/5.0.0rc10-ofi-cuda-22.5_11.7/gnu/bin/mpiexec and I get the same warning message:

[nid004692:163609] PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750

@rhc54
Copy link
Contributor

rhc54 commented Mar 13, 2023

Ensure you have a debug build (i.e., configure with --enable-debug) and then set OMPI_MCA_pmix_base_verbose=10 in your environment and run your test again. Should tell us where the problem is being hit.

@thomasgillis
Copy link
Author

thomasgillis commented Mar 13, 2023

here is the output :

[nid004616:52327] mca: base: components_register: registering framework pmix components
[nid004616:52327] mca: base: components_open: opening pmix components
[nid004617:171294] mca: base: components_register: registering framework pmix components
[nid004617:171294] mca: base: components_open: opening pmix components
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:631] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.hname
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:642] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.lrank
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:657] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.nrank
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:676] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.job.size
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:691] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.univ.size
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:707] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.job.napps
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:716] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.appnum
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:752] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.app.argv
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:772] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.reinc
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:781] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.local.size
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:789] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.tmpdir
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:804] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.nsdir
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:819] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.pdir
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:835] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.wdir
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:848] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.cpuset
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:859] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.locstr
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:870] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:930] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.ndosub
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:631] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.hname
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:642] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.lrank
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:657] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.nrank
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:676] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.job.size
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:691] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.univ.size
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:707] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.job.napps
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:716] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.appnum
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:752] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.app.argv
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:772] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.reinc
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:781] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.local.size
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:789] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.tmpdir
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:804] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.nsdir
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:819] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.pdir
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:835] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.wdir
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:848] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.cpuset
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:859] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.locstr
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:870] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:930] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.ndosub
[nid004616:52327] [[4349,1],0][base/hwloc_base_util.c:218] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocfile
[nid004616:52327] [[4349,1],0][base/hwloc_base_util.c:220] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocaddr
[nid004616:52327] [[4349,1],0][base/hwloc_base_util.c:222] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocsize
[nid004616:52327] [[4349,1],0][base/hwloc_base_util.c:262] MODEX RECV VALUE IMMEDIATE FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocxml2
[nid004617:171294] [[4349,1],1][base/hwloc_base_util.c:218] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocfile
[nid004617:171294] [[4349,1],1][base/hwloc_base_util.c:220] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocaddr
[nid004617:171294] [[4349,1],1][base/hwloc_base_util.c:222] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocsize
[nid004617:171294] [[4349,1],1][base/hwloc_base_util.c:262] MODEX RECV VALUE IMMEDIATE FOR PROC [[4349,1],WILDCARD] KEY pmix.hwlocxml2
[nid004616:52327] [[4349,1],0][common_ofi.c:566] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.pkgrank
[nid004616:52327] [[4349,1],0][common_ofi.c:573] MODEX RECV VALUE FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004616:52327] [[4349,1],0][common_ofi.c:586] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.locstr
[nid004617:171294] [[4349,1],1][common_ofi.c:566] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.pkgrank
[nid004617:171294] [[4349,1],1][common_ofi.c:573] MODEX RECV VALUE FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004617:171294] [[4349,1],1][common_ofi.c:586] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.locstr
[nid004617:171294] [[4349,1],1][common_ofi.c:566] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.pkgrank
[nid004617:171294] [[4349,1],1][common_ofi.c:573] MODEX RECV VALUE FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004617:171294] [[4349,1],1][common_ofi.c:586] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],1] KEY pmix.locstr
[nid004616:52327] [[4349,1],0][common_ofi.c:566] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.pkgrank
[nid004616:52327] [[4349,1],0][common_ofi.c:573] MODEX RECV VALUE FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004616:52327] [[4349,1],0][common_ofi.c:586] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],0] KEY pmix.locstr
[nid004616:52327] [[4349,1],0][proc/proc.c:315] MODEX RECV VALUE FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004616:52327] [[4349,1],0][mtl_ofi.c:271] MODEX RECV FOR PROC [[4349,1],0] KEY mtl.ofi.5.0
[nid004616:52327] [[4349,1],0][mtl_ofi.c:271] MODEX RECV STRING FOR PROC [[4349,1],0] KEY mtl.ofi.5.0
[nid004617:171294] [[4349,1],1][proc/proc.c:315] MODEX RECV VALUE FOR PROC [[4349,1],WILDCARD] KEY pmix.lpeers
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:1109] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.dbg.notify
[nid004617:171294] [[4349,1],1][base/pml_base_select.c:294] MODEX RECV STRING FOR PROC [[4349,1],0] KEY pml.base.2.0
[nid004617:171294] [[4349,1],1][mtl_ofi.c:271] MODEX RECV FOR PROC [[4349,1],1] KEY mtl.ofi.5.0
[nid004617:171294] [[4349,1],1][mtl_ofi.c:271] MODEX RECV STRING FOR PROC [[4349,1],1] KEY mtl.ofi.5.0
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:1109] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.dbg.notify
[nid004616:52327] [[4349,1],0][communicator/comm_init.c:250] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.mapby
[nid004617:171294] [[4349,1],1][communicator/comm_init.c:250] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.mapby
[nid004616:52327] [[4349,1],0][runtime/ompi_rte.c:1109] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.dbg.notify
[nid004617:171294] [[4349,1],1][runtime/ompi_rte.c:1109] MODEX RECV VALUE OPTIONAL FOR PROC [[4349,1],WILDCARD] KEY pmix.dbg.notify
[nid004616:52327] [[4349,1],0][mtl_ofi.c:271] MODEX RECV FOR PROC [[4349,1],1] KEY mtl.ofi.5.0
[nid004616:52327] [[4349,1],0][mtl_ofi.c:271] MODEX RECV STRING FOR PROC [[4349,1],1] KEY mtl.ofi.5.0
[nid004617:171294] [[4349,1],1][base/pml_base_select.c:294] MODEX RECV STRING FOR PROC [[4349,1],0] KEY pml.base.2.0
[nid004617:171294] [[4349,1],1][mtl_ofi.c:271] MODEX RECV FOR PROC [[4349,1],0] KEY mtl.ofi.5.0
[nid004617:171294] [[4349,1],1][mtl_ofi.c:271] MODEX RECV STRING FOR PROC [[4349,1],0] KEY mtl.ofi.5.0

@hppritcha
Copy link
Member

okay I think this PMIX OUT-OF-RESOURCE message is something associated with the "estimated size" feature in prrte. If one moves the sha for prrte ahead to 10496e38a0b54722723ec83923f6311ec82d692b (in the v5.0.0rc10 tag checkout, the problem appears to disappear). Note if one does this sha advance, one. has to also advance the pmix sha, and the oac submodule shas, etc.

@rhc54
Copy link
Contributor

rhc54 commented Mar 13, 2023

Interesting - note that the error log did not appear once --enable-debug was provided. Just normal modex recv messages.

@hppritcha
Copy link
Member

the OUT-OF-RESOURCE message doesn't appear at head of 5.0.x. also additional debug statements show that the "esimatd size" key and associated activities are no longer present.

@rhc54
Copy link
Contributor

rhc54 commented Mar 13, 2023

Guess I'm getting confused - I didn't ask @thomasgillis to update submodule pointers, just to enable debug. Were the submodule pointers also advanced??

@hppritcha
Copy link
Member

i got suspicious about this after talking with Thomas here at the forum and the fact that i was having problems reproducing in my 5.0.x sandbox.

@thomasgillis could you clone ompi (and checkout v5.0.x) and see if the pmix error messages vanish for you.

@rhc54
Copy link
Contributor

rhc54 commented Mar 13, 2023

had forgotten that we don't allow ppr as a default - I believe that was because of PRRTE's origins as a persistent DVM (prior to OMPI re-using it as mpirun). I'll have to look at how we might handle that case.

Took a small amount of code, but I have enabled ppr to be provided as a default mapping policy. It is in PRRTE now and I am updating the (patiently waiting for some time now) OMPI PRs.

@Xiang-cd
Copy link

Xiang-cd commented Oct 7, 2023

PMIX ERROR: OUT-OF-RESOURCE in file base/bfrop_base_unpack.c at line 750
same issue here, openmpi version 4.1.6
run with one machine is fine, this error appear with running using multi machine

@rhc54
Copy link
Contributor

rhc54 commented Oct 7, 2023

not enough info there to do anything - please explain what you did, your cmd line, etc.

@Xiang-cd
Copy link

Xiang-cd commented Oct 7, 2023

not enough info there to do anything - please explain what you did, your cmd line, etc.

sorry, I am trying to run a program based on StarPU runtime system with multi-process, each process will start several threads.
hw info: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, IB card
system info: Linux i4 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux
system software: [email protected] build with spack, [email protected], both cuda aware; starpu version 1.4.0
running command:

spack load openmpi intel-oneapi-mkl [email protected]
export STARPU_COMM_STATS=1
`which mpirun` -np 6 -x LD_LIBRARY_PATH -x STARPU_COMM_STATS --bind-to none -hostfile hostfile --rankfile rankfile chameleon_dtesting --mtxfmt 1 --nowarmup -l 5 --uplo 1 -s -t 30 -o potrf -b 300 -n 300000 -r 0 -D 0 -P 0 -F 6 -R 4 -v 1

run with 6 process, each process will start 30 thread( with addition manage thread)
hostfile:

i1 slots=64
i2 slots=64
i3 slots=64
i4 slots=64

rankfile:

rank 0=i1 slot=0-31
rank 1=i1 slot=32-63
rank 2=i2 slot=0-31
rank 3=i2 slot=32-63
rank 4=i4 slot=0-31
rank 5=i4 slot=32-63

@rhc54
Copy link
Contributor

rhc54 commented Oct 8, 2023

I won't have a chance to look at this until late this week. One thing that stands out, though, is that --bind-to none makes no sense with the rankfile placement policy. Rankfile requires that we bind the process to the specified cpus - otherwise, why are you bothering to specify them?

It looks like you are trying to have two procs on each node, each bound to half of the cpus. If that's the case, then why not just --map-by ppr:2:node:pe=32? Or, since you have 64 cpus on each node, you could just --map-by :pe=32. Each proc will be assigned 32 sequential cpus, and since overload isn't allowed, that means only 2 procs can fit on each node. You appear to be skipping i3, so something like this should do the trick:

mpirun --host ^i3 --map-by :pe=32 ...

@Xiang-cd
Copy link

Xiang-cd commented Oct 9, 2023

thanks, I learned a lot from command usage. However, process placement seems irrelevant with PMIX OUT-OF-RESOURCE error, I've tried reducing threads number but this error still appears. furthermore, my program ends normally, but my concern is that this error will affect my performance.

@rhc54
Copy link
Contributor

rhc54 commented Oct 9, 2023

my program ends normally

Sigh - you really need to provide more complete issue reports. This is an important piece of information.

[email protected]

This issue was opened relative to an OMPI v5 release candidate. Piggy-backing on it about a different release series totally confuses the problem.

Please don't do that. Open a new issue that clearly explains the version you are using, what you did, and the problem you are concerned about.

Meantime, you might want to try updating OMPI to the most recent release - in the 4.2 series, I believe. They will need to help you from there as I don't support PMIx back that far (the embedded version is a few release series old).

@hppritcha
Copy link
Member

I am not seeing this using the 5.0.0rc15 on perlmutter gpu/cpu partitions with a non-cuda executable. I will try to get the module files and perms set for access at

/global/common/software/m3169/perlmutter/openmpi/5.0.0rc15-ofi-cuda-22.7_11.7

tomorrow.

@jsquyres jsquyres modified the milestones: v5.0.0, v5.0.1 Oct 30, 2023
@hppritcha
Copy link
Member

is this problem still being observed with the Open MPI 5.0.0 release?

@hppritcha
Copy link
Member

please reopen this issue if you observe this problem with the 5.0.0 release.

@HamedSharifian
Copy link

I get the same on ompi v5.0.0rc12 and ucx v1.13.1. I'm running a simple application with 320 ranks over 10 exclusive nodes (each with 40 cores and 32 ranks). I haven't changed the mapping/binding settings. It prints this error but doesn't terminate my application. The application does its job and terminates without any problems, I just see this error in the output file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants