Skip to content

Fix/nb leaks #8926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Fix/nb leaks #8926

wants to merge 2 commits into from

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented May 5, 2021

A possible fix for #8763

@bosilca bosilca requested review from jsquyres and ggouaillardet May 5, 2021 04:39
@bosilca bosilca force-pushed the fix/nb_leaks branch 3 times, most recently from a4caa6f to aa31cf8 Compare October 6, 2021 04:59
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 6, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 6, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 6, 2021
@bosilca bosilca added this to the v5.0.0 milestone Oct 6, 2021
@open-mpi open-mpi deleted a comment from jjhursey Oct 6, 2021
@bosilca
Copy link
Member Author

bosilca commented Oct 6, 2021

We need to converge on this and once we reached a consensus we need to apply the same logic to basically all the non-blocking Fortran functions. This includes:

  • ineighbor_alltoallw_f.c,
  • ineighbor_alltoallv_f.c,
  • neighbor_allgatherv_init_f.c,
  • neighbor_alltoallw_init_f.c,
  • neighbor_alltoallv_init_f.c,
  • ineighbor_allgatherv_f.c,
  • igatherv_f.c,
  • iallgatherv_f.c,
  • iscatterv_f.c,
  • ialltoallv_f.c,
  • ialltoallw_f.c,
  • ireduce_scatter_f.c,
  • reduce_scatter_init_f.c,
  • alltoallv_init_f.c,
  • alltoallw_init_f.c,
  • allgatherv_init_f.c,
  • gatherv_init_f.c,
  • scatterv_init_f.c.

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will approve once the other functions are in 👍

@open-mpi open-mpi deleted a comment from ibm-ompi Oct 6, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 6, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 6, 2021
Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small formatting nitpick, looks good otherwise.

@bosilca bosilca force-pushed the fix/nb_leaks branch 2 times, most recently from 7507a31 to 301c45e Compare October 7, 2021 19:18
@gpaulsen
Copy link
Member

gpaulsen commented Nov 1, 2021

Looks like Mellanox build failed with two compile errors:

                        ^
coll_hcoll_rte.c: In function 'get_coll_handle':
coll_hcoll_rte.c:355:19: error: 'struct <anonymous>' has no member named 'objs'
     ompi_req->data.objs.objs[0]          = NULL;
                   ^
coll_hcoll_rte.c:356:19: error: 'struct <anonymous>' has no member named 'objs'
     ompi_req->data.objs.objs[1]          = NULL;
                   ^

@open-mpi open-mpi deleted a comment from jsquyres Nov 16, 2021
@open-mpi open-mpi deleted a comment from azure-pipelines bot Nov 16, 2021
@open-mpi open-mpi deleted a comment from janjust Nov 16, 2021
@open-mpi open-mpi deleted a comment from azure-pipelines bot Nov 16, 2021
@bosilca
Copy link
Member Author

bosilca commented Nov 16, 2021

I don't have hcoll on mu clusters, but I think I pushed a fix for this compile error. The PR is ready for final review and them merge. @ggouaillardet and @devreal please one last review, maybe we can make it in the pending releases.

@gpaulsen gpaulsen requested a review from devreal November 30, 2021 19:42
@gpaulsen
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bosilca bosilca force-pushed the fix/nb_leaks branch 2 times, most recently from bc791a2 to 301c45e Compare December 28, 2021 19:27
If Fortran and C integers have different sizes we allocate the storage
for the C objects in the MPI API. For nonblocking functions we need to
release these temporary arrays upon completion of the request in order
to avoid memory leaks.

Signed-off-by: George Bosilca <[email protected]>
@gpaulsen
Copy link
Member

@ggouaillardet @devreal can you please re-review?

@gpaulsen
Copy link
Member

gpaulsen commented Feb 9, 2022

@devreal @ggouaillardet ?

@gpaulsen
Copy link
Member

gpaulsen commented Feb 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Austen Lauria <[email protected]>
(cherry picked from commit 4a1c29a)
@awlauria
Copy link
Contributor

@bosilca I fixed the compile error. Does that look right?

@janjust ^^ this touches hcoll.

@awlauria
Copy link
Contributor

If that's right can you squash and rebase on current main?

@awlauria
Copy link
Contributor

awlauria commented Apr 18, 2022

Closing for #10288

It's a squash of the two commits here and a rebase on top of main.

@awlauria awlauria closed this Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants