Skip to content

Clear tracker for the current pool on destroy #759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ldorau
Copy link
Contributor

@ldorau ldorau commented Sep 26, 2024

Description

Clear tracker for the current pool when destroying the pool
and clear the whole tracker when destroying the UMF library.
Print error messages about unfreed allocations left in the tracker
only if provider supports the free() operation.

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly

@ldorau ldorau requested a review from a team as a code owner September 26, 2024 09:22
@lukaszstolarczuk
Copy link
Contributor

do we have any place where this was failing (without your change)?

@ldorau
Copy link
Contributor Author

ldorau commented Sep 26, 2024

do we have any place where this was failing (without your change)?

We do not parse logs in our CI tests.

@ldorau ldorau force-pushed the Check_if_the_provider_supports_the_free_operation branch from e14fcbd to abf8876 Compare September 26, 2024 10:32
@vinser52
Copy link
Contributor

I do not understand why this logic is needed. I thought that free is a mandatory operation for all memory providers, isn't it?

@ldorau
Copy link
Contributor Author

ldorau commented Sep 26, 2024

I do not understand why this logic is needed. I thought that free is a mandatory operation for all memory providers, isn't it?

The DevDax memory provider:
https://github.com/oneapi-src/unified-memory-framework/blob/main/README.md#devdax-memory-provider-linux-only

and the file memory provider:
https://github.com/oneapi-src/unified-memory-framework/blob/main/README.md#file-memory-provider-linux-only-yet

do not support the free() operation - they both free whole available memory in the destroy() operation.

@vinser52
Copy link
Contributor

I do not understand why this logic is needed. I thought that free is a mandatory operation for all memory providers, isn't it?

The DevDax memory provider: https://github.com/oneapi-src/unified-memory-framework/blob/main/README.md#devdax-memory-provider-linux-only

and the file memory provider: https://github.com/oneapi-src/unified-memory-framework/blob/main/README.md#file-memory-provider-linux-only-yet

do not support the free() operation - they both free whole available memory in the destroy() operation.

I missed that (probably because I was on vacation), what is the reason for such a limitation? Does it mean that only jemalloc pool can be used today with these memory providers?

@ldorau
Copy link
Contributor Author

ldorau commented Sep 26, 2024

Does it mean that only jemalloc pool can be used today with these memory providers?

Only the jemalloc pool or the course provider (#715 - hopefully it will be merged soon).

@ldorau
Copy link
Contributor Author

ldorau commented Sep 26, 2024

I missed that (probably because I was on vacation), what is the reason for such a limitation?

The reason is to keep these providers as simple as possible. The course provider will take over the memory management for them.

@ldorau ldorau changed the title Check if the provider supports the free() operation Call check_if_tracker_is_empty() conditionally Sep 27, 2024
@ldorau ldorau force-pushed the Check_if_the_provider_supports_the_free_operation branch from abf8876 to 34197ef Compare September 27, 2024 07:09
@vinser52
Copy link
Contributor

The reason is to keep these providers as simple as possible. The course provider will take over the memory management for them.

I look from the user perspective, does "keep these providers as simple as possible" mean that we simplify things for us (UMF developers) but complicate it for the users?

Another question regarding correctness: when memory tracker is cleared? When the memory provider allocates a memory region it is added to the memory tracker, but if the pool implementation does not release memory back and relies on the fact that the memory provider implementation will release virtual memory mapping who will cleanup the memory tracker? If it is not done then it is an error, because after the memory provider is destroyed the virtual memory address range might be re-used by other memory providers and the same virtual memory region might be added to the tracker and it will fail because the same virtual memory region already presented in the tracker.

@ldorau ldorau changed the title Call check_if_tracker_is_empty() conditionally Clear tracker for the pool if provider does not support free() operation Sep 27, 2024
@ldorau ldorau force-pushed the Check_if_the_provider_supports_the_free_operation branch from 34197ef to b147c91 Compare September 27, 2024 11:59
@ldorau
Copy link
Contributor Author

ldorau commented Sep 27, 2024

Another question regarding correctness: when memory tracker is cleared? When the memory provider allocates a memory region it is added to the memory tracker, but if the pool implementation does not release memory back and relies on the fact that the memory provider implementation will release virtual memory mapping who will cleanup the memory tracker? If it is not done then it is an error, because after the memory provider is destroyed the virtual memory address range might be re-used by other memory providers and the same virtual memory region might be added to the tracker and it will fail because the same virtual memory region already presented in the tracker.

Right. So I replaced it with clearing the tracker for the current pool - please re-review

@ldorau ldorau force-pushed the Check_if_the_provider_supports_the_free_operation branch 2 times, most recently from 120b06d to 3d3adf1 Compare September 28, 2024 16:37
@ldorau ldorau changed the title Clear tracker for the pool if provider does not support free() operation Clear tracker for the current pool on destroy Sep 28, 2024
@ldorau ldorau force-pushed the Check_if_the_provider_supports_the_free_operation branch 2 times, most recently from d7d4afd to 04a55e3 Compare September 30, 2024 06:44
@ldorau
Copy link
Contributor Author

ldorau commented Sep 30, 2024

Clear tracker for the current pool when destroying the pool
and clear the whole tracker when destroying the UMF library.
Print error messages about unfreed allocations left in the tracker
only if provider supports the free() operation.

@vinser52
Copy link
Contributor

Another question regarding correctness: when memory tracker is cleared? When the memory provider allocates a memory region it is added to the memory tracker, but if the pool implementation does not release memory back and relies on the fact that the memory provider implementation will release virtual memory mapping who will cleanup the memory tracker? If it is not done then it is an error, because after the memory provider is destroyed the virtual memory address range might be re-used by other memory providers and the same virtual memory region might be added to the tracker and it will fail because the same virtual memory region already presented in the tracker.

Right. So I replaced it with clearing the tracker for the current pool - please re-review

I still think that not supporting free operation in the memory provider is weird:

  • First, it means that not every pool manager can be used with such memory provider or user needs to use coarse provider on top of upstream provider that does not support free operation.
  • Second, in this PR you need to iterate via the tracker (that contains records from all pools) and filter only records for the corresponding pool. It might be a performance impact in real life.

Another consideration, I am working on the cache implementation for opened IPC handles, the cache is global and I need to remove entries from the IPC cache when the corresponding pool is destroyed. It looks like a similar problem. BUt I still think that in your case it is better to require free implementation in all providers.

@ldorau
Copy link
Contributor Author

ldorau commented Oct 1, 2024

I still think that not supporting free operation in the memory provider is weird:

  • First, it means that not every pool manager can be used with such memory provider or user needs to use coarse provider on top of upstream provider that does not support free operation.
  • Second, in this PR you need to iterate via the tracker (that contains records from all pools) and filter only records for the corresponding pool. It might be a performance impact in real life.

Another consideration, I am working on the cache implementation for opened IPC handles, the cache is global and I need to remove entries from the IPC cache when the corresponding pool is destroyed. It looks like a similar problem. BUt I still think that in your case it is better to require free implementation in all providers.

@vinser52 OK, let's move this discussion to #769.

This PR is about clearing the tracker. Currently it is not cleared and this patch adds clearing the tracker.

@vinser52 You have written recently:

Another question regarding correctness: when memory tracker is cleared? When the memory provider allocates a memory region it is added to the memory tracker, but if the pool implementation does not release memory back and relies on the fact that the memory provider implementation will release virtual memory mapping who will cleanup the memory tracker? If it is not done then it is an error, because after the memory provider is destroyed the virtual memory address range might be re-used by other memory providers and the same virtual memory region might be added to the tracker and it will fail because the same virtual memory region already presented in the tracker.

So now the question is: should we clear the tracker or not?

}
#ifndef NDEBUG
// print error messages only if provider supports the free() operation
if (n_items && !upstreamDoesNotFree) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is upstreamDoesNotFree needed for anything other than debug printing?

Copy link
Contributor Author

@ldorau ldorau Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently it is not. The tracker is cleared from all left non-freed pointers regardless of value of upstreamDoesNotFree.

@vinser52
Copy link
Contributor

vinser52 commented Oct 1, 2024

So now the question is: should we clear the tracker or not?

The general rule is that the tracker should always be cleared when some chunk is deallocated. Otherwise the state of the tracker is incorrect.

Now regarding this PR: since file and DAX providers are already upstreamed to the main branch, this PR fixes the issue with the tracker. Yesterday we discussed this with @bratpiorka and agreed that while I disagree with the general design and we need design discussion on free API implementation, we need to merge this PR as a temporary workaround to fix the main branch.

@ldorau
Copy link
Contributor Author

ldorau commented Oct 1, 2024

So now the question is: should we clear the tracker or not?

The general rule is that the tracker should always be cleared when some chunk is deallocated. Otherwise the state of the tracker is incorrect.

Now regarding this PR: since file and DAX providers are already upstreamed to the main branch, this PR fixes the issue with the tracker. Yesterday we discussed this with @bratpiorka and agreed that while I disagree with the general design and we need design discussion on free API implementation, we need to merge this PR as a temporary workaround to fix the main branch.

Yes, I agree.

So you can approve or request changes to this PR.

@vinser52
Copy link
Contributor

vinser52 commented Oct 1, 2024

So you can approve or request changes to this PR.

I approved, but maybe you can add a TODO comment to note that it is a temporary solution.

@ldorau
Copy link
Contributor Author

ldorau commented Oct 2, 2024

So you can approve or request changes to this PR.

I approved, but maybe you can add a TODO comment to note that it is a temporary solution.

@vinser52 It seems that after the yesterday's discussion (free() is an optional op) it is not a temporary solution,
so TODO is not needed?

@vinser52
Copy link
Contributor

vinser52 commented Oct 2, 2024

So you can approve or request changes to this PR.

I approved, but maybe you can add a TODO comment to note that it is a temporary solution.

@vinser52 It seems that after the yesterday's discussion (free() is an optional op) it is not a temporary solution, so TODO is not needed?

free() is optional, but on the meeting we agreed that in case some provider does not define free() UMF will wrap such provider with coarse grain provider that supports free(). So my initial comment was about your solution to cleanup the tracker: we do not want to iterate over memory tracker, and we are doing it right now temporary to fix the main branch.

Clear tracker for the current pool on destroy.
Do not print error messages if provider
does not support the free() operation.

Signed-off-by: Lukasz Dorau <[email protected]>
@ldorau ldorau force-pushed the Check_if_the_provider_supports_the_free_operation branch from 04a55e3 to 2766a21 Compare October 2, 2024 09:02
@ldorau
Copy link
Contributor Author

ldorau commented Oct 2, 2024

@vinser52 TODO added.

@ldorau ldorau merged commit 3e4f65f into oneapi-src:main Oct 2, 2024
72 checks passed
@ldorau ldorau deleted the Check_if_the_provider_supports_the_free_operation branch October 2, 2024 13:38
ldorau added a commit to ldorau/unified-memory-framework that referenced this pull request Feb 18, 2025
Clearing the tracker was a temporary solution and should be removed.
The tracker should be cleared using the provider's free() operation.

Replace clear_tracker_for_the_pool() with check_if_tracker_is_empty().

This patch reverts commit 2766a21

Ref: oneapi-src#759

Signed-off-by: Lukasz Dorau <[email protected]>
ldorau added a commit to ldorau/unified-memory-framework that referenced this pull request Feb 18, 2025
Clearing the tracker was a temporary solution and should be removed.
The tracker should be cleared using the provider's free() operation.

Replace clear_tracker_for_the_pool() with check_if_tracker_is_empty().

This patch reverts commit 2766a21

Ref: oneapi-src#759

Signed-off-by: Lukasz Dorau <[email protected]>
ldorau added a commit to ldorau/unified-memory-framework that referenced this pull request Feb 18, 2025
Clearing the tracker was a temporary solution and should be removed.
The tracker should be cleared using the provider's free() operation.

Replace clear_tracker_for_the_pool() with check_if_tracker_is_empty().

This patch reverts commit 2766a21

Ref: oneapi-src#759

Signed-off-by: Lukasz Dorau <[email protected]>
rbanka1 pushed a commit to rbanka1/unified-memory-framework that referenced this pull request Feb 21, 2025
Clearing the tracker was a temporary solution and should be removed.
The tracker should be cleared using the provider's free() operation.

Replace clear_tracker_for_the_pool() with check_if_tracker_is_empty().

This patch reverts commit 2766a21

Ref: oneapi-src#759

Signed-off-by: Lukasz Dorau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants