Skip to content

add support for CUDA allocation flags #1079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

bratpiorka
Copy link
Contributor

@bratpiorka bratpiorka commented Feb 6, 2025

Add support for CUDA allocation flags.

This PR adds one new API call for CUDA:

  • umfCUDAMemoryProviderParamsSetAllocFlags()

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly

@bratpiorka bratpiorka force-pushed the rrudnick_cuda_flags branch 3 times, most recently from 4114f01 to 9d300d4 Compare February 6, 2025 13:58
@bratpiorka bratpiorka force-pushed the rrudnick_cuda_flags branch 5 times, most recently from 6d18197 to 0bbce23 Compare February 7, 2025 14:18
@bratpiorka bratpiorka added this to the v0.11.x milestone Feb 10, 2025
@bratpiorka bratpiorka marked this pull request as ready for review February 10, 2025 21:33
@bratpiorka bratpiorka requested a review from a team as a code owner February 10, 2025 21:33
@bratpiorka bratpiorka force-pushed the rrudnick_cuda_flags branch 2 times, most recently from f216ec8 to 43544e5 Compare February 12, 2025 09:09
@bratpiorka bratpiorka requested a review from igchor February 12, 2025 09:17
@bratpiorka
Copy link
Contributor Author

@ldrumm @igchor @vinser52 all tests pass, ready to re-review

@ldrumm
Copy link

ldrumm commented Feb 13, 2025 via email

Copy link

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this implementation of this feature as it pertains to CUDA, but do we need to better generalize for other accelerator APIs that also takes flags for allocations? CUDA is one, but HIP supports a different set of flags, and Intel OpenCL extensions also extend allocation to take hints.

As I discussed here I've already got work in progress that does similar things for all accelerator targets, so if UMF can provide a generic way to do this, then I'd prefer that

Please note, I'm still not to be considered competent with understanding how UMF works so feel free to tell me I'm misunderstanding

// Handle to the CUDA context
void *cuda_context_handle;

// Handle to the CUDA device
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer doxygen style comments. Is that intended since this is implementation and non-public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this code was moved from the public include files to this location when we updated the UMF API to use setters instead of directly exposign the parameter structure. So this change is just a cleanup

@igchor
Copy link
Member

igchor commented Feb 13, 2025

I'm happy with this implementation of this feature as it pertains to CUDA, but do we need to better generalize for other accelerator APIs that also takes flags for allocations? CUDA is one, but HIP supports a different set of flags, and Intel OpenCL extensions also extend allocation to take hints.

As I discussed here I've already got work in progress that does similar things for all accelerator targets, so if UMF can provide a generic way to do this, then I'd prefer that

Please note, I'm still not to be considered competent with understanding how UMF works so feel free to tell me I'm misunderstanding

@ldrumm UMF doesn't allow flags/hints to be passed to pool allocation APIs. This is because those flags/hints can only be applied to the entire memory page. If we would, for example, allow umfPoolMalloc(1024, READ_MOSTLY_HINT), internally UMF would have to apply the hint to the entire memory page which would impact other, unrelated allocations.

Our current thinking is that the upper level (UR or SYCL) should decide what to do for every specific flag. For some flags (ones that are used often), it might make sense to create dedicated pools (for which entire memory is allocated with those flags). For others (or when the allocation size is big), it might be better to not do any pooling (just pass the request directly to the driver). UR spec gives as quite a lot of flexibility in how we can implement this in the adapters.

@bratpiorka bratpiorka merged commit 1160a27 into oneapi-src:main Feb 14, 2025
77 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants