Skip to content

[SYCL][CUDA] Set the device primary context for the cuMemGetInfo call #7906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 6, 2023
Merged

[SYCL][CUDA] Set the device primary context for the cuMemGetInfo call #7906

merged 2 commits into from
Jan 6, 2023

Conversation

zjin-lcf
Copy link
Contributor

@zjin-lcf zjin-lcf commented Jan 3, 2023

This PR tries to fix the Cuda API error detected: cuMemGetInfo_v2 returned (0xc9) in #5713. Thank you for your review.

@zjin-lcf zjin-lcf requested a review from a team as a code owner January 3, 2023 17:55
@zjin-lcf zjin-lcf temporarily deployed to aws January 3, 2023 18:31 — with GitHub Actions Inactive
@zjin-lcf zjin-lcf temporarily deployed to aws January 3, 2023 19:54 — with GitHub Actions Inactive
@abagusetty
Copy link
Contributor

Thanks for the fix!

I see that when using device object directly to access the free_memory, the context was not initialized but however if accesed via a queue queue->get_device(), a valid context was established.

@zjin-lcf zjin-lcf temporarily deployed to aws January 4, 2023 12:24 — with GitHub Actions Inactive
@zjin-lcf zjin-lcf temporarily deployed to aws January 4, 2023 12:54 — with GitHub Actions Inactive
@bader bader merged commit 4713aeb into intel:sycl Jan 6, 2023
steffenlarsen pushed a commit that referenced this pull request Feb 2, 2023
Extend the `ScopedContext` to work with just a device, in that case it
will simply use the primary context.

This is helpful for entry points that only have a `pi_device` and no
`pi_context` but that still need some cuda calls that require an active
context, such as for the device infos.

This addresses a bug where getting the amount of free memory before
creating any queues or context, would simply crash.

This was partially solved in a previous PR
(#7906), however the previous PR was
releasing the primary context, but leaving it active on the current
thread, so getting the device info twice in a row would end up crashing
again since it would just use the active but released primary context.

This should address: #8117
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants