Skip to content

Throw when device does not support USM allocation #12446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 25, 2024

Conversation

lbushi25
Copy link
Contributor

@lbushi25 lbushi25 commented Jan 19, 2024

The USM allocation functions are changed to conform with the spec and throw when certain usm allocation modes are not supported.
malloc_shared is left incomplete because it breaks a lot of tests and its more reasonable to handle that in a later PR. In the meantime, malloc_device and malloc_host have been changed to conform with the spec and throw an exception when the corresponding USM aspect is not supported. The test usm_pooling.cpp has also been updated to reflect this query about the memory access properties in the L0 API so that it knows to expect it in the output.

Copy link
Contributor

github-actions bot commented Jan 19, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@lbushi25 lbushi25 marked this pull request as ready for review January 23, 2024 02:05
@lbushi25 lbushi25 requested a review from a team as a code owner January 23, 2024 02:05
@lbushi25 lbushi25 requested a review from againull January 23, 2024 02:05
@lbushi25
Copy link
Contributor Author

lbushi25 commented Jan 23, 2024

This PR is meant to fix this issue : #9513.
The issue has been fixed however there is a test named sycl/test-e2e/USM/usm_pooling.cpp that fails when tested on Intel L0 GPU. This however seems to be due to the test implementation and not an implementation error on my part. In particular, the test is failing because my changes will make the driver code in the test to emit an extra L0 API call which the test does not expect for some reason and the check then fails when the command at line 8 of the test is run.

@lbushi25
Copy link
Contributor Author

lbushi25 commented Jan 23, 2024

@againull
I have implemented your suggested corrections.
However, it seems like my latest change is breaking a number of tests due to throwing an exception when usm_shared_allocations is not supported. The reason seems to be that many of these tests do not catch exceptions when attempting to allocate memory rather they rely on the returned pointer being null. Thoughts on how to move forward?

@againull
Copy link
Contributor

againull commented Jan 23, 2024

@againull I have implemented your suggested corrections. However, it seems like my latest change is breaking a number of tests due to throwing an exception when usm_shared_allocations is not supported. The reason seems to be that many of these tests do not catch exceptions when attempting to allocate memory rather they rely on the returned pointer being null. Thoughts on how to move forward?

If tests are really relying on the behavior that these functions return nullptr when aspect is not supported then we have to fix all these tests in my opinion. Because such behaviour is not aligned with specification.

Per SYCL 2020 spec:
malloc_device:
Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_device_allocations.

malloc_shared:
Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_shared_allocations.

malloc_host:
Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_host_allocations.

@lbushi25
Copy link
Contributor Author

@againull I have implemented your suggested corrections. However, it seems like my latest change is breaking a number of tests due to throwing an exception when usm_shared_allocations is not supported. The reason seems to be that many of these tests do not catch exceptions when attempting to allocate memory rather they rely on the returned pointer being null. Thoughts on how to move forward?

If tests are really relying on the behavior that these functions return nullptr when aspect is not supported then we have to fix all these tests in my opinion. Because such behaviour is not aligned with specification.

Per SYCL 2020 spec: malloc_device: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_device_allocations.

malloc_shared: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_shared_allocations.

malloc_host: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_host_allocations.

@againull Is there someone we can tag to have a closer look at these tests?
For convencience here is the failure link: https://github.com/intel/llvm/actions/runs/7630226166/job/20787344943?pr=12446

@againull
Copy link
Contributor

againull commented Jan 23, 2024

@againull I have implemented your suggested corrections. However, it seems like my latest change is breaking a number of tests due to throwing an exception when usm_shared_allocations is not supported. The reason seems to be that many of these tests do not catch exceptions when attempting to allocate memory rather they rely on the returned pointer being null. Thoughts on how to move forward?

If tests are really relying on the behavior that these functions return nullptr when aspect is not supported then we have to fix all these tests in my opinion. Because such behaviour is not aligned with specification.
Per SYCL 2020 spec: malloc_device: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_device_allocations.
malloc_shared: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_shared_allocations.
malloc_host: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_host_allocations.

@againull Is there someone we can tag to have a closer look at these tests? For convencience here is the failure link: https://github.com/intel/llvm/actions/runs/7630226166/job/20787344943?pr=12446

I believe tests need to be fixed/updated in scope of this PR. I don't think it makes sense to ask authors of each test to fix it. We don't have a single person who is responsible for maintaining the tests.

@lbushi25
Copy link
Contributor Author

lbushi25 commented Jan 24, 2024

@againull I have implemented your suggested corrections. However, it seems like my latest change is breaking a number of tests due to throwing an exception when usm_shared_allocations is not supported. The reason seems to be that many of these tests do not catch exceptions when attempting to allocate memory rather they rely on the returned pointer being null. Thoughts on how to move forward?

If tests are really relying on the behavior that these functions return nullptr when aspect is not supported then we have to fix all these tests in my opinion. Because such behaviour is not aligned with specification.
Per SYCL 2020 spec: malloc_device: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_device_allocations.
malloc_shared: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_shared_allocations.
malloc_host: Throws a synchronous exception with the errc::feature_not_supported error code if the syclDevice does not have aspect::usm_host_allocations.

@againull Is there someone we can tag to have a closer look at these tests? For convencience here is the failure link: https://github.com/intel/llvm/actions/runs/7630226166/job/20787344943?pr=12446

I believe tests need to be fixed/updated in scope of this PR. I don't think it makes sense to ask authors of each test to fix it. We don't have a single person who is responsible for maintaining the tests.

@againull
I removed the code for the case when the allocation is shared because almost all the tests are failing because of it. I think this is the best thing to do because having all the tests changed to conform with the spec could be very risky because I have no knowledge as to what the tests are actually testing and expecting and I will most likely introduce bugs in many of these tests. Instead, I will create another github issue that can be looked at a later time.

@AlexeySachkov AlexeySachkov requested a review from a team January 24, 2024 17:43
Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with condition that tracker for malloc_shared will be created.

@lbushi25
Copy link
Contributor Author

@intel/llvm-gatekeepers Can we get this merged?
The 3 failures are all due to the same test which is unrelated to this PR and is being fixed by #12485

@lbushi25
Copy link
Contributor Author

LGTM with condition that tracker for malloc_shared will be created.

A github issue has been created here: #12490
Will soon create a JIRA ticket.

@steffenlarsen
Copy link
Contributor

With #12485 merged, I am just retriggering testing for good measure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants