-
Notifications
You must be signed in to change notification settings - Fork 789
[SYCL] Implement sycl_ext_oneapi_address_cast #12382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pennycook
commented
Jan 12, 2024
- Adds support for OpGenericCastToPtr SPIR-V intrinsic.
- Implements static_address_cast using OpGenericCastToPtr.
- Implements dynamic_address_cast using OpGenericCastToPtrExplicit.
- Adds tests for both new forms of address_cast.
- Adds support for OpGenericCastToPtr SPIR-V intrinsic. - Implements static_address_cast using OpGenericCastToPtr. - Implements dynamic_address_cast using OpGenericCastToPtrExplicit. - Adds tests for both new forms of address_cast. Signed-off-by: John Pennycook <[email protected]>
Neither form of OpGenericCastToPtr is implemented for AMD. Signed-off-by: John Pennycook <[email protected]>
Neither form of OpGenericCastToPtr is implemented for NVIDIA. Signed-off-by: John Pennycook <[email protected]>
c145219
to
152e355
Compare
template <typename dataT> | ||
extern __attribute__((opencl_global)) dataT * | ||
__SYCL_GenericCastToPtr_ToGlobal(void *Ptr) noexcept { | ||
return (__attribute__((opencl_global)) dataT *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: no need for cast, the returned type should be already correctly typed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I just copied the existing definitions of __SYCL_GenericCastToPtrExplicit
here.
I agree it makes sense to change this, but I'd prefer for it to be done as part of a follow-up PR. I don't really want to touch the existing definitions, in case something breaks unexpectedly.
Success &= LocalPointer.get_raw() == nullptr; | ||
Success &= PrivatePointer.get_raw() == nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmoadeli for your awareness, your patch is going to be required to enable this test on cuda and hip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFE change looks good to me. Thanks
@intel/llvm-gatekeepers, I think this can be merged now. |