-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][libdevice] Get rid of builtins for memcpy memset in libdevice #4919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The patch is done but we need more extensive testing for it, we are working on enhance tests for libdevice memcpy/set using USM: intel/llvm-test-suite#558 Thanks very much. |
Previously, memcpy and memset are implemented in libdevice by calling to __builtin_memcpy and __builtin_memset. Such code will lead to infinite loop if underying CPU runtime implements memcpy/set builtin by calling memcpy, memset. The function call chain is following: In libdevice: memcpy(dest, src, n) { return __devicelib_memcpy(dest, src, n); } __devicelib_memcpy(dest, src, n) { return __builtin_memcpy(dest, src, n); } In CPU runtime: Handing of __builtin_memcpy<--------------------| | | | | |---->memcpy------>__devicelib_memcpy In order to fix this, we have to provide implementation for memcpy/set without using __builtin_*.
/summary:run |
Hi, @vzakhari |
/summary:run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few style changes required to align with the project coding guidelines.
Signed-off-by: jinge90 <[email protected]>
Hi, @vzakhari |
Previously, memcpy and memset are implemented in libdevice by calling
to __builtin_memcpy and __builtin_memset. Such code will lead to
infinite loop if underying CPU runtime implements memcpy/set builtin by
calling memcpy, memset. The function call chain is following:
In libdevice:
memcpy(dest, src, n) {
return __devicelib_memcpy(dest, src, n);
}
__devicelib_memcpy(dest, src, n) {
return __builtin_memcpy(dest, src, n);
}
In CPU runtime:
Handing of __builtin_memcpy<-------------------|
| |
| |
|---->memcpy------>__devicelib_memcpy
In order to fix this, we have to provide implementation for memcpy/set
without using _builtin*.