-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL] Implement device libraries: C assert #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Device libraries provide a way to support functions from standard C and C++ system headers in SYCL device code. Runtime library requirements (functions that are normally defined in libc, libm or libstdc++) are provided in a SPIR-V library (libsycl-fallback.spv), which is linked at JIT (or AOT) time. If a particular device supports a library OpenCL extension (e.g. cl_intel_devicelib_assert), then SPIR-V implementation is not linked and the device compiler has to provide definitions for the corresponding functions. Signed-off-by: Andrew Savonichev <[email protected]>
sycl/doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Savonichev <[email protected]>
Since build() may return a different program object, we need to properly release the old one (or not, if we return the same object). Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
This should fix: fatal error: 'CL/__spirv/spirv_vars.hpp' file not found Signed-off-by: Andrew Savonichev <[email protected]>
sycl/doc/extensions/C-CXX-StandardLibrary/C-CXX-StandardLibrary.rst
Outdated
Show resolved
Hide resolved
sycl/include/CL/sycl/detail/program_manager/program_manager.hpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Clang calls it under the hood, but it has no explicit dependency for it in cmake. Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PI_CALL(piContextGetInfo)(Context, PI_CONTEXT_INFO_NUM_DEVICES, | ||
sizeof(NumDevices), &NumDevices, nullptr); | ||
Devices.resize(NumDevices); | ||
PI_CALL(piContextGetInfo)(Context, PI_CONTEXT_INFO_DEVICES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor. Please handle possible error from piContextGetInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PI_CALL is supposed to throw an exception if the call fails, right? Not sure what else is missing.
RT::PiResult Error = PI_CALL_NOCHECK(piProgramBuild)( | ||
Program.get(), Devices.size(), Devices.data(), Opts, nullptr, nullptr); | ||
if (Error != PI_SUCCESS) | ||
compile_program_error(getProgramBuildLog(Program.get())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG, we really need some static analysis tool here!
// an incomplete (but valid) LinkedProg, and throw. | ||
throw compile_program_error(getProgramBuildLog(LinkedProg)); | ||
} | ||
pi::checkPiResult(Error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here? Probably throw more user friendly exception than checkPiResult throws here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least leave a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current check should throw an exception and mention the error code. What should I mention in addition to that?
} | ||
pi::checkPiResult(Error); | ||
} | ||
return Program; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest checking that error == CL_SUCCESS and return so error handling is not in the if body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is going to look weird:
if (Error != PI_SUCCESS) {
return Program;
}
if (LinkedProg) {
// A non-trivial error occurred during linkage: get a build log, release
// an incomplete (but valid) LinkedProg, and throw.
throw compile_program_error(getProgramBuildLog(LinkedProg));
}
pi::checkPiResult(Error);
A function that returns non-void ends with just pi::checkPiResult(Error);
.
It is not obvious that the function is going to throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar nitpicks are not blocking, of course.
sycl/doc/extensions/C-CXX-StandardLibrary/C-CXX-StandardLibrary.rst
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/C-CXX-StandardLibrary/C-CXX-StandardLibrary.rst
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/C-CXX-StandardLibrary/C-CXX-StandardLibrary.rst
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/C-CXX-StandardLibrary/C-CXX-StandardLibrary.rst
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/C-CXX-StandardLibrary/C-CXX-StandardLibrary.rst
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/C-CXX-StandardLibrary/DeviceLibExtensions.rst
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me, I don't have any major comments.
Anyway, I would like to @sergey-semenov to approve this patch too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global idea of allowing more normal code in kernels on accelerators is great.
But some points are unclear to me.
Probably I miss some internal context.
cgh.parallel_for<class SimpleVadd<T>>(numOfItems, | ||
[=](cl::sycl::id<1> wiID) { | ||
accessorC[wiID] = accessorA[wiID] + accessorB[wiID]; | ||
assert(accessorC[wiID] > 0 && "Invalid value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is assert a good example? It relies on some kernel features not available in OpenCL... Are there real use cases?
What about this implementation:
#ifdef __SYCL_DEVICE_ONLY__
#define NDEBUG 1
#endif
? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is assert a good example? It relies on some kernel features not available in OpenCL...
Right, assert is an edge case and it is probably a bad example.
However, it is the only standard C and C++ function supported at this moment.
Are there real use cases?
For assert? It is widely used as a sanity check in host code, so I don't see a reason why it cannot be used for device code as well.
What about this implementation:
#ifdef __SYCL_DEVICE_ONLY__ #define NDEBUG 1 #endif
That should work if you define the macro before #include <assert.h>
.
Beware any significant changes under #if NDEBUG
:
struct shared {
int i;
#ifndef NDEBUG
int debug;
#endif
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure that assert()
is really "supported", in the sense that the behaviour of what is proposed here is undefined.
The macro __SYCL_DEVICE_ONLY__
is set by the SYCL device compiler.
Having the device compiler driver to also set NDEBUG
would be abusive but would somehow solve a little bit of the "problem".
But you could always #define assert()
in your sycl.hpp
even after including "cassert".
Anyway all this looks messy because, well, this is outside of the scope of SYCL and kernels...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the device compiler driver to also set
NDEBUG
would be abusive but would somehow solve a little bit of the "problem".
But you could always#define assert()
in yoursycl.hpp
even after including "cassert".
There are two problems:
- This solution depends on order of
#include
directives: if you includeassert.h
aftersycl.hpp
, the macro will be redefined again.
#ifdef _ASSERT_H
# undef _ASSERT_H
# undef assert
[...]
#endif /* assert.h */
- Even if you manage to define the macro properly, there is still a problem of not having a proper abort function.
So we avoid this dance around __assert_fail
and _wassert
, but we still have to manage the devicelib assert extension.
Anyway all this looks messy because, well, this is outside of the scope of SYCL and kernels...
Agree.
libraries, and allows to use them in SYCL device code. | ||
|
||
Function declarations are taken from the standard headers (e.g. from | ||
<assert.h> or <complex>), and the corresponding header has to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<assert.h>
or <complex>
But note that normally you should use in C++ instead...
clang++ -fsycl main.obj %SYCL_INSTALL%/lib/libsycl-msvc.o -o a.exe | ||
|
||
List of supported functions from C standard library: | ||
- assert macro (from assert.h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.h
and it indicates that a program does not execute as expected. | ||
The function should print a message containing the information | ||
provided in the arguments. In addition to that, the function is free | ||
to terminate the current kernel invocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if not, what happens next?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns to the caller, and the kernel continues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it launch a nuclear missile that should have been avoided?
I would prefer the program not to compile at the first place!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advise against using SYCL for anything that involves radioactive materials or weapons of mass destruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it launch a nuclear missile that should have been avoided?
Wait, have you guarded this just by an assertion? Not the best idea, IMO
// assert() call in device code guarantees nothing: on some devices it behaves | ||
// in the usual way and terminates the program. On other devices it can print an | ||
// error message and *continue* execution. Less capable devices can even ignore | ||
// an assert! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the conclusion is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the conclusion is...
... the assert's behavior in SYCL device code is unspecified. However, you should get the expected behavior on devices that have capabilities to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But how to do this in a consistent and reasonable way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to implement cl_intel_devicelib_cassert extension for all devices you care about to guarantee consistency between them.
// | ||
// Fallback mode (aka the best we can do by following the OpenCL spec): | ||
// 1. Assertion condition is printed to *stdout* by the OpenCL printf(). | ||
// 2. Process (both host and device) is terminated by a SIGSEGV. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am lost here. The fallback in SYCL is just the normal C++ code on the host so I have the feeling it should be the same for a "normal" assert()
of the system.
What is the expectation of the end-user? That some functions are just changed by the Intel implementation when in kernels on the host? How a programmer can be sure that we can call the right function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback in SYCL is just the normal C++ code on the host so I have the feeling it should be the same for a "normal"
assert()
of the system.
The fallback in this context is a generic implementation of standard library functions provided in a SPIR-V form. This implementation is not aware about device-specific features, and can only use what is available in SPIR-V (i.e. no termination is possible for assert).
If a device compiler can do something better than a generic implementation, it can override some functions to achieve performance or functional improvements.
Anyway, the line 20 is misleading (2. Process is terminated). The SIGSEGV is not a side-effect from the fallback implementation: an unreachable
instruction follows an __assert_fail
call, so the process terminates after reaching it.
// | ||
// Native mode (same behavior as libc assert on CPU): | ||
// 1. Assertion condition is printed to *stderr*. | ||
// 2. Process (both host and device) is terminated by a SIGABRT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the fallback mode is not the native mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are you thinking to some OpenCL native kernels that could be used in SYCL interoperability mode or even by some OpenCL compiler for x86? https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clEnqueueNativeKernel.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the fallback mode is not the native mode?
Basically, we have not 2, but 3 modes:
- Host device, where everything works already (it is a standard C++ code after all)
- Fallback mode (on OpenCL/PI device), which is supposed to work with any device
- Native mode (on OpenCL/PI device) provides the best possible implementation for a particular device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this seems different from what OpenCL defines as "native" then.
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just one concern. sycl::program
and sycl::kernel
classes, generally speaking, cannot be used for kernels with assert() as of this patch. Are there plans to address that? This limitation should probably be mentioned somewhere.
This is because they don't use functions from program manager, right? |
Right, current program manager API does not cover all possible use-cases (e. g. |
For example, on Linux with GNU glibc: | ||
.. code: | ||
clang++ -fsycl -c main.cpp -o main.o | ||
clang++ -fsycl main.o $(SYCL_INSTALL)/lib/libsycl-glibc.o -o a.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think device library should be linked by default i.e. the same way OpenMP/CUDA compiler links device side library + driver option to disable linking device code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This requires changes in clang driver.
|
||
Implementation requires a special device library to be linked with a | ||
SYCL program. The library should match the C or C++ standard library | ||
used to compile the program: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library should match the C or C++ standard library used to compile the program:
Can we have a single device library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, see the line 77 in this document.
Basically, different libraries may use the same function name (e.g. __assert_fail), but with different arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the case for existing C/C++ standard library implementations.
Assuming that we want avoid potential issues with conflicting names, why do you propose "a special device library" instead of "a special header" as it's done for CUDA/OpenMP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the case for existing C/C++ standard library implementations.
These little inconsistencies definitely exist.
For example, in musl C library, __assert_fail
has an line argument of type int
, and Glibc has this argument as unsigned int
.
Even if we consider only one library (e.g. Glibc), I'm not sure that ABI between headers and the library will always remain stable, so future/past versions may bring similar problems.
Assuming that we want avoid potential issues with conflicting names, why do you propose "a special device library" instead of "a special header" as it's done for CUDA/OpenMP?
Can you describe this "special header" approach using the assert function as an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, take a look here: https://github.com/intel/llvm/blob/sycl/clang/lib/Headers/__clang_cuda_runtime_wrapper.h#L335.
I suggest reading comments in https://github.com/intel/llvm/blob/sycl/clang/lib/Headers/__clang_cuda_* and take a look at https://github.com/intel/llvm/tree/sycl/clang/lib/Headers/cuda_wrappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to tell how this approach is going to co-exist with the rest of the standard library, and scale beyond a handful of functions. The code you refer to doesn't seem to support anything other than glibc and libc++/libstdc++.
In general, I think the most difficult part here is (1) to establish an interface with a device compiler (DevicelibExtensions.rst), and (2) wire standard library functions through that interface, instead of using library specific routines.
(2) can probably be done in a header file, but I don't see a huge benefit in comparison with the library approach. If you (or anyone reading this) see a benefit and know to to implement this with portability in mind - feel free to send a patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this at least should be discussed with the clang community.
It should be safe to develop solution aligned with the existing approaches in the community code w/o constantly checking with the community, but in this case we went in slightly different direction.
|
||
.. code: | ||
clang++ -fsycl -c main.cpp -o main.o | ||
clang++ -fsycl main.o $(SYCL_INSTALL)/lib/libsycl-glibc.o -o a.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two points:
- Do we really need a separate wrapper for each standard library implementation? I don't think I understand the problem with a
boilerplate code in *every* device compiler
. I think linker is able to import only necessary functions, so scattering "boilerplate" code into separate wrapper libraries seems unnecessary. - If we use single device library, it can be linked by the driver implicitly, but even if we stick to existing proposal, driver is able to infer right library from the host triple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) should be explained on line 77. Let me know if I should elaborate.
(2) can be done as a separate change in driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) I understand that we should have definitions for __assert_fail
, _wassert
and __assert_func
to support Glibc, MSVC and newlib, but can we keep them all in the same library? What is the strategy for the symbols shared between different implementation (if any): e.g. Glibc and newlib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) I understand that we should have definitions for
__assert_fail
,_wassert
and__assert_func
to support Glibc, MSVC and newlib, but can we keep them all in the same library?
This doesn't work for functions that have the same name, but different arguments (e.g. __assert_fail in glibc vs musl).
What is the strategy for the symbols shared between different implementation (if any): e.g. Glibc and newlib?
There are no shared symbols. If there is a function with identical name and signature, it will be duplicated in every wrapper library.
In any case, I anticipate the wrapper libraries to go away soon. @s-kanaev is working on a header-based approach to provide the wrappers.
Device libraries provide a way to support functions from standard C
and C++ system headers in SYCL device code.
Runtime library requirements (functions that are normally defined in
libc, libm or libstdc++) are provided in a SPIR-V
library (libsycl-fallback.spv), which is linked at JIT (or AOT) time.
If a particular device supports a library OpenCL
extension (e.g. cl_intel_devicelib_assert), then SPIR-V implementation
is not linked and the device compiler has to provide definitions for
the corresponding functions.
NOTE:
The patch is a bit rough in program_manager.cpp after the recent refactoring changes there.
I'm going to refactor this bit soon.
Signed-off-by: Andrew Savonichev [email protected]