Skip to content

[SYCL] Implement device libraries: C assert #922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Dec 24, 2019

Conversation

asavonic
Copy link
Contributor

Device libraries provide a way to support functions from standard C
and C++ system headers in SYCL device code.

Runtime library requirements (functions that are normally defined in
libc, libm or libstdc++) are provided in a SPIR-V
library (libsycl-fallback.spv), which is linked at JIT (or AOT) time.

If a particular device supports a library OpenCL
extension (e.g. cl_intel_devicelib_assert), then SPIR-V implementation
is not linked and the device compiler has to provide definitions for
the corresponding functions.

NOTE:
The patch is a bit rough in program_manager.cpp after the recent refactoring changes there.
I'm going to refactor this bit soon.

Signed-off-by: Andrew Savonichev [email protected]

Device libraries provide a way to support functions from standard C
and C++ system headers in SYCL device code.

Runtime library requirements (functions that are normally defined in
libc, libm or libstdc++) are provided in a SPIR-V
library (libsycl-fallback.spv), which is linked at JIT (or AOT) time.

If a particular device supports a library OpenCL
extension (e.g. cl_intel_devicelib_assert), then SPIR-V implementation
is not linked and the device compiler has to provide definitions for
the corresponding functions.

Signed-off-by: Andrew Savonichev <[email protected]>
Andrew Savonichev added 12 commits December 11, 2019 16:52
Since build() may return a different program object, we need to
properly release the old one (or not, if we return the same object).

Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
This should fix:
fatal error: 'CL/__spirv/spirv_vars.hpp' file not found

Signed-off-by: Andrew Savonichev <[email protected]>
Andrew Savonichev added 3 commits December 13, 2019 17:14
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Andrew Savonichev added 2 commits December 13, 2019 20:34
Clang calls it under the hood, but it has no explicit dependency for
it in cmake.

Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PI_CALL(piContextGetInfo)(Context, PI_CONTEXT_INFO_NUM_DEVICES,
sizeof(NumDevices), &NumDevices, nullptr);
Devices.resize(NumDevices);
PI_CALL(piContextGetInfo)(Context, PI_CONTEXT_INFO_DEVICES,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. Please handle possible error from piContextGetInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PI_CALL is supposed to throw an exception if the call fails, right? Not sure what else is missing.

RT::PiResult Error = PI_CALL_NOCHECK(piProgramBuild)(
Program.get(), Devices.size(), Devices.data(), Opts, nullptr, nullptr);
if (Error != PI_SUCCESS)
compile_program_error(getProgramBuildLog(Program.get()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, we really need some static analysis tool here!

// an incomplete (but valid) LinkedProg, and throw.
throw compile_program_error(getProgramBuildLog(LinkedProg));
}
pi::checkPiResult(Error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here? Probably throw more user friendly exception than checkPiResult throws here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least leave a TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current check should throw an exception and mention the error code. What should I mention in addition to that?

}
pi::checkPiResult(Error);
}
return Program;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest checking that error == CL_SUCCESS and return so error handling is not in the if body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is going to look weird:

  if (Error != PI_SUCCESS) {
    return Program;
  }
  if (LinkedProg) {
    // A non-trivial error occurred during linkage: get a build log, release
    // an incomplete (but valid) LinkedProg, and throw.
    throw compile_program_error(getProgramBuildLog(LinkedProg));
  }
  pi::checkPiResult(Error);

A function that returns non-void ends with just pi::checkPiResult(Error);.
It is not obvious that the function is going to throw an exception.

Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nitpicks are not blocking, of course.

Andrew Savonichev added 2 commits December 17, 2019 18:01
Signed-off-by: Andrew Savonichev <[email protected]>
Andrew Savonichev added 2 commits December 18, 2019 18:04
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
AlexeySachkov
AlexeySachkov previously approved these changes Dec 18, 2019
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me, I don't have any major comments.

Anyway, I would like to @sergey-semenov to approve this patch too

Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global idea of allowing more normal code in kernels on accelerators is great.
But some points are unclear to me.
Probably I miss some internal context.

cgh.parallel_for<class SimpleVadd<T>>(numOfItems,
[=](cl::sycl::id<1> wiID) {
accessorC[wiID] = accessorA[wiID] + accessorB[wiID];
assert(accessorC[wiID] > 0 && "Invalid value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is assert a good example? It relies on some kernel features not available in OpenCL... Are there real use cases?
What about this implementation:

#ifdef __SYCL_DEVICE_ONLY__
#define NDEBUG 1
#endif 

? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is assert a good example? It relies on some kernel features not available in OpenCL...

Right, assert is an edge case and it is probably a bad example.
However, it is the only standard C and C++ function supported at this moment.

Are there real use cases?

For assert? It is widely used as a sanity check in host code, so I don't see a reason why it cannot be used for device code as well.

What about this implementation:

#ifdef __SYCL_DEVICE_ONLY__
#define NDEBUG 1
#endif 

That should work if you define the macro before #include <assert.h>.
Beware any significant changes under #if NDEBUG:

struct shared {
  int i;
#ifndef NDEBUG
  int debug;
#endif
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure that assert() is really "supported", in the sense that the behaviour of what is proposed here is undefined.

The macro __SYCL_DEVICE_ONLY__ is set by the SYCL device compiler.
Having the device compiler driver to also set NDEBUG would be abusive but would somehow solve a little bit of the "problem".
But you could always #define assert() in your sycl.hpp even after including "cassert".
Anyway all this looks messy because, well, this is outside of the scope of SYCL and kernels...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the device compiler driver to also set NDEBUG would be abusive but would somehow solve a little bit of the "problem".
But you could always #define assert() in your sycl.hpp even after including "cassert".

There are two problems:

  1. This solution depends on order of #include directives: if you include assert.h after sycl.hpp, the macro will be redefined again.
#ifdef  _ASSERT_H

# undef _ASSERT_H
# undef assert
[...]
#endif /* assert.h      */
  1. Even if you manage to define the macro properly, there is still a problem of not having a proper abort function.

So we avoid this dance around __assert_fail and _wassert, but we still have to manage the devicelib assert extension.

Anyway all this looks messy because, well, this is outside of the scope of SYCL and kernels...

Agree.

libraries, and allows to use them in SYCL device code.

Function declarations are taken from the standard headers (e.g. from
<assert.h> or <complex>), and the corresponding header has to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<assert.h> or <complex>

But note that normally you should use in C++ instead...

clang++ -fsycl main.obj %SYCL_INSTALL%/lib/libsycl-msvc.o -o a.exe

List of supported functions from C standard library:
- assert macro (from assert.h)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.h

and it indicates that a program does not execute as expected.
The function should print a message containing the information
provided in the arguments. In addition to that, the function is free
to terminate the current kernel invocation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if not, what happens next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns to the caller, and the kernel continues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it launch a nuclear missile that should have been avoided?

I would prefer the program not to compile at the first place!!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd advise against using SYCL for anything that involves radioactive materials or weapons of mass destruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it launch a nuclear missile that should have been avoided?

Wait, have you guarded this just by an assertion? Not the best idea, IMO

// assert() call in device code guarantees nothing: on some devices it behaves
// in the usual way and terminates the program. On other devices it can print an
// error message and *continue* execution. Less capable devices can even ignore
// an assert!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the conclusion is...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the conclusion is...

... the assert's behavior in SYCL device code is unspecified. However, you should get the expected behavior on devices that have capabilities to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. But how to do this in a consistent and reasonable way?

Copy link
Contributor Author

@asavonic asavonic Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to implement cl_intel_devicelib_cassert extension for all devices you care about to guarantee consistency between them.

//
// Fallback mode (aka the best we can do by following the OpenCL spec):
// 1. Assertion condition is printed to *stdout* by the OpenCL printf().
// 2. Process (both host and device) is terminated by a SIGSEGV.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am lost here. The fallback in SYCL is just the normal C++ code on the host so I have the feeling it should be the same for a "normal" assert() of the system.
What is the expectation of the end-user? That some functions are just changed by the Intel implementation when in kernels on the host? How a programmer can be sure that we can call the right function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback in SYCL is just the normal C++ code on the host so I have the feeling it should be the same for a "normal" assert() of the system.

The fallback in this context is a generic implementation of standard library functions provided in a SPIR-V form. This implementation is not aware about device-specific features, and can only use what is available in SPIR-V (i.e. no termination is possible for assert).

If a device compiler can do something better than a generic implementation, it can override some functions to achieve performance or functional improvements.

Anyway, the line 20 is misleading (2. Process is terminated). The SIGSEGV is not a side-effect from the fallback implementation: an unreachable instruction follows an __assert_fail call, so the process terminates after reaching it.

//
// Native mode (same behavior as libc assert on CPU):
// 1. Assertion condition is printed to *stderr*.
// 2. Process (both host and device) is terminated by a SIGABRT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the fallback mode is not the native mode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are you thinking to some OpenCL native kernels that could be used in SYCL interoperability mode or even by some OpenCL compiler for x86? https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clEnqueueNativeKernel.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the fallback mode is not the native mode?

Basically, we have not 2, but 3 modes:

  1. Host device, where everything works already (it is a standard C++ code after all)
  2. Fallback mode (on OpenCL/PI device), which is supposed to work with any device
  3. Native mode (on OpenCL/PI device) provides the best possible implementation for a particular device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this seems different from what OpenCL defines as "native" then.

Andrew Savonichev added 2 commits December 19, 2019 19:06
Signed-off-by: Andrew Savonichev <[email protected]>
Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just one concern. sycl::program and sycl::kernel classes, generally speaking, cannot be used for kernels with assert() as of this patch. Are there plans to address that? This limitation should probably be mentioned somewhere.

@asavonic
Copy link
Contributor Author

LGTM overall, just one concern. sycl::program and sycl::kernel classes, generally speaking, cannot be used for kernels with assert() as of this patch. Are there plans to address that? This limitation should probably be mentioned somewhere.

This is because they don't use functions from program manager, right?
Any ideas why?

@sergey-semenov
Copy link
Contributor

This is because they don't use functions from program manager, right?
Any ideas why?

Right, current program manager API does not cover all possible use-cases (e. g. program::compile_with_program_type). It will have to be extended and then used in sycl::program instead of direct calls to PI.

@romanovvlad romanovvlad merged commit 0039ee0 into intel:sycl Dec 24, 2019
For example, on Linux with GNU glibc:
.. code:
clang++ -fsycl -c main.cpp -o main.o
clang++ -fsycl main.o $(SYCL_INSTALL)/lib/libsycl-glibc.o -o a.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think device library should be linked by default i.e. the same way OpenMP/CUDA compiler links device side library + driver option to disable linking device code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This requires changes in clang driver.


Implementation requires a special device library to be linked with a
SYCL program. The library should match the C or C++ standard library
used to compile the program:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library should match the C or C++ standard library used to compile the program:

Can we have a single device library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, see the line 77 in this document.
Basically, different libraries may use the same function name (e.g. __assert_fail), but with different arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the case for existing C/C++ standard library implementations.
Assuming that we want avoid potential issues with conflicting names, why do you propose "a special device library" instead of "a special header" as it's done for CUDA/OpenMP?

Copy link
Contributor Author

@asavonic asavonic Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the case for existing C/C++ standard library implementations.

These little inconsistencies definitely exist.
For example, in musl C library, __assert_fail has an line argument of type int, and Glibc has this argument as unsigned int.

Even if we consider only one library (e.g. Glibc), I'm not sure that ABI between headers and the library will always remain stable, so future/past versions may bring similar problems.

Assuming that we want avoid potential issues with conflicting names, why do you propose "a special device library" instead of "a special header" as it's done for CUDA/OpenMP?

Can you describe this "special header" approach using the assert function as an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to tell how this approach is going to co-exist with the rest of the standard library, and scale beyond a handful of functions. The code you refer to doesn't seem to support anything other than glibc and libc++/libstdc++.

In general, I think the most difficult part here is (1) to establish an interface with a device compiler (DevicelibExtensions.rst), and (2) wire standard library functions through that interface, instead of using library specific routines.

(2) can probably be done in a header file, but I don't see a huge benefit in comparison with the library approach. If you (or anyone reading this) see a benefit and know to to implement this with portability in mind - feel free to send a patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this at least should be discussed with the clang community.
It should be safe to develop solution aligned with the existing approaches in the community code w/o constantly checking with the community, but in this case we went in slightly different direction.


.. code:
clang++ -fsycl -c main.cpp -o main.o
clang++ -fsycl main.o $(SYCL_INSTALL)/lib/libsycl-glibc.o -o a.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points:

  1. Do we really need a separate wrapper for each standard library implementation? I don't think I understand the problem with a boilerplate code in *every* device compiler. I think linker is able to import only necessary functions, so scattering "boilerplate" code into separate wrapper libraries seems unnecessary.
  2. If we use single device library, it can be linked by the driver implicitly, but even if we stick to existing proposal, driver is able to infer right library from the host triple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) should be explained on line 77. Let me know if I should elaborate.
(2) can be done as a separate change in driver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) I understand that we should have definitions for __assert_fail, _wassert and __assert_func to support Glibc, MSVC and newlib, but can we keep them all in the same library? What is the strategy for the symbols shared between different implementation (if any): e.g. Glibc and newlib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) I understand that we should have definitions for __assert_fail, _wassert and __assert_func to support Glibc, MSVC and newlib, but can we keep them all in the same library?

This doesn't work for functions that have the same name, but different arguments (e.g. __assert_fail in glibc vs musl).

What is the strategy for the symbols shared between different implementation (if any): e.g. Glibc and newlib?

There are no shared symbols. If there is a function with identical name and signature, it will be duplicated in every wrapper library.

In any case, I anticipate the wrapper libraries to go away soon. @s-kanaev is working on a header-based approach to provide the wrappers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants