-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][ESIMD] Software-emulation preparation #2963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][ESIMD] Software-emulation preparation #2963
Conversation
sycl/include/CL/sycl/INTEL/esimd/esimd_libcm_lambda_wrapper.hpp
Outdated
Show resolved
Hide resolved
sycl/include/CL/sycl/INTEL/esimd/esimd_libcm_lambda_wrapper.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic comments.
Mostly OK with the patch, but it would be nice if @kbobrovs reviews as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition is not only NFC, but it also isn't being used at all. Please add a minimal use, so it is checked at compile-time, at least.
Also, please, add lots of comments in the header sources.
@smaslov-intel The reason why there is only single header file not being used at all in this PR is because enabling a single change for this multithreaded CPU emulation would require a big change set including a new plug-in module (about 1400 lines). Due to its size, I have abandoned several PRs already (e.g. : #2939) and staged change sets for incremental PR pushing. I'll forward an e-mail exchanging thread discussing this staged-push to you. |
Why can't you just start with a fake .cpp that's includes the header being introduced here? This way it will at least be checked that it is buildable. |
While I don't want to add any intermediate dummy CPP file, adding 'pi_esimd_cpu.cpp' for newly introduced ESIMD_CPU plug-in will incur many changes in multiple files - including CMake files. This is because the header file in this PR imports header files from CM (imported from another github repo) and the new plug-in needs to be practically built for showing 'buildability' of all files newly added for the feature. This approach will break staging approach I planned for adding & enabling this ESIMD_CPU (CPU emulation of ESIMD) feature. |
Then don't include that header yet, since it doesn't exists yet.
To me this means that the planned approach is not good enough. I don't see a reason in adding a header that's not even being used by anyone. FWIW, if others support this approach (asking @romanovvlad , @kbobrovs specifically) then I can live with it. |
I agree with @smaslov-intel that there must be some way to make sure the added code at least compiles. I suggest to add a test to either
|
I will be the owner. Will update the owners file once we have this in. |
|
||
inline void __cm_mt_barrier() { cm_support::mt_barrier(); } | ||
|
||
inline void __cm_set_slm_size(size_t size) { cm_support::set_slm_size(size); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear why __cm
prefix is used here. What is the purpose of these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like mentioned above, they are called from ESIMD intrinsic implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we noticed an elephant in the room :) I envision a couple of problems with this:
__cm_set_slm_size
is a device-side API unlike kernel launching and they should not be intermingled within the same header- This will require linkage of user application to the libCM where
cm_support::set_slm_size
is implemented, even if emulation is not needed.
@romanovvlad, @smaslov-intel, we need to decide what is acceptable here. We obviously don't want to link with libCM always. So I see 2 variants (maybe you can see more) - require the user to compile and link differently when EMU support is wanted:
- add -DSUPPORT_CPU_EMU to compilation under which all
__esimd*
intrinsics will call libcm APIs when EMU is enabled - add -lcm to link command
- add -DSUPPORT_CPU_EMU to compilation under which all
- introduce a CM device interface table dynamically initialized by the SYCL RT. The interface table will be obtained from the EMU plugin. We can discuss required PI extensions if needed.
Red arrow below shows the second approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does a kernel ends up having direct calls to __cm_set_slm_size
or __esimd_intrinsicX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanovvlad In my implementation prepared for later PR-posting, following functions are calling __cm_*
functions through a couple of wrapping functions. The 'caller' functions are defined in '/sycl/include/CL/sycl/INTEL/esimd/esimd_memory.hpp'. __cm_*
functions are mostly called in 'esimd_memory_intrin.hpp'.
slm_init(size)
->__cm_set_slm_size(size)
esimd_barrier()
-> __cm_mt_barrier()
slm_load(offsets, predicates)
-> __cm_get_slm()
slm_load4(offsets, predicates)
-> __cm_get_slm()
slm_block_load(offset)
-> __cm_get_slm()
slm_store(values, offsets, predicates)
-> __cm_get_slm()
slm_store4(values, offsets, predicates)
-> __cm_get_slm()
slm_block_store(offset, values)
-> __cm_get_slm()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my suggestion in more details:
struct ESIMDDeviceInterface {
virtual void barrier() = 0;
virtual void set_slm_size(size_t size) = 0;
};
// PI interface (name TBD):
void* piextGetPluginSpecificData();
// Implementation in the plugin:
struct ESIMDDeviceInterfaceImpl : public ESIMDDeviceInterface {
void barrier() override {
cm_support::mt_barrier();
}
void set_slm_size(size_t size) override {
cm_support::set_slm_size(size);
}
};
void* piextGetPluginSpecificData() {
static ESIMDDeviceInterfaceImpl* ESIMDIntf = nullptr;
if (!ESIMDIntf)
ESIMDIntf = new ESIMDDeviceInterfaceImpl();
return ESIMDIntf;
}
// SYCL Rt changes:
std::map<backend, void*> PluginOpaqueDataRegistry;
void* getPluginSpecificData(backend be) { return PluginOpaqueDataRegistry[be]; }
// somewhere in SYCL RT:
{
PluginOpaqueDataRegistry[esimd_emu] = esimd_plugin.piextGetPluginSpecificData();
}
// intrinsic implementaiton:
#ifndef __SYCL_DEVICE_ONLY__
void __esimd_barrier() {
ESIMDDeviceInterface *I = reinterpret_cast<ESIMDDeviceInterface*>(getPluginSpecificData(esimd_emu));
I->barrier();
}
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided that we are going to follow what @romanovvlad suggested - delayed-library loading.
In Linux, this delayed-library loading is supported with 'dlopen([LIB_NAME],[LOADING_OPTION])
' calls from c/c++ files along with '-ldl
' command line option as POSIX standard.
In Windows, delayed loading is supported with compiler command line argument('[LIB_NAME] delayimp.lib /link /DELAYLOAD:[DLL_NAME]
') - not a single argument can be omitted for delayed-loading to work. It seems that there is no c/c++ API equivalent to 'dlopen()
'.
(There is an active open-source project enabling 'dl*' functionalities under Win32 environment, which could imply that Microsoft does not support the functionalities officially yet. (https://github.com/dlfcn-win32/dlfcn-win32))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested delayed-library loading on Windows environment.
Without delayed-library loading, the command line for compilation looks like follows
%LLVM_ROOT%\bin\clang++ -fsycl -fsycl-explicit-simd -llibcm -I %LLVM_ROOT%\include\sycl -g -std=c++17 -o .\vadd_1d.exe .\vadd_1d.cpp
In order to enable delayed-library loading, following two lines have to be added in application file (such as vadd_1d.cpp). This is an alternative approach to specifying library(ies) to be delay-loaded instead of [LIB_NAME] delayimp.lib
in command line.
#pragma comment(lib, "delayimp")
#pragma comment(lib, "libcm")
With these two lines in application source file, following command line works.
%LLVM_ROOT%\bin\clang++ -fsycl -fsycl-explicit-simd -I %LLVM_ROOT%\include\sycl -g -std=c++17 -o .\vadd_1d.exe .\vadd_1d.cpp
I don't know why it is working without link /delayload:libcm.dll
command line argument under DPCPP/CM. When I tested with a small sample code from MS documentation, building for the sample failed without the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There is an active open-source project enabling 'dl*' functionalities under Win32 environment, which could imply that Microsoft does not support the functionalities officially yet. (https://github.com/dlfcn-win32/dlfcn-win32))
I was wrong about this statement - there is C/C++ API equivalent to 'dlopen' in Windows - LoadLibrary. Its usage is almost same as 'dlopen' - (1) open library file / (2) search symbol from the file handle and get function pointer / (3) call function through function pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record - @dongkyunahn-intel tried the delayed loading approach on Windows - did not work w/o explicit GetProcaddress-based binding. We decided to go with the interface table.
35fddf7
to
f36e2dc
Compare
f36e2dc
to
5ca2386
Compare
sycl/include/CL/sycl/detail/pi.h
Outdated
@@ -913,6 +913,9 @@ piextDeviceGetNativeHandle(pi_device device, pi_native_handle *nativeHandle); | |||
__SYCL_EXPORT pi_result piextDeviceCreateWithNativeHandle( | |||
pi_native_handle nativeHandle, pi_platform platform, pi_device *device); | |||
|
|||
/// Returns plug-in specific data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe the argument, what it returns and how caller interprets that.
Why cannot this opaque pointer be added as a field of _pi_plugin
llvm/sycl/include/CL/sycl/detail/pi.h
Line 1608 in 6a49170
struct _pi_plugin { |
piPluginInit
API? Why do we need the new API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I'll remove the new API and include the instantiation of device-specific data in piPluginInit
.
sycl/include/CL/sycl/detail/pi.h
Outdated
@@ -1309,6 +1312,12 @@ __SYCL_EXPORT pi_result piEnqueueKernelLaunch( | |||
const size_t *local_work_size, pi_uint32 num_events_in_wait_list, | |||
const pi_event *event_wait_list, pi_event *event); | |||
|
|||
__SYCL_EXPORT pi_result piEnqueueHostKernelLaunch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this API needed? Why not just implement piEnqueueKernelLaunch
specially in the pi_esimd_cpu.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API will be removed in next update of this PR.
} | ||
|
||
extern "C" inline void invokeLambda_ID_2DIM(void *Wrapper) { | ||
auto *w = reinterpret_cast<LambdaWrapper_ID_2DIM *>(Wrapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, use consistent style of naming variables and function. Wrapper
and w
are different styles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
#ifdef _MSC_VER | ||
typedef unsigned int uint; | ||
typedef unsigned short ushort; | ||
typedef unsigned char uchar; | ||
#endif // _MSC_VER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify why those typedef's are needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
sycl/include/CL/sycl/INTEL/esimd/detail/emu/esimdcpu_device_interface.hpp
Show resolved
Hide resolved
ESIMDDeviceInterfaceImpl() { | ||
#ifdef __GNUC__ | ||
// Linux | ||
void *handle = dlopen("libcm.so", RTLD_LAZY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify where the library will be dlclose
'd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dlclose
is not called. I'll add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to start testing. But I have a question - why these two are needed for EMU plugin?
template __SYCL_EXPORT const plugin &getPluginbackend::opencl();
template __SYCL_EXPORT const plugin &getPluginbackend::level_zero();
I only added __SYCL_EXPORT in those lines. I'll revert the changes if necessary. |
Yes, please. + @romanovvlad |
Since now only one specializations is |
They are generated with 'abi_test.py' after I reverted changes for those two lines. In other words, there is only one specialization for 'getPlugin' in pi.cpp, but three symbols are generated in libsycl.so. I checked this after doing clean build several times. I will manually remove two symbols - leaving the last one with '4' which should correspond to |
|
* This PR adds new Plug-In module for ESIMD Software-emulation, ESIMD_CPU * The new feature will use CM emulation library for runtime support and ESIMD intrinsics such as Software-multithreaded kernel launching and buffer management/access * pi* functions are filled with dummy codes as placeholder * Steps for importing CM and enumerating/enabling pi_esimd_cpu will be added later * New PI_API is added - piextPluginGetOpaqueData * New API is added - getPluginOpaqueData
b8b6566
29198d1
to
b8b6566
Compare
@smaslov-intel, please re-approve |
This function is not called anywhere from SYCL headers and therefore there is no reason to have it exported. It was made exported as part of ESIMD emulator plugin addition (intel#2963), but that plugin was since then deprecated and removed (intel#13295).
This function is not called anywhere from SYCL headers and therefore there is no reason to have it exported. It was made exported as part of ESIMD emulator plugin addition (#2963), but that plugin was since then deprecated and removed (#13295). This is patch is essentially a by-product of #14145 and it is done to simplify that change, i.e. PI plugins removal should not be ABI-breaking by itself, we just need to cleanup some of our exported symbols.
…#2964) `SPV_KHR_cooperative_matrix` extension defines that the only argument accepted in this instruction is `Matrix Type <id>`, not the pointer to an actual matrix. This resolves #2963 Original commit: KhronosGroup/SPIRV-LLVM-Translator@197a800558787c9
Software-emulation using new Plug-In - ESIMD_CPU
and ESIMD intrinsics such as Software-multithreaded kernel launching
and buffer management and its access