Skip to content

[SYCL][ESIMD] Software-emulation preparation #2963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dongkyunahn-intel
Copy link
Contributor

  • This PR adds new header files to be used for ESIMD
    Software-emulation using new Plug-In - ESIMD_CPU
  • The new feature will use CM Emulation library for runtime support
    and ESIMD intrinsics such as Software-multithreaded kernel launching
    and buffer management and its access
  • Plug-In files and enabling logic will be added in order.

Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic comments.
Mostly OK with the patch, but it would be nice if @kbobrovs reviews as well.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition is not only NFC, but it also isn't being used at all. Please add a minimal use, so it is checked at compile-time, at least.

Also, please, add lots of comments in the header sources.

@dongkyunahn-intel
Copy link
Contributor Author

@smaslov-intel The reason why there is only single header file not being used at all in this PR is because enabling a single change for this multithreaded CPU emulation would require a big change set including a new plug-in module (about 1400 lines). Due to its size, I have abandoned several PRs already (e.g. : #2939) and staged change sets for incremental PR pushing. I'll forward an e-mail exchanging thread discussing this staged-push to you.

@smaslov-intel
Copy link
Contributor

The reason why there is only single header file not being used at all in this PR is because enabling a single change for this multithreaded CPU emulation would require a big change set including a new plug-in module (about 1400 lines). Due to its size, I have abandoned several PRs already (e.g. : #2939) and staged change sets for incremental PR pushing. I'll forward an e-mail exchanging thread discussing this staged-push to you.

Why can't you just start with a fake .cpp that's includes the header being introduced here? This way it will at least be checked that it is buildable.

@dongkyunahn-intel
Copy link
Contributor Author

Why can't you just start with a fake .cpp that's includes the header being introduced here? This way it will at least be checked that it is buildable.

While I don't want to add any intermediate dummy CPP file, adding 'pi_esimd_cpu.cpp' for newly introduced ESIMD_CPU plug-in will incur many changes in multiple files - including CMake files. This is because the header file in this PR imports header files from CM (imported from another github repo) and the new plug-in needs to be practically built for showing 'buildability' of all files newly added for the feature. This approach will break staging approach I planned for adding & enabling this ESIMD_CPU (CPU emulation of ESIMD) feature.

@smaslov-intel
Copy link
Contributor

This is because the header file in this PR imports header files from CM (imported from another github repo)

Then don't include that header yet, since it doesn't exists yet.

approach will break staging approach I planned

To me this means that the planned approach is not good enough. I don't see a reason in adding a header that's not even being used by anyone.

FWIW, if others support this approach (asking @romanovvlad , @kbobrovs specifically) then I can live with it.
Also, I am not the right code owner for sycl/plugins/esimd_cpu, so who would be?

@kbobrovs
Copy link
Contributor

kbobrovs commented Feb 4, 2021

I agree with @smaslov-intel that there must be some way to make sure the added code at least compiles. I suggest to add a test to either

  • sycl/test/esimd (or nearby) which would include the new header and compile OR to
  • sycl/unittests

@kbobrovs
Copy link
Contributor

kbobrovs commented Feb 4, 2021

I am not the right code owner for sycl/plugins/esimd_cpu, so who would be?

I will be the owner. Will update the owners file once we have this in.


inline void __cm_mt_barrier() { cm_support::mt_barrier(); }

inline void __cm_set_slm_size(size_t size) { cm_support::set_slm_size(size); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why __cm prefix is used here. What is the purpose of these functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like mentioned above, they are called from ESIMD intrinsic implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we noticed an elephant in the room :) I envision a couple of problems with this:

  • __cm_set_slm_size is a device-side API unlike kernel launching and they should not be intermingled within the same header
  • This will require linkage of user application to the libCM where cm_support::set_slm_size is implemented, even if emulation is not needed.
    @romanovvlad, @smaslov-intel, we need to decide what is acceptable here. We obviously don't want to link with libCM always. So I see 2 variants (maybe you can see more)
  • require the user to compile and link differently when EMU support is wanted:
    • add -DSUPPORT_CPU_EMU to compilation under which all __esimd* intrinsics will call libcm APIs when EMU is enabled
    • add -lcm to link command
  • introduce a CM device interface table dynamically initialized by the SYCL RT. The interface table will be obtained from the EMU plugin. We can discuss required PI extensions if needed.
    Red arrow below shows the second approach.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a kernel ends up having direct calls to __cm_set_slm_size or __esimd_intrinsicX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romanovvlad In my implementation prepared for later PR-posting, following functions are calling __cm_* functions through a couple of wrapping functions. The 'caller' functions are defined in '/sycl/include/CL/sycl/INTEL/esimd/esimd_memory.hpp'. __cm_* functions are mostly called in 'esimd_memory_intrin.hpp'.

slm_init(size)
 ->__cm_set_slm_size(size)
esimd_barrier()
 -> __cm_mt_barrier()
slm_load(offsets, predicates)
 -> __cm_get_slm()
slm_load4(offsets, predicates)
 -> __cm_get_slm()
slm_block_load(offset)
 -> __cm_get_slm()
slm_store(values, offsets, predicates)
 -> __cm_get_slm()
slm_store4(values, offsets, predicates)
 -> __cm_get_slm()
slm_block_store(offset, values)
 -> __cm_get_slm()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my suggestion in more details:

struct ESIMDDeviceInterface {
  virtual void barrier() = 0;
  virtual void set_slm_size(size_t size) = 0;
};

// PI interface (name TBD):
void* piextGetPluginSpecificData();

// Implementation in the plugin:
struct ESIMDDeviceInterfaceImpl : public ESIMDDeviceInterface {
  void barrier() override {
    cm_support::mt_barrier();
  }
  void set_slm_size(size_t size) override {
    cm_support::set_slm_size(size);
  }
};

void* piextGetPluginSpecificData() {
  static ESIMDDeviceInterfaceImpl* ESIMDIntf = nullptr;

  if (!ESIMDIntf)
    ESIMDIntf = new ESIMDDeviceInterfaceImpl();
  return ESIMDIntf;
}

// SYCL Rt changes:
std::map<backend, void*> PluginOpaqueDataRegistry;

void* getPluginSpecificData(backend be) { return PluginOpaqueDataRegistry[be]; }

// somewhere in SYCL RT:
{
  PluginOpaqueDataRegistry[esimd_emu] = esimd_plugin.piextGetPluginSpecificData();
}

// intrinsic implementaiton:
#ifndef __SYCL_DEVICE_ONLY__
void __esimd_barrier() {
  ESIMDDeviceInterface *I = reinterpret_cast<ESIMDDeviceInterface*>(getPluginSpecificData(esimd_emu));
  I->barrier();
}
#endif

Copy link
Contributor Author

@dongkyunahn-intel dongkyunahn-intel Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that we are going to follow what @romanovvlad suggested - delayed-library loading.

In Linux, this delayed-library loading is supported with 'dlopen([LIB_NAME],[LOADING_OPTION])' calls from c/c++ files along with '-ldl' command line option as POSIX standard.

In Windows, delayed loading is supported with compiler command line argument('[LIB_NAME] delayimp.lib /link /DELAYLOAD:[DLL_NAME]') - not a single argument can be omitted for delayed-loading to work. It seems that there is no c/c++ API equivalent to 'dlopen()'.

(There is an active open-source project enabling 'dl*' functionalities under Win32 environment, which could imply that Microsoft does not support the functionalities officially yet. (https://github.com/dlfcn-win32/dlfcn-win32))

Copy link
Contributor Author

@dongkyunahn-intel dongkyunahn-intel Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested delayed-library loading on Windows environment.

Without delayed-library loading, the command line for compilation looks like follows
%LLVM_ROOT%\bin\clang++ -fsycl -fsycl-explicit-simd -llibcm -I %LLVM_ROOT%\include\sycl -g -std=c++17 -o .\vadd_1d.exe .\vadd_1d.cpp

In order to enable delayed-library loading, following two lines have to be added in application file (such as vadd_1d.cpp). This is an alternative approach to specifying library(ies) to be delay-loaded instead of [LIB_NAME] delayimp.lib in command line.
#pragma comment(lib, "delayimp")
#pragma comment(lib, "libcm")

With these two lines in application source file, following command line works.
%LLVM_ROOT%\bin\clang++ -fsycl -fsycl-explicit-simd -I %LLVM_ROOT%\include\sycl -g -std=c++17 -o .\vadd_1d.exe .\vadd_1d.cpp

I don't know why it is working without link /delayload:libcm.dll command line argument under DPCPP/CM. When I tested with a small sample code from MS documentation, building for the sample failed without the argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There is an active open-source project enabling 'dl*' functionalities under Win32 environment, which could imply that Microsoft does not support the functionalities officially yet. (https://github.com/dlfcn-win32/dlfcn-win32))

I was wrong about this statement - there is C/C++ API equivalent to 'dlopen' in Windows - LoadLibrary. Its usage is almost same as 'dlopen' - (1) open library file / (2) search symbol from the file handle and get function pointer / (3) call function through function pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record - @dongkyunahn-intel tried the delayed loading approach on Windows - did not work w/o explicit GetProcaddress-based binding. We decided to go with the interface table.

@dongkyunahn-intel dongkyunahn-intel force-pushed the private/dongkyun/esimd_cpu_emulation_new_header_files branch from 35fddf7 to f36e2dc Compare February 23, 2021 05:57
@dongkyunahn-intel dongkyunahn-intel force-pushed the private/dongkyun/esimd_cpu_emulation_new_header_files branch from f36e2dc to 5ca2386 Compare April 2, 2021 05:56
@dongkyunahn-intel dongkyunahn-intel requested review from bader and a team as code owners April 2, 2021 05:56
@@ -913,6 +913,9 @@ piextDeviceGetNativeHandle(pi_device device, pi_native_handle *nativeHandle);
__SYCL_EXPORT pi_result piextDeviceCreateWithNativeHandle(
pi_native_handle nativeHandle, pi_platform platform, pi_device *device);

/// Returns plug-in specific data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe the argument, what it returns and how caller interprets that.
Why cannot this opaque pointer be added as a field of _pi_plugin

struct _pi_plugin {
and just be always filled at plugin's initialization piPluginInit API? Why do we need the new API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I'll remove the new API and include the instantiation of device-specific data in piPluginInit.

@@ -1309,6 +1312,12 @@ __SYCL_EXPORT pi_result piEnqueueKernelLaunch(
const size_t *local_work_size, pi_uint32 num_events_in_wait_list,
const pi_event *event_wait_list, pi_event *event);

__SYCL_EXPORT pi_result piEnqueueHostKernelLaunch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this API needed? Why not just implement piEnqueueKernelLaunch specially in the pi_esimd_cpu.cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API will be removed in next update of this PR.

@intel intel deleted a comment from dweeratu Apr 19, 2021
}

extern "C" inline void invokeLambda_ID_2DIM(void *Wrapper) {
auto *w = reinterpret_cast<LambdaWrapper_ID_2DIM *>(Wrapper);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use consistent style of naming variables and function. Wrapper and w are different styles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Comment on lines 19 to 23
#ifdef _MSC_VER
typedef unsigned int uint;
typedef unsigned short ushort;
typedef unsigned char uchar;
#endif // _MSC_VER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why those typedef's are needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

ESIMDDeviceInterfaceImpl() {
#ifdef __GNUC__
// Linux
void *handle = dlopen("libcm.so", RTLD_LAZY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify where the library will be dlclose'd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dlclose is not called. I'll add.

kbobrovs
kbobrovs previously approved these changes Jun 3, 2021
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to start testing. But I have a question - why these two are needed for EMU plugin?

template __SYCL_EXPORT const plugin &getPluginbackend::opencl();
template __SYCL_EXPORT const plugin &getPluginbackend::level_zero();

@dongkyunahn-intel
Copy link
Contributor Author

Approving to start testing. But I have a question - why these two are needed for EMU plugin?

template __SYCL_EXPORT const plugin &getPluginbackend::opencl();
template __SYCL_EXPORT const plugin &getPluginbackend::level_zero();

I only added __SYCL_EXPORT in those lines. I'll revert the changes if necessary.

kbobrovs
kbobrovs previously approved these changes Jun 3, 2021
@kbobrovs
Copy link
Contributor

kbobrovs commented Jun 3, 2021

Approving to start testing. But I have a question - why these two are needed for EMU plugin?
template __SYCL_EXPORT const plugin &getPluginbackend::opencl();
template __SYCL_EXPORT const plugin &getPluginbackend::level_zero();

I only added __SYCL_EXPORT in those lines. I'll revert the changes if necessary.

Yes, please. + @romanovvlad

@romanovvlad
Copy link
Contributor

Approving to start testing. But I have a question - why these two are needed for EMU plugin?
template __SYCL_EXPORT const plugin &getPluginbackend::opencl();
template __SYCL_EXPORT const plugin &getPluginbackend::level_zero();

I only added __SYCL_EXPORT in those lines. I'll revert the changes if necessary.

Yes, please. + @romanovvlad

Since now only one specializations is __SYCL_EXPORTed, 2 symbols out of 3 should be removed from the abi test:
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE1EEERKNS1_6pluginEv
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE2EEERKNS1_6pluginEv
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE4EEERKNS1_6pluginEv

@dongkyunahn-intel
Copy link
Contributor Author

Approving to start testing. But I have a question - why these two are needed for EMU plugin?
template __SYCL_EXPORT const plugin &getPluginbackend::opencl();
template __SYCL_EXPORT const plugin &getPluginbackend::level_zero();

I only added __SYCL_EXPORT in those lines. I'll revert the changes if necessary.

Yes, please. + @romanovvlad

Since now only one specializations is __SYCL_EXPORTed, 2 symbols out of 3 should be removed from the abi test:
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE1EEERKNS1_6pluginEv
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE2EEERKNS1_6pluginEv
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE4EEERKNS1_6pluginEv

They are generated with 'abi_test.py' after I reverted changes for those two lines. In other words, there is only one specialization for 'getPlugin' in pi.cpp, but three symbols are generated in libsycl.so. I checked this after doing clean build several times.

I will manually remove two symbols - leaving the last one with '4' which should correspond to backend::esimd_cpu. If abi_test.py from build jobs fails again, I'll put those two lines back.

romanovvlad
romanovvlad previously approved these changes Jun 4, 2021
@dongkyunahn-intel
Copy link
Contributor Author

buildbot/sycl-ubu-x64-pr job failed because of two missing symbols. I'll revert the latest change.

$ ":" "RUN: at line 6"
$ "env" "LLVM_BIN_PATH=/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/bin/" "python" "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/tools//abi_check.py" "--mode" "check_symbols" "--reference" "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.src/sycl/test/abi/sycl_symbols_linux.dump" "/localdisk2/sycl_ci/buildbot/worker/sycl-ubu-x64-pr/llvm.obj/./lib/libsycl.so"
# command output:
There are new symbols in the new library. It is a non-breaking change. Refer to sycl/doc/ABIPolicyGuide.md for further instructions.
The following symbols are new to the object file:
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE2EEERKNS1_6pluginEv
_ZN2cl4sycl6detail2pi9getPluginILNS0_7backendE1EEERKNS1_6pluginEv
error: command failed with exit status: 255

kbobrovs
kbobrovs previously approved these changes Jun 4, 2021
smaslov-intel
smaslov-intel previously approved these changes Jun 4, 2021
* This PR adds new Plug-In module for ESIMD Software-emulation, ESIMD_CPU

* The new feature will use CM emulation library for runtime support
and ESIMD intrinsics such as Software-multithreaded kernel launching
and buffer management/access

* pi* functions are filled with dummy codes as placeholder

* Steps for importing CM and enumerating/enabling pi_esimd_cpu will be
added later

* New PI_API is added - piextPluginGetOpaqueData

* New API is added - getPluginOpaqueData
@dongkyunahn-intel dongkyunahn-intel dismissed stale reviews from smaslov-intel and kbobrovs via b8b6566 June 8, 2021 19:18
@dongkyunahn-intel dongkyunahn-intel force-pushed the private/dongkyun/esimd_cpu_emulation_new_header_files branch from 29198d1 to b8b6566 Compare June 8, 2021 19:18
@kbobrovs
Copy link
Contributor

kbobrovs commented Jun 9, 2021

@smaslov-intel, please re-approve

@pvchupin pvchupin merged commit d8cb011 into intel:sycl Jun 9, 2021
AlexeySachkov added a commit to AlexeySachkov/llvm that referenced this pull request Jul 18, 2024
This function is not called anywhere from SYCL headers and therefore
there is no reason to have it exported.

It was made exported as part of ESIMD emulator plugin addition
(intel#2963), but that plugin was since
then deprecated and removed (intel#13295).
AlexeySachkov added a commit that referenced this pull request Jul 18, 2024
This function is not called anywhere from SYCL headers and therefore
there is no reason to have it exported.

It was made exported as part of ESIMD emulator plugin addition
(#2963), but that plugin was since
then deprecated and removed (#13295).

This is patch is essentially a by-product of #14145 and it is
done to simplify that change, i.e. PI plugins removal should not be
ABI-breaking by itself, we just need to cleanup some of our exported
symbols.
jsji pushed a commit that referenced this pull request Jan 26, 2025
…#2964)

`SPV_KHR_cooperative_matrix` extension defines that the only argument
accepted in this instruction is `Matrix Type <id>`, not the pointer to
an actual matrix.

This resolves #2963

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@197a800558787c9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants