-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Make invoke_simd convert its arguments to appropriate type. #6544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -139,14 +139,14 @@ bool isESIMDFunction(const Function &F) { | |||
|
|||
// This function makes one or two groups depending on kernel types (SYCL, ESIMD) | |||
EntryPointGroupVec | |||
groupEntryPointsByKernelType(const ModuleDesc &MD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
is removed in a number of places because some entry points can be changed now - their linkage fixed.
Folks, I would appreciate if could you review the following parts: @rolandschulz: @v-klochkov: @asudarsa: |
9a8d833
to
345e247
Compare
- Introduce an intermediate lambda in invoke_simd and call SIMD target from the lambda with given (SIMD) arguments coming from the lambda's formal paremeters. This way compiler automatically performs necessary argument type conversion. - Introduce a new implicit constructor for simd to allow conversion of simd objects with _VecExt storage kind (used in invoke_simd extension). - Fix linkonce_odr linkage of entry points : change it to external linkage to avoid removal by the inliner and llvm-linker. This new scheme complicates the SIMD target function pointer flow, so LowerInvokeSimd.cpp update is needed to accomodate the new flow. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
345e247
to
222263a
Compare
The llvm/test/SYCLLowerIR/ESIMD/lower_invoke_simd.ll failure will be fixed by test update (hopefully). |
@@ -209,8 +211,15 @@ using SpmdRetType = | |||
|
|||
template <class SimdCallable, class... SpmdArgs> | |||
static constexpr int get_sg_size() { | |||
if constexpr (has_uniform_signature<SimdCallable, SpmdArgs...>()) { | |||
return 0; // subgroup size does not matter then | |||
if constexpr (all_uniform_types<SpmdArgs...>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a a good idea. But we should update the extension document to reflect this. It seems at the moment it is missing a description when the sg-size can be deduced. BTW now that this is implemented should the extension be moved to experimental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I can create a PR, you can do this as well if you wish
BTW now that this is implemented should the extension be moved to experimental?
This is in sycl::ext::oneapi::experimental already, am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I can create a PR, you can do this as well if you wish
happy for you to do it
I meant the location of https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_invoke_simd.asciidoc. I think this should be moved from proposed to experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy for you to do it
OK, I'll do this
I think this should be moved from proposed to experimental.
Oh, that's what you mean. Agree.
typename detail::spmd2simd<T, N>::type... simd_args) { | ||
SYCL_EXTERNAL __regcall detail::SimdRetType<N, Callable, T...> | ||
simd_obj_call_helper(const void *obj_ptr, | ||
typename detail::spmd2simd<T, N>::type... simd_args) { | ||
auto f = | ||
*reinterpret_cast<const std::remove_reference_t<Callable> *>(obj_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this reinterpret_cast needed? (I realize it's not changed in this PR but I would like to understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because const void*
can't be used as a function pointer. Another question is why obj_ptr
is const void*
and not const Callable*
. I tried changing (along with __builtin_invoke_simd
declaration, got lots of errors. Will need to figure out once functors/lambdas are enabled.
// definition. This function changes linkage of linkonce_odr entry points. | ||
// An example of a such entry point is an instantiation of a function template | ||
// even marked SYCL_EXTERNAL. | ||
// TODO: is it practical to fix FE instead to assign different linkage for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be useful to get @rolandschulz's comment here. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either inliner pass or LLVM linker will remove functions with such linkage
They will only remove them if unused. Why do they incorrectly assume they are unused? Is it because you spit into SPMD and SIMD and the entry point into SIMD is unused inside the SIMD part? And therefore it gets removed after splitting?
Co-authored-by: Vyacheslav Klochkov <[email protected]>
// definition. This function changes linkage of linkonce_odr entry points. | ||
// An example of a such entry point is an instantiation of a function template | ||
// even marked SYCL_EXTERNAL. | ||
// TODO: is it practical to fix FE instead to assign different linkage for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either inliner pass or LLVM linker will remove functions with such linkage
They will only remove them if unused. Why do they incorrectly assume they are unused? Is it because you spit into SPMD and SIMD and the entry point into SIMD is unused inside the SIMD part? And therefore it gets removed after splitting?
|
||
if ((L == GlobalValue::LinkageTypes::LinkOnceODRLinkage) || | ||
(L == GlobalValue::LinkageTypes::LinkOnceAnyLinkage)) { | ||
F->setLinkage(GlobalValue::LinkageTypes::ExternalLinkage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you change it to external and not weak_odr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entry point need to be available externally, yet an attempt to later link in another module which defines the same entry point would be an error. So ExternalLinkage
seems the best fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it be an error to have the same template function (which are entry point to invoke_simd) in two different TUs? This is valid C++. Why should this not work for invoke_simd?
No, it is not because of splitting. They (linker and inliner) correctly assume they are unused, because there can remain no calls to entry points within the module. In this case inliner turns this
into
which is correct from the LLVMIR optimization standpoint, but not correct conceptually because UserF is declared as entry point and someone might want to link to it / call it later. Ideally, SYCL_EXTERNAL must not lead to More practical example would be a template SYCL_EXTERNAL function. |
I'm pretty sure the intention of SYCL_EXTERNAL (as defined by SYCL spec and therefore not in the context of invoke_simd) was never to give a function external linkage if it didn't have it according to the C++ spec. Only to guarantee that a function which has external linkage also has external linkage for device linking. An implicit instantiated template function doesn't have external linkage in C++. You need to explicit instantiate it. And if you do that the LLVM-IR linkage is weak not linkonce. Which guarantees it works.
For the case of invoke_simd why is it possible that the inliner gets rid of the call. My understanding is that we don't support inlining through invoke_simd (yet). |
Hm. This excerpt from the spec + details in (5.10.1.) pretty much means SYCL_EXTERNAL = external linkage for me:
But WeakODR might be better in this case of implicit instantiation indeed. I changed the code. |
We now don't try to retain linkonce_odr entry points if they are removed by optimizations.
- "__invoke_simd_target" function attribute introduced to mark call targets. The attribute is used to protect call targets referenced from SYCL callgraph from DCE'ing on split ESIMD module. - Automatically mark invoke_simd call helper library function as ESIMD, so that no ESIMD specifics needs to be added to the invoke_simd.hpp. - Fix linkage of the helper functions so that they are not dropped by Linker/Global DCE. - Fix test.
We discussed the
In this patch I do the linkage replacement for both helper and target. For helper this is permanent change, for target - temporary. Just to protect the target from removal when ModuleDesc::cleanup() is called on split ESIMD module, because there can be other uses of the target in the SYCL part of the program. Once ESIMD and SYCL are linked back together (this is what invoke_simd requires), cleanup is run on the linked module. Original Pseudo-IR illustrating helper and target:
Optimized to:
but there can be still other uses of |
llvm/test/tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll
Outdated
Show resolved
Hide resolved
The only failure is InvokeSimd/invoke_simd_smoke.cpp, which is fixed by intel/llvm-test-suite#1146 |
/verify with intel/llvm-test-suite#1146 |
@asudarsa, @cperkinsintel - I resolved outstanding comments with @rolandschulz and @v-klochkov. Please take a look. |
@@ -613,6 +603,52 @@ void ModuleDesc::renameDuplicatesOf(const Module &MA, StringRef Suff) { | |||
} | |||
} | |||
|
|||
constexpr char SYCL_ORIG_LINKAGE_ATTR[] = "__sycl_orig_linkage"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will help to have a short comment about this constexpr and why we need this. Thanks
} | ||
} | ||
|
||
// TODO: try to move including all passes (cleanup, spec consts, compile time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? may be 'including' needs to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpicks. LGTM.
Thanks
@asudarsa, thanks - let me address your NIT comments in a separate PR to reduce the risk of conflicts for this big PR. |
b4b6749
Had to fix test failure (see 9dd341e), so I addressed @asudarsa's review comments as well (b4b6749). |
/verify with intel/llvm-test-suite#1146 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
|
@cperkinsintel, please review/approve for @intel/llvm-reviewers-runtime |
Actually, I have approval from @intel/llvm-reviewers-runtime - from @v-klochkov. Merging. |
Introduce an intermediate lambda in invoke_simd and call SIMD target
from the lambda with given (SIMD) arguments coming from the lambda's
formal parameters. This way compiler automatically performs necessary
argument type conversion.
Introduce a new implicit constructor for simd to allow conversion
of simd objects with _VecExt storage kind (used in invoke_simd
extension).
Fix linkonce_odr linkage of entry points : change it to external
linkage to avoid removal by the inliner and llvm-linker.
This new scheme complicates the SIMD target function pointer flow, so
LowerInvokeSimd.cpp update is needed to accommodate the new flow.
E2E tests: intel/llvm-test-suite#1146
Signed-off-by: Konstantin S Bobrovsky [email protected]