Skip to content

[sycl-post-link] Do not drop SYCL_EXTERNAL functions #3793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 18, 2021

Conversation

DenisBakhvalov
Copy link
Contributor

@DenisBakhvalov DenisBakhvalov commented May 20, 2021

There are a few use cases where unreferenced device functions should go through sycl-post-link. Such functions may be referenced later, for example on a SPIRV level.
Dynamic linking of device binary images is one such case. I.e. when the kernel will have an import list with functionality implemented in a dynamic library. DPCPP RT will resolve imported functions, which could be implemented as SYCL_EXTERNAL functions. Thus we need to allow SYCL_EXTERNAL to go through sycl-post-link to make it work.
ESIMD/ISPC interoperability requested by @aneshlya is another motivation for this functionality.

@DenisBakhvalov
Copy link
Contributor Author

I'm not 100% sure it always safe to allow unreferenced functions to go through sycl-post-link. One case I have in mind is when we split per-kernel. In such a scenario, sycl-post-link will place unreferenced functions in a separate module. So far, I believe the DPCPP runtime expects to see kernels in each SPIRV module, but there might be none. I know that RT claims binary images based on symbols they contain, but still, I'm not 100% sure about it. @romanovvlad, do you have any comments here?

@DenisBakhvalov
Copy link
Contributor Author

I will address CI failures shortly.
Does anyone have a strong opinion on whether I should put this functionality under a dedicated option?

@bader bader requested a review from Fznamznon May 21, 2021 08:32
@romanovvlad
Copy link
Contributor

I'm not 100% sure it always safe to allow unreferenced functions to go through sycl-post-link. One case I have in mind is when we split per-kernel. In such a scenario, sycl-post-link will place unreferenced functions in a separate module. So far, I believe the DPCPP runtime expects to see kernels in each SPIRV module, but there might be none. I know that RT claims binary images based on symbols they contain, but still, I'm not 100% sure about it. @romanovvlad, do you have any comments here?

Currently SYCL RT will assume a device images has all kernels if the list of kernel in the device image is zero.
But it uses such a device image if only a kernel is not found in images where list of kernels is not zero.


std::vector<llvm::Function *> UnreferencedFuncs;
for (auto &F : M.functions()) {
if (!F.isDeclaration() && !ReferencedFuncs.count(&F))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have an internal (I mean linkage type) unreferenced function here? Could it cause any conflicts later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore unreferenced internal functions, just like we did before. AFAIK if two LLVM modules defined internal functions with same names, they won't cause conflicts when these two LLVM modules linked. I'm not sure about SPIR-V though, I can't find any analog of internal linkage type in SPIR-V spec, but the translator seems to support some Internal linkage type. (@AlexeySotkin , any idea why?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be cases when we want to preserve unreferenced functions in SPIR-V, e.g. those which are mentioned in @llvm.used variable.
Whether they will conflict or not depends on linkage type.
internal linkage type is the default in SPIR-V, i.e. when global variable or function doesn't have LinkageType decoration it has internal linkage.

@Fznamznon
Copy link
Contributor

The idea behind dynamic linking support was to consider SYCL_EXTERNAL functions as entry points. I.e. SYCL_EXTERNAL functions would be processed as kernels. Kernels don't have references as well, so the algorithm will require smaller change than you have now. The rest non-SYCL_EXTERNAL functions are still considered as internal functions in device code, so they can be removed from the module if not referenced.
Support for dynamic linking would require adding sycl-module-id attribute to SYCL_EXTERNAL functions. Applying of this idea can also help to fix some lit fails because the reason is missing sycl-module-id attribute.

@DenisBakhvalov
Copy link
Contributor Author

The idea behind dynamic linking support was to consider SYCL_EXTERNAL functions as entry points. I.e. SYCL_EXTERNAL functions would be processed as kernels.

That's exactly what I'm trying to do in this patch.

Kernels don't have references as well, so the algorithm will require smaller change than you have now. The rest non-SYCL_EXTERNAL functions are still considered as internal functions in device code, so they can be removed from the module if not referenced.

I thought that non-SYCL_EXTERNAL functions are filtered by FE, no?

Support for dynamic linking would require adding sycl-module-id attribute to SYCL_EXTERNAL functions. Applying of this idea can also help to fix some lit fails because the reason is missing sycl-module-id attribute.

Do I understand correctly that your idea is to distinguish roots (entry points) by the presence of sycl-module-id attributes, correct? So, we essentially can replace:
if (F.getCallingConv() == CallingConv::SPIR_KERNEL)
with
if (F->hasFnAttribute(ATTR_SYCL_MODULE_ID))
right?

Also, there is a comment in clang/lib/CodeGen/CodeGenFunction.cpp that says:

// TODO Module identifier is not reliable for this purpose since two modules
// can have the same ID, needs improvement
if (getLangOpts().SYCLIsDevice)
  Fn->addFnAttr("sycl-module-id", Fn->getParent()->getModuleIdentifier());

@Fznamznon
Copy link
Contributor

I thought that non-SYCL_EXTERNAL functions are filtered by FE, no?

Yes, FE ignores functions that is not referenced from kernels/SYCL_EXTERNALs (I'll call it device context), but with early optimizations there is the case when function was originally referenced from device context, but use of it was optimized away. In this case we get unreferenced non-SYCL_EXTERNAL function in the module on sycl-post-link stage. I guess this is happening in failed tests.

Do I understand correctly that your idea is to distinguish roots (entry points) by the presence of sycl-module-id attributes, correct?

Yes, that is right.

Also, there is a comment in clang/lib/CodeGen/CodeGenFunction.cpp that says:

Yes, there is still a room for improvement, I don't remember if anyone ever returned to it. But it shouldn't block us from re-using the attribute for SYCL_EXTERNAL functions. The only thing that might be affected by collision in sycl-module-id attribute values is per-source device code split distribution.

@DenisBakhvalov DenisBakhvalov force-pushed the external_funcs_post_link branch from 210d80b to d76610c Compare May 25, 2021 22:39
@DenisBakhvalov
Copy link
Contributor Author

Thanks, @Fznamznon,
I found that FE already generates the sycl-module-id attribute for SYCL_EXTERNAL functions. I updated the PR according to the previous comments.

@DenisBakhvalov DenisBakhvalov changed the title [sycl-post-link] Do not drop unreferenced functions [sycl-post-link] Do not drop SYCL_EXTERNAL functions May 25, 2021
Comment on lines 708 to 709
" any kernel are dropped from the resulting module(s), except from\n"
" SYCL_EXTERNAL functions.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Functions unreachable from any kernel or SYCL_EXTERNAL function are dropped from the resulting module(s)."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
target triple = "spir64-unknown-linux-sycldevice"

define dso_local spir_func void @externalDeviceFunc() #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test where some SYCL_EXTERNAL function foo is defined in source "a.cpp" but it is also referenced by a kernel defined in source "b.cpp"? In this case design of dynamic code linking says that definition of function foo should be present in both resulting modules when per-source split is requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@DenisBakhvalov
Copy link
Contributor Author

DenisBakhvalov commented May 27, 2021

This PR is supposed to fix #3239.

Fznamznon
Fznamznon previously approved these changes Jun 2, 2021
@DenisBakhvalov
Copy link
Contributor Author

Looks like CUDA AOT mode generates call without SPIR_KERNEL or SPIR_FUNC and still uses sycl-post-link. So, I removed the check for unsupported calling conventions.

@DenisBakhvalov
Copy link
Contributor Author

I have a new issue that popped up in the CI testing (AOT/multiple-devices.cpp). The problem there is that DPCPP RT does not allow device binary images to have a disjoint set of kernels&functions in them. With the changes in the current patch, it is the case: we compile the same code for 3 different targets in AOT mode: cpu, gpu, and fpga. For cpu and gpu we compile with -mllvm -sycl-opt, while for fpga we compile with -fno-sycl-early-optimizations. This causes SYCL_EXTERNAL size_t __spirv_GlobalInvocationId_x() to be not inlined for FPGA and later recognized as external symbol by sycl-post-link. So, in the end we have something like:

# cpu symbols:
_ZTS10SimpleVaddIiE

# gpu symbols:
_ZTS10SimpleVaddIiE

# fpga symbols:
_ZTS10SimpleVaddIiE
_Z28__spirv_GlobalInvocationId_xv

And this is why DPCPP RT asserts that the set of kernel symbols are disjoint across all the images in the device binary.

I tend to think we should filter SPIRV ("_spirv") and SYCL ("_sycl") built-in functions in sycl-post-link. Such functions should not be exposed to the outside even though they are declared as SYCL_EXTERNAL. So, I think we don't need to add them to the ResKernelModuleMap in sycl-post-link. If we do that, _Z28__spirv_GlobalInvocationId_xv will be gone from the fpga symbols and the problem will be solved.

@Fznamznon, @AlexeySachkov, @kbobrovs any thoughts?

@Fznamznon
Copy link
Contributor

The problem there is that DPCPP RT does not allow device binary images to have a disjoint set of kernels&functions in them.

Is there any check for functions? I remember RT has a thing called kernel set id, it is based on kernels presented in the device image. Are you saying with your changes, we add names of SYCL_EXTERNAL functions to entries table (

_pi_offload_entry EntriesBegin;
) as well?
The idea behind dynamic linking was to have a separate property set for SYCL_EXTERNAL function names and not add them to entries table, so this thing wouldn't break. However I think we actually could use either only entries table or only property table to save kernel names and SYCL_EXTERNAL names both, but we will need to re-do RT part with kernel set id. If I understand it correct, this kernel set id is actually not a necessity, but some choice made by RT developers some time ago. @s-kanaev , WDYT?

I tend to think we should filter SPIRV ("_spirv") and SYCL ("_sycl") built-in functions in sycl-post-link. Such functions should not be exposed to the outside even though they are declared as SYCL_EXTERNAL.

Not sure about __sycl built-ins, but we should definitely not treat __spirv functions as external symbols, since they disappear after SPIR-V translation. AFAIK some of __sycl built-ins are used for spec constants, @AlexeySachkov , could you please comment?

@AlexeySachkov
Copy link
Contributor

Not sure about __sycl built-ins, but we should definitely not treat __spirv functions as external symbols, since they disappear after SPIR-V translation. AFAIK some of __sycl built-ins are used for spec constants, @AlexeySachkov , could you please comment?

We do have a few __sycl_ pseudo-intrinsics, but all of them are replaced with something else by sycl-post-link.

kbobrovs
kbobrovs previously approved these changes Jun 11, 2021
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DenisBakhvalov, looks like there is a real regression: [SYCL :: AOT/multiple-devices.cpp]

@DenisBakhvalov
Copy link
Contributor Author

@DenisBakhvalov, looks like there is a real regression: [SYCL :: AOT/multiple-devices.cpp]

Right. I didn't have the time to fix it yet. But it's still under my radar. :)

@DenisBakhvalov
Copy link
Contributor Author

Sorry for the late response.

@Fznamznon ,

Are you saying with your changes, we add names of SYCL_EXTERNAL functions to entries table as well?

Yes, here is the assert that gets triggered:

sycl/source/detail/program_manager/program_manager.cpp:965: void cl::sycl::detail::ProgramManager::addImages(pi_device_binaries): Assertion `KSIdMap[EntriesIt->name] == KSIdIt->second && "Kernel sets are not disjoint"' failed.

we should definitely not treat __spirv functions as external symbols, since they disappear after SPIR-V translation.

So, I went ahead and added this exception.

@DenisBakhvalov
Copy link
Contributor Author

@Fznamznon, @AlexeySachkov, please review.

@@ -21,25 +21,27 @@

declare dso_local spir_func zeroext i1 @_Z33__sycl_getScalarSpecConstantValueIbET_PKc(i8 addrspace(4)*)

define dso_local spir_kernel void @KERNEL_AAA() {
define dso_local spir_kernel void @KERNEL_AAA() #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why do we need to add the attribute to this test. Why It wasn't here before? Is it not possible to process the code without the attribute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is how we detect whether the function should be considered as an entry point or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

@@ -6,20 +6,22 @@
; RUN: FileCheck %s -input-file=%t.files.table --check-prefixes CHECK-TABLE
; RUN: FileCheck %s -input-file=%t.files_0.sym --match-full-lines --check-prefixes CHECK-SYM

define dso_local spir_kernel void @KERNEL_AAA() {
define dso_local spir_kernel void @KERNEL_AAA() #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -21,25 +21,27 @@

declare dso_local spir_func zeroext i1 @_Z33__sycl_getScalarSpecConstantValueIbET_PKc(i8 addrspace(4)*)

define dso_local spir_kernel void @KERNEL_AAA() {
define dso_local spir_kernel void @KERNEL_AAA() #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is how we detect whether the function should be considered as an entry point or not

@DenisBakhvalov DenisBakhvalov requested a review from Fznamznon June 18, 2021 15:32
@DenisBakhvalov
Copy link
Contributor Author

@kbobrovs , @mlychkov please review/approve.

@DenisBakhvalov
Copy link
Contributor Author

@pvchupin , @kbobrovs , I have all the approvals, please merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants