-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][NATIVECPU] Fix assert in MS name mangler #11103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a front end change, can you add a frontend test instead?
@@ -2902,6 +2902,17 @@ void MicrosoftCXXNameMangler::mangleCallingConvention(CallingConv CC) { | |||
else | |||
Out << "w"; | |||
break; | |||
case CC_OpenCLKernel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me but @premanandrao can you also please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a lot to learn in this area. I don't understand why OpenCL is relevant for the added test case. Is OpenCL always used for -fsycl-is-native-cpu
?
Mangling also seems to be missing for IA64 ABIs. See CXXNameMangler::getCallingConvQualifierName()
in clang/lib/AST/ItaniumMangle.cpp
. Why isn't a similar change needed there? The test exercises Linux as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has anyone tried to determine what, if any, mangling is used by MSVC for OpenCL kernels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a lot to learn in this area. I don't understand why OpenCL is relevant for the added test case. Is OpenCL always used for -fsycl-is-native-cpu?
My understanding was that this is the calling convention for kernel. I think that when using -fsycl-is-native-cpu
device compilation is done with the host triple and this results in us entering MS mangler for the kernel. @uwedolinsky did I get that correct?
Mangling also seems to be missing for IA64 ABIs. See CXXNameMangler::getCallingConvQualifierName() in clang/lib/AST/ItaniumMangle.cpp. Why isn't a similar change needed there? The test exercises Linux as well.
This is a good point. Why is this not an issue for Linux?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why OpenCL is relevant for the added test case.
OpenCL is not relevant for this test, but since the frontend attaches the clang CC_OpenCLKernel
convention (independent from C++ ABI and target) to the kernel declaration here, we need to handle that convention in the currently used name mangler (which is the MS mangler when targeting Windows).
Is OpenCL always used for
-fsycl-is-native-cpu
?
-fsycl-is-native-cpu
does not use or enable OpenCL in compiler or runtime. NativeCPU
has no dependency on OpenCL.
Mangling also seems to be missing for IA64 ABIs. See
CXXNameMangler::getCallingConvQualifierName()
inclang/lib/AST/ItaniumMangle.cpp
. Why isn't a similar change needed there?
CXXNameMangler::getCallingConvQualifierName()
already handles the CC_OpenCLKernel
and other conventions here (without providing mangling), but this doesn't even seem to be called on SYCL kernels when targeting the IA64 ABI, so that problem will not occur on that ABI.
The test exercises Linux as well.
We wanted the test to also cover other ABIs and different combinations of cpu triples (for possible cross compilation) to ensure they don't have similar problems. Some of these invocations may not exercise the fix but are still worth testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explaining. I think I'm convinced this change is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still isn't clear to me that this is the right change. Why is the CC_OpenCLKernel
attached to SYCL kernels in the first place? Can we just stop attaching them to SYCL kernels? I think we should understand why they are needed before choosing to ignore them.
MSVC doesn't directly support the OpenCL language. From what I can tell, OpenCL kernels are always resolved by a call to clCreateKernel()
that passes the name of the kernel as a string. Overloaded kernels are supported (at least by some implementations) by allowing a custom mangled name to be specified via an attribute. It seems any kind of mangling is therefore implementation dependent and thus, the Microsoft ABI doesn't specify one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still isn't clear to me that this is the right change. Why is the
CC_OpenCLKernel
attached to SYCL kernels in the first place?
Probably because of historical reasons since SYCL initially targeted OpenCL, many data structures and identifiers may have been inspired by it and SYCL kernels were implemented as OpenCL kernels. Also, querying the calling convention is one way to identify kernels.
Can we just stop attaching them to SYCL kernels?
Probably, but that would likely lead to a much larger, potentially intrusive SYCL frontend change I mentioned previously and I was trying to avoid that. If NativeCPU is the only SYCL target that has this issue, and there are no other issues elsewhere with using this convention "under the hood", it may not be worth making that large a change and instead it may be easier to just handle the assert.
MSVC doesn't directly support the OpenCL language. From what I can tell, OpenCL kernels are always resolved by a call to
clCreateKernel()
that passes the name of the kernel as a string. Overloaded kernels are supported (at least by some implementations) by allowing a custom mangled name to be specified via an attribute. It seems any kind of mangling is therefore implementation dependent and thus, the Microsoft ABI doesn't specify one.
Please note that this PR does not specify any MS mangling, it merely prevents the assert in the MS mangler triggered for the CC_OpenCLKernel
convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for being responsive to my comments. I won't hold this review up further; the change seems unlikely to cause any problems in the short term. However, we will have to revisit all of this as part of the SYCL upstreaming effort as the reuse of CC_OpenCLKernel
for SYCL here is quite confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if the clang infrastructure for OpenCL isn't used to implement SYCL kernels (not even when targeting OpenCL), then this "reuse" of CC_OpenCLKernel
could probably be avoided. It may be possible to instead use the information from SYCLKernelAttr
or other SYCL-related clang attributes that are currently attached to SYCL kernels in the AST. But such potentially larger refactoring is probably better done as a separate PR.
Thank you for your comments. I've converted this into a frontend test. |
Sorry I didn't specify the folder earlier. Since this test is checking IR, I would except it to be in clang/test/CodeGenSYCL
I think this makes sense. I see there are several other which have similar ownership - #10096 |
That's the test moved to |
template <typename KernelName = name1, typename KernelType> | ||
void parallel_for_work_group1(const KernelType &KernelFunc) { | ||
kernel_parallel_for_work_group<KernelName, KernelType>(KernelFunc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the parallel_for_work_group1 function and the kernel_parall_for_work_group function to the mock header in the Inputs directory and use it from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this matches real header implementation, I don't think it should be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, if a barebones API to test nativecpu can be added to mock headers, that would be preferred. This is how we handle most new features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the test to reproduce the assert in the MS mangler using the existing mock API instead, so there is no need to add the reduced API from the test to the mock headers. Hope this is acceptable.
Hello @intel/llvm-gatekeepers , this PR looks ready for merge. Thank you! |
What does make you think so? |
Mainly because this issue seems to affect only the NativeCPU target. No other sycl or (upstream) clang/llvm component appears to use the |
I'll leave it for @elizabethandrews to decide, but eventually this code is supposed to be upstreamed to clang. In my opinion, the earlier we get the feedback from the community, the better (especially if larger changes are required). The general rule is to avoid committing patches, which we think will be rejected in upstream. |
Hmm....Just to make sure I am understanding this PR correctly - For native CPU we compile device code with target triple. In this case the target is Windows, and this is causing an issue because Microsoft mangler is being applied to openCL kernels and mangler cannot handle CC_OpenCLKernel calling convention. The solution in this PR is to explicitly handle CC_OpenCLKernel calling convention. I guess the question is whether it correct that the kernels have this calling convention (I do not know the answer to this. @premanandrao do you?). If it is correct, I don't see a problem with explicitly handling it at the mangler, but if it is incorrect we should probably fix it before reaching the mangler. @AaronBallman can you also please take a look here? |
@AaronBallman is on vacation this week. @tahonermann @premanandrao could you please weigh in here? |
h.parallel_for_work_group<name1>(sycl::range<1>(2),sycl::range<1>(1), [=](sycl::group<1> G) {}); | ||
} | ||
|
||
// CHECK: void @_ZTS5name1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ZTS5name1
is an Itanium mangled name. Shouldn't there be a check for a Microsoft mangled name? How does this check pass on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All SYCL targets need to generate the same name for a kernel to enable lookup in different device binaries. All SYCL targets in DPC++ use the Itanium mangling for kernel names.
We still need the MS mangler fix though because when targeting NativeCPU
to Windows the MS C++ ABI is used (causing the assert), but subsequently SemaSYCL
generates the unique Itanium-mangled name for each kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following. We generate a mangled name for the MS C++ ABI that is never used? If so, wouldn't a better fix be to skip generating such a mangled name in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following. We generate a mangled name for the MS C++ ABI that is never used? If so, wouldn't a better fix be to skip generating such a mangled name in the first place?
We were considering this but it would probably be more intrusive to query/skip/change the kernel name mangling only for NativeCPU kernels targeting Windows than just handling the CC_OpenCLKernel
convention which is already in core clang. We wanted to avoid changing mangling in the core clang Windows pipeline just for NativeCPU.
Also, SemaSYCL
currently always generates a "stable name" (using the Itanium mangling) irrespective of what C++ ABI is used. NativeCPU just selects that existing name - it doesn't create a new name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to close out this thread... I've still been struggling to follow these explanations, but as mentioned elsewhere, I won't hold this review up since the change should be harmless, but we'll need to revisit this as part of SYCL upstreaming; the use of CC_OpenCLKernel
in this context just doesn't make sense.
This PR prevents the MS name mangler to assert on the unhandled
CC_OpenCLKernel
calling convention which can end up in the AST when targeting NativeCPU on Windows - see attached test case that triggers the assert. This assert is also triggered on a number of e2e tests on NativeCPU. This patch handles the CC_OpenCLKernel convention currently only for NativeCPU and the usual error is raised for other targets as before.Making this change in an upstream llvm file may not be ideal - any suggestions for a better way to fix this are very welcome. Thanks!