[SYCL][ESIMD] Implement IR pass to lower C++ ESIMD intrinsics. #1881

kbobrovs · 2020-06-12T19:13:35Z

DO NOT REVIEW THE FIRST COMMIT (covered by #1880)

The pass transforms _esimd Itanium - mangled C++ intrinsics to
genx.*style parseable by the ESIMD - capable SPIRV translator.

This pass depends on the vc-intrinsics repo (GenXIntrinsics component), which is not part of the build yet - in progress.

Authors:
Konstantin S Bobrovsky [email protected]
Gang Chen [email protected]
Wei Pan
Denis Bakhvalov [email protected]
Anton Sidorenko [email protected]
Kaiyu Chen [email protected]
Pratik Ashar [email protected]

Signed-off-by: Konstantin S Bobrovsky [email protected]

againull

Overall looks good to me, just minor comments. And currently build is failing with this error:

CMake Error at cmake/modules/AddLLVM.cmake:646 (add_dependencies):
  The dependency target "LLVMGenXIntrinsics" of target "LLVMSYCLLowerIR" does
  not exist.

Coule you please fix this.

againull · 2020-06-17T18:35:03Z

llvm/include/llvm/SYCLLowerIR/LowerESIMD.h

@@ -0,0 +1,39 @@
+//===-- LowerESIMD.cpp - lower Explicit SIMD (ESIMD) constructs -----------===//


Could you please fix the header.

againull · 2020-06-17T18:43:18Z

llvm/test/SYCLLowerIR/esimd_lower_intrins.ll

+; RUN: opt < %s -LowerESIMD -S | FileCheck %s
+
+target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n8:16:32:64"
+target triple = "spir64-unknown-windows-sycldevice"


Pass uses information about triple, so probably linux should also be tested, not sure if there will be any difference.

againull · 2020-06-17T20:55:45Z

llvm/lib/SYCLLowerIR/LowerESIMD.cpp

+    // Add this kernel to the root.
+    Kernels->addOperand(MDNode::get(Ctx, MDArgs));
+    F.addFnAttr("oclrt", "1");
+    // F.setDLLStorageClass(llvm::GlobalValue::DLLExportStorageClass);


Probably you forgot to remove this.

againull · 2020-06-17T21:00:57Z

llvm/lib/SYCLLowerIR/LowerESIMD.cpp

+
+    // Iterate argument list to gather argument kinds and generate argument
+    // descriptors.
+    for (auto AI = F.arg_begin(), AE = F.arg_end(); AI != AE; ++AI) {


Nice to have: range-based for loop over F.args().

kbobrovs · 2020-06-17T21:18:43Z

Overall looks good to me, just minor comments. And currently build is failing with this error:
CMake Error at cmake/modules/AddLLVM.cmake:646 (add_dependencies):
  The dependency target "LLVMGenXIntrinsics" of target "LLVMSYCLLowerIR" does
  not exist.
Coule you please fix this.

Right. As I wrote in the description, this is pending (other people are working on this already):

This pass depends on the vc-intrinsics repo (GenXIntrinsics component), which is not part of the
build yet - in progress.

againull · 2020-06-17T21:35:11Z

Overall looks good to me, just minor comments. And currently build is failing with this error:
CMake Error at cmake/modules/AddLLVM.cmake:646 (add_dependencies):
  The dependency target "LLVMGenXIntrinsics" of target "LLVMSYCLLowerIR" does
  not exist.
Coule you please fix this.
Right. As I wrote in the description, this is pending (other people are working on this already):

This pass depends on the vc-intrinsics repo (GenXIntrinsics component), which is not part of the
build yet - in progress.

Oh, I see, thanks!

AlexeySachkov

A few notes and comments, mostly about StringRef usage.

In general, I have no objections, but I'm not sure that this approach is the best one. Have you considered using existing clang infrastructure for built-ins?
I think it would be better (or at least more reliable than parsing and comparing mangled names) to use it.

I imagine that some SYCL header file will just call a built-in function, which will be then mapped to corresponding intrinsic and no pass would be needed at all. I cannot say for sure that amount of code would be less than in this pass, but it would look more aligned with existing clang stuff, for sure.

A few links:

llvm/lib/SYCLLowerIR/LowerESIMD.cpp

AlexeySachkov · 2020-06-19T09:01:58Z

llvm/lib/SYCLLowerIR/LowerESIMD.cpp

+    if (!Name.startswith(ESIMD_INTRIN_PREF0))
+      continue;
+    // now skip the digits
+    StringRef Name1 = Name.substr(std::strlen(ESIMD_INTRIN_PREF0));


I'm also not sure why do you need Name1 here if you can just re-assign result of substr or drop_while back to Name

I'll see if I can simplify

llvm/lib/SYCLLowerIR/LowerESIMD.cpp

kbobrovs · 2020-06-19T16:41:07Z

Have you considered using existing clang infrastructure for built-ins?

Yes we have. Big difference with e.g. AMD GPU is that we compile with SPIRV target, not with Gen-specific target. SPIRV target can't have Gen-specific intrinsics enabled easily and non-intrusively. Another difference is that there is no Gen back-end, and no Gen TargetMachine, which also complicates things.

AlexeySachkov · 2020-06-19T19:44:06Z

Big difference with e.g. AMD GPU is that we compile with SPIRV target, not with Gen-specific target. SPIRV target can't have Gen-specific intrinsics enabled easily and non-intrusively. Another difference is that there is no Gen back-end, and no Gen TargetMachine, which also complicates things.

AFAIK, you can have not only backend-specific built-ins, but also lang-specific or even lib-specific built-ins. Also, I don't fully understand why do you need actual backend or TargetMachine to add new intrinsics and emit them in LLVM IR (for native code, yes, some BE is required, but here we are talking about LLVM IR only)

kbobrovs · 2020-06-19T22:18:09Z

AFAIK, you can have not only backend-specific built-ins, but also lang-specific or even lib-specific built-ins.

Can you please please be more specific about your suggestion and why it is better than what's implemented? We can discuss this as a potential future alternative and add a TODO (if we agree that it is something worth doing). So far:

this is really back-end specific, not lang or lib-specific intrinsics
there are tons of combinations of template parameters which would require tons of new built-in definitions and lowering
we use SPIRV target, which is device neutral, but need to represent and emit device-specific intrinsics, I don't think there is a prescribed recipe for this in LLVM, and I believe the solution implemented fits the purpose. But I'll be definitely interested in good alternative suggestions.

Also, I don't fully understand why do you need actual backend or TargetMachine to add new intrinsics and emit them in LLVM IR (for native code, yes, some BE is required, but here we are talking about LLVM IR only)

What I said is that absence of TargetMachine will (well, is likely to) complicate implementation of this approach. You showed x86 and AMD as an example, but similar approach for Gen won't work w/o target machine.

AlexeySachkov · 2020-06-22T16:50:00Z

Please note that I'm not against going with the current approach, I'm just trying to understand whether we can implement the same in a way which is better aligned with existing stuff in clang & llvm (and potentially more reliable: dealing with mangler/demangler is evil)

Also, I don't fully understand why do you need actual backend or TargetMachine to add new intrinsics and emit them in LLVM IR (for native code, yes, some BE is required, but here we are talking about LLVM IR only)

What I said is that absence of TargetMachine will (well, is likely to) complicate implementation of this approach. You showed x86 and AMD as an example, but similar approach for Gen won't work w/o target machine.

Why so? I don't understand why emitting call to an intrinsic in CGBuiltin.cpp based on some descriptor from Builtins.td requires some TargetMachine and why it won't work for Gen.

AFAIK, you can have not only backend-specific built-ins, but also lang-specific or even lib-specific built-ins.

Can you please please be more specific about your suggestion and why it is better than what's implemented?

Overall, I see that we could declare several new clang built-ins, which will be lowered into corresponding GenX intrinsics. Header files should be updated to call these new built-ins. Main advantages of that approach are the following:

no dealing with demangling. I believe that relying on mangling might be fragile
potentially better performance of the compiler itself: no need to demangle each EISMD-specific call
this approach is more aligned with existing clang/llvm infrastructure

We can discuss this as a potential future alternative and add a TODO (if we agree that it is something worth doing). So far:

this is really back-end specific, not lang or lib-specific intrinsics

I don't see any problems with this

there are tons of combinations of template parameters which would require tons of new built-in definitions and lowering

I may miss something, so correct me if something is wrong, missing, or more complex than I think:

My understanding is that you are converting call to some function into intrinsic, based on some rules: for example:

{"rdregion", {a(0), t(3), t(4), t(5), a(1), t(6)}, nk(-1)}}

Here first argument of the intrinsic is the first argument of a call, the second argument of the intrinsic is the third template argument of a called function, etc.

I think that absolutely the same can be expressed via existing clang built-ins infrastructure. If you know particular types which could be used in a(0) and a(1), then you can declare a built-in with particular types (t is always just an integer value). nk shouldn't be documented in arguments at all - it will just be applied to intrinsic name when call to built-in is handled. There is no need in specific handling of template arguments or demangling anything: I suggest that this built-in is directly called from headers and all required template arguments are passed there as arguments.
If you don't know exact types which could be used in a(0) and a(1) or there are several possibilities, then you just need a built-in with custom type-checking.

we use SPIRV target, which is device neutral, but need to represent and emit device-specific intrinsics, I don't think there is a prescribed recipe for this in LLVM,

There is no way to represent any arbitrary LLVM intrinsic in SPIR-V: this is not LLVM IR representation or generation problem, this is SPIR-V specification design. If you want to have more LLVM IR intrinsics represented in SPIR-V, then you need to have a SPIR-V spec updated or SPIR-V spec extension prepared or SPIR-V extended instruction set prepared.

and I believe the solution implemented fits the purpose. But I'll be definitely interested in good alternative suggestions.

I think this patch implements a conversion from DPC++ methods/functions into LLVM IR intrinsics, but it has nothing to do with SPIR-V.

AlexeySachkov · 2020-06-22T16:51:29Z

llvm/lib/SYCLLowerIR/LowerESIMD.cpp

@@ -1027,7 +1026,7 @@ static void translateESIMDIntrinsicCall(CallInst &CI) {
  Function *F = CI.getCalledFunction();
  StringRef MnglName = F->getName();
  const char *MnglNameCStr = MnglName.data();


This variable is probably became unused

understand whether we can implement the same in a way which is better aligned with existing stuff in clang & llvm (and potentially more reliable: dealing with mangler/demangler is evil)

The word "better" and "evil" are highly subjective in this case :) I have the opposite opinion, for example. That's OK. I would suggest that you file an issue after we merge this patch and put your design sketches there. We definitely need more people to discuss what's the best here if there are doubts. Does this sound good to you?

There is no way to represent any arbitrary LLVM intrinsic in SPIR-V:

As far as I can see --spirv-allow-unknown-intrinsics serves pretty well for this purpose. What exactly do you mean?

I think this patch implements a conversion from DPC++ methods/functions into LLVM IR intrinsics

right

, but it has nothing to do with SPIR-V.

It has. This is done for the SPIRV target which means I can't use device-specific built-ins like AMDGPU built-ins, which are enabled for specific target. So SPIRV is essential here.

Actually my main doubt/objection about using clang is not really complications related to hacking built-ins mechanism for our purposes. It is the need to introduce a lot of new intrinsics derived from combinations of element type, vector length and some other parameters. Not only I'd have to introduce those, I'd also have to write lowering for each, of course.

Overall, I think this kind of debate should better happen in the dedicated issue.

Why so? I don't understand why emitting call to an intrinsic in CGBuiltin.cpp based on some descriptor from Builtins.td requires some TargetMachine and why it won't work for Gen.

Do you mean Builtins.def? This is target-neutral list of built-ins. It is not a good place for Gen intrinsics. Also I don't think e.g. ARM built-ins can be used with targets other than ARM. So I'd have to follow the same mechanism as employed for BuiltinsARM.def. I haven't digged deeper into this, but all the target-specific built-ins precedents I'm aware of are for real BEs with target machines and code gen infra.

This variable is probably became unused

yep, I haven't yet pushed the fixed variant, just committed you suggestions. Will do shortly

kbobrovs · 2020-06-26T05:28:59Z

Had to rebase and force-push.

This workaround is to avoid compilation error when building on Windows with MSVC++ 2017. Signed-off-by: Konstantin S Bobrovsky <[email protected]>

The pass transforms *__esimd_* Itanium - mangled C++ intrinsics to genx.*style parseable by the ESIMD - capable SPIRV translator. Authors: Konstantin S Bobrovsky <[email protected]> Gang Chen <[email protected]> Wei Pan Denis Bakhvalov <[email protected]> Anton Sidorenko <[email protected]> Kaiyu Chen <[email protected]> Pratik Ashar <[email protected]> Signed-off-by: Konstantin S Bobrovsky <[email protected]>

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

Co-authored-by: Alexey Sachkov <[email protected]>

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

…erIR. SYCLLowerIR is the only user of vc-intrinsics. Signed-off-by: Konstantin S Bobrovsky <[email protected]>

kbobrovs · 2020-07-01T06:13:20Z

Had to rebase again to fix another problem in ItaniumDemangler.h.
@AlexeySachkov - please review and approve if no more objections.
@againull - could you please re-submit approval.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

AlexeySachkov

As I already said, I have no objections. Different approaches can be discussed and evaluated later, if needed

bader · 2020-07-03T11:43:41Z

llvm/include/llvm/Demangle/ItaniumDemangle.h

@@ -445,6 +445,14 @@ class EnableIfAttr : public Node {
  }
 };

+#ifdef _MSC_VER


Hm... is this a GitHub bug or this code will be added again?
It should be already in place - #2021.

This is to make compilation and testing for this PR pass. I will rebase and this will go away.

kbobrovs requested review from AlexeySachkov and againull June 12, 2020 19:13

kbobrovs requested a review from bader as a code owner June 12, 2020 19:13

kbobrovs force-pushed the esimd-lower branch from 703efce to 35d9f11 Compare June 12, 2020 19:19

kbobrovs added the esimd Explicit SIMD feature label Jun 12, 2020

kbobrovs mentioned this pull request Jun 13, 2020

[SYCL][ESIMD] Use vload/vstore for simd object getter/setter. #1887

Merged

againull reviewed Jun 17, 2020

View reviewed changes

againull self-requested a review June 17, 2020 23:32

againull previously approved these changes Jun 17, 2020

View reviewed changes

AlexeySachkov reviewed Jun 19, 2020

View reviewed changes

asi-sc mentioned this pull request Jun 19, 2020

[SYCL][ESIMD] Bring vc-intrinsics sources #1930

Merged

kbobrovs dismissed againull’s stale review via 787f903 June 19, 2020 22:40

AlexeySachkov reviewed Jun 22, 2020

View reviewed changes

kbobrovs force-pushed the esimd-lower branch 2 times, most recently from 09560e1 to 572d874 Compare June 26, 2020 05:28

kbobrovs requested a review from AlexeySachkov June 26, 2020 05:29

kbobrovs and others added 6 commits June 30, 2020 22:53

Add workaround for MSVC++ 2017 bug to ItaniumDemangle.h.

26e493f

This workaround is to avoid compilation error when building on Windows with MSVC++ 2017. Signed-off-by: Konstantin S Bobrovsky <[email protected]>

[SQUASH] Address review comments.

ba89a49

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

Apply suggestions from code review

32584d8

Co-authored-by: Alexey Sachkov <[email protected]>

[SQUASH] Fixes after code review comments implementation.

f22c5cd

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

[SYCL][ESIMD] Move vc-intrinsics dependency build to llvm/lib/SYCLLow…

6c0a911

…erIR. SYCLLowerIR is the only user of vc-intrinsics. Signed-off-by: Konstantin S Bobrovsky <[email protected]>

kbobrovs force-pushed the esimd-lower branch from 572d874 to 6c0a911 Compare July 1, 2020 05:59

kbobrovs requested a review from a team as a code owner July 1, 2020 05:59

[SQUASH] Applied clang-format.

24f099f

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

kbobrovs force-pushed the esimd-lower branch from e79b631 to 24f099f Compare July 1, 2020 06:15

againull approved these changes Jul 1, 2020

View reviewed changes

AlexeySachkov approved these changes Jul 2, 2020

View reviewed changes

bader reviewed Jul 3, 2020

View reviewed changes

bader approved these changes Jul 3, 2020

View reviewed changes

bader merged commit a115520 into intel:sycl Jul 3, 2020

kbobrovs deleted the esimd-lower branch July 30, 2020 12:30

		@@ -0,0 +1,39 @@
		//===-- LowerESIMD.cpp - lower Explicit SIMD (ESIMD) constructs -----------===//

[SYCL][ESIMD] Implement IR pass to lower C++ ESIMD intrinsics. #1881

[SYCL][ESIMD] Implement IR pass to lower C++ ESIMD intrinsics. #1881

Uh oh!

Conversation

kbobrovs commented Jun 12, 2020

Uh oh!

againull left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

againull Jun 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbobrovs commented Jun 17, 2020

Uh oh!

againull commented Jun 17, 2020

Uh oh!

AlexeySachkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kbobrovs commented Jun 19, 2020

Uh oh!

AlexeySachkov commented Jun 19, 2020

Uh oh!

kbobrovs commented Jun 19, 2020

Uh oh!

AlexeySachkov commented Jun 22, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbobrovs Jun 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbobrovs commented Jun 26, 2020

Uh oh!

kbobrovs commented Jul 1, 2020

Uh oh!

AlexeySachkov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

againull Jun 17, 2020 •

edited

Loading

kbobrovs Jun 22, 2020 •

edited

Loading