Skip to content

[SYCL][ESIMD] Implement IR pass to lower C++ ESIMD intrinsics. #1881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 3, 2020

Conversation

kbobrovs
Copy link
Contributor

DO NOT REVIEW THE FIRST COMMIT (covered by #1880)

The pass transforms _esimd Itanium - mangled C++ intrinsics to
genx.*style parseable by the ESIMD - capable SPIRV translator.

This pass depends on the vc-intrinsics repo (GenXIntrinsics component), which is not part of the build yet - in progress.

Authors:
Konstantin S Bobrovsky [email protected]
Gang Chen [email protected]
Wei Pan
Denis Bakhvalov [email protected]
Anton Sidorenko [email protected]
Kaiyu Chen [email protected]
Pratik Ashar [email protected]

Signed-off-by: Konstantin S Bobrovsky [email protected]

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, just minor comments. And currently build is failing with this error:

CMake Error at cmake/modules/AddLLVM.cmake:646 (add_dependencies):
  The dependency target "LLVMGenXIntrinsics" of target "LLVMSYCLLowerIR" does
  not exist.

Coule you please fix this.

@@ -0,0 +1,39 @@
//===-- LowerESIMD.cpp - lower Explicit SIMD (ESIMD) constructs -----------===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please fix the header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

; RUN: opt < %s -LowerESIMD -S | FileCheck %s

target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n8:16:32:64"
target triple = "spir64-unknown-windows-sycldevice"
Copy link
Contributor

@againull againull Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass uses information about triple, so probably linux should also be tested, not sure if there will be any difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// Add this kernel to the root.
Kernels->addOperand(MDNode::get(Ctx, MDArgs));
F.addFnAttr("oclrt", "1");
// F.setDLLStorageClass(llvm::GlobalValue::DLLExportStorageClass);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you forgot to remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// Iterate argument list to gather argument kinds and generate argument
// descriptors.
for (auto AI = F.arg_begin(), AE = F.arg_end(); AI != AE; ++AI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have: range-based for loop over F.args().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@kbobrovs
Copy link
Contributor Author

Overall looks good to me, just minor comments. And currently build is failing with this error:

CMake Error at cmake/modules/AddLLVM.cmake:646 (add_dependencies):
  The dependency target "LLVMGenXIntrinsics" of target "LLVMSYCLLowerIR" does
  not exist.

Coule you please fix this.

Right. As I wrote in the description, this is pending (other people are working on this already):

This pass depends on the vc-intrinsics repo (GenXIntrinsics component), which is not part of the
build yet - in progress.

@againull
Copy link
Contributor

Overall looks good to me, just minor comments. And currently build is failing with this error:

CMake Error at cmake/modules/AddLLVM.cmake:646 (add_dependencies):
  The dependency target "LLVMGenXIntrinsics" of target "LLVMSYCLLowerIR" does
  not exist.

Coule you please fix this.

Right. As I wrote in the description, this is pending (other people are working on this already):

This pass depends on the vc-intrinsics repo (GenXIntrinsics component), which is not part of the
build yet - in progress.

Oh, I see, thanks!

@againull againull self-requested a review June 17, 2020 23:32
againull
againull previously approved these changes Jun 17, 2020
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes and comments, mostly about StringRef usage.

In general, I have no objections, but I'm not sure that this approach is the best one. Have you considered using existing clang infrastructure for built-ins?
I think it would be better (or at least more reliable than parsing and comparing mangled names) to use it.

I imagine that some SYCL header file will just call a built-in function, which will be then mapped to corresponding intrinsic and no pass would be needed at all. I cannot say for sure that amount of code would be less than in this pass, but it would look more aligned with existing clang stuff, for sure.

A few links:

if (!Name.startswith(ESIMD_INTRIN_PREF0))
continue;
// now skip the digits
StringRef Name1 = Name.substr(std::strlen(ESIMD_INTRIN_PREF0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure why do you need Name1 here if you can just re-assign result of substr or drop_while back to Name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can simplify

@kbobrovs
Copy link
Contributor Author

Have you considered using existing clang infrastructure for built-ins?

Yes we have. Big difference with e.g. AMD GPU is that we compile with SPIRV target, not with Gen-specific target. SPIRV target can't have Gen-specific intrinsics enabled easily and non-intrusively. Another difference is that there is no Gen back-end, and no Gen TargetMachine, which also complicates things.

@AlexeySachkov
Copy link
Contributor

Big difference with e.g. AMD GPU is that we compile with SPIRV target, not with Gen-specific target. SPIRV target can't have Gen-specific intrinsics enabled easily and non-intrusively. Another difference is that there is no Gen back-end, and no Gen TargetMachine, which also complicates things.

AFAIK, you can have not only backend-specific built-ins, but also lang-specific or even lib-specific built-ins. Also, I don't fully understand why do you need actual backend or TargetMachine to add new intrinsics and emit them in LLVM IR (for native code, yes, some BE is required, but here we are talking about LLVM IR only)

@kbobrovs
Copy link
Contributor Author

AFAIK, you can have not only backend-specific built-ins, but also lang-specific or even lib-specific built-ins.

Can you please please be more specific about your suggestion and why it is better than what's implemented? We can discuss this as a potential future alternative and add a TODO (if we agree that it is something worth doing). So far:

  • this is really back-end specific, not lang or lib-specific intrinsics
  • there are tons of combinations of template parameters which would require tons of new built-in definitions and lowering
  • we use SPIRV target, which is device neutral, but need to represent and emit device-specific intrinsics, I don't think there is a prescribed recipe for this in LLVM, and I believe the solution implemented fits the purpose. But I'll be definitely interested in good alternative suggestions.

Also, I don't fully understand why do you need actual backend or TargetMachine to add new intrinsics and emit them in LLVM IR (for native code, yes, some BE is required, but here we are talking about LLVM IR only)

What I said is that absence of TargetMachine will (well, is likely to) complicate implementation of this approach. You showed x86 and AMD as an example, but similar approach for Gen won't work w/o target machine.

@AlexeySachkov
Copy link
Contributor

Please note that I'm not against going with the current approach, I'm just trying to understand whether we can implement the same in a way which is better aligned with existing stuff in clang & llvm (and potentially more reliable: dealing with mangler/demangler is evil)

Also, I don't fully understand why do you need actual backend or TargetMachine to add new intrinsics and emit them in LLVM IR (for native code, yes, some BE is required, but here we are talking about LLVM IR only)

What I said is that absence of TargetMachine will (well, is likely to) complicate implementation of this approach. You showed x86 and AMD as an example, but similar approach for Gen won't work w/o target machine.

Why so? I don't understand why emitting call to an intrinsic in CGBuiltin.cpp based on some descriptor from Builtins.td requires some TargetMachine and why it won't work for Gen.

AFAIK, you can have not only backend-specific built-ins, but also lang-specific or even lib-specific built-ins.

Can you please please be more specific about your suggestion and why it is better than what's implemented?

Overall, I see that we could declare several new clang built-ins, which will be lowered into corresponding GenX intrinsics. Header files should be updated to call these new built-ins. Main advantages of that approach are the following:

  • no dealing with demangling. I believe that relying on mangling might be fragile
  • potentially better performance of the compiler itself: no need to demangle each EISMD-specific call
  • this approach is more aligned with existing clang/llvm infrastructure

We can discuss this as a potential future alternative and add a TODO (if we agree that it is something worth doing). So far:

  • this is really back-end specific, not lang or lib-specific intrinsics

I don't see any problems with this

  • there are tons of combinations of template parameters which would require tons of new built-in definitions and lowering

I may miss something, so correct me if something is wrong, missing, or more complex than I think:

My understanding is that you are converting call to some function into intrinsic, based on some rules: for example:

{"rdregion", {a(0), t(3), t(4), t(5), a(1), t(6)}, nk(-1)}}

Here first argument of the intrinsic is the first argument of a call, the second argument of the intrinsic is the third template argument of a called function, etc.

I think that absolutely the same can be expressed via existing clang built-ins infrastructure. If you know particular types which could be used in a(0) and a(1), then you can declare a built-in with particular types (t is always just an integer value). nk shouldn't be documented in arguments at all - it will just be applied to intrinsic name when call to built-in is handled. There is no need in specific handling of template arguments or demangling anything: I suggest that this built-in is directly called from headers and all required template arguments are passed there as arguments.
If you don't know exact types which could be used in a(0) and a(1) or there are several possibilities, then you just need a built-in with custom type-checking.

  • we use SPIRV target, which is device neutral, but need to represent and emit device-specific intrinsics, I don't think there is a prescribed recipe for this in LLVM,

There is no way to represent any arbitrary LLVM intrinsic in SPIR-V: this is not LLVM IR representation or generation problem, this is SPIR-V specification design. If you want to have more LLVM IR intrinsics represented in SPIR-V, then you need to have a SPIR-V spec updated or SPIR-V spec extension prepared or SPIR-V extended instruction set prepared.

and I believe the solution implemented fits the purpose. But I'll be definitely interested in good alternative suggestions.

I think this patch implements a conversion from DPC++ methods/functions into LLVM IR intrinsics, but it has nothing to do with SPIR-V.

@@ -1027,7 +1026,7 @@ static void translateESIMDIntrinsicCall(CallInst &CI) {
Function *F = CI.getCalledFunction();
StringRef MnglName = F->getName();
const char *MnglNameCStr = MnglName.data();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is probably became unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understand whether we can implement the same in a way which is better aligned with existing stuff in clang & llvm (and potentially more reliable: dealing with mangler/demangler is evil)

The word "better" and "evil" are highly subjective in this case :) I have the opposite opinion, for example. That's OK. I would suggest that you file an issue after we merge this patch and put your design sketches there. We definitely need more people to discuss what's the best here if there are doubts. Does this sound good to you?

There is no way to represent any arbitrary LLVM intrinsic in SPIR-V:

As far as I can see --spirv-allow-unknown-intrinsics serves pretty well for this purpose. What exactly do you mean?

I think this patch implements a conversion from DPC++ methods/functions into LLVM IR intrinsics

right

, but it has nothing to do with SPIR-V.

It has. This is done for the SPIRV target which means I can't use device-specific built-ins like AMDGPU built-ins, which are enabled for specific target. So SPIRV is essential here.

Actually my main doubt/objection about using clang is not really complications related to hacking built-ins mechanism for our purposes. It is the need to introduce a lot of new intrinsics derived from combinations of element type, vector length and some other parameters. Not only I'd have to introduce those, I'd also have to write lowering for each, of course.

Overall, I think this kind of debate should better happen in the dedicated issue.

Copy link
Contributor Author

@kbobrovs kbobrovs Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so? I don't understand why emitting call to an intrinsic in CGBuiltin.cpp based on some descriptor from Builtins.td requires some TargetMachine and why it won't work for Gen.

Do you mean Builtins.def? This is target-neutral list of built-ins. It is not a good place for Gen intrinsics. Also I don't think e.g. ARM built-ins can be used with targets other than ARM. So I'd have to follow the same mechanism as employed for BuiltinsARM.def. I haven't digged deeper into this, but all the target-specific built-ins precedents I'm aware of are for real BEs with target machines and code gen infra.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is probably became unused

yep, I haven't yet pushed the fixed variant, just committed you suggestions. Will do shortly

@kbobrovs kbobrovs force-pushed the esimd-lower branch 2 times, most recently from 09560e1 to 572d874 Compare June 26, 2020 05:28
@kbobrovs
Copy link
Contributor Author

Had to rebase and force-push.

@kbobrovs kbobrovs requested a review from AlexeySachkov June 26, 2020 05:29
kbobrovs and others added 6 commits June 30, 2020 22:53
This workaround is to avoid compilation error when building on Windows
with MSVC++ 2017.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>
The pass transforms *__esimd_* Itanium - mangled C++ intrinsics to
genx.*style parseable by the ESIMD - capable SPIRV translator.

Authors:
Konstantin S Bobrovsky <[email protected]>
Gang Chen <[email protected]>
Wei Pan
Denis Bakhvalov <[email protected]>
Anton Sidorenko <[email protected]>
Kaiyu Chen <[email protected]>
Pratik Ashar <[email protected]>

Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Signed-off-by: Konstantin S Bobrovsky <[email protected]>
…erIR.

SYCLLowerIR is the only user of vc-intrinsics.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>
@kbobrovs kbobrovs requested a review from a team as a code owner July 1, 2020 05:59
@kbobrovs
Copy link
Contributor Author

kbobrovs commented Jul 1, 2020

Had to rebase again to fix another problem in ItaniumDemangler.h.
@AlexeySachkov - please review and approve if no more objections.
@againull - could you please re-submit approval.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I already said, I have no objections. Different approaches can be discussed and evaluated later, if needed

@@ -445,6 +445,14 @@ class EnableIfAttr : public Node {
}
};

#ifdef _MSC_VER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... is this a GitHub bug or this code will be added again?
It should be already in place - #2021.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make compilation and testing for this PR pass. I will rebase and this will go away.

@bader bader merged commit a115520 into intel:sycl Jul 3, 2020
@kbobrovs kbobrovs deleted the esimd-lower branch July 30, 2020 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esimd Explicit SIMD feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants