Skip to content

[SYCL][ESIMD]Limit propagation of ESIMD attributes #7191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 5, 2022

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Oct 26, 2022

No description provided.

@fineg74
Copy link
Contributor Author

fineg74 commented Oct 26, 2022

Complementary test PR: intel/llvm-test-suite#1352

@fineg74 fineg74 changed the title Limit propagation of ESIMD attributes to RoundedRangeKernel [SYCL][ESIMD]Limit propagation of ESIMD attributes to RoundedRangeKernel Oct 27, 2022
v-klochkov
v-klochkov previously approved these changes Oct 31, 2022
#define DEBUG_TYPE "LowerESIMDKernelAttrs"

using namespace llvm;

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocator is used in other places - please factor this out to a separate source to avoid code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -30,7 +63,28 @@ SYCLFixupESIMDKernelWrapperMDPass::run(Module &M, ModuleAnalysisManager &MAM) {
sycl::utils::traverseCallgraphUp(
&F,
[&](Function *GraphNode) {
if (!llvm::esimd::isESIMD(*GraphNode)) {
if (!llvm::esimd::isESIMD(*GraphNode) &&
llvm::esimd::isKernel(*GraphNode)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filtering leaves out SYCL_EXTERNAL functions, which does not seem correct.

if (!Name.contains("sycl::_V1::detail::RoundedRangeKernel<")) {
return;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the GraphNode->setMetadata... call below is reached only when GraphNode is a RoundedRangeKernel, which means only RoundedRangeKernel gets the marker, all others are left out. Previous logic was that all functions were marked. This needs additional investigation. I'll also give this a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally this pass was introduced to resolve a problem of mixing ESIMD and NonESIMD kernels and back then the issue was determined as
The immediate cause of the issue is that ESIMD kernel is wrapped with RoundedRangeKernel and sycl_explicit_esimd attribute is not propagated to the wrapper class. As a result ModuleSplitter in sycl-post-link creates a spurious split for the wrapper and later creates a spurious non-esimd module and pulls called functions into it. As a result non-esimd module would contain esimd functions and later fails to link. It looks like that for named ESIMD kernels the attributes are properly propagated and therefore the issue doesn't occur. (#6557)
This bug basically says that marking all functions that call ESIMD functions is also incorrect as it results in a crash in some circumstances. Therefore the fix here is to limit marking to RoundedRangeKernel which was the original source of the problem. So in this sense the fix does what it is expected to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this patch does is it excludes all the functions in the caller chain from given ESIMD function F up to RoundedRangeKernel from marking them as ESIMD, instead, it only marks the kernel. Previously all the function in the call chain were marked.
Now, this patch is supposed to fix internal LLVM error on some function met in the IR:
Assertion `isa(Val) && "cast() argument of incompatible type!"' failed.

So

  1. All the functions in a call chain ending at ESIMD function must be ESIMD functions, so not marking them is incorrect.
  2. I think the patch just masks the problem by skipping marking the problematic function (which in turn can spawn other problems).

I think we need to come up with a patch that fixes the root cause of the above assert.

@v-klochkov v-klochkov dismissed their stale review November 1, 2022 17:25

It needs more review/investigation.

@@ -0,0 +1,53 @@
//===--------- Allocator.h - Allocator for demangler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've expressed my proposal clearer, sorry. This is just implementation detail of demangling code, not a general purpose service, so I would very much prefer this to reside in ESIMDUtils.cpp/ESIMDUtils.hpp

@fineg74 fineg74 requested a review from a team as a code owner November 10, 2022 18:35

using namespace llvm::esimd;
bool isAddressArgumentInvokeSIMD(const CallInst *CI) {
constexpr char INVOKE_SIMD_PREF[] = "_Z33__regcall3____builtin_invoke_simd";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please factor out. There is another user:
llvm/lib/SYCLLowerIR/LowerInvokeSimd.cpp:constexpr char INVOKE_SIMD_PREF[] = "_Z33__regcall3____builtin_invoke_simd";


namespace llvm {
namespace sycl {
namespace utils {

using namespace llvm::esimd;
bool isAddressArgumentInvokeSIMD(const CallInst *CI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool isAddressArgumentInvokeSIMD(const CallInst *CI) {
bool isInvokeSimdBuiltinCall(const CallInst *CI) {

return false;
}

bool filterFunctionPointer(Value *address) {
Copy link
Contributor

@kbobrovs kbobrovs Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment what this function does, what it returns.
I also thought the deduceFunction from LowerInvokeSimd.cpp could be used directly (factored out first, of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that deduceFunction is doing the opposite: i.e. given an argument of invoke_simd, it tries to find the actual function that invoke_simd tries to call.

@@ -43,6 +101,13 @@ void traverseCallgraphUp(llvm::Function *F, CallGraphNodeAction ActionF,
} else {
// ... non-call is OK - add using function to the worklist
if (auto *I = dyn_cast<Instruction>(FCall)) {
if (auto *SI = dyn_cast<StoreInst>(I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we talked, such filtering within traverseCallgraphUp should be opaque, and filtering function should be passed as parameter rather than hardcoded, as there are multiple users of traverseCallgraphUp, but we need this invoke_simd behavior only for attribute markup. Somthing like

void traverseCallgraphUp(llvm::Function *F, CallGraphNodeAction ActionF,
                         SmallPtrSetImpl<Function *> &FunctionsVisited,
                         bool ErrorOnNonCallUse, std::function<bool(const Instruction *I, const Function *F)> && FuncUseFilter);

and the LowerESIMDKernelAttrs pass would invoke the traverseCallgraphUp with proper FuncUseFilter argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. However, I believe we are going to encounter similar problems in other cases where traverseCallgraphUp is used with invoke_simd and therefore embedding the filtering functionality to traverseCallgraphUp would make more sense

@fineg74 fineg74 changed the title [SYCL][ESIMD]Limit propagation of ESIMD attributes to RoundedRangeKernel [SYCL][ESIMD]Limit propagation of ESIMD attributes Nov 10, 2022
void traverseCallgraphUp(llvm::Function *F, CallGraphNodeAction NodeF,
SmallPtrSetImpl<Function *> &Visited,
bool ErrorOnNonCallUse);
void traverseCallgraphUp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment how functionFilter parameter is used.

Comment on lines 11 to 14
#include "llvm/Demangle/Demangle.h"
#include "llvm/Demangle/ItaniumDemangle.h"
#include "llvm/IR/Module.h"
#include "llvm/Pass.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these still needed?

#define DEBUG_TYPE "LowerESIMDKernelAttrs"

using namespace llvm;
using namespace llvm::esimd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using namespace llvm::esimd;

@@ -37,7 +39,7 @@ SYCLFixupESIMDKernelWrapperMDPass::run(Module &M, ModuleAnalysisManager &MAM) {
Modified = true;
}
},
false);
false, filterFunctionPointer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
false, filterFunctionPointer);
false, llvm::esimd::filterFunctionPointer);

@@ -43,6 +47,13 @@ void traverseCallgraphUp(llvm::Function *F, CallGraphNodeAction ActionF,
} else {
// ... non-call is OK - add using function to the worklist
if (auto *I = dyn_cast<Instruction>(FCall)) {
if (auto *SI = dyn_cast<StoreInst>(I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment why this special branch is needed if (auto *SI = dyn_cast<StoreInst>(I))

@@ -43,6 +47,13 @@ void traverseCallgraphUp(llvm::Function *F, CallGraphNodeAction ActionF,
} else {
// ... non-call is OK - add using function to the worklist
if (auto *I = dyn_cast<Instruction>(FCall)) {
if (auto *SI = dyn_cast<StoreInst>(I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StoreInst check should be moved into the filter as it has nothing to do with this generic traversal functionality. That's why I suggested std::function<bool(const Instruction *I, const Function *F)>, not std::function<bool(const Value *)>

Suggested change
if (auto *SI = dyn_cast<StoreInst>(I)) {
if (!functionFilter(I, CurF)) {
continue;
}

Comment on lines 51 to 56
Value *addr = SI->getPointerOperand();
if (!functionFilter(addr)) {
continue;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value *addr = SI->getPointerOperand();
if (!functionFilter(addr)) {
continue;
}
}

// Tracks the use of a function pointer being stored in a memory.
// Returns false if the function pointer is used as an argument for invoke_simd
// function call, true otherwise.
bool filterFunctionPointer(Value *address) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since this function remains quite specific and is not reused anywhere, it should be moved into the place of use - LowerESIMDKernelAttrs.cpp.
  2. Please improve comment. It is not clear what is memory, what is function pointer, what is the incoming argument.
  3. I added a note below that this should have different signature.

Comment on lines 35 to 37
if (address == nullptr) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: LLVM Coding Standard says don’t use braces on simple single-statement bodies of if/else/loop statements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always thought this is not strict - we prefer to omit the braces to avoid unnecessary line noise.

Comment on lines 25 to 28
if (F && F->getName().startswith(esimd::INVOKE_SIMD_PREF)) {
return true;
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (F && F->getName().startswith(esimd::INVOKE_SIMD_PREF)) {
return true;
}
return false;
return F && F->getName().startswith(esimd::INVOKE_SIMD_PREF);

// if the instruction is to store address of a function, check if it is later
// used by InvokeSimd.
if (auto *SI = dyn_cast<StoreInst>(I)) {
const Value *addr = SI->getPointerOperand();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const Value *addr = SI->getPointerOperand();
const Value *Addr = SI->getPointerOperand();

@@ -73,10 +72,6 @@ ModulePass *llvm::createSYCLLowerInvokeSimdPass() {

namespace {
// TODO support lambda and functor overloads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move this TODO as well or is it unrelated to the removed code below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO is unrelated to the removed code

// Checks the use of a function address being stored in a memory.
// Returns false if the function address is used as an argument for
// invoke_simd function call, true otherwise.
bool checkFunctionAddressUse(const Value *address) {
Copy link
Contributor

@kbobrovs kbobrovs Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment still not updated

BTW, why not just inline it into filterInvokeSimdUse, since it is not general purpose function anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses recursion to follow the use of a function. filterInvokeSimdUse was introduced to take StoreInst check out of generic traverseCallgraphUp and inlining will complicate the recursion.

@@ -43,6 +47,10 @@ void traverseCallgraphUp(llvm::Function *F, CallGraphNodeAction ActionF,
} else {
// ... non-call is OK - add using function to the worklist
if (auto *I = dyn_cast<Instruction>(FCall)) {
if (!functionFilter(I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function* should be part of the interface, as we are deciding whether a use of particular function should be filtered or not. What if there are multiple function pointers used by I?

@@ -25,8 +25,8 @@ bool isInvokeSimdBuiltinCall(const CallInst *CI) {
return F && F->getName().startswith(esimd::INVOKE_SIMD_PREF);
}

// Checks the use of a function address being stored in a memory.
// Returns false if the function address is used as an argument for
// Checks the use of a function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no "function" per se in any of the parameters. So the comment is not clear.

// used by InvokeSimd.
if (auto *SI = dyn_cast<StoreInst>(I)) {
const Value *Addr = SI->getPointerOperand();
return checkFunctionAddressUse(Addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return checkFunctionAddressUse(Addr);
return !isInvokeSimdArgumentSource(Addr, F);

// location is used diectly or indirectly as an argument for invoke_simd.
// Returns false if the function is used as an argument for
// invoke_simd function call, true otherwise.
bool checkFunctionAddressUse(const Value *address) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool checkFunctionAddressUse(const Value *address) {
bool isInvokeSimdArgumentSource(const Value *address) {

// invoke_simd function call, true otherwise.
bool checkFunctionAddressUse(const Value *address) {
if (address == nullptr)
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return true;
return false;

} else if (const auto *CI = dyn_cast<CallInst>(V)) {
// if __builtin_invoke_simd uses the pointer, do not traverse the function
if (isInvokeSimdBuiltinCall(CI))
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false;
return true;

// location is used diectly or indirectly as an argument for invoke_simd.
// Returns false if the function is used as an argument for
// invoke_simd function call, true otherwise.
bool checkFunctionAddressUse(const Value *address) {
Copy link
Contributor

@kbobrovs kbobrovs Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into the code deeper, I think this part should be reworked. The overall approach can be:

Loads = find all loads from `address`
for (Load : Loads) {
  Uses = collectUsesSkipThroughCasts(Load);
  for (Use : Uses) {
    if (Use is parameter to invoke_simd) {
      return true;
    }
  }
  return false;
}

Current logic is very different. We also don't need recursion, so this can be inlined intp the caller.

Copy link
Contributor Author

@fineg74 fineg74 Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the suggested rework is a simplification of the current approach and we are going to miss even more cases that we may currently miss.
For example what if we store the value to a different address, we are not going to track this new address and are going to miss the valid code flow that may lead to invoke_simd. The purpose of recursion was to track such chain of load or load/store manipulations especially for -O0 when no optimizations that eliminate redundant loads/stores are applied.
For example following code will fail:

FuncType SIMD_CALLEE_PTR = SIMD_CALLEE;
....
FuncType SIMD_CALLEE_PTR1 = SIMD_CALLEE_PTR;
....
invoke_simd(sg, SIMD_CALLEE_PTR1 , ...);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think original checkFunctionAddressUse is borrowed from https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLLowerIR/LowerInvokeSimd.cpp#L102, whose purpose is quite different from what we need here. collectPossibleStoredVals is used to conservatively detect if optimization is possible, checkFunctionAddressUse must tell with 100% reliability that given address:

  1. either contains only given function address or cannot contain it at all, anything in between should be an assert (as we lose reliability in this case)
  2. if it contains only given function address, then the value stored (the function pointer) either can't appear as an argument of __builtin_invoke_simd.
    In general case this is not possible, e.g. when user program uses some complex data flow on the function pointer before calling invoke_simd.

The more I think of this, the more I'm getting inclined to stop adding a call graph edge on any use of a function other than call.
In our case, if a user function can call ESIMD function via a pointer, we should require that this function is marked with SYCL_ESIMD_FUNCTION explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the conclusion. The problem that this fix is trying to fix is to mark all callers of an ESIMD function up the chain with ESIMD until it reaches either kernel or invoke_simd. So it is invoked only for functions explicitly marked as SYCL_ESIMD_FUNCTION. If we allow function pointers as arguments for invoke_simd we would need some data flow analysis, similar to what checkFunctionAddressUse does. Yes it has its limitations and doesn't cover all the cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked, the idea is:

  • When propagating sycl_explicit_simd attribute we never go up through non-call use of function pointers, which automatically covers invoke_simd
  • Users must explicitly add sycl_explicit_simd attribute if function calls a sycl_explicit_simd function through pointer.

@pvchupin
Copy link
Contributor

pvchupin commented Dec 1, 2022

hip fails are due to driver fail.

[16528323.699738] amdgpu: Runlist is getting oversubscribed. Expect reduced ROCm performance.
[16528323.892286] amdgpu: Runlist is getting oversubscribed. Expect reduced ROCm performance.
[16528325.741668] amdgpu: Runlist is getting oversubscribed. Expect reduced ROCm performance.
[16528327.333944] amdgpu: Runlist is getting oversubscribed. Expect reduced ROCm performance.
[16528720.285095] static-buffer-d[1433031]: segfault at 39 ip 00007fe85cea9871 sp 00007fffd57bed80 error 4 in libamdhip64.so.5.4.50400[7fe85cdf0000+37e000]

I've rebooted runner.

@@ -68,5 +74,35 @@ void collectUsesLookThroughCastsAndZeroGEPs(const Value *V,
/// in it. Returns nullptr if failed to do so.
Type *getVectorTyOrNull(StructType *STy);

// Simplest possible implementation of an allocator for the Itanium demangler
Copy link
Contributor

@kbobrovs kbobrovs Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change (and other ones related to demangling) is no longer needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon second thought, it is good to have the allocator defined in shared header, as others may need to use it too.

@kbobrovs
Copy link
Contributor

kbobrovs commented Dec 5, 2022

@AlexeySachkov, do you approve?

@AlexeySachkov
Copy link
Contributor

@AlexeySachkov, do you approve?

My comments were mostly minor about coding style, I don't have objections against merging this PR, feel free to go ahead

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formal approval (I forgot that there are two files owned by dpcpp-tools-reviewers)

@kbobrovs kbobrovs merged commit 4d3c150 into intel:sycl Dec 5, 2022
@fineg74 fineg74 deleted the esimdSplitFix branch December 27, 2022 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants