Skip to content

[SYCL] Separate host instantiation from HostKernel #18534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 29, 2025

Conversation

Pennycook
Copy link
Contributor

@Pennycook Pennycook commented May 19, 2025

The HostKernel class is currently used for two different things:

  1. To extend the lifetime of the kernel function object/lambda; and
  2. To force the kernel to be instantiated on the host.

Requiring a single solution to solve both problems prevents optimization. Providing separate solutions will enable us to use the fastest solution for each problem.


A few notes that might help with review:

  • I would like us to eventually reach a point where we can remove the HostKernel class entirely and avoid extending the lifetime of the kernel function object. We can't do that if it's being relied upon for debugging.

  • I had to change the runKernelWithoutArg signature to remove SFINAE because MSVC complained about creating a pointer to an overloaded function. I would have expected that the compiler could figure out the function pointer after SFINAE, but it failed.

  • Removing the old mechanism of instantiating on the host is an ABI break because it removes a function from the vtable.

  • With the new approach, we can avoid instantiating (and compiling!) the kernel on the host under NDEBUG. I've verified that -DNDEBUG -fpreview-breaking-changes removes the kernel symbols from the binary. -DNDEBUG alone isn't currently sufficient, because the old instantiation mechanism is still triggered. I wasn't sure how much of the old implementation to leave in tact.

The HostKernel class is currently used for two different things:
1) To extend the lifetime of the kernel function object/lambda; and
2) To force the kernel to be instantiated on the host.

Requiring a single solution to solve both problems prevents optimization.
Providing separate solutions will enable us to use the fastest solution
for each problem.

Signed-off-by: John Pennycook <[email protected]>
Pennycook added 2 commits May 21, 2025 12:49
Signed-off-by: John Pennycook <[email protected]>
Apparently the symbol is different for different versions of MSVC...

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook marked this pull request as ready for review May 21, 2025 12:51
@Pennycook Pennycook requested a review from a team as a code owner May 21, 2025 12:51
@Pennycook Pennycook requested a review from sergey-semenov May 21, 2025 12:51
@Pennycook
Copy link
Contributor Author

@intel/llvm-gatekeepers - I think this can be merged now.

@martygrant martygrant merged commit 12584c2 into intel:sycl May 29, 2025
24 checks passed
@aelovikov-intel
Copy link
Contributor

aelovikov-intel commented May 29, 2025

When I tried to do this, instantiating wasn't enough, I had to ensure the code isn't DCE'd out (which is why I ultimately kept it). Have you tried to actually run all our tests in preview mode?

Update: I see that handler::instantiateKernelOnHost(void *InstantiateKernelOnHostPtr) prevents DCE.

// This function is needed for host-side compilation to keep kernels
// instantitated. This is important for debuggers to be able to associate
// kernel code instructions with source code lines.
// NOTE: InstatiateKernelOnHost() should not be called.
void InstantiateKernelOnHost() override {
using IDBuilder = sycl::detail::Builder;
constexpr bool HasKernelHandlerArg =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just call "new" instantiation here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants