Skip to content

[SYCL] Implement the sycl_oneapi_raw_kernel_arg extension #14335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,21 @@ SYCL specification refer to that revision.

== Status

This is a proposed extension specification, intended to gather community
feedback. Interfaces defined in this specification may not be implemented yet
or may be in a preliminary state. The specification itself may also change in
incompatible ways before it is finalized. *Shipping software products should
not rely on APIs defined in this specification.*
This is an experimental extension specification, intended to provide early
access to features and gather community feedback. Interfaces defined in this
specification are implemented in {dpcpp}, but they are not finalized and may
change incompatibly in future versions of {dpcpp} without prior notice.
*Shipping software products should not rely on APIs defined in this
specification.*


== Backend support status

This extension is currently implemented in {dpcpp} only for GPU devices and
only when using the Level Zero backend. Attempting to use this extension in
kernels that run on other devices or backends may result in undefined
behavior. Be aware that the compiler is not able to issue a diagnostic to
warn you if this happens.


== Overview
Expand All @@ -71,7 +81,7 @@ char* opaque_type;
int nbytes;
...
h.set_arg(0, a);
h.set_arg(1, sycl::ext::oneapi::raw_kernel_arg(opaque_type, nbytes));
h.set_arg(1, sycl::ext::oneapi::experimental::raw_kernel_arg(opaque_type, nbytes));
h.parallel_for(range, kernel);
----

Expand All @@ -94,7 +104,8 @@ implementation supports.
|Description

|1
|Initial version of this extension.
|The APIs of this experimental extension are not versioned, so the
feature-test macro always has this value.
|===

=== The `raw_kernel_arg` class
Expand All @@ -104,26 +115,28 @@ kernel arguments via a raw byte representation.

[source,c++]
----
namespace sycl::ext::oneapi {
namespace sycl::ext::oneapi::experimental {

class raw_kernel_arg {
public:
raw_kernel_arg(void* bytes, size_t count);
raw_kernel_arg(const void* bytes, size_t count);
};

} // namespace sycl::ext::oneapi
} // namespace sycl::ext::oneapi::experimental
----

[source,c++]
----
raw_kernel_arg(void* bytes, size_t count);
raw_kernel_arg(const void* bytes, size_t count);
----
_Preconditions_: `bytes` must point to an array of at least `count` bytes,
which is the byte representation of a kernel argument that is trivially
copyable.

_Effects_: Constructs a `raw_kernel_arg` representing a view of the `count`
bytes starting at the address specified by `bytes`.
bytes starting at the address specified by `bytes`. Since the `raw_kernel_arg`
object is only a view, the caller must ensure that the lifetime of the `bytes`
memory lasts at least as long as the lifetime of the `raw_kernel_arg` object.

=== Using a raw kernel argument

Expand All @@ -138,7 +151,7 @@ argument in `args` was passed to `set_arg` ", adding a new overload of

[source,c++]
----
void set_arg(int argIndex, sycl::ext::oneapi::raw_kernel_arg&& arg);
void set_arg(int argIndex, sycl::ext::oneapi::experimental::raw_kernel_arg&& arg);
----
_Effects_: Sets the kernel argument associated with index `argIndex` using the
bytes represented by `arg`.
Expand Down
34 changes: 34 additions & 0 deletions sycl/include/sycl/ext/oneapi/experimental/raw_kernel_arg.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
//==--- raw_kernel_arg.hpp --- SYCL extension for raw kernel args ----------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

#include <stddef.h>

namespace sycl {
inline namespace _V1 {

class handler;

namespace ext::oneapi::experimental {

class raw_kernel_arg {
public:
raw_kernel_arg(const void *bytes, size_t count)
: MArgData(bytes), MArgSize(count) {}

private:
const void *MArgData;
size_t MArgSize;

friend class sycl::handler;
};

} // namespace ext::oneapi::experimental
} // namespace _V1
} // namespace sycl
22 changes: 22 additions & 0 deletions sycl/include/sycl/handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include <sycl/ext/oneapi/device_global/device_global.hpp>
#include <sycl/ext/oneapi/device_global/properties.hpp>
#include <sycl/ext/oneapi/experimental/graph.hpp>
#include <sycl/ext/oneapi/experimental/raw_kernel_arg.hpp>
#include <sycl/ext/oneapi/experimental/use_root_sync_prop.hpp>
#include <sycl/ext/oneapi/experimental/virtual_functions.hpp>
#include <sycl/ext/oneapi/kernel_properties/properties.hpp>
Expand Down Expand Up @@ -523,6 +524,14 @@ class __SYCL_EXPORT handler {
return Storage;
}

void *
storeRawArg(const sycl::ext::oneapi::experimental::raw_kernel_arg &RKA) {
CGData.MArgsStorage.emplace_back(RKA.MArgSize);
void *Storage = static_cast<void *>(CGData.MArgsStorage.back().data());
std::memcpy(Storage, RKA.MArgData, RKA.MArgSize);
return Storage;
}

void setType(detail::CG::CGTYPE Type) { MCGType = Type; }

detail::CG::CGTYPE getType() { return MCGType; }
Expand Down Expand Up @@ -758,6 +767,14 @@ class __SYCL_EXPORT handler {
registerDynamicParameter(DynamicParam, ArgIndex);
}

// setArgHelper for the raw_kernel_arg extension type.
void setArgHelper(int ArgIndex,
sycl::ext::oneapi::experimental::raw_kernel_arg &&Arg) {
auto StoredArg = storeRawArg(Arg);
MArgs.emplace_back(detail::kernel_param_kind_t::kind_std_layout, StoredArg,
Arg.MArgSize, ArgIndex);
}

/// Registers a dynamic parameter with the handler for later association with
/// the node being created
/// @param DynamicParamBase
Expand Down Expand Up @@ -2047,6 +2064,11 @@ class __SYCL_EXPORT handler {
setArgHelper(argIndex, dynamicParam);
}

// set_arg for the raw_kernel_arg extension type.
void set_arg(int argIndex, ext::oneapi::experimental::raw_kernel_arg &&Arg) {
setArgHelper(argIndex, std::move(Arg));
}

/// Sets arguments for OpenCL interoperability kernels.
///
/// Registers pack of arguments(Args) with indexes starting from 0.
Expand Down
1 change: 1 addition & 0 deletions sycl/include/sycl/sycl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
#include <sycl/ext/oneapi/experimental/opportunistic_group.hpp>
#include <sycl/ext/oneapi/experimental/prefetch.hpp>
#include <sycl/ext/oneapi/experimental/profiling_tag.hpp>
#include <sycl/ext/oneapi/experimental/raw_kernel_arg.hpp>
#include <sycl/ext/oneapi/experimental/root_group.hpp>
#include <sycl/ext/oneapi/experimental/tangle_group.hpp>
#include <sycl/ext/oneapi/filter_selector.hpp>
Expand Down
1 change: 1 addition & 0 deletions sycl/source/feature_test.hpp.in
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ inline namespace _V1 {
#define SYCL_EXT_ONEAPI_FREE_FUNCTION_KERNELS 1
#define SYCL_EXT_ONEAPI_PROD 1
#define SYCL_EXT_ONEAPI_ENQUEUE_FUNCTIONS 1
#define SYCL_EXT_ONEAPI_RAW_KERNEL_ARG 1

#ifndef __has_include
#define __has_include(x) 0
Expand Down
67 changes: 67 additions & 0 deletions sycl/test-e2e/RawKernelArg/arg_combinations.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// REQUIRES: aspect-usm_shared_allocations
// REQUIRES: ocloc && level_zero

// RUN: %{build} -o %t.out
// RUN: %{run} %t.out

// Tests raw_kernel_arg in different combinations.

#include <sycl/detail/core.hpp>
#include <sycl/usm.hpp>

constexpr size_t NumArgs = 4;

auto constexpr CLSource = R"===(
__kernel void Kernel(int in1, char in2, __global float *out, float in3) {
out[0] = (float)in1 + (float)in2 + in3;
}
)===";

template <typename T>
void SetArg(sycl::handler &CGH, T &&Arg, size_t Index, size_t Iteration) {
// Pick how we set the arg based on the bit at Index in Iteration.
if (Iteration & (1 << Index))
CGH.set_arg(Index, sycl::ext::oneapi::experimental::raw_kernel_arg(
&Arg, sizeof(T)));
else
CGH.set_arg(Index, Arg);
}

int main() {
sycl::queue Q;

auto SourceKB =
sycl::ext::oneapi::experimental::create_kernel_bundle_from_source(
Q.get_context(),
sycl::ext::oneapi::experimental::source_language::opencl, CLSource);
auto ExecKB = sycl::ext::oneapi::experimental::build(SourceKB);

int Failed = 0;

float *Out = sycl::malloc_shared<float>(1, Q);
int32_t IntVal = 42;
char CharVal = 100;
float FloatVal = 1.23;

float Expected =
static_cast<float>(IntVal) + static_cast<float>(CharVal) + FloatVal;
for (size_t I = 0; I < (2 >> (NumArgs - 1)); ++I) {
Out[0] = 0.0f;
Q.submit([&](sycl::handler &CGH) {
SetArg(CGH, IntVal, 0, I);
SetArg(CGH, CharVal, 1, I);
SetArg(CGH, Out, 2, I);
SetArg(CGH, FloatVal, 3, I);
CGH.single_task(ExecKB.ext_oneapi_get_kernel("Kernel"));
}).wait();

if (Out[0] != Expected) {
std::cout << "Failed for iteration " << I << ": " << Out[0]
<< " != " << Expected << std::endl;
++Failed;
}
}

sycl::free(Out, Q);
return Failed;
}
63 changes: 63 additions & 0 deletions sycl/test-e2e/RawKernelArg/diff_size.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
// REQUIRES: aspect-usm_shared_allocations
// REQUIRES: ocloc && level_zero

// RUN: %{build} -o %t.out
// RUN: %{run} %t.out

// Tests raw_kernel_arg with pointers and scalars to different types with
// different sizes.

#include <sycl/detail/core.hpp>
#include <sycl/usm.hpp>

auto constexpr CLSource = R"===(
__kernel void Kernel1(int in, __global int *out) {
out[0] = in;
}

__kernel void Kernel2(short in, __global short *out) {
out[0] = in;
}
)===";

int main() {
sycl::queue Q;

auto SourceKB =
sycl::ext::oneapi::experimental::create_kernel_bundle_from_source(
Q.get_context(),
sycl::ext::oneapi::experimental::source_language::opencl, CLSource);
auto ExecKB = sycl::ext::oneapi::experimental::build(SourceKB);

int32_t *IntOut = sycl::malloc_shared<int32_t>(1, Q);
int16_t *ShortOut = sycl::malloc_shared<int16_t>(1, Q);
int32_t IntVal = 42;
int16_t ShortVal = 24;

for (size_t I = 0; I < 2; ++I) {
std::string KernelName = I == 0 ? "Kernel1" : "Kernel2";
Q.submit([&](sycl::handler &CGH) {
sycl::ext::oneapi::experimental::raw_kernel_arg KernelArg0 =
I == 0 ? sycl::ext::oneapi::experimental::raw_kernel_arg(
&IntVal, sizeof(int32_t))
: sycl::ext::oneapi::experimental::raw_kernel_arg(
&ShortVal, sizeof(int16_t));
sycl::ext::oneapi::experimental::raw_kernel_arg KernelArg1 =
I == 0 ? sycl::ext::oneapi::experimental::raw_kernel_arg(
&IntOut, sizeof(int32_t *))
: sycl::ext::oneapi::experimental::raw_kernel_arg(
&ShortOut, sizeof(int16_t *));

CGH.set_arg(0, KernelArg0);
CGH.set_arg(1, KernelArg1);
CGH.single_task(ExecKB.ext_oneapi_get_kernel(KernelName));
}).wait();
}

assert(IntOut[0] == IntVal);
assert(ShortOut[0] == ShortVal);

sycl::free(IntOut, Q);
sycl::free(ShortOut, Q);
return 0;
}
62 changes: 62 additions & 0 deletions sycl/test-e2e/RawKernelArg/same_size_pointer.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
// REQUIRES: aspect-usm_shared_allocations
// REQUIRES: ocloc && level_zero

// RUN: %{build} -o %t.out
// RUN: %{run} %t.out

// Tests raw_kernel_arg with pointers and scalars to different 32-bit types.

#include <sycl/detail/core.hpp>
#include <sycl/usm.hpp>

auto constexpr CLSource = R"===(
__kernel void Kernel1(int in, __global int *out) {
out[0] = in;
}

__kernel void Kernel2(float in, __global float *out) {
out[0] = in;
}
)===";

int main() {
sycl::queue Q;

auto SourceKB =
sycl::ext::oneapi::experimental::create_kernel_bundle_from_source(
Q.get_context(),
sycl::ext::oneapi::experimental::source_language::opencl, CLSource);
auto ExecKB = sycl::ext::oneapi::experimental::build(SourceKB);

int32_t *IntOut = sycl::malloc_shared<int32_t>(1, Q);
float *FloatOut = sycl::malloc_shared<float>(1, Q);
int32_t IntVal = 42;
float FloatVal = 3.12f;

for (size_t I = 0; I < 2; ++I) {
std::string KernelName = I == 0 ? "Kernel1" : "Kernel2";
Q.submit([&](sycl::handler &CGH) {
sycl::ext::oneapi::experimental::raw_kernel_arg KernelArg0 =
I == 0 ? sycl::ext::oneapi::experimental::raw_kernel_arg(
&IntVal, sizeof(int32_t))
: sycl::ext::oneapi::experimental::raw_kernel_arg(
&FloatVal, sizeof(float));
sycl::ext::oneapi::experimental::raw_kernel_arg KernelArg1 =
I == 0 ? sycl::ext::oneapi::experimental::raw_kernel_arg(
&IntOut, sizeof(int32_t *))
: sycl::ext::oneapi::experimental::raw_kernel_arg(
&FloatOut, sizeof(float *));

CGH.set_arg(0, KernelArg0);
CGH.set_arg(1, KernelArg1);
CGH.single_task(ExecKB.ext_oneapi_get_kernel(KernelName));
}).wait();
}

assert(IntOut[0] == IntVal);
assert(FloatOut[0] == FloatVal);

sycl::free(IntOut, Q);
sycl::free(FloatOut, Q);
return 0;
}
Loading
Loading