Skip to content

[SYCL] Add code examples for all SYCL Function Attributes #3107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 28, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
192 changes: 185 additions & 7 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -2219,6 +2219,15 @@ program is ill-formed and no diagnostic is required.

The ``intel::kernel_args_restrict`` attribute has an effect when applied to a
function, and no effect otherwise.

.. code-block:: c++

[[intel::kernel_args_restrict]] void func() {}

struct bar {
[[intel::kernel_args_restrict]] void operator()() const {}
};

}];
}

Expand All @@ -2230,6 +2239,25 @@ Applies to a device function/lambda function. Indicates the number of work
items that should be processed in parallel. Valid values are positive integers.
If ``intel::num_simd_work_items`` is applied to a function called from a
device kernel, the attribute is not ignored and it is propagated to the kernel.

.. code-block:: c++

[[intel::num_simd_work_items(4)]] void foo() {}

template<int N>
[[intel::num_simd_work_items(N)]] void bar() {}

class Foo {
public:
[[intel::num_simd_work_items(6)]] void operator()() const {}
};

template <int N>
class Functor {
public:
[[intel::num_simd_work_items(N)]] void operator()() const {}
};

}];
}

Expand Down Expand Up @@ -2285,14 +2313,64 @@ those device functions, such that the kernel attributes are the sum of all
attributes of all device functions called in this kernel.
See section 6.7 Attributes for more details.

As Intel extension, ``[[intel::reqd_work_group_size(X, Y, Z)]]`` spelling is allowed
which features optional arguments `Y` and `Z`, those simplifies its usage if
only 1- or 2-dimensional ND-range is assumed by a programmer. These arguments
defaults to ``1``.
.. code-block:: c++

[[cl::reqd_work_group_size(4, 4, 4)]] void foo() {}

class Foo {
public:
[[cl::reqd_work_group_size(2, 2, 2)]] void operator()() const {}
};

template <int N, int N1, int N2>
class Functor {
public:
[[cl::reqd_work_group_size(N, N1, N2)]] void operator()() const {}
};

template <int N, int N1, int N2>
[[cl::reqd_work_group_size(N, N1, N2)]] void func() {}

As an Intel extension, the ``[[intel::reqd_work_group_size(X, Y, Z)]]``
spelling is supported. This spelling allows the Y and Z arguments to be
optional. If not provided by the user, the value of Y and Z defaults to 1.
This simplifies usage of the attribute when a 1- or 2-dimensional ND-range
is assumed.

.. code-block:: c++

[[intel::reqd_work_group_size(5)]]
// identical to [[intel::reqd_work_group_size(5, 1, 1)]]
void quux() {}

[[intel::reqd_work_group_size(5, 5)]]
// identical to [[intel::reqd_work_group_size(5, 5, 1)]]
void qux() {}

[[intel::reqd_work_group_size(4, 4, 4)]] void foo() {}

class Foo {
public:
[[intel::reqd_work_group_size(2, 2, 2)]] void operator()() const {}
};

template <int X, int Y, int Z>
class Functor {
public:
[[intel::reqd_work_group_size(X, Y, Z)]] void operator()() const {}
};

template <int X, int Y, int Z>
[[intel::reqd_work_group_size(X, Y, Z)]] void func() {}

In OpenCL C, this attribute is available in GNU spelling
(``__attribute__((reqd_work_group_size(X, Y, Z)))``), see section
6.7.2 Optional Attribute Qualifiers of OpenCL 1.2 specification for details.

.. code-block:: c++

__kernel __attribute__((reqd_work_group_size(8, 16, 32))) void test() {}

}];
}

Expand All @@ -2306,6 +2384,25 @@ reqd_work_group_size, but allows work groups that are smaller or equal to the
specified sizes.
If ``intel::max_work_group_size`` is applied to a function called from a
device kernel, the attribute is not ignored and it is propagated to the kernel.

.. code-block:: c++

[[intel::max_work_group_size(4, 4, 4)]] void foo() {}

class Foo {
public:
[[intel::max_work_group_size(2, 2, 2)]] void operator()() const {}
};

template <int N, int N1, int N2>
class Functor {
public:
[[intel::max_work_group_size(N, N1, N2)]] void operator()() const {}
};

template <int N, int N1, int N2>
[[intel::max_work_group_size(N, N1, N2)]] void func() {}

}];
}

Expand All @@ -2316,12 +2413,42 @@ def SYCLIntelMaxGlobalWorkDimAttrDocs : Documentation {
Applies to a device function/lambda function or function call operator (of a
function object). Indicates the largest valid global work dimension that will be
accepted when running the kernel on a device. Valid values are integers in a
range of [0, 3]. A kernel with max_global_work_dim(0) must be invoked with a
range of [0, 3].
If ``intel::max_global_work_dim`` is applied to a function called from a
device kernel, the attribute is not ignored and it is propagated to the kernel.

.. code-block:: c++

[[intel::max_global_work_dim(1)]] void foo() {}

template<int N>
[[intel::max_global_work_dim(N)]] void bar() {}

class Foo {
public:
[[intel::max_global_work_dim(1)]] void operator()() const {}
};

template <int N>
class Functor {
public:
[[intel::max_global_work_dim(N)]] void operator()() const {}
};

A kernel with ``intel::max_global_work_dim(0)`` must be invoked with a
'single_task' and if ``intel::max_work_group_size`` or
``cl::reqd_work_group_size`` are applied to the kernel as well - they shall
have arguments of (1, 1, 1).
If ``intel::max_global_work_dim`` is applied to a function called from a
device kernel, the attribute is not ignored and it is propagated to the kernel.

.. code-block:: c++

struct TRIFuncObjGood {
[[intel::max_global_work_dim(0)]]
[[intel::max_work_group_size(1, 1, 1)]]
[[cl::reqd_work_group_size(1, 1, 1)]]
void operator()() const {}
};

}];
}

Expand All @@ -2344,6 +2471,25 @@ This attribute enables communication of the desired maximum frequency of the
device operation, guiding the FPGA backend to insert the appropriate number of
registers to break-up the combinational logic circuit, and thereby controlling
the length of the longest combinational path.

.. code-block:: c++

[[intel::scheduler_target_fmax_mhz(4)]] void foo() {}

template<int N>
[[intel::scheduler_target_fmax_mhz(N)]] void bar() {}

class Foo {
public:
[[intel::scheduler_target_fmax_mhz(6)]] void operator()() const {}
};

template <int N>
class Functor {
public:
[[intel::scheduler_target_fmax_mhz(N)]] void operator()() const {}
};

}];
}

Expand All @@ -2355,6 +2501,29 @@ Applies to a device function/lambda function or function call operator (of a
function object). If 1, compiler doesn't use the global work offset values for
the device function. Valid values are 0 and 1. If used without argument, value
of 1 is set implicitly.

.. code-block:: c++

[[intel::no_global_work_offset]]
// identical to [[intel::no_global_work_offset(1)]]
void quux() {}

[[intel::no_global_work_offset(0)]] void foo() {}

class Foo {
public:
[[intel::no_global_work_offset(1)]] void operator()() const {}
};

template <int N>
class Functor {
public:
[[intel::no_global_work_offset(N)]] void operator()() const {}
};

template <int N>
[[intel::no_global_work_offset(N)]] void func() {}

}];
}

Expand Down Expand Up @@ -2645,6 +2814,15 @@ optimization.
This attribute allows to pass name and address of the function to a special
``cl::sycl::intel::get_device_func_ptr`` API call which extracts the device
function pointer for the specified function.

.. code-block:: c++

[[intel::device_indirectly_callable]] int func3() {}

class A {
[[intel::device_indirectly_callable]] A() {}
};

}];
}

Expand Down