Skip to content

Commit 2c27477

Browse files
authored
[SYCL] Kernel property to control Cache/SLM size on GPU (#8597)
When L1 cache & SLM share same physical memory module, developers may want more L1 or SLM based on their application. So, the purpose of the introduced kernel property is to give developers flexibility to tune the division. E2E test: intel/llvm-test-suite#1687
1 parent 0c212e3 commit 2c27477

File tree

22 files changed

+465
-46
lines changed

22 files changed

+465
-46
lines changed

clang/lib/Sema/SemaDeclAttr.cpp

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8025,10 +8025,12 @@ void Sema::CheckSYCLAddIRAttributesFunctionAttrConflicts(Decl *D) {
80258025
// "sycl-single-task" is present on all single_task invocations, implicitly
80268026
// added by the SYCL headers. It can only conflict with max_global_work_dim,
80278027
// but the value will be the same so there is no need for a warning.
8028-
if (NumArgsWithoutFilter == 2 &&
8029-
AddIRFuncAttr->getAttributeNameValuePairs(Context)[0].first ==
8030-
"sycl-single-task")
8031-
return;
8028+
if (NumArgsWithoutFilter == 2) {
8029+
auto NameValuePairs = AddIRFuncAttr->getAttributeNameValuePairs(Context);
8030+
if (NameValuePairs.size() > 0 &&
8031+
NameValuePairs[0].first == "sycl-single-task")
8032+
return;
8033+
}
80328034

80338035
// If there are potentially conflicting attributes, we issue a warning.
80348036
for (const auto *Attr : std::vector<AttributeCommonInfo *>{
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
= sycl_ext_intel_cache_config
2+
:source-highlighter: coderay
3+
:coderay-linenums-mode: table
4+
5+
// This section needs to be after the document title.
6+
:doctype: book
7+
:toc2:
8+
:toc: left
9+
:encoding: utf-8
10+
:lang: en
11+
:dpcpp: pass:[DPC++]
12+
13+
:blank: pass:[ +]
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
== Notice
21+
22+
[%hardbreaks]
23+
Copyright (c) 2023-2023 Intel Corporation. All rights reserved.
24+
25+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
26+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
27+
permission by Khronos.
28+
29+
== Contact
30+
31+
To report problems with this extension, please open a new issue at:
32+
https://github.com/intel/llvm/issues
33+
34+
== Contributors
35+
36+
Greg Lueck, Intel +
37+
John Pennycook, Intel +
38+
Artur Gainullin, Intel
39+
40+
== Dependencies
41+
42+
This extension is written against the SYCL 2020 specification, Revision 6 and
43+
the following extensions:
44+
45+
- link:../experimental/sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties]
46+
- link:sycl_ext_oneapi_kernel_properties.asciidoc[sycl_ext_oneapi_kernel_properties]
47+
48+
== Status
49+
50+
This is an experimental extension specification, intended to provide early
51+
access to features and gather community feedback. Interfaces defined in this
52+
specification are implemented in {dpcpp}, but they are not finalized and may
53+
change incompatibly in future versions of {dpcpp} without prior notice.
54+
*Shipping software products should not rely on APIs defined in this
55+
specification.*
56+
57+
== Overview
58+
59+
There are devices where the same hardware resources are used for shared local
60+
memory (SLM) and L1 data cache. Developers may want more L1 data cache or SLM based
61+
on their application. This extension adds runtime kernel property `cache_config`
62+
which provides a way to set the preferred cache configuration for a kernel.
63+
64+
=== Feature Test Macro
65+
66+
This extension provides a feature-test macro as described in the core SYCL
67+
specification section 6.3.3 "Feature test macros". Therefore, an
68+
implementation supporting this extension must predefine the macro
69+
`SYCL_EXT_INTEL_CACHE_CONFIG` to one of the values defined
70+
in the table below. Applications can test for the existence of this macro to
71+
determine if the implementation supports this feature, or applications can test
72+
the macro's value to determine which of the extension's APIs the implementation
73+
supports.
74+
75+
[%header,cols="1,5"]
76+
|===
77+
|Value |Description
78+
|1 |Initial extension version. Base features are supported.
79+
|===
80+
81+
=== Introduction
82+
83+
This extension introduces new kernel property that can be applied to kernels
84+
using the mechanism defined in link:sycl_ext_oneapi_kernel_properties.asciidoc[sycl_ext_oneapi_kernel_properties].
85+
86+
=== Cache Config Property
87+
88+
```c++
89+
namespace sycl::ext::intel::experimental {
90+
91+
92+
enum class cache_config_enum : std::uint16_t { large_slm, large_data };
93+
94+
inline constexpr cache_config_enum large_slm =
95+
cache_config_enum::large_slm;
96+
inline constexpr cache_config_enum large_data =
97+
cache_config_enum::large_data;
98+
99+
struct cache_config {
100+
cache_config(cache_config_enum v) : value(v) {}
101+
cache_config_enum value;
102+
};
103+
104+
} // namespace sycl::ext::intel::experimental
105+
```
106+
107+
The `cache_config` property provides a way to set the preferred cache
108+
configuration for a kernel. The following values are supported:
109+
110+
* `large_slm`: Prefer having larger shared local memory and smaller L1 data cache.
111+
In this case driver will ensure that all workgroups will have enough
112+
SLM to run.
113+
114+
* `large_data`: Prefer having larger L1 data cache and smaller shared local memory.
115+
In this case SLM size may be shrinked (which may result in workgroups
116+
spawning as there will be not enough SLM to handle multiple workgroups)
117+
but L1 data cache will be bigger. There may be rare use cases when this
118+
is beneficial.
119+
120+
These property may be passed to any kernel invocation function (e.g.
121+
`parallel_for`) via the properties parameter. At most, only one of these
122+
values may be passed to any single kernel invocation function.
123+
124+
Backends that do not support this extension may accept and ignore this
125+
property.
126+
127+
=== Adding a Property List to a Kernel Launch
128+
129+
A simple example of using this extension is shown below.
130+
131+
The example assumes that the kernel will benefit from large SLM and hence uses the property
132+
`cache_config_large_slm`:
133+
134+
```c++
135+
using namespace sycl::ext::intel::experimental;
136+
{
137+
...
138+
properties kernel_properties{cache_config{large_slm}};
139+
140+
q.single_task(kernel_properties, [=] {
141+
*a = *b + *c;
142+
}).wait();
143+
}
144+
```
145+
146+
=== Embedding Property into a Kernel
147+
148+
The example below shows how the kernel from the previous section could be
149+
rewritten to leverage an embedded property list (see link:sycl_ext_oneapi_kernel_properties.asciidoc#embedding-properties-into-a-kernel[embedding-properties-into-a-kernel]):
150+
151+
```c++
152+
using namespace sycl::ext::intel::experimental;
153+
struct KernelFunctor {
154+
155+
KernelFunctor(int* a, int* b, int* c) : a(a), b(b), c(c) {}
156+
157+
void operator()() const {
158+
*a = *b + *c;
159+
}
160+
161+
auto get(properties_tag) const {
162+
return properties{cache_config{large_slm}};
163+
}
164+
165+
int* a;
166+
int* b;
167+
int* c;
168+
};
169+
170+
...
171+
172+
q.single_task(KernelFunctor{a, b, c}).wait();
173+
```
174+
175+
== Revision History
176+
177+
[cols="5,15,15,70"]
178+
[grid="rows"]
179+
[options="header"]
180+
|========================================
181+
|Rev|Date|Author|Changes
182+
|1|2022-03-01|Artur Gainullin|*Initial public working draft*
183+
|========================================

sycl/include/sycl/detail/cg.hpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ class CGExecKernel : public CG {
144144
detail::OSModuleHandle MOSModuleHandle;
145145
std::vector<std::shared_ptr<detail::stream_impl>> MStreams;
146146
std::vector<std::shared_ptr<const void>> MAuxiliaryResources;
147+
RT::PiKernelCacheConfig MKernelCacheConfig;
147148

148149
CGExecKernel(NDRDescT NDRDesc, std::unique_ptr<HostKernelBase> HKernel,
149150
std::shared_ptr<detail::kernel_impl> SyclKernel,
@@ -157,7 +158,8 @@ class CGExecKernel : public CG {
157158
detail::OSModuleHandle OSModuleHandle,
158159
std::vector<std::shared_ptr<detail::stream_impl>> Streams,
159160
std::vector<std::shared_ptr<const void>> AuxiliaryResources,
160-
CGTYPE Type, detail::code_location loc = {})
161+
CGTYPE Type, RT::PiKernelCacheConfig KernelCacheConfig,
162+
detail::code_location loc = {})
161163
: CG(Type, std::move(ArgsStorage), std::move(AccStorage),
162164
std::move(SharedPtrStorage), std::move(Requirements),
163165
std::move(Events), std::move(loc)),
@@ -166,7 +168,8 @@ class CGExecKernel : public CG {
166168
MKernelBundle(std::move(KernelBundle)), MArgs(std::move(Args)),
167169
MKernelName(std::move(KernelName)), MOSModuleHandle(OSModuleHandle),
168170
MStreams(std::move(Streams)),
169-
MAuxiliaryResources(std::move(AuxiliaryResources)) {
171+
MAuxiliaryResources(std::move(AuxiliaryResources)),
172+
MKernelCacheConfig(std::move(KernelCacheConfig)) {
170173
assert((getType() == RunOnHostIntel || getType() == Kernel) &&
171174
"Wrong type of exec kernel CG.");
172175
}

sycl/include/sycl/detail/pi.h

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,12 @@
7777
// 12.22 Add piGetDeviceAndHostTimer to query device wall-clock timestamp
7878
// 12.23 Added new piextEnqueueDeviceGlobalVariableWrite and
7979
// piextEnqueueDeviceGlobalVariableRead functions.
80+
// 12.24 Added new PI_EXT_KERNEL_EXEC_INFO_CACHE_CONFIG property to the
81+
// _pi_kernel_exec_info. Defined _pi_kernel_cache_config enum with values of
82+
// the new PI_EXT_KERNEL_EXEC_INFO_CACHE_CONFIG property.
8083

8184
#define _PI_H_VERSION_MAJOR 12
82-
#define _PI_H_VERSION_MINOR 23
85+
#define _PI_H_VERSION_MINOR 24
8386

8487
#define _PI_STRING_HELPER(a) #a
8588
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
@@ -621,6 +624,15 @@ constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_PRIORITY_LOW = (1 << 5);
621624
constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_PRIORITY_HIGH = (1 << 6);
622625
// clang-format on
623626

627+
typedef enum {
628+
// No preference for SLM or data cache.
629+
PI_EXT_KERNEL_EXEC_INFO_CACHE_DEFAULT = 0x0,
630+
// Large SLM size.
631+
PI_EXT_KERNEL_EXEC_INFO_CACHE_LARGE_SLM = 0x1,
632+
// Large General Data size.
633+
PI_EXT_KERNEL_EXEC_INFO_CACHE_LARGE_DATA = 0x2
634+
} _pi_kernel_cache_config;
635+
624636
using pi_result = _pi_result;
625637
using pi_platform_info = _pi_platform_info;
626638
using pi_device_type = _pi_device_type;
@@ -650,6 +662,7 @@ using pi_program_build_status = _pi_program_build_status;
650662
using pi_program_binary_type = _pi_program_binary_type;
651663
using pi_kernel_info = _pi_kernel_info;
652664
using pi_profiling_info = _pi_profiling_info;
665+
using pi_kernel_cache_config = _pi_kernel_cache_config;
653666

654667
// For compatibility with OpenCL define this not as enum.
655668
using pi_device_partition_property = intptr_t;
@@ -1357,7 +1370,9 @@ typedef enum {
13571370
/// indicates that the kernel might access data through USM ptrs
13581371
PI_USM_INDIRECT_ACCESS,
13591372
/// provides an explicit list of pointers that the kernel will access
1360-
PI_USM_PTRS = 0x4203
1373+
PI_USM_PTRS = 0x4203,
1374+
/// provides the preferred cache configuration (large slm or large data)
1375+
PI_EXT_KERNEL_EXEC_INFO_CACHE_CONFIG = 0x4204
13611376
} _pi_kernel_exec_info;
13621377

13631378
using pi_kernel_exec_info = _pi_kernel_exec_info;

sycl/include/sycl/detail/pi.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ using PiMemImageInfo = ::pi_image_info;
145145
using PiMemObjectType = ::pi_mem_type;
146146
using PiMemImageChannelOrder = ::pi_image_channel_order;
147147
using PiMemImageChannelType = ::pi_image_channel_type;
148+
using PiKernelCacheConfig = ::pi_kernel_cache_config;
148149

149150
__SYCL_EXPORT void contextSetExtendedDeleter(const sycl::context &constext,
150151
pi_context_extended_deleter func,
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
//==----- gpu_kernel_properties.hpp - Kernel properties for Intel GPUs ---==//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===--------------------------------------------------------------------===//
8+
9+
#pragma once
10+
11+
#include <sycl/ext/oneapi/properties/property.hpp>
12+
13+
namespace sycl {
14+
__SYCL_INLINE_VER_NAMESPACE(_V1) {
15+
namespace ext::intel::experimental {
16+
17+
template <typename T, typename PropertyListT> class gpu_kernel_attribute;
18+
19+
enum class cache_config_enum : std::uint16_t { large_slm, large_data };
20+
21+
inline constexpr cache_config_enum large_slm =
22+
cache_config_enum::large_slm;
23+
inline constexpr cache_config_enum large_data =
24+
cache_config_enum::large_data;
25+
26+
struct cache_config {
27+
cache_config(cache_config_enum v) : value(v) {}
28+
cache_config_enum value;
29+
};
30+
31+
using cache_config_key = cache_config;
32+
33+
inline bool operator==(const cache_config &lhs,
34+
const cache_config &rhs) {
35+
return lhs.value == rhs.value;
36+
}
37+
inline bool operator!=(const cache_config &lhs,
38+
const cache_config &rhs) {
39+
return !(lhs == rhs);
40+
}
41+
42+
} // namespace ext::intel::experimental
43+
44+
namespace ext::oneapi::experimental {
45+
template <>
46+
struct is_property_key<intel::experimental::cache_config_key>
47+
: std::true_type {};
48+
49+
template <typename T, typename PropertyListT>
50+
struct is_property_key_of<
51+
intel::experimental::cache_config_key,
52+
intel::experimental::gpu_kernel_attribute<T, PropertyListT>>
53+
: std::true_type {};
54+
55+
namespace detail {
56+
template <> struct PropertyToKind<intel::experimental::cache_config_key> {
57+
static constexpr PropKind Kind = PropKind::CacheConfig;
58+
};
59+
60+
template <>
61+
struct IsRuntimeProperty<intel::experimental::cache_config_key>
62+
: std::true_type {};
63+
64+
} // namespace detail
65+
} // namespace ext::oneapi::experimental
66+
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
67+
} // namespace sycl

sycl/include/sycl/ext/oneapi/properties/property.hpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,9 @@ enum PropKind : uint32_t {
187187
MaxBurst = 21,
188188
WaitRequest = 22,
189189
Alignment = 23,
190+
CacheConfig = 24,
190191
// PropKindSize must always be the last value.
191-
PropKindSize = 24,
192+
PropKindSize = 25,
192193
};
193194

194195
// This trait must be specialized for all properties and must have a unique

0 commit comments

Comments
 (0)