Skip to content

Commit 9fbf6b2

Browse files
authored
[SYCL][Doc] Add sycl_ext_oneapi_cache_size draft (#14837)
Adds an extension for querying the availability and size of different levels of cache within a device. --------- Signed-off-by: John Pennycook <[email protected]>
1 parent f04c79b commit 9fbf6b2

File tree

1 file changed

+162
-0
lines changed

1 file changed

+162
-0
lines changed
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
= sycl_ext_oneapi_cache_size
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
:endnote: &#8212;{nbsp}end{nbsp}note
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
21+
== Notice
22+
23+
[%hardbreaks]
24+
Copyright (C) 2024 Intel Corporation. All rights reserved.
25+
26+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
27+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
28+
permission by Khronos.
29+
30+
31+
== Contact
32+
33+
To report problems with this extension, please open a new issue at:
34+
35+
https://github.com/intel/llvm/issues
36+
37+
38+
== Dependencies
39+
40+
This extension is written against the SYCL 2020 revision 8 specification. All
41+
references below to the "core SYCL specification" or to section numbers in the
42+
SYCL specification refer to that revision.
43+
44+
45+
== Status
46+
47+
This is a proposed extension specification, intended to gather community
48+
feedback. Interfaces defined in this specification may not be implemented yet
49+
or may be in a preliminary state. The specification itself may also change in
50+
incompatible ways before it is finalized. *Shipping software products should
51+
not rely on APIs defined in this specification.*
52+
53+
54+
== Overview
55+
56+
SYCL 2020's device partitioning functions acknowledge that devices will
57+
typically have multiple levels of cache (L1, L2, L3 and L4) but its device
58+
queries only allow developers to request information about one (unnamed) level
59+
of cache.
60+
61+
This extension proposes a mechanism to query the availability and size of
62+
specific levels of cache on individual devices, to help developers with
63+
performance tuning and writing other cache-aware operations.
64+
65+
66+
== Specification
67+
68+
=== Feature test macro
69+
70+
This extension provides a feature-test macro as described in the core SYCL
71+
specification. An implementation supporting this extension must predefine the
72+
macro `SYCL_EXT_ONEAPI_CACHE_SIZES` to one of the values defined in the table
73+
below. Applications can test for the existence of this macro to determine if
74+
the implementation supports this feature, or applications can test the macro's
75+
value to determine which of the extension's features the implementation
76+
supports.
77+
78+
79+
[%header,cols="1,5"]
80+
|===
81+
|Value
82+
|Description
83+
84+
|1
85+
|The APIs of this experimental extension are not versioned, so the
86+
feature-test macro always has this value.
87+
|===
88+
89+
90+
=== Cache Levels
91+
92+
A new `enum` is added to describe the four levels of cache:
93+
94+
[source,c++]
95+
----
96+
namespace sycl::ext::oneapi::experimental {
97+
enum class cache_level : /* unspecified */
98+
{
99+
L1 = 1,
100+
L2 = 2,
101+
L3 = 3,
102+
L4 = 4,
103+
};
104+
} // namespace sycl::ext::oneapi::experimental
105+
----
106+
107+
108+
=== Device Queries
109+
110+
[source,c++]
111+
----
112+
namespace sycl::ext::oneapi::experimental::info::device {
113+
template <cache_level CacheLevel>
114+
struct cache_size {
115+
using return_type = size_t;
116+
};
117+
} // namespace sycl::ext::oneapi::experimental::info::device
118+
----
119+
120+
_Remarks_: Template parameter to `device::get_info`.
121+
122+
_Returns_: The size in bytes of the cache at the requested `cache_level` for
123+
this device, or 0 if this level of cache does not exist on this device.
124+
125+
The set of cache levels for which a device returns a non-zero value is not
126+
required to be continuous (e.g., a device may report an L1 and L3 cache without
127+
reporting an L2 cache).
128+
129+
[_Note:_ Although this may seem an unusual choice, there are several real-life
130+
devices that name their cache levels such that there are gaps. This extension
131+
allows for this behavior to minimize the cognitive burden to developers of
132+
shifting between the naming of cache levels in hardware specification sheets
133+
and in SYCL. _{endnote}_]
134+
135+
136+
== Implementation notes
137+
138+
This non-normative section provides information about one possible
139+
implementation of this extension. It is not part of the specification of the
140+
extension's API.
141+
142+
CUDA exposes an `l2CacheSize` property via the `cudaDeviceProp` struct, which
143+
could be used to implement the size query for `cache_level::L2`. Other sizes
144+
could be derived from the Compute Capability.
145+
146+
147+
== Issues
148+
149+
. Should devices be able to signal an "unknown"/"unsupported" cache size?
150+
+
151+
--
152+
*UNRESOLVED*:
153+
There are many mechanisms that could be used to signal that an implementation
154+
simply does not know anything about a specific level of cache (e.g.,
155+
an exception, a special return value, an orthogonal query). However, requiring
156+
implementations to determine and return an accurate size would make the query
157+
significantly easier for developers to use.
158+
159+
We should revisit this issue once we have implementation experience across
160+
multiple backends, which should give us a better idea of how hard it is to
161+
return accurate cache sizes in practice.
162+
--

0 commit comments

Comments
 (0)