-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Doc] Add initial draft of sycl load_store proposal #5655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,341 @@ | ||
= sycl_ext_oneapi_load_store | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2022-2022 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of | ||
The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 4 specification. All | ||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
This extension also depends on the following other SYCL extensions: | ||
|
||
* link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties] | ||
* link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_uniform.asciidoc[sycl_ext_oneapi_uniform] | ||
|
||
== Status | ||
|
||
This is a proposed extension specification, intended to gather community | ||
feedback. Interfaces defined in this specification may not be implemented yet | ||
or may be in a preliminary state. The specification itself may also change in | ||
incompatible ways before it is finalized. *Shipping software products should | ||
not rely on APIs defined in this specification.* | ||
|
||
== Overview | ||
|
||
Many architectures allow for sophisticated controls to be placed on how memory | ||
operations are executed, but these vary in form and execution. This extension | ||
adds high-level abstractions for expressing semantic hints. | ||
|
||
This extension consists of a family of free functions for loading and storing | ||
data; these functions support hints that are passed through property list | ||
arguments. This extension also proposes a set of such hints for describing | ||
temporal behavior. | ||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_ONEAPI_LOAD_STORE` to one of the values defined in the table | ||
below. Applications can test for the existence of this macro to determine if | ||
the implementation supports this feature, or applications can test the macro's | ||
value to determine which of the extension's features the implementation | ||
supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|Initial version of this extension. | ||
|=== | ||
|
||
== Load and store functions | ||
|
||
These are fine-grained functions that accept property lists and apply them to | ||
the granularity of a single logical memory transaction. | ||
|
||
=== Work-item granularity | ||
|
||
The following functions operate on a per work-item basis. | ||
|
||
```c++ | ||
namespace sycl { | ||
namespace ext { | ||
namespace oneapi { | ||
namespace experimental { | ||
|
||
template <typename T, typename Props> | ||
T load(const T *addr, Props p); // 1 | ||
|
||
template <typename T> | ||
T load(const T *addr); // 1a | ||
|
||
template <typename T, typename Props> | ||
void store(T *addr, const T &value, Props p); // 2 | ||
|
||
template <typename T> | ||
void store(T *addr, T &value); // 2a | ||
|
||
} // namespace experimental | ||
} // namespace ext | ||
} // namespace oneapi | ||
} // namespace sycl | ||
``` | ||
|
||
1:: Load and return the object of type `T` at `addr` with the hints in property | ||
list `p`. Each work-item recieves a copy of the loaded object. | ||
1a:: Special case of 1 with no property list. | ||
2:: Store `value` at `addr` with the hints in property list `p`. | ||
2a:: Special case of 2 with no property list. | ||
|
||
=== Joint (cooperative) group granularity | ||
|
||
The following functions apply to the passed `Group g`; the group cooperates to | ||
perform the operation on uniform arguments. These are cooperative group | ||
functions, so they have the same restrictions and behaviors described in | ||
Sec. 4.17.3 "Group functions" of the core SYCL specification. | ||
|
||
```c++ | ||
namespace sycl { | ||
namespace ext { | ||
namespace oneapi { | ||
namespace experimental { | ||
|
||
template <typename Group, typename T, typename Props> | ||
T joint_load(Group g, const T *addr, Props p); // 1 | ||
|
||
template <typename Group, typename T> | ||
T joint_load(Group g, const T *addr); // 1a | ||
|
||
// Available only when Group == sub_group | ||
template <typename Group, typename T, typename Props> | ||
uniform<T> joint_load(Group g, const T *addr, Props p); // 1b | ||
|
||
// Available only when Group == sub_group | ||
template <typename Group, typename T> | ||
uniform<T> joint_load(Group g, const T *addr); // 1c | ||
|
||
template <typename Group, typename T, typename Props> | ||
void joint_store(Group g, T *addr, const T &value, Props p); // 2 | ||
|
||
template <typename Group, typename T> | ||
void joint_store(Group g, T *addr, const T &value); // 2a | ||
|
||
} // namespace experimental | ||
} // namespace ext | ||
} // namespace oneapi | ||
} // namespace sycl | ||
``` | ||
|
||
1:: Load and return the object of type `T` at `addr` with the hints in property | ||
list `p`. Each argument must be the same for each work-item in `g`. If `Group` is a | ||
`sub_group`, in which case a `sycl::ext::oneapi::experimental::uniform<T>` is | ||
returned (see 1b-1c.) | ||
1a:: Special case of 1 with no property list. | ||
1b:: Special case of 1 with `sub_group` | ||
1c:: Special case of 1 with `sub_group` and no property list | ||
2:: Store `value` at `addr` with the hints in property list `p`. `p` cannot vary across | ||
work-items, but `value` and `addr` are expected to. | ||
2a:: Special case of 2 with no property list. | ||
|
||
=== `group_block` | ||
|
||
The following functions apply to the passed `Group g` and operate on the memory | ||
range `[addr, addr + g.get_group_linear_range())` (`[addr, addr + | ||
g.get_max_local_range())` for `sub_groups'); see below for more details. These | ||
functions follow the restrictions and behaviors described in Sec. 4.17.3: Group | ||
functions. | ||
|
||
```c++ | ||
namespace sycl { | ||
namespace ext { | ||
namespace oneapi { | ||
namespace experimental { | ||
|
||
template <typename Group, typename T, typename Props> | ||
T group_block_load(Group g, const T *addr, Props p); // 1 | ||
|
||
template <typename Group, typename T> | ||
T group_block_load(Group g, const T *addr); // 1a | ||
|
||
template <typename Group, typename T, typename Props> | ||
void group_block_store(Group g, T *addr, const T &value, Props p); // 2 | ||
|
||
template <typename Group, typename T> | ||
void group_block_store(Group g, T *addr, const T &value); // 2a | ||
|
||
} // namespace experimental | ||
} // namespace ext | ||
} // namespace oneapi | ||
} // namespace sycl | ||
``` | ||
|
||
1:: Load and return an object of type `T` for each work-item in `g`; each | ||
work-item in `g` will return the corresponding object `T` at `addr + | ||
g.get_local_linear_id()`, subject to any hints in `p`. | ||
1a:: Special case of 1 with no property list. | ||
2:: For each work-item in `g`, store that item's `value` at | ||
`addr + g.get_local_linear_id()` as computed by that work-item, using the | ||
hints in `p`. | ||
2a:: Special case of 2 with no property list. | ||
|
||
== Nontemporal properties | ||
|
||
These properties allow programmers to express hints at how memory accesses | ||
should behave. These assume compile-time property values, and are passed to | ||
various constructs via property lists so that they may be associated with memory | ||
operations. | ||
|
||
The default behavior for any property class, if some other specified property | ||
class does not override it, is to assume the most temporal behavior as possible. | ||
|
||
=== Values | ||
|
||
Each nontemporal property is parameterized to take one of two values: | ||
|
||
```c++ | ||
namespace sycl { | ||
namespace ext { | ||
namespace oneapi { | ||
namespace experimental { | ||
|
||
struct nontemporal { /* unspecified */ }; // 1 | ||
struct temporal { /* unspecified */ }; // 2 | ||
|
||
} // namespace experimental | ||
} // namespace extg | ||
} // namespace oneapi | ||
} // namespace sycl | ||
``` | ||
|
||
1:: indicates that the associated memory be accessed in as maximally nontemporal | ||
a fashion as possible. | ||
2:: indicates that the associated memory be accessed in | ||
as maximally temporal a fashion as possible. | ||
|
||
=== Properties | ||
|
||
The nontemporal properties that are parameterized by the above are: | ||
|
||
```c++ | ||
namespace sycl { | ||
namespace ext { | ||
namespace oneapi { | ||
namespace experimental { | ||
|
||
struct temporality_hint_key { | ||
template <typename T> | ||
using value_t = property_value<temporality_hint_key, T>; | ||
}; | ||
|
||
struct L1_cache_hint_key { | ||
template <typename T> | ||
using value_t = property_value<L1_cache_hint_key, T>; | ||
}; | ||
|
||
struct L2_cache_hint_key { | ||
template <typename T> | ||
using value_t = property_value<L2_cache_hint_key, T>; | ||
}; | ||
|
||
struct L3_cache_hint_key { | ||
template <typename T> | ||
using value_t = property_value<L3_cache_hint_key, T>; | ||
}; | ||
|
||
struct L4_cache_hint_key { | ||
template <typename T> | ||
using value_t = property_value<L4_cache_hint_key, T>; | ||
}; | ||
|
||
} // namespace experimental | ||
} // namespace ext | ||
} // namespace oneapi | ||
} // namespace sycl | ||
``` | ||
|
||
The `temporality_hint_key` property is the most generic and it should override | ||
any other nontemporal properties, if present. | ||
|
||
The property values as passed to the `{L1,L2,L3,L4}_cache_hint_key` property | ||
classes should apply only to the cache level specified; the precise mapping to | ||
hardware constructs is otherwise implementation-defined. | ||
|
||
=== Convenience variables | ||
|
||
The following convenience variables help use the above when declaring property lists: | ||
|
||
```c++ | ||
namespace sycl { | ||
namespace ext { | ||
namespace oneapi { | ||
namespace experimental { | ||
|
||
template <typename T> | ||
inline constexpr temporality_hint_key::value_t<T> temporality_hint; | ||
|
||
template <typename T> | ||
inline constexpr L1_cache_hint_key::value_t<T> L1_cache_hint; | ||
|
||
template <typename T> | ||
inline constexpr L2_cache_hint_key::value_t<T> L2_cache_hint; | ||
|
||
template <typename T> | ||
inline constexpr L3_cache_hint_key::value_t<T> L3_cache_hint; | ||
|
||
template <typename T> | ||
inline constexpr L4_cache_hint_key::value_t<T> L4_cache_hint; | ||
|
||
} // namespace experimental | ||
} // namespace ext | ||
} // namespace oneapi | ||
} // namespace sycl | ||
``` | ||
|
||
== Implementation Notes | ||
|
||
These properties are intended to be hints to guide the compiler; specific | ||
nontemporal behavior should not be assumed. | ||
|
||
Most extant architectures lack awareness of categories of memory as they are | ||
understood by the programmer (i.e. buffers, arrays, structures) and only expose | ||
temporality controls at the granularity of memory-transacting instructions. This | ||
extension provides a groundwork for future extensions that expose pointer- and | ||
accessor-level semantics. A future extension may provide more | ||
architecture-specific hints and coarser controls for applying hints. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (adding a global comment here only so it can be a threaded conversation) After reviewing #5755, I'm wondering if we can use
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, man. I guess you don't remember that There are a few reasons for this:
Does that make sense? I think it absolutely makes sense to have this work with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gmlueck , did you see this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the ping. At this point we already have a proposed extension for Regardless, we should decide now how the non-temporary properties interact with If we decided to use If we keep the non-temporal support in the load/store extension, how would the interaction with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do think (2) warranted. I see these as complimentary features. annotated_ptr + nontemporal properties allows for a productive, coarse-grained way of marking some semantically important piece of memory, while load/store allow for fine-grained expressions. In my original proposal, I suggested that load/store would override any annotated properties, and I stand by that. It's a 'most specific' type of behavior that I think users can easily reason about. I should add that this was renamed "load_store" from "nontemporal" at the suggestion of @Pennycook; we are considering how to add marray/vec behavior as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it seems better to expose the temporal/nontemporal choice as an enum. There should also be predefined variables for each of these properties, which will make their use much less verbose:
It would also reduce verbosity if we name the property
temporality_key
instead oftemporality_hint_key
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had convenience variables in my POC that didn't make it to my proposal; I've fixed that. I don't understand the value of the enum vs. independent classes, to be honest.