intel · jasonsewall-intel · Feb 24, 2022 · Mar 8, 2022 · Mar 8, 2022 · gmlueck
@@ -0,0 +1,282 @@
+= sycl_ext_oneapi_load_store
+:source-highlighter: coderay
+:coderay-linenums-mode: table
+
+// This section needs to be after the document title.
+:doctype: book
+:toc2:
+:toc: left
+:encoding: utf-8
+:lang: en
+
+:blank: pass:[ +]
+
+// Set the default source code type in this document to C++,
+// for syntax highlighting purposes.  This is needed because
+// docbook uses c++ and html5 uses cpp.
+:language: {basebackend@docbook:c++:cpp}
+
+== Introduction
+IMPORTANT: This specification is a draft.
+
+NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
+trademarks of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc.
+used by permission by Khronos.
+
+This proposal adds support for a family of load and store functions to SYCL. These functions are intended to support semantic hints to help guide code generation.  This document describes these functions, the hint mechanisms, and a group of hints for control over nontemporal memory operations.
+
+== Notice
+
+Copyright (c) 2021-2022 Intel Corporation.  All rights reserved.
+
+== Status
+
+Working Draft
+
+This is a proposed extension specification, intended to gather community
+feedback. Interfaces defined in this specification may not be implemented yet
+or may be in a preliminary state.  The specification itself may also change in
+incompatible ways before it is finalized. Shipping software products should not
+rely on APIs defined in this specification.
+
+== Version
+
+Revision: 1
+
+== Contributors
+
+Jason Sewall, Intel +
+Konst Bobrovsky, Intel +
+John Pennycook, Intel
+
+== Dependencies
+
+This extension is written against the SYCL 2020 specification, Revision 4 and
+the following extensions:
+
+* link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties]
+* link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_uniform.asciidoc[sycl_ext_oneapi_uniform]
+
+== Feature Test Macro
+
+This extension provides a feature-test macro as described in the core SYCL
+specification section 6.3.3 "Feature test macros".  Therefore, an
+implementation supporting this extension must predefine the macro
+`SYCL_EXT_ONEAPI_LOAD_STORE` to one of the values defined in the table
+below.  Applications can test for the existence of this macro to determine if
+the implementation supports this feature, or applications can test the macro's
+value to determine which of the extension's APIs the implementation supports.
+
+[%header,cols="1,5"]
+|===
+|Value |Description
+|1     |Initial extension version.  Base features are supported.
+|===
+
+== Overview
+
+Many architectures allow for sophisticated controls to be placed on how memory operations are executed, but these vary in form and execution. This extension adds high-level abstractions for expressing semantic hints.
+
+This extension consists of a family of free functions for loading and storing data; these functions support hints that are passed through property list arguments. This extension also proposes a set of such hints for describing temporal behavior.
+
+== Load and store functions
+
+These are fine-grained functions that accept property lists and apply them to the granularity of a single logical memory transaction.
+
+=== Work-item granularity
+
+The following functions operate on a per work-item basis.
+
+```c++
+namespace sycl {
+namespace ext {
+namespace oneapi {
+namespace experimental {
+
+  template <typename T, typename Props>
+  T load(const T *addr, Props p); // 1
+
+  template <typename T>
+  T load(const T *addr); // 1a
+
+  template <typename T, typename Props>
+  void store(T *addr, const T &value, Props p); // 2
+
+  template <typename T>
+  void store(T *addr, T &value); // 2a
+
+} // namespace experimental
+} // namespace ext
+} // namespace oneapi
+} // namespace sycl
+```
+
+1:: Load and return the object of type `T` at `addr` with the hints in property list `p`. `p` cannot vary across work-items, but `addr` is expected to. Each work-item recieves a copy of the loaded object.
+1a:: Special case of 1 with no property list.
+2:: Store `value` at `addr` with the hints in property list `p`. `p` cannot vary across work-items, but `value` and `addr` are expected to.
+2a:: Special case of 2 with no property list.
+
+=== Joint (cooperative) group granularity
+
+The following functions apply to the passed `Group g`; the group cooperates to perform the operation to uniform arguments. These functions follow the restrictions and behaviors described in Sec. 4.17.3: Group functions.
+
+```c++
+namespace sycl {
+namespace ext {
+namespace oneapi {
+namespace experimental {
+
+  template <typename Group, typename T, typename Props>
+  T joint_load(Group g, const T *addr, Props p); // 1
+
+  template <typename Group, typename T>
+  T joint_load(Group g, const T *addr); // 1a
+
+  // Available only when Group == sub_group
+  template <typename Group, typename T, typename Props>
+  uniform<T> joint_load(Group g, const T *addr, Props p); // 1b
+
+  // Available only when Group == sub_group
+  template <typename Group, typename T>
+  uniform<T> joint_load(Group g, const T *addr); // 1c
+
+  template <typename Group, typename T, typename Props>
+  void joint_store(Group g, T *addr, const T &value, Props p); // 2
+
+  template <typename Group, typename T>
+  void joint_store(Group g, T *addr, const T &value); // 2a
+
+} // namespace experimental
+} // namespace ext
+} // namespace oneapi
+} // namespace sycl
+```
+
+1:: Load and return the object of type `T` at `addr` with the hints in property list `p`. Each argument must be the same for each work-item in `g`, and a different object is returned for each work-item, unless the `Group` is a `sub_group`, in which case a `sycl::ext::oneapi::experimental::uniform<T>` is returned (see 1b-1c.)
+1a:: Special case of 1 with no property list.
+1b:: Special case of 1 with `sub_group`
+1c:: Special case of 1 with `sub_group` and no property list
+2:: Store `value` at `addr` with the hints in property list `p`. `p` cannot vary across work-items, but `value` and `addr` are expected to.
+2a:: Special case of 2 with no property list.
+
+=== `group_block`
+
+The following functions apply to the passed `Group g` and operate on the memory range `[addr, addr + g.get_group_linear_range())` (`[addr, addr + g.get_max_local_range())` for `sub_groups'); see below for more details. These functions follow the restrictions and behaviors described in Sec. 4.17.3: Group functions.
+
+```c++
+namespace sycl {
+namespace ext {
+namespace oneapi {
+namespace experimental {
+
+  template <typename Group, typename T, typename Props>
+  T group_block_load(Group g, const T *addr, Props p); // 1
+
+  template <typename Group, typename T>
+  T group_block_load(Group g, const T *addr); // 1a
+
+  template <typename Group, typename T, typename Props>
+  void group_block_store(Group g, T *addr, const T &value, Props p); // 2
+
+  template <typename Group, typename T>
+  void group_block_store(Group g, T *addr, const T &value); // 2a
+
+} // namespace experimental
+} // namespace ext
+} // namespace oneapi
+} // namespace sycl
+```
+
+1:: Load and return an object of type `T` for each work-item in `g`; each work-item in `g` will return the corresponding object `T` at `addr + g.get_local_linear_id()`, subject to any hints in `p`.
+1a:: Special case of 1 with no property list.
+2:: For each work-item in `g`, store that item's `value` at `addr + g.get_local_linear_id()` as computed by that work-item, using the hints in `p`.
+2a:: Special case of 2 with no property list.
+
+== Nontemporal properties
+
+These properties allow programmers to express hints at how memory accesses should behave. These assume compile-time property values, and are passed to various constructs via property lists so that they may be associated with memory operations.
+
+The default behavior for any property class, if some other specified property class does not override it, is to assume the most temporal behavior as possible.
+
+=== Values
+
+Each nontemporal property is parameterized to take one of two values:
+
+```c++
+namespace sycl {
+namespace ext {
+namespace oneapi {
+namespace experimental {
+
+struct nontemporal { /* unspecified */ }; // 1
+struct temporal { /* unspecified */ }; // 2
+
+} // namespace experimental
+} // namespace ext
+} // namespace oneapi
+} // namespace sycl
+```
+
+1:: indicates that the associated memory be accessed in as maximally nontemporal a fashion as possible.
+2:: indicates that the associated memory be accessed in as maximally temporal a fashion as possible.
+
+=== Properties
+
+The nontemporal properties that are parameterized by the above are:
+
+```c++
+namespace sycl {
+namespace ext {
+namespace oneapi {
+namespace experimental {
+
+struct temporality_hint_key {
+  template <typename T>
+  using value_t = property_value<temporality_hint_key, T>;
+};
+
+struct L1_cache_hint_key {
+  template <typename T>
+  using value_t = property_value<L1_cache_hint_key, T>;
+};
+
+struct L2_cache_hint_key {
+  template <typename T>
+  using value_t = property_value<L2_cache_hint_key, T>;
+};
+
+struct L3_cache_hint_key  {
+  template <typename T>
+  using value_t = property_value<L3_cache_hint_key, T>;
+};
+
+struct L4_cache_hint_key {
+  template <typename T>
+  using value_t = property_value<L4_cache_hint_key, T>;
+};
+
+} // namespace experimental
+} // namespace ext
+} // namespace oneapi
+} // namespace sycl
+```
+
+The `temporality_hint_key` property is the most generic and it should override any other nontemporal properties, if present.
+
+The property values as passed to the `{L1,L2,L3,L4}_cache_hint_key` property classes should apply only to the cache level specified; the precise mapping to hardware constructs is otherwise implementation-defined.
+
+== Notes
+
+These properties are intended to be hints to guide the compiler; specific nontemporal behavior should not be assumed.
+
+Most extant architectures lack awareness of categories of memory as they are understood by the programmer (i.e. buffers, arrays, structures) and only expose temporality controls at the granularity of memory-transacting instructions. This extension provides a groundwork for future extensions that expose pointer- and accessor-level semantics. A future extension may provide more architecture-specific hints and coarser controls for applying hints.
+
+== Revision History
+
+[cols="5,15,15,70"]
+[grid="rows"]
+[options="header"]
+|========================================
+|Rev|Date|Author|Changes
+|1|2022-02-22|Jason Sewall|*Initial public working draft*
+|========================