Skip to content

[SYCL][Doc] Add initial draft of sycl load_store proposal #5655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
282 changes: 282 additions & 0 deletions sycl/doc/extensions/proposed/sycl_ext_oneapi_load_store.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
= sycl_ext_oneapi_load_store
:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}

== Introduction
IMPORTANT: This specification is a draft.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

This proposal adds support for a family of load and store functions to SYCL. These functions are intended to support semantic hints to help guide code generation. This document describes these functions, the hint mechanisms, and a group of hints for control over nontemporal memory operations.

== Notice

Copyright (c) 2021-2022 Intel Corporation. All rights reserved.

== Status

Working Draft

This is a proposed extension specification, intended to gather community
feedback. Interfaces defined in this specification may not be implemented yet
or may be in a preliminary state. The specification itself may also change in
incompatible ways before it is finalized. Shipping software products should not
rely on APIs defined in this specification.

== Version

Revision: 1

== Contributors

Jason Sewall, Intel +
Konst Bobrovsky, Intel +
John Pennycook, Intel

== Dependencies

This extension is written against the SYCL 2020 specification, Revision 4 and
the following extensions:

* link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_properties.asciidoc[sycl_ext_oneapi_properties]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is no longer correct because the extension is now implemented. The spec now lives in the "experimental" directory.

* link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_uniform.asciidoc[sycl_ext_oneapi_uniform]

== Feature Test Macro

This extension provides a feature-test macro as described in the core SYCL
specification section 6.3.3 "Feature test macros". Therefore, an
implementation supporting this extension must predefine the macro
`SYCL_EXT_ONEAPI_LOAD_STORE` to one of the values defined in the table
below. Applications can test for the existence of this macro to determine if
the implementation supports this feature, or applications can test the macro's
value to determine which of the extension's APIs the implementation supports.

[%header,cols="1,5"]
|===
|Value |Description
|1 |Initial extension version. Base features are supported.
|===

== Overview

Many architectures allow for sophisticated controls to be placed on how memory operations are executed, but these vary in form and execution. This extension adds high-level abstractions for expressing semantic hints.

This extension consists of a family of free functions for loading and storing data; these functions support hints that are passed through property list arguments. This extension also proposes a set of such hints for describing temporal behavior.

== Load and store functions

These are fine-grained functions that accept property lists and apply them to the granularity of a single logical memory transaction.

=== Work-item granularity

The following functions operate on a per work-item basis.

```c++
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

template <typename T, typename Props>
T load(const T *addr, Props p); // 1

template <typename T>
T load(const T *addr); // 1a

template <typename T, typename Props>
void store(T *addr, const T &value, Props p); // 2

template <typename T>
void store(T *addr, T &value); // 2a

} // namespace experimental
} // namespace ext
} // namespace oneapi
} // namespace sycl
```

1:: Load and return the object of type `T` at `addr` with the hints in property list `p`. `p` cannot vary across work-items, but `addr` is expected to. Each work-item recieves a copy of the loaded object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does not make sense to say:

p cannot vary across work-items

because these are not cooperative group functions. Is this just a typo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me example of how it could vary across work-items? If it were runtime only, I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current wording make is seem like the following would not be legal:

if (i == 0) {
  sycl::ext::oneapi::experimental::properties prop1{sycl::ext::oneapi::experimental::temporality_hint_nontemporal};
  sycl::ext::oneapi::experimental::load(ptr1, prop1);
}
else {
  sycl::ext::oneapi::experimental::properties prop2{sycl::ext::oneapi::experimental::temporality_hint_temporal};
  sycl::ext::oneapi::experimental::load(ptr2, prop2);
}

However, since these are not cooperative group functions, it seems like that should be legal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I just don't understand SYCL. Are you saying that those two loads are the same call, and that we are effectively calling load with different arguments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm saying that the language "cannot vary across work-items" is non-sensical unless you are talking about a function that must be called in convergent code. Since this is not a cooperative group function, that phrase just doesn't make any sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

1a:: Special case of 1 with no property list.
2:: Store `value` at `addr` with the hints in property list `p`. `p` cannot vary across work-items, but `value` and `addr` are expected to.
2a:: Special case of 2 with no property list.

=== Joint (cooperative) group granularity

The following functions apply to the passed `Group g`; the group cooperates to perform the operation to uniform arguments. These functions follow the restrictions and behaviors described in Sec. 4.17.3: Group functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a wording suggestion, but it might be clearer to say that these are "cooperative group functions":

These are cooperative group functions, so they have all the restrictions and behaviors described in Section 4.17.3 "Group functions" of the core SYCL specification.


```c++
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

template <typename Group, typename T, typename Props>
T joint_load(Group g, const T *addr, Props p); // 1

template <typename Group, typename T>
T joint_load(Group g, const T *addr); // 1a

// Available only when Group == sub_group
template <typename Group, typename T, typename Props>
uniform<T> joint_load(Group g, const T *addr, Props p); // 1b

// Available only when Group == sub_group
template <typename Group, typename T>
uniform<T> joint_load(Group g, const T *addr); // 1c

template <typename Group, typename T, typename Props>
void joint_store(Group g, T *addr, const T &value, Props p); // 2

template <typename Group, typename T>
void joint_store(Group g, T *addr, const T &value); // 2a

} // namespace experimental
} // namespace ext
} // namespace oneapi
} // namespace sycl
```

1:: Load and return the object of type `T` at `addr` with the hints in property list `p`. Each argument must be the same for each work-item in `g`, and a different object is returned for each work-item, unless the `Group` is a `sub_group`, in which case a `sycl::ext::oneapi::experimental::uniform<T>` is returned (see 1b-1c.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "a different object is returned for each work-item" seems a little confusing to me. Since all work-items load from the same address, they will all get the same value. I agree that each work-item gets a unique object of type T. However, even in the sub-group case, each work-item gets a unique object of type uniform<T>.

Maybe something like this:

1:: The addr and p arguments must be the same for each work-item in the group. Each work-item loads the object of type T at that address, using the hints in property list p.

1b:: Special case of 1 with sub_group. Each work-item loads the object of type T at that address, and returns that value wrapped in a sycl::ext::oneapi::experimental::uniform<T> object.

1a:: Special case of 1 with no property list.
1b:: Special case of 1 with `sub_group`
1c:: Special case of 1 with `sub_group` and no property list
2:: Store `value` at `addr` with the hints in property list `p`. `p` cannot vary across work-items, but `value` and `addr` are expected to.
2a:: Special case of 2 with no property list.

=== `group_block`

The following functions apply to the passed `Group g` and operate on the memory range `[addr, addr + g.get_group_linear_range())` (`[addr, addr + g.get_max_local_range())` for `sub_groups'); see below for more details. These functions follow the restrictions and behaviors described in Sec. 4.17.3: Group functions.

```c++
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

template <typename Group, typename T, typename Props>
T group_block_load(Group g, const T *addr, Props p); // 1

template <typename Group, typename T>
T group_block_load(Group g, const T *addr); // 1a

template <typename Group, typename T, typename Props>
void group_block_store(Group g, T *addr, const T &value, Props p); // 2

template <typename Group, typename T>
void group_block_store(Group g, T *addr, const T &value); // 2a

} // namespace experimental
} // namespace ext
} // namespace oneapi
} // namespace sycl
```

1:: Load and return an object of type `T` for each work-item in `g`; each work-item in `g` will return the corresponding object `T` at `addr + g.get_local_linear_id()`, subject to any hints in `p`.
1a:: Special case of 1 with no property list.
2:: For each work-item in `g`, store that item's `value` at `addr + g.get_local_linear_id()` as computed by that work-item, using the hints in `p`.
2a:: Special case of 2 with no property list.

== Nontemporal properties

These properties allow programmers to express hints at how memory accesses should behave. These assume compile-time property values, and are passed to various constructs via property lists so that they may be associated with memory operations.

The default behavior for any property class, if some other specified property class does not override it, is to assume the most temporal behavior as possible.

=== Values

Each nontemporal property is parameterized to take one of two values:

```c++
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

struct nontemporal { /* unspecified */ }; // 1
struct temporal { /* unspecified */ }; // 2

} // namespace experimental
} // namespace ext
} // namespace oneapi
} // namespace sycl
```

1:: indicates that the associated memory be accessed in as maximally nontemporal a fashion as possible.
2:: indicates that the associated memory be accessed in as maximally temporal a fashion as possible.

=== Properties

The nontemporal properties that are parameterized by the above are:

```c++
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

struct temporality_hint_key {
template <typename T>
using value_t = property_value<temporality_hint_key, T>;
};

struct L1_cache_hint_key {
template <typename T>
using value_t = property_value<L1_cache_hint_key, T>;
};

struct L2_cache_hint_key {
template <typename T>
using value_t = property_value<L2_cache_hint_key, T>;
};

struct L3_cache_hint_key {
template <typename T>
using value_t = property_value<L3_cache_hint_key, T>;
};

struct L4_cache_hint_key {
template <typename T>
using value_t = property_value<L4_cache_hint_key, T>;
};

} // namespace experimental
} // namespace ext
} // namespace oneapi
} // namespace sycl
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it seems better to expose the temporal/nontemporal choice as an enum. There should also be predefined variables for each of these properties, which will make their use much less verbose:

namespace sycl::oneapi::ext::experimental {

enum class temporality_hint_enum : /*unspecified*/ {
  nontemporal,
  temporal
};

struct temporality_hint_key {
  template <temporality_hint_enum Hint>
  using value_t = property_value<temporality_hint_key, Hint>;
};

inline constexpr temporality_hint_key::value_t<temporality_hint_enum::nontemporal> temporality_hint_nontemporal;
inline constexpr temporality_hint_key::value_t<temporality_hint_enum::temporal> temporality_hint_temporal;

// Etc. for other properties.

} // namespace

It would also reduce verbosity if we name the property temporality_key instead of temporality_hint_key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had convenience variables in my POC that didn't make it to my proposal; I've fixed that. I don't understand the value of the enum vs. independent classes, to be honest.


The `temporality_hint_key` property is the most generic and it should override any other nontemporal properties, if present.

The property values as passed to the `{L1,L2,L3,L4}_cache_hint_key` property classes should apply only to the cache level specified; the precise mapping to hardware constructs is otherwise implementation-defined.

== Notes

These properties are intended to be hints to guide the compiler; specific nontemporal behavior should not be assumed.

Most extant architectures lack awareness of categories of memory as they are understood by the programmer (i.e. buffers, arrays, structures) and only expose temporality controls at the granularity of memory-transacting instructions. This extension provides a groundwork for future extensions that expose pointer- and accessor-level semantics. A future extension may provide more architecture-specific hints and coarser controls for applying hints.

== Revision History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|========================================
|Rev|Date|Author|Changes
|1|2022-02-22|Jason Sewall|*Initial public working draft*
|========================