|
| 1 | += sycl_ext_oneapi_append_and_shift |
| 2 | + |
| 3 | +:source-highlighter: coderay |
| 4 | +:coderay-linenums-mode: table |
| 5 | + |
| 6 | +// This section needs to be after the document title. |
| 7 | +:doctype: book |
| 8 | +:toc2: |
| 9 | +:toc: left |
| 10 | +:encoding: utf-8 |
| 11 | +:lang: en |
| 12 | +:dpcpp: pass:[DPC++] |
| 13 | + |
| 14 | +// Set the default source code type in this document to C++, |
| 15 | +// for syntax highlighting purposes. This is needed because |
| 16 | +// docbook uses c++ and html5 uses cpp. |
| 17 | +:language: {basebackend@docbook:c++:cpp} |
| 18 | + |
| 19 | + |
| 20 | +== Notice |
| 21 | + |
| 22 | +[%hardbreaks] |
| 23 | +Copyright (C) 2023-2023 Intel Corporation. All rights reserved. |
| 24 | + |
| 25 | +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks |
| 26 | +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by |
| 27 | +permission by Khronos. |
| 28 | + |
| 29 | + |
| 30 | +== Contact |
| 31 | + |
| 32 | +To report problems with this extension, please open a new issue at: |
| 33 | + |
| 34 | +https://github.com/intel/llvm/issues |
| 35 | + |
| 36 | + |
| 37 | +== Dependencies |
| 38 | + |
| 39 | +This extension is written against the SYCL 2020 revision 6 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision. |
| 40 | + |
| 41 | + |
| 42 | +== Status |
| 43 | + |
| 44 | +This is a proposed extension specification, intended to gather community feedback. Interfaces defined in this specification may not be implemented yet or may be in a preliminary state. The specification itself may also change in incompatible ways before it is finalized. Shipping software products should not rely on APIs defined in this specification. |
| 45 | + |
| 46 | + |
| 47 | +== Overview |
| 48 | + |
| 49 | +The current specification and implementation of `sycl::shift_group_left` returns "the value of x from the work-item whose group local id (`id`) is delta larger than that of the calling work-item." If `id+delta` is greater or equal than the group's linear size (`sg_size`), then the value returned is unspecified. An equivalent problem occurs for `sycl::shift_group_right` if `id-delta < 0`. |
| 50 | + |
| 51 | +The proposed extension of `sycl::shift_group_left` takes two values, the "current" value `x` and the "next" (or "previous") value `to_append` (`to_prepend`). If `id+delta` is less than the group's linear size, the function returns `x` of the work-item with group local id `id+delta`. If `id+delta` is greater than or equal to the group's linear size, the function returns `to_append` of the work-item with the group local id `id+delta-sg_size`. Thus, in all cases the return value is defined and valid. |
| 52 | + |
| 53 | + |
| 54 | +Similarly, the proposed extension of `sycl::shift_group_right` returns `x` of the work-item with group local id `id-delta`, if `id-delta` is greater than or equal to 0 and `to_prepend` of the work-item with group local id `sg_size+(id-delta)`, otherwise. |
| 55 | + |
| 56 | +While this feature can easily be reproduced manually, as indicated below in the section "possible implementation", an extension is feasible since the underlying `pass:[__]spirv_SubgroupShuffleDownINTEL` and `pass:[__]spirv_SubgroupShuffleUpINTEL` already take the proposed two values and support the required capabilities. |
| 57 | + |
| 58 | +=== Example |
| 59 | +Assuming the group `g` consists of 4 work items (`sg_size = 4`), which are enumerated `0-3` (`WI0`, `WI1`, `WI2`, `WI3`). |
| 60 | +Each work item holds a value `x` and a value `to_append`. To indicate the relation between a work item and its value, we enumerate the values as `x0`, `x1`, `x2`, `x3` for the values associated with work items `WI0`, `WI1`, `WI2`, `WI3`, respectively. Similarly for `to_append0`, etc. |
| 61 | + |
| 62 | +Assuming `append_and_shift_group_left` is called with `delta=2`, we would get the following return values: |
| 63 | +[%header,cols="1,1,1,1,1"] |
| 64 | +|=== |
| 65 | +|Work Item |
| 66 | +|WI0 |
| 67 | +|WI1 |
| 68 | +|WI2 |
| 69 | +|WI3 |
| 70 | + |
| 71 | +|Input x |
| 72 | +|x0 |
| 73 | +|x1 |
| 74 | +|x2 |
| 75 | +|x3 |
| 76 | + |
| 77 | +|Input to_append |
| 78 | +|to_append0 |
| 79 | +|to_append1 |
| 80 | +|to_append2 |
| 81 | +|to_append3 |
| 82 | + |
| 83 | +|Returned Value |
| 84 | +|x2 |
| 85 | +|x3 |
| 86 | +|to_append0 |
| 87 | +|to_append1 |
| 88 | +|=== |
| 89 | + |
| 90 | +Similarly, calling `prepend_and_shift_group_right` (with enumerated `to_prepend0` to `to_prepend3` analogously to the above `to_append`) with `delta=3` would yield |
| 91 | + |
| 92 | +[%header,cols="1,1,1,1,1"] |
| 93 | +|=== |
| 94 | +|Work Item |
| 95 | +|WI0 |
| 96 | +|WI1 |
| 97 | +|WI2 |
| 98 | +|WI3 |
| 99 | + |
| 100 | +|Input x |
| 101 | +|x0 |
| 102 | +|x1 |
| 103 | +|x2 |
| 104 | +|x3 |
| 105 | + |
| 106 | +|Input to_prepend |
| 107 | +|to_prepend0 |
| 108 | +|to_prepend1 |
| 109 | +|to_prepend2 |
| 110 | +|to_prepend3 |
| 111 | + |
| 112 | +|Returned Value |
| 113 | +|to_prepend1 |
| 114 | +|to_prepend2 |
| 115 | +|to_prepend3 |
| 116 | +|x0 |
| 117 | +|=== |
| 118 | + |
| 119 | + |
| 120 | + |
| 121 | +== Specification |
| 122 | + |
| 123 | +=== Feature test macro |
| 124 | +This extension provides a feature-test macro as described in the core SYCL specification. An implementation supporting this extension must predefine the macro `SYCL_EXT_ONEAPI_APPEND_AND_SHIFT` to one of the values defined in the table below. Applications can test for the existence of this macro to determine if the implementation supports this feature, or applications can test the macro's value to determine which of the extension's features the implementation supports. |
| 125 | + |
| 126 | +[%header,cols="1,5"] |
| 127 | +|=== |
| 128 | +|Value |
| 129 | +|Description |
| 130 | + |
| 131 | +|1 |
| 132 | +|The APIs of this experimental extension are not versioned, so the feature-test macro always has this value. |
| 133 | +|=== |
| 134 | + |
| 135 | +=== API |
| 136 | + |
| 137 | +```c++ |
| 138 | +namespace sycl::ext::oneapi { |
| 139 | + |
| 140 | +template <typename Group, typename T> |
| 141 | +T append_and_shift_group_left(Group g, T x, T to_append, Group::linear_id_type delta=1) |
| 142 | + |
| 143 | +} // namespace sycl::ext::oneapi |
| 144 | +``` |
| 145 | + |
| 146 | +_Constraints_: Available only if `std::is_same_v<std::decay_t<Group>, sub_group>` is true and `T` is a trivially copyable type. |
| 147 | + |
| 148 | +_Preconditions_: `delta` must be the same non-negative value for all work-items in the group. |
| 149 | + |
| 150 | +_Returns_: The value `x` of the work-item with group local id `id+delta`, where `id` denotes the group local id of the work-item calling the function. If `id+delta` is greater than or equal to the group's linear size (`sg_size`), the function returns `to_append` of the work-item with the group local id `id+delta-sg_size`. If `id+delta` is greater than or equal to `2*sg_size` (which may happen if `delta` is greater than `sg_size`), the return value is undefined. |
| 151 | + |
| 152 | + |
| 153 | + |
| 154 | +```c++ |
| 155 | +namespace sycl::ext::oneapi { |
| 156 | + |
| 157 | +template <typename Group, typename T> |
| 158 | +T prepend_and_shift_group_right(Group g, T x, T to_prepend, Group::linear_id_type delta=1) |
| 159 | + |
| 160 | +} // namespace sycl::ext::oneapi |
| 161 | +``` |
| 162 | +_Constraints_: Available only if `std::is_same_v<std::decay_t<Group>, sub_group>` is true and `T` is a trivially copyable type. |
| 163 | + |
| 164 | +_Preconditions_: `delta` must be the same non-negative value for all work-items in the group. |
| 165 | + |
| 166 | +_Returns_: The value `x` of the work-item with group local id `id-delta` if `id-delta` is greater than or equal to 0, and `to_prepend` of the work-item with group local id `sg_size+(id-delta)` otherwise. If `id-delta` is less than `-sg_size` (which may happen if `delta` is greater than `sg_size`), the return value is undefined. |
| 167 | + |
| 168 | + |
| 169 | +=== Possible Implementation |
| 170 | + |
| 171 | +The feature can be implemented based on `pass:[__]spirv_SubgroupShuffleDownINTEL` and `pass:[__]spirv_SubgroupShuffleUpINTEL`. For devices without that capability, it can be implemented with the existing shuffle capabilities as follows: |
| 172 | + |
| 173 | +```c++ |
| 174 | +template <typename Group, typename T> |
| 175 | +T sycl::ext::oneapi::append_and_shift_group_left(Group g, T x, T to_append, Group::linear_id_type delta = 1) |
| 176 | +{ |
| 177 | + T down_val = sycl::shift_group_left(g, x, delta); |
| 178 | + T up_val = sycl::shift_group_right(g,to_append, g.get_local_linear_range()-delta); |
| 179 | + |
| 180 | + return delta+g.get_local_linear_id() > g.get_local_linear_range() ? down_val : up_val; |
| 181 | +} |
| 182 | + |
| 183 | +template <typename Group, typename T> |
| 184 | +T sycl::ext::oneapi::prepend_and_shift_group_right(Group g, T x, T to_prepend, Group::linear_id_type delta = 1) |
| 185 | +{ |
| 186 | + T up_val = sycl::shift_group_right(g, x, delta); |
| 187 | + T down_val = sycl::shift_group_left(g,to_prepend, g.get_local_linear_range()-delta); |
| 188 | + |
| 189 | + return g.get_local_linear_id()-delta >= 0 ? up_val : down_val; |
| 190 | +} |
| 191 | +``` |
| 192 | + |
0 commit comments