|
| 1 | += sycl_ext_oneapi_reduction_properties |
| 2 | + |
| 3 | +:source-highlighter: coderay |
| 4 | +:coderay-linenums-mode: table |
| 5 | + |
| 6 | +// This section needs to be after the document title. |
| 7 | +:doctype: book |
| 8 | +:toc2: |
| 9 | +:toc: left |
| 10 | +:encoding: utf-8 |
| 11 | +:lang: en |
| 12 | +:dpcpp: pass:[DPC++] |
| 13 | +:endnote: —{nbsp}end{nbsp}note |
| 14 | + |
| 15 | +// Set the default source code type in this document to C++, |
| 16 | +// for syntax highlighting purposes. This is needed because |
| 17 | +// docbook uses c++ and html5 uses cpp. |
| 18 | +:language: {basebackend@docbook:c++:cpp} |
| 19 | + |
| 20 | + |
| 21 | +== Notice |
| 22 | + |
| 23 | +[%hardbreaks] |
| 24 | +Copyright (C) 2024 Intel Corporation. All rights reserved. |
| 25 | + |
| 26 | +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks |
| 27 | +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by |
| 28 | +permission by Khronos. |
| 29 | + |
| 30 | + |
| 31 | +== Contact |
| 32 | + |
| 33 | +To report problems with this extension, please open a new issue at: |
| 34 | + |
| 35 | +https://github.com/intel/llvm/issues |
| 36 | + |
| 37 | + |
| 38 | +== Dependencies |
| 39 | + |
| 40 | +This extension is written against the SYCL 2020 revision 9 specification. All |
| 41 | +references below to the "core SYCL specification" or to section numbers in the |
| 42 | +SYCL specification refer to that revision. |
| 43 | + |
| 44 | +This extension also depends on the following other SYCL extensions: |
| 45 | + |
| 46 | +* link:../experimental/sycl_ext_oneapi_properties.asciidoc[ |
| 47 | + sycl_ext_oneapi_properties] |
| 48 | + |
| 49 | + |
| 50 | +== Status |
| 51 | + |
| 52 | +This is a proposed extension specification, intended to gather community |
| 53 | +feedback. Interfaces defined in this specification may not be implemented yet |
| 54 | +or may be in a preliminary state. The specification itself may also change in |
| 55 | +incompatible ways before it is finalized. *Shipping software products should |
| 56 | +not rely on APIs defined in this specification.* |
| 57 | + |
| 58 | + |
| 59 | +== Overview |
| 60 | + |
| 61 | +In order to maximize portability across different device types, the SYCL 2020 |
| 62 | +`reduction` interface gives implementers a significant amount of freedom in |
| 63 | +selecting the correct reduction algorithm to use for different types. |
| 64 | + |
| 65 | +In the majority of cases, a developer can trust an implementation to choose the |
| 66 | +best algorithm; however, there are situations in which a user may wish to |
| 67 | +constrain algorithm selection (e.g., to ensure run-to-run reproducibility). |
| 68 | +This extension introduces new compile-time properties for the `reduction` |
| 69 | +interface that enable developers to provide such constraints. |
| 70 | + |
| 71 | + |
| 72 | +== Specification |
| 73 | + |
| 74 | +=== Feature test macro |
| 75 | + |
| 76 | +This extension provides a feature-test macro as described in the core SYCL |
| 77 | +specification. An implementation supporting this extension must predefine the |
| 78 | +macro `SYCL_EXT_ONEAPI_REDUCTION_PROPERTIES` to one of the values defined in |
| 79 | +the table below. Applications can test for the existence of this macro to |
| 80 | +determine if the implementation supports this feature, or applications can test |
| 81 | +the macro's value to determine which of the extension's features the |
| 82 | +implementation supports. |
| 83 | + |
| 84 | +[%header,cols="1,5"] |
| 85 | +|=== |
| 86 | +|Value |
| 87 | +|Description |
| 88 | + |
| 89 | +|1 |
| 90 | +|Initial version of this extension. |
| 91 | +|=== |
| 92 | + |
| 93 | +=== `reduction` overload |
| 94 | + |
| 95 | +New `reduction` overloads are introduced to allow developers to attach |
| 96 | +compile-time properties to a reduction object. |
| 97 | + |
| 98 | +Each new overload has the same behavior as its corresponding definition in the |
| 99 | +SYCL 2020 specification unless the definition of a property passed in via the |
| 100 | +final `sycl::ext::oneapi::experimental::properties` parameter says otherwise. |
| 101 | + |
| 102 | +[source,c++] |
| 103 | +---- |
| 104 | +namespace sycl { |
| 105 | +
|
| 106 | +template <typename BufferT, typename BinaryOperation, typename PropertyList> |
| 107 | +__unspecified__ reduction(BufferT vars, handler& cgh, BinaryOperation combiner, |
| 108 | + PropertyList properties); |
| 109 | +
|
| 110 | +template <typename T, typename BinaryOperation, typename PropertyList> |
| 111 | +__unspecified__ reduction(T* var, BinaryOperation combiner, |
| 112 | + PropertyList properties); |
| 113 | +
|
| 114 | +template <typename T, typename Extent, typename BinaryOperation, typename PropertyList> |
| 115 | +__unspecified__ reduction(span<T, Extent> vars, BinaryOperation combiner, |
| 116 | + PropertyList properties); |
| 117 | +
|
| 118 | +template <typename BufferT, typename BinaryOperation, typename PropertyList> |
| 119 | +__unspecified__ |
| 120 | +reduction(BufferT vars, handler& cgh, const BufferT::value_type& identity, |
| 121 | + BinaryOperation combiner, PropertyList properties); |
| 122 | +
|
| 123 | +template <typename T, typename BinaryOperation, typename PropertyList> |
| 124 | +__unspecified__ reduction(T* var, const T& identity, BinaryOperation combiner, |
| 125 | + PropertyList properties); |
| 126 | +
|
| 127 | +template <typename T, typename Extent, typename BinaryOperation, typename PropertyList> |
| 128 | +__unspecified__ reduction(span<T, Extent> vars, const T& identity, |
| 129 | + BinaryOperation combiner, |
| 130 | + PropertyList properties); |
| 131 | +
|
| 132 | +} |
| 133 | +---- |
| 134 | + |
| 135 | +=== Reduction properties |
| 136 | + |
| 137 | +New `reduction` properties are introduced to allow developers to constrain |
| 138 | +reduction algorithm selection based on desired behavior(s). Compile-time |
| 139 | +properties corresponding to existing runtime properties are also introduced to |
| 140 | +ensure that all information can be passed via a single property list. |
| 141 | + |
| 142 | +If a reduction kernel is submitted to a device that cannot satisfy the |
| 143 | +request for specific reduction behavior(s), the implementation must throw an |
| 144 | +`exception` with the `errc::feature_not_supported` error code. |
| 145 | + |
| 146 | +[source,c++] |
| 147 | +---- |
| 148 | +namespace sycl::ext::oneapi { |
| 149 | +
|
| 150 | +struct deterministic_key { |
| 151 | + using value_t = property_value<deterministic_key>; |
| 152 | +}; |
| 153 | +inline constexpr deterministic_key::value_t deterministic; |
| 154 | +
|
| 155 | +struct initialize_to_identity_key { |
| 156 | + using value_t = property_value<initialize_to_identity_key>; |
| 157 | +}; |
| 158 | +inline constexpr initialize_to_identity_key::value_t initialize_to_identity; |
| 159 | +
|
| 160 | +} |
| 161 | +---- |
| 162 | + |
| 163 | +|=== |
| 164 | +|Property|Description |
| 165 | + |
| 166 | +|`deterministic` |
| 167 | +a|When two reductions both have this property, they are guaranteed to produce |
| 168 | +the same result when all of the following conditions hold: |
| 169 | + |
| 170 | +* Both reductions run on the same device. |
| 171 | +* Both reductions are invoked with the same launch configuration (i.e., `range` |
| 172 | + or `nd_range`). |
| 173 | +* The same values are contributed to each reduction. |
| 174 | +* The work-items in each reduction contribute those values in the same pattern |
| 175 | + and the same order. For example, if the first reduction contributes values |
| 176 | + _V1_, _V2_, and _V3_ (in that order) from a work-item with linear index _i_; |
| 177 | + then the second reduction must also contribute values _V1_, _V2_, and _V3_ |
| 178 | + (in that order) from the work-item with linear index _i_. |
| 179 | + |
| 180 | +[_Note:_ Work-items may contribute different values to a reduction because of |
| 181 | +other potential sources of non-determinism, such as calls to group algorithms, |
| 182 | +use of atomic operations, etc. _{endnote}_] |
| 183 | + |
| 184 | +|`initialize_to_identity` |
| 185 | +|Adds the same requirement as |
| 186 | +`sycl::property::reduction::initialize_to_identity`. |
| 187 | + |
| 188 | +|=== |
| 189 | + |
| 190 | + |
| 191 | +=== Usage example |
| 192 | + |
| 193 | +[source,c++] |
| 194 | +---- |
| 195 | +using syclex = sycl::ext::oneapi::experimental; |
| 196 | +
|
| 197 | +float sum(sycl::queue q, float* input, size_t N) { |
| 198 | +
|
| 199 | + float result = 0; |
| 200 | + { |
| 201 | + sycl::buffer<float> buf{&result, 1}; |
| 202 | +
|
| 203 | + q.submit([&](sycl::handler& h) { |
| 204 | + auto reduction = sycl::reduction(buf, h, sycl::plus<>(), syclex::properties(syclex::deterministic)); |
| 205 | + h.parallel_for(N, reduction, [=](size_t i, auto& reducer) { |
| 206 | + reducer += input[i]; |
| 207 | + }); |
| 208 | + } |
| 209 | + } |
| 210 | + return result; |
| 211 | + |
| 212 | +} |
| 213 | +
|
| 214 | +... |
| 215 | +
|
| 216 | +float x = sum(q, array, 1024); |
| 217 | +float y = sum(q, array, 1024); |
| 218 | +
|
| 219 | +// NB: determinism guarantees bitwise reproducible reductions for floats |
| 220 | +assert(sycl::bit_cast<unsigned int>(x) == sycl::bit_cast<unsigned int>(y)); |
| 221 | +---- |
| 222 | + |
| 223 | + |
| 224 | +== Implementation notes |
| 225 | + |
| 226 | +This non-normative section provides information about one possible |
| 227 | +implementation of this extension. It is not part of the specification of the |
| 228 | +extension's API. |
| 229 | + |
| 230 | +Since SYCL implementations must support arbitrary types, we anticipate that |
| 231 | +many implementations will already have appropriate reduction variants available |
| 232 | +that satisfy the constraints imposed by these new properties. Implementing |
| 233 | +support for these new constraints may therefore be as straightforward as |
| 234 | +providing a new overload of `sycl::reduction` that overrides the algorithm |
| 235 | +selection process. |
| 236 | + |
| 237 | +The steps necessary to guarantee deterministic results are type-dependent. For |
| 238 | +integers and built-in combination operators, all implementations should be |
| 239 | +deterministic by default. For floating-point numbers and/or custom combination |
| 240 | +operators, it becomes necessary to ensure that the intermediate results from |
| 241 | +each work-item are always combined in the same order. |
| 242 | + |
| 243 | + |
| 244 | +== Issues |
| 245 | + |
| 246 | +None. |
0 commit comments