You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][Doc] Remove now incorrect info from Reduction_status.md (#7751)
* ext::oneapi::reduction removed in
#6634
* sycl::item in kernel supported since
#7478
* sycl::range + many reductions implemented in
#7456
* CPU reduction performance implemented in
#6164
* span support implemented in #6019
There might be other things that have been implemented already, but I
cannot immediately identify them, if any.
Copy file name to clipboardExpand all lines: sycl/doc/design/Reduction_status.md
+2-37Lines changed: 2 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -2,20 +2,6 @@
2
2
3
3
**NOTE**: This document is a quick draft. It is written to help developers of SYCL headers/library to understand the current status, currently used algorithms and known problems.
4
4
5
-
6
-
7
-
# Reduction specifications
8
-
9
-
There are 2 specifications of the reduction feature and both are still actual:
10
-
11
-
*`sycl::ext::oneapi::reduction` is described in [this document](../extensions/deprecated/sycl_ext_oneapi_nd_range_reductions.md). This extension is deprecated, and was created as part of a pathfinding/prototyping work before it was added to SYCL 2020 standard.
12
-
13
-
*`sycl::reduction` is described in [SYCL 2020 standard](https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:reduction).
14
-
15
-
These two specifications for reduction are pretty similar. The implementation of `sycl::reduction` is based on (basically re-uses) the implementation of `sycl::ext::oneapi::reduction`.
16
-
17
-
There are non-critical differences in API to create the reduction object. `sycl::reduction` accepts either `sycl::buffer` or `usm memory` and optional property `property::reduction::initialize_to_identity` as parameter to create a reduction, while `sycl::ext::oneapi::reduction` accepts `sycl::accessor` that has `access::mode` equal to either `read_write` (which corresponds to SYCL 2020 reduction initialized without `property::reduction::initialize_to_identity`) or `discard_write`(corresponds to case when `property::reduction::initialize_to_identity` is used).
18
-
19
5
---
20
6
---
21
7
# Implementation details: `reduction` in `parallel_for()` accepting `nd_range`
@@ -140,10 +126,7 @@ Variants (B) and (C) use the same approach. The only difference is how the parti
140
126
141
127
---
142
128
143
-
TODO #4 (Performance): The `reductionLoop()` has some order in which it choses indexes from the global index space. Currently it has huge stride to help vectorizer and get more vector insturction for the device code, which though may cause competition among devices for the memory due to pretty bad memory locality. On two-socket server CPUs using smaller stride to prioritize better memory locality gives additional perf improvement.
144
-
145
-
---
146
-
TODO #5 (Performance): Some devices may provide unique-thread-id where the number of worker threads running simultaneously is limited. Such feature opens way for more efficient implementations (up to 2x faster, especially on many stacks/tiles devices). See this extension for reference: https://github.com/intel/llvm/pull/4747
129
+
TODO #4 (Performance): Some devices may provide unique-thread-id where the number of worker threads running simultaneously is limited. Such feature opens way for more efficient implementations (up to 2x faster, especially on many stacks/tiles devices). See this extension for reference: https://github.com/intel/llvm/pull/4747
147
130
148
131
---
149
132
---
@@ -162,25 +145,7 @@ The rest of this work is temporarily blocked by XPTI instrumentation that need t
162
145
The problem is known, the fix in SYCL headers is implemented: https://github.com/intel/llvm/pull/4352 and is waiting for some re-work in XPTI component that must be done before the fix merge.
163
146
164
147
---
165
-
### 2) Support `parallel_for` accepting `range` and having `item` as the parameter of the kernel function.
166
-
Currently only kernels accepting `id` are supported.
167
-
168
-
---
169
-
### 3) Support `parallel_for` accepting `range` and 2 or more reduction variables.
170
-
Currently `parallel_for()` accepting `range` may handle only 1 reduction variable. It does not support 2 or more.
171
-
172
-
The temporary work-around for that is to use some container multiple reduction variables, i.e. std::pair, std::tuple or a custom struct/class containing 2 or more reduction variables, and also define a custom operator that would be passed to `reduction` constructor.
173
-
Another work-around is to provide `nd_range`.
174
-
175
-
---
176
-
### 4) Support `parallel_for` accepting `reduction` constructed with `span`:
0 commit comments