Skip to content

Commit b51f267

Browse files
[SYCL][Doc] Remove now incorrect info from Reduction_status.md (#7751)
* ext::oneapi::reduction removed in #6634 * sycl::item in kernel supported since #7478 * sycl::range + many reductions implemented in #7456 * CPU reduction performance implemented in #6164 * span support implemented in #6019 There might be other things that have been implemented already, but I cannot immediately identify them, if any.
1 parent 29aa7ba commit b51f267

File tree

1 file changed

+2
-37
lines changed

1 file changed

+2
-37
lines changed

sycl/doc/design/Reduction_status.md

Lines changed: 2 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,6 @@
22

33
**NOTE**: This document is a quick draft. It is written to help developers of SYCL headers/library to understand the current status, currently used algorithms and known problems.
44

5-
6-
7-
# Reduction specifications
8-
9-
There are 2 specifications of the reduction feature and both are still actual:
10-
11-
* `sycl::ext::oneapi::reduction` is described in [this document](../extensions/deprecated/sycl_ext_oneapi_nd_range_reductions.md). This extension is deprecated, and was created as part of a pathfinding/prototyping work before it was added to SYCL 2020 standard.
12-
13-
* `sycl::reduction` is described in [SYCL 2020 standard](https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:reduction).
14-
15-
These two specifications for reduction are pretty similar. The implementation of `sycl::reduction` is based on (basically re-uses) the implementation of `sycl::ext::oneapi::reduction`.
16-
17-
There are non-critical differences in API to create the reduction object. `sycl::reduction` accepts either `sycl::buffer` or `usm memory` and optional property `property::reduction::initialize_to_identity` as parameter to create a reduction, while `sycl::ext::oneapi::reduction` accepts `sycl::accessor` that has `access::mode` equal to either `read_write` (which corresponds to SYCL 2020 reduction initialized without `property::reduction::initialize_to_identity`) or `discard_write`(corresponds to case when `property::reduction::initialize_to_identity` is used).
18-
195
---
206
---
217
# Implementation details: `reduction` in `parallel_for()` accepting `nd_range`
@@ -140,10 +126,7 @@ Variants (B) and (C) use the same approach. The only difference is how the parti
140126
141127
---
142128
143-
TODO #4 (Performance): The `reductionLoop()` has some order in which it choses indexes from the global index space. Currently it has huge stride to help vectorizer and get more vector insturction for the device code, which though may cause competition among devices for the memory due to pretty bad memory locality. On two-socket server CPUs using smaller stride to prioritize better memory locality gives additional perf improvement.
144-
145-
---
146-
TODO #5 (Performance): Some devices may provide unique-thread-id where the number of worker threads running simultaneously is limited. Such feature opens way for more efficient implementations (up to 2x faster, especially on many stacks/tiles devices). See this extension for reference: https://github.com/intel/llvm/pull/4747
129+
TODO #4 (Performance): Some devices may provide unique-thread-id where the number of worker threads running simultaneously is limited. Such feature opens way for more efficient implementations (up to 2x faster, especially on many stacks/tiles devices). See this extension for reference: https://github.com/intel/llvm/pull/4747
147130
148131
---
149132
---
@@ -162,25 +145,7 @@ The rest of this work is temporarily blocked by XPTI instrumentation that need t
162145
The problem is known, the fix in SYCL headers is implemented: https://github.com/intel/llvm/pull/4352 and is waiting for some re-work in XPTI component that must be done before the fix merge.
163146

164147
---
165-
### 2) Support `parallel_for` accepting `range` and having `item` as the parameter of the kernel function.
166-
Currently only kernels accepting `id` are supported.
167-
168-
---
169-
### 3) Support `parallel_for` accepting `range` and 2 or more reduction variables.
170-
Currently `parallel_for()` accepting `range` may handle only 1 reduction variable. It does not support 2 or more.
171-
172-
The temporary work-around for that is to use some container multiple reduction variables, i.e. std::pair, std::tuple or a custom struct/class containing 2 or more reduction variables, and also define a custom operator that would be passed to `reduction` constructor.
173-
Another work-around is to provide `nd_range`.
174-
175-
---
176-
### 4) Support `parallel_for` accepting `reduction` constructed with `span`:
177-
```c++
178-
template <typename T, typename Extent, typename BinaryOperation>
179-
__unspecified__ reduction(span<T, Extent> vars, const T& identity, BinaryOperation combiner);
180-
```
181-
182-
---
183-
### 5) Support identity-less reductions even when the reduction cannot be determinted automatically.
148+
### 2) Support identity-less reductions even when the reduction cannot be determinted automatically.
184149

185150
Currently identity-less reductions are supported, but only in cases when sycl::has_known_identity<BinaryOperation, ElementType> returns true.
186151
When sycl::has_known_identity returns false, the implementation of the reduction may be less efficient, but still be functional.

0 commit comments

Comments
 (0)