Skip to content

Commit d4045be

Browse files
authored
[ESIMD][NFC][DOC] Add 'restriction' section to atomic_update() doc (#13202)
Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
1 parent ac4808a commit d4045be

File tree

2 files changed

+35
-19
lines changed

2 files changed

+35
-19
lines changed

sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646

4747
## Introduction
4848

49-
The main motivation for introducing the "Explicit SIMD" SYCL extension
49+
The main motivation for introducing the "Explicit SIMD" SYCL extension
5050
(or simply "ESIMD") is enabling efficient low-level programming for Intel graphics
5151
architectures. It provides APIs close to the Intel GPU ISA
5252
and allows writing explicitly vectorized device code.
@@ -356,7 +356,7 @@ See more details on the API documentation [page TODO](https://intel.github.io/ll
356356
### Memory access APIs
357357

358358
Explicit SIMD memory access interface is quite different from the standard SYCL
359-
memory access interface. It supports main SYCL's device memory representations:
359+
memory access interface. It supports main SYCL device memory representations:
360360
- USM pointers
361361
- SYCL accessors
362362
- 1D global accessors
@@ -651,7 +651,7 @@ ESIMD supports the following non-standard math functions implemented in hardware
651651
- Fraction - `frc`, extracts the fractional parts of the input vector elements.
652652
- Count leading zeroes - `lzd`.
653653
- Linear interpolation - `lrp`. Basically computes `src1 * src0 + src2 * (1.0f - src0)`
654-
- Plane equation - `plane`. Solves a component-wise plane equation
654+
- Plane equation - `plane`. Solves a component-wise plane equation
655655
`w = p*u + q*v + r` where `u`, `v`, `w` are vectors and `p`, `q`, `r` are scalars.
656656
657657
@@ -865,7 +865,7 @@ There are other useful miscellaneous APIs provided by ESIMD.
865865
types with saturation.
866866
- Conversion - `convert`. Converts between vectors with different element data
867867
types.
868-
- Reverse bits - `bf_reverse`.
868+
- Reverse bits - `bf_reverse`.
869869
- Insert bit field - `bf_insert`.
870870
- Extract bit field - `bf_extract`.
871871
- Convert mask to integer and back - `pack_mask`, `unpack_mask`.
@@ -978,7 +978,7 @@ More examples of the unwrap/merge process:
978978
B6 b;
979979
char x;
980980
char y;
981-
981+
982982
C6 foo() { return *this; }
983983
};
984984
```
@@ -989,7 +989,7 @@ More examples of the unwrap/merge process:
989989
```
990990
%struct.C6 = type { %struct.B6, i8, i8 }
991991
%struct.B6 = type { i32 addrspace(4)*, i32 }
992-
```
992+
```
993993
994994
Note that `__regcall` does not guarantee passing through registers in the final
995995
generated code. For example, compiler will use a threshold for argument or
@@ -1162,8 +1162,7 @@ inside ESIMD kernels and functions. Most of missing SYCL features listed below
11621162
must be supported eventually:
11631163
- 2D and 3D target::device accessor and local_accessor;
11641164
- Constant accessors;
1165-
- `sycl::accessor::get_pointer()` and `sycl::accessor::operator[]` are supported only with `-fsycl-esimd-force-stateless-mem`. Otherwise, All memory accesses through an accessor are
1166-
done via explicit APIs; e.g. `sycl::ext::intel::esimd::block_store(acc, offset)`
1165+
- `sycl::accessor::get_pointer()` and `sycl::accessor::operator[]` are not supported with with `-fno-sycl-esimd-force-stateless-mem` compilation switch.
11671166
- Accessors with non-zero offsets to accessed buffer;
11681167
- Accessors with access/memory range specified;
11691168
- `sycl::image`, `sycl::sampler` and `sycl::stream` classes.

sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd_functions.md

Lines changed: 28 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ The optional [compile-time properties](#compile-time-properties) list `props` ma
122122
123123
`N` - the valid values may depend on usage of cache-hints or passing of the `pred` argument:
124124
125-
| `Function` | `Condition` | Requirement for `N` | Required/supported Intel GPU |
125+
| `Function` | `Condition` | Requirement for `N` | Required Intel GPU |
126126
|-|-|-|-|
127127
| `(usm-bl-*)` | (no cache-hints) and (`pred` is not passed) | `N` is any positive number | Any Intel GPU |
128128
| `(usm-bl-*)` | (cache-hints) or (`pred` is passed) | `N` must be from [Table1 below](#table1---valid-values-of-n-if-cache-hints-used-or-pred-parameter-is-passed) | DG2 or PVC |
@@ -195,7 +195,7 @@ The optional [compile-time properties](#compile-time-properties) list `props` ma
195195

196196
`N` - the valid values may depend on usage of cache-hints or passing of the `pred` argument:
197197

198-
| `Function` | `Condition` | Requirement for `N` | Required/supported Intel GPU |
198+
| `Function` | `Condition` | Requirement for `N` | Required Intel GPU |
199199
|-|-|-|-|
200200
| `(usm-bs-*)` | (no cache-hints) and (`pred` is not passed) | `N` is any positive number | Any Intel GPU |
201201
| `(usm-bs-*)` | (cache-hints) or (`pred` is passed) | `N` must be from [Table2 below](#table1---valid-values-of-n-if-cache-hints-used-or-pred-parameter-is-passed) | DG2 or PVC |
@@ -338,7 +338,7 @@ template <typename T, int N, int VS = 1, typename OffsetSimdViewT, typename Prop
338338
`(slm-ga-*)`: Loads ("gathers") elements of the type `T` from shared local memory locations addressed by `byte_offsets`.
339339
The parameter `byte_offset` is a vector of any integral type elements for `(usm-ga-*)`, 32-bit integer elements for `(lacc-ga-*)` and `(slm-ga-*)`, any integral type integer elements for `(acc-ga-*)` in [stateless](#statelessstateful-memory-mode) mode(default),
340340
and up-to-32-bit integer elements for `(acc-ga-*)` in [stateful](#statelessstateful-memory-mode) mode.
341-
The optional parameter `pred` provides a `simd_mask`. If some element in `pred` is zero, then the load of the corresponding memory location is skipped and the element of the result is copied from `pass_thru` (if it is passed) or it is undefined (if `pass_thru` is omitted).
341+
The optional parameter `mask` provides a `simd_mask`. If some element in `mask` is zero, then the load of the corresponding memory location is skipped and the element of the result is copied from `pass_thru` (if it is passed) or it is undefined (if `pass_thru` is omitted).
342342
The optional [compile-time properties](#compile-time-properties) list `props` may specify `alignment` and/or `cache-hints`. The cache-hints are ignored for `(lacc-*)` and `(slm-*)` functions.
343343
The template parameter `N` can be any positive number.
344344
The optional template parameter `VS` must be one of `{1, 2, 3, 4, 8, 16, 32, 64}` values. It specifies how many conseсutive elements are loaded per each element in `byte_offsets`.
@@ -354,7 +354,7 @@ simd<float, 8> vec8 = gather<float, 8, 2>(ptr, offsets);
354354
```
355355

356356
### Restrictions
357-
| `Function` | `Condition` | Required/supported Intel GPU |
357+
| `Function` | `Condition` | Required Intel GPU |
358358
|-|-|-|
359359
| `(usm-ga-1,4,7)`,`(acc-ga-1,4,7)` | true (`pass_thru` arg is passed) | DG2 or PVC |
360360
| `(usm-ga-2,3,8,9)`, `(acc-ga-2,3,8,9)` | !(cache-hints) and (`VS` == 1) and (`N` == 1,2,4,8,16,32) | Any Intel GPU |
@@ -439,7 +439,7 @@ template <typename T, int N, int VS = 1, typename OffsetSimdViewT, typename Prop
439439
`(slm-sc-*)`: Stores ("scatters") the vector `vals` to shared local memory locations addressed by `byte_offsets`.
440440
The parameter `byte_offset` is a vector of any integral type elements for `(usm-sc-*)`, 32-bit integer elements for `(lacc-sc-*)` and `(slm-sc-*)`, any integral type integer elements for `(acc-sc-*)` in [stateless](#statelessstateful-memory-mode) mode(default),
441441
and up-to-32-bit integer elements for `(acc-sc-*)` in [stateful](#statelessstateful-memory-mode) mode.
442-
The optional parameter `pred` provides a `simd_mask`. If some element in `pred` is zero, then the store to the corresponding memory location is skipped.
442+
The optional parameter `mask` provides a `simd_mask`. If some element in `mask` is zero, then the store to the corresponding memory location is skipped.
443443
The optional [compile-time properties](#compile-time-properties) list `props` may specify `alignment` and/or `cache-hints`. The cache-hints are ignored for `(lacc-sc-*)` and `(slm-sc-*)` functions.
444444
The template parameter `N` can be any positive number.
445445
The optional template parameter `VS` must be one of `{1, 2, 3, 4, 8, 16, 32, 64}` values. It specifies how many conseсutive elements are written per each element in `byte_offsets`.
@@ -457,7 +457,7 @@ scatter<float, 8, 2>(ptr, offsets4);
457457
```
458458

459459
### Restrictions
460-
| `Function` | `Condition` | Required/supported Intel GPU |
460+
| `Function` | `Condition` | Required Intel GPU |
461461
|-|-|-|
462462
| `(usm-sc-*)`, `(acc-sc-*)` | !(cache-hints) and (`VS` == 1) and (`N` == 1,2,4,8,16,32) | Any Intel GPU |
463463
| `(usm-sc-*)`, `(acc-sc-*)` | (cache-hints) or (`VS` > 1) or (`N` != 1,2,4,8,16,32) | DG2 or PVC |
@@ -502,7 +502,7 @@ template <atomic_op Op, typename T, int N, typename OffsetObjT, typename Accesso
502502
// Atomic update the memory locations referenced by local-accessor (SLM) - zero operands (dec, load, etc.).
503503
template <atomic_op Op, typename T, int N, typename AccessorT>
504504
/*lacc-au0-1*/ simd<T, N> atomic_update(AccessorT lacc, simd<uint32_t, N> byte_offset,
505-
simd_mask<1> pred = 1);
505+
simd_mask<1> mask = 1);
506506

507507
// Atomic update the shared local memory (SLM) - zero operands (dec, load, etc.).
508508
template <atomic_op Op, typename T, int N>
@@ -543,7 +543,7 @@ template <atomic_op Op, typename T, int N, typename OffsetObjT, typename Accesso
543543
// Atomic update the memory locations referenced by local-accessor (SLM) - one operand (add, max, etc.).
544544
template <atomic_op Op, typename T, int N, typename AccessorT>
545545
/*lacc-au1-1*/ simd<T, N> atomic_update(AccessorT lacc, simd<uint32_t, N> byte_offset,
546-
simd<T, N> src0, simd_mask<1> pred = 1);
546+
simd<T, N> src0, simd_mask<1> mask = 1);
547547
548548
// Atomic update the shared local memory (SLM) - one operand (add, max etc.).
549549
template <atomic_op Op, typename T, int N>
@@ -583,7 +583,7 @@ template <atomic_op Op, typename T, int N, typename OffsetObjT, typename Accesso
583583
// Atomic update the memory locations referenced by local-accessor (SLM) - two operands: cmpxchg, fcmpxchg.
584584
template <atomic_op Op, typename T, int N, typename AccessorT>
585585
/*lacc-au2-1*/ simd<T, N> atomic_update(AccessorT lacc, simd<uint32_t, N> byte_offset,
586-
simd<T, N> src0, simd<T, N> src1, simd_mask<1> pred = 1);
586+
simd<T, N> src0, simd<T, N> src1, simd_mask<1> mask = 1);
587587

588588
// Atomic update the shared local memory (SLM) - two operands: cmpxchg, fcmpxchg.
589589
template <atomic_op Op, typename T, int N>
@@ -597,8 +597,25 @@ template <atomic_op Op, typename T, int N>
597597
`(slm-*)`: Atomically updates the shared memory locations addressed by `byte_offset`.
598598
The parameter `byte_offset` is a vector of any integral type elements for `(usm-*)`, 32-bit integer elements for `(lacc-*)` and `(slm-*)`, any integral type integer elements for `(acc-*)` in [stateless](#statelessstateful-memory-mode) mode(default),
599599
and up-to-32-bit integer elements for `(acc-*)` in [stateful](#statelessstateful-memory-mode) mode.
600-
The optional parameter `pred` provides a `simd_mask`. If some element in `pred` is zero, then the corresponding memory location is not updated.
601-
`(usm-*)`, `(acc-*)`: The optional [compile-time properties](#compile-time-properties) list `props` may specify `cache-hints`.
600+
The optional parameter `mask` provides a `simd_mask`. If some element in `mask` is zero, then the corresponding memory location is not updated.
601+
`(usm-*)`, `(acc-*)`: The optional [compile-time properties](#compile-time-properties) list `props` may specify `cache-hints`.
602+
The template parameter `Op` specifies the atomic operation applied to the memory.
603+
The template parameter `T` specifies the type of the elements used in the atomic_update operation. Only 2,4,8-byte types are supported.
604+
The template parameter `N` is the number of elements being atomically updated.
605+
606+
### Restrictions
607+
| `Function` | `Condition` | Required Intel GPU |
608+
|-|-|-|
609+
| `(usm-au0-*)`, `(acc-au0-*)` | !(cache-hints) and (`N` == 1,2,4,8,16,32) and (sizeof(T) >= 4) | Any Intel GPU |
610+
| `(usm-au0-*)`, `(acc-au0-*)` | (cache-hints) or (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) | DG2 or PVC |
611+
| `(usm-au1-*)`, `(acc-au1-*)`, `(usm-au2-*)`, `(acc-au2-*)` | !(cache-hints) and (`N` == 1,2,4,8,16,32) and (sizeof(T) >= 4) and (`Op` is integral operation) | Any Intel GPU |
612+
| `(usm-au1-*)`, `(acc-au1-*)`, `(usm-au2-*)`, `(acc-au2-*)` | (cache-hints) or (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) or (`Op` is FP operation) | DG2 or PVC |
613+
|-|-|-|
614+
| `(slm-au0-*)`, `(lacc-au0-*)` | (`N` == 1,2,4,8,16,32) and (sizeof(T) == 4) | Any Intel GPU |
615+
| `(slm-au0-*)`, `(lacc-au0-*)` | (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) or (sizeof(T) == 8)| DG2 or PVC |
616+
| `(slm-au1-*)`, `(lacc-au1-*)`, `(slm-au2-*)`, `(lacc-au2-*)` | (`N` == 1,2,4,8,16,32) and (sizeof(T) == 4) and (`Op` is integral operation) | Any Intel GPU |
617+
| `(slm-au1-*)`, `(lacc-au1-*)`, `(slm-au2-*)`, `(lacc-au2-*)` | (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) or (sizeof(T) == 8) or (`Op` is FP operation)| DG2 or PVC |
618+
602619
603620
## prefetch(...)
604621
```C++

0 commit comments

Comments
 (0)