You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Plane equation - `plane`. Solves a component-wise plane equation
654
+
- Plane equation - `plane`. Solves a component-wise plane equation
655
655
`w = p*u + q*v + r` where `u`, `v`, `w` are vectors and `p`, `q`, `r` are scalars.
656
656
657
657
@@ -865,7 +865,7 @@ There are other useful miscellaneous APIs provided by ESIMD.
865
865
types with saturation.
866
866
- Conversion - `convert`. Converts between vectors with different element data
867
867
types.
868
-
- Reverse bits - `bf_reverse`.
868
+
- Reverse bits - `bf_reverse`.
869
869
- Insert bit field - `bf_insert`.
870
870
- Extract bit field - `bf_extract`.
871
871
- Convert mask to integer and back - `pack_mask`, `unpack_mask`.
@@ -978,7 +978,7 @@ More examples of the unwrap/merge process:
978
978
B6 b;
979
979
char x;
980
980
char y;
981
-
981
+
982
982
C6 foo() { return *this; }
983
983
};
984
984
```
@@ -989,7 +989,7 @@ More examples of the unwrap/merge process:
989
989
```
990
990
%struct.C6 = type { %struct.B6, i8, i8 }
991
991
%struct.B6 = type { i32 addrspace(4)*, i32 }
992
-
```
992
+
```
993
993
994
994
Note that `__regcall` does not guarantee passing through registers in the final
995
995
generated code. For example, compiler will use a threshold for argument or
@@ -1162,8 +1162,7 @@ inside ESIMD kernels and functions. Most of missing SYCL features listed below
1162
1162
must be supported eventually:
1163
1163
- 2D and 3D target::device accessor and local_accessor;
1164
1164
- Constant accessors;
1165
-
- `sycl::accessor::get_pointer()` and `sycl::accessor::operator[]` are supported only with `-fsycl-esimd-force-stateless-mem`. Otherwise, All memory accesses through an accessor are
1166
-
done via explicit APIs; e.g. `sycl::ext::intel::esimd::block_store(acc, offset)`
1165
+
- `sycl::accessor::get_pointer()` and `sycl::accessor::operator[]` are not supported with with `-fno-sycl-esimd-force-stateless-mem` compilation switch.
1167
1166
- Accessors with non-zero offsets to accessed buffer;
1168
1167
- Accessors with access/memory range specified;
1169
1168
- `sycl::image`, `sycl::sampler` and `sycl::stream` classes.
| `(usm-bl-*)` | (no cache-hints) and (`pred` is not passed) | `N` is any positive number | Any Intel GPU |
128
128
| `(usm-bl-*)` | (cache-hints) or (`pred` is passed) | `N` must be from [Table1 below](#table1---valid-values-of-n-if-cache-hints-used-or-pred-parameter-is-passed) | DG2 or PVC |
@@ -195,7 +195,7 @@ The optional [compile-time properties](#compile-time-properties) list `props` ma
195
195
196
196
`N` - the valid values may depend on usage of cache-hints or passing of the `pred` argument:
197
197
198
-
|`Function`|`Condition`| Requirement for `N`| Required/supported Intel GPU |
198
+
|`Function`|`Condition`| Requirement for `N`| Required Intel GPU |
199
199
|-|-|-|-|
200
200
|`(usm-bs-*)`| (no cache-hints) and (`pred` is not passed) |`N` is any positive number | Any Intel GPU |
201
201
|`(usm-bs-*)`| (cache-hints) or (`pred` is passed) |`N` must be from [Table2 below](#table1---valid-values-of-n-if-cache-hints-used-or-pred-parameter-is-passed)| DG2 or PVC |
@@ -338,7 +338,7 @@ template <typename T, int N, int VS = 1, typename OffsetSimdViewT, typename Prop
338
338
`(slm-ga-*)`: Loads ("gathers") elements of the type `T` from shared local memory locations addressed by `byte_offsets`.
339
339
The parameter `byte_offset` is a vector of any integral type elements for `(usm-ga-*)`, 32-bit integer elements for `(lacc-ga-*)` and `(slm-ga-*)`, any integral type integer elements for `(acc-ga-*)` in [stateless](#statelessstateful-memory-mode) mode(default),
340
340
and up-to-32-bit integer elements for `(acc-ga-*)` in [stateful](#statelessstateful-memory-mode) mode.
341
-
The optional parameter `pred` provides a `simd_mask`. If some element in `pred` is zero, then the load of the corresponding memory location is skipped and the element of the result is copied from `pass_thru` (if it is passed) or it is undefined (if `pass_thru` is omitted).
341
+
The optional parameter `mask` provides a `simd_mask`. If some element in `mask` is zero, then the load of the corresponding memory location is skipped and the element of the result is copied from `pass_thru` (if it is passed) or it is undefined (if `pass_thru` is omitted).
342
342
The optional [compile-time properties](#compile-time-properties) list `props` may specify `alignment` and/or `cache-hints`. The cache-hints are ignored for `(lacc-*)` and `(slm-*)` functions.
343
343
The template parameter `N` can be any positive number.
344
344
The optional template parameter `VS` must be one of `{1, 2, 3, 4, 8, 16, 32, 64}` values. It specifies how many conseсutive elements are loaded per each element in `byte_offsets`.
|`(usm-ga-1,4,7)`,`(acc-ga-1,4,7)`| true (`pass_thru` arg is passed) | DG2 or PVC |
360
360
|`(usm-ga-2,3,8,9)`, `(acc-ga-2,3,8,9)`| !(cache-hints) and (`VS` == 1) and (`N` == 1,2,4,8,16,32) | Any Intel GPU |
@@ -439,7 +439,7 @@ template <typename T, int N, int VS = 1, typename OffsetSimdViewT, typename Prop
439
439
`(slm-sc-*)`: Stores ("scatters") the vector `vals` to shared local memory locations addressed by `byte_offsets`.
440
440
The parameter `byte_offset` is a vector of any integral type elements for `(usm-sc-*)`, 32-bit integer elements for `(lacc-sc-*)` and `(slm-sc-*)`, any integral type integer elements for `(acc-sc-*)` in [stateless](#statelessstateful-memory-mode) mode(default),
441
441
and up-to-32-bit integer elements for `(acc-sc-*)` in [stateful](#statelessstateful-memory-mode) mode.
442
-
The optional parameter `pred` provides a `simd_mask`. If some element in `pred` is zero, then the store to the corresponding memory location is skipped.
442
+
The optional parameter `mask` provides a `simd_mask`. If some element in `mask` is zero, then the store to the corresponding memory location is skipped.
443
443
The optional [compile-time properties](#compile-time-properties) list `props` may specify `alignment` and/or `cache-hints`. The cache-hints are ignored for `(lacc-sc-*)` and `(slm-sc-*)` functions.
444
444
The template parameter `N` can be any positive number.
445
445
The optional template parameter `VS` must be one of `{1, 2, 3, 4, 8, 16, 32, 64}` values. It specifies how many conseсutive elements are written per each element in `byte_offsets`.
`(slm-*)`: Atomically updates the shared memory locations addressed by `byte_offset`.
598
598
The parameter `byte_offset` is a vector of any integral type elements for `(usm-*)`, 32-bit integer elements for `(lacc-*)` and `(slm-*)`, any integral type integer elements for `(acc-*)` in [stateless](#statelessstateful-memory-mode) mode(default),
599
599
and up-to-32-bit integer elements for `(acc-*)` in [stateful](#statelessstateful-memory-mode) mode.
600
-
The optional parameter `pred` provides a `simd_mask`. If some element in `pred` is zero, then the corresponding memory location is not updated.
601
-
`(usm-*)`, `(acc-*)`: The optional [compile-time properties](#compile-time-properties) list `props` may specify `cache-hints`.
600
+
The optional parameter `mask` provides a `simd_mask`. If some element in `mask` is zero, then the corresponding memory location is not updated.
601
+
`(usm-*)`, `(acc-*)`: The optional [compile-time properties](#compile-time-properties) list `props` may specify `cache-hints`.
602
+
The template parameter `Op` specifies the atomic operation applied to the memory.
603
+
The template parameter `T` specifies the type of the elements used in the atomic_update operation. Only 2,4,8-byte types are supported.
604
+
The template parameter `N` is the number of elements being atomically updated.
605
+
606
+
### Restrictions
607
+
| `Function` | `Condition` | Required Intel GPU |
608
+
|-|-|-|
609
+
| `(usm-au0-*)`, `(acc-au0-*)` | !(cache-hints) and (`N` == 1,2,4,8,16,32) and (sizeof(T) >= 4) | Any Intel GPU |
610
+
| `(usm-au0-*)`, `(acc-au0-*)` | (cache-hints) or (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) | DG2 or PVC |
611
+
| `(usm-au1-*)`, `(acc-au1-*)`, `(usm-au2-*)`, `(acc-au2-*)` | !(cache-hints) and (`N` == 1,2,4,8,16,32) and (sizeof(T) >= 4) and (`Op` is integral operation) | Any Intel GPU |
612
+
| `(usm-au1-*)`, `(acc-au1-*)`, `(usm-au2-*)`, `(acc-au2-*)` | (cache-hints) or (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) or (`Op` is FP operation) | DG2 or PVC |
613
+
|-|-|-|
614
+
| `(slm-au0-*)`, `(lacc-au0-*)` | (`N` == 1,2,4,8,16,32) and (sizeof(T) == 4) | Any Intel GPU |
615
+
| `(slm-au0-*)`, `(lacc-au0-*)` | (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) or (sizeof(T) == 8)| DG2 or PVC |
616
+
| `(slm-au1-*)`, `(lacc-au1-*)`, `(slm-au2-*)`, `(lacc-au2-*)` | (`N` == 1,2,4,8,16,32) and (sizeof(T) == 4) and (`Op` is integral operation) | Any Intel GPU |
617
+
| `(slm-au1-*)`, `(lacc-au1-*)`, `(slm-au2-*)`, `(lacc-au2-*)` | (`N` != 1,2,4,8,16,32) or (sizeof(T) == 2) or (sizeof(T) == 8) or (`Op` is FP operation)| DG2 or PVC |
0 commit comments