You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
auto vec_b = block_load<float, 16>(f32_ptr + 1, props);
64
68
```
69
+
### Cache-hint properties
70
+
Cache-hint properties (if passed) currently add a restriction on the target-device, it must be a Intel® Arc Series (aka DG2) or Intel® Data Center GPU Max Series (aka PVC).
71
+
The valid combinations of L1/L2 cache-hints depend on the usage context.. There are 4 contexts:
#### Valid combinations of `L1` and `L2` cache-hints for `load` functions:
78
+
|`L1`|`L2`|
79
+
|-|-|
80
+
| none | none |
81
+
| uncached | uncached |
82
+
| uncached | cached |
83
+
| cached | uncached |
84
+
| cached | cached |
85
+
| streaming | uncached |
86
+
| streaming | cached |
87
+
| read_invalidate | cached |
88
+
89
+
#### Valid combinations of `L1` and `L2` cache-hints for `prefetch` functions:
90
+
|`L1`|`L2`|
91
+
|-|-|
92
+
| uncached | cached |
93
+
| cached | uncached |
94
+
| cached | cached |
95
+
| streaming | uncached |
96
+
| streaming | cached |
97
+
98
+
#### Valid combinations of `L1` and `L2` cache-hints for `store` functions:
99
+
|`L1`|`L2`|
100
+
|-|-|
101
+
| none | none |
102
+
| uncached | uncached |
103
+
| uncached | write_back |
104
+
| write_through | uncached |
105
+
| write_through | write_back |
106
+
| streaming | uncached |
107
+
| streaming | write_back |
108
+
| write_back | write_back |
109
+
110
+
#### Valid combinations of `L1` and `L2` cache-hints for `atomic_update` functions:
111
+
|`L1`|`L2`|
112
+
|-|-|
113
+
| none | none |
114
+
| uncached | uncached |
115
+
| uncached | write_back |
65
116
66
-
Cache-hint properties (if passed) currently adds a restriction on the target-device, it must be a Intel® Arc Series (aka DG2) or Intel® Data Center GPU Max Series (aka PVC).
67
117
68
118
## block_load(...) - fast load from a contiguous memory block
69
119
```C++
@@ -114,6 +164,8 @@ The optional [compile-time properties](#compile-time-properties) list `props` ma
114
164
### Restrictions/assumptions:
115
165
`Alignment` - if not specified by the `props` param, then `assumed` alignment is used. If the actual memory reference has a smaller alignment than the `assumed`, then it must be explicitly passed in `props` argument.
116
166
167
+
`Cache-hint` properties if passed must follow the [rules](#valid-combinations-of-l1-and-l2-cache-hints-for-load-functions) for `load` functions.
| `(usm-bl-*)` | `max(4, sizeof(T))` | `sizeof(T)` if no cache-hints, otherwise it is `max(4, sizeof(T))` |
@@ -183,6 +235,8 @@ The optional [compile-time properties](#compile-time-properties) list `props` ma
183
235
### Restrictions/assumptions:
184
236
`Alignment` - if not specified by the `props` param, then `assumed` alignment is used. If the actual memory reference requires a smaller alignment than the `assumed`, then it must be explicitly passed in `props` argument.
185
237
238
+
`Cache-hint` properties if passed must follow the [rules](#valid-combinations-of-l1-and-l2-cache-hints-for-store-functions) for `store` functions.
unsigned SurfacePitch, int X, int Y, simd<T, N> Vals, PropertyListT props = {});
617
+
618
+
```
619
+
### Description
620
+
Stores the vector `Vals` of the type `simd<T, N>` to 2D memory block where `N` is `BlockWidth * BlockHeight`.
621
+
`T` is element type of the values to be stored to memory.
622
+
`BlockWidth` - the block width in number of elements.
623
+
`BlockHeight` - the block height in number of elements.
624
+
`N` - (automatically deduced) the size of the vector to be stored.
625
+
`Ptr` - the surface base address for this operation.
626
+
`SurfaceWidth` - the surface width minus 1 in bytes.
627
+
`SurfaceHeight` - the surface height minus 1 in rows.
628
+
`SurfacePitch` - the surface pitch minus 1 in bytes.
629
+
`X` - zero based X-coordinate of the left upper rectangle corner in number of elements.
630
+
`Y` - zero based Y-coordinate of the left upper rectangle corner in rows.
631
+
`props` - The optional compile-time properties. Only cache hint properties are used.
632
+
633
+
### Restrictions
634
+
* This function is available only for Intel® Data Center GPU Max Series (aka PVC).
635
+
* `Cache-hint` properties if passed must follow the [rules](#valid-combinations-of-l1-and-l2-cache-hints-for-store-functions) for `store` functions.
636
+
* `BlockWidth` * `BlockHeight` * sizeof(`T`) must not exceed 512.
637
+
* `BlockHeight` must not exceed 8.
638
+
* `BlockWidth` must be 4 or more for `bytes`, 2 or more for `words`, 1 or more for `dwords` and `qwords`.
639
+
* `BlockWidth` must not exceed 64 for `bytes`, 32 for `words`, 16 for `dwords, and 8 for `qwords`.
640
+
468
641
## atomic_update(...)
469
642
470
643
### atomic_update() with 0 operands (inc, dec, load)
@@ -604,6 +777,8 @@ The template parameter `T` specifies the type of the elements used in the atomic
604
777
The template parameter `N` is the number of elements being atomically updated.
605
778
606
779
### Restrictions
780
+
'Cache-hint` properties if passed must follow the [rules](#valid-combinations-of-l1-and-l2-cache-hints-for-atomic_update-functions) for `atomic_update` functions.
781
+
607
782
|`Function`|`Condition`| Required Intel GPU |
608
783
|-|-|-|
609
784
|`(usm-au0-*)`, `(acc-au0-*)`| !(cache-hints) and (`N` == 1,2,4,8,16,32) and (sizeof(T) >= 4) | Any Intel GPU |
@@ -699,13 +874,18 @@ The `byte_offsets` is a vector of any integral type elements, limited in [statef
699
874
700
875
`(acc-pf-7,8,9,10)`: Prefetches a linear block of memory addressed by the accessor `acc` and the optional `byte-offset` parameter, which is 64-bit in [stateless](#statelessstateful-memory-mode) mode(default), and 32-bit in [stateful](#statelessstateful-memory-mode) mode.
701
876
702
-
703
877
`(usm-pf-1,2,3,4,5,6)`, `(acc-pf-1,2,3,4,5,6)`: The optional parameter `mask` provides a `simd_mask`. If some element in `mask` is zero, then the corresponding memory location is not prefetched.
704
878
`(usm-pf-7,8,9,10)`, `(acc-pf-7,8,9,10)`: The optional parameter `mask` provides 1-element
705
879
`simd_mask`. If it is zero, then the whole prefetch operation is skipped.
706
880
707
881
`(usm-pf-*)`, `(acc-pf-*)`: The [compile-time properties](#compile-time-properties) list `props` must specify `cache-hints`.
708
882
883
+
### Restrictions
884
+
885
+
* This function is available only for Intel® Arc Series (aka DG2) or Intel® Data Center GPU Max Series (aka PVC).
886
+
* 'Cache-hint` properties must follow the [rules](#valid-combinations-of-l1-and-l2-cache-hints-for-prefetch-functions) for `prefetch` functions.
0 commit comments