Skip to content

Commit f1e66f5

Browse files
authored
[ESIMD][NFC][DOC] Add fence to the ESIMD SPEC functions (#13135)
It also adds a deprecated warning for l3_flush_instructions, l3_flush_texture_data, l3_flush_constant_data, l3_flush_rw_data enums as they in fact refer to L2 cache, not L3. Corresponding l2_* enums were added. --------- Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
1 parent 7fb3b20 commit f1e66f5

File tree

3 files changed

+100
-11
lines changed

3 files changed

+100
-11
lines changed

sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd_functions.md

Lines changed: 82 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ See more general ESIMD documentation [here](./sycl_ext_intel_esimd.md).
1313
- [scatter(...)](#scatter---store-to-memory-locations-addressed-by-a-vector-of-offsets)
1414
- [atomic_update(...)](#atomic_update)
1515
- [prefetch(...)](#prefetch)
16+
- [fence(...) - set the memory read/write order](#fence---set-the-memory-readwrite-order)
1617
- [Examples](#examples)
1718

1819
## Other content:
@@ -99,7 +100,7 @@ template <typename T, int N, typename PropertyListT = empty_properties_t>
99100
/*slm-bl-1*/ simd<T, N> slm_block_load(uint32_t byte_offset, PropertyListT props={});
100101
/*slm-bl-2*/ simd<T, N> slm_block_load(uint32_t byte_offset, simd_mask<1> pred, PropertyListT props={});
101102
/*slm-bl-3*/ simd<T, N> slm_block_load(uint32_t byte_offset, simd_mask<1> pred, simd<T, N> pass_thru, PropertyListT props={});
102-
}
103+
} // end namespace sycl::ext::intel::esimd
103104
```
104105
### Description
105106
`(usm-bl-*)`: Loads a contiguous memory block from global memory referenced by the USM pointer `ptr` optionally adjusted by `byte_offset`.
@@ -169,7 +170,7 @@ void block_store(AccessorT lacc, simd<T, N> vals, simd_mask<1> pred, PropertyLis
169170
template <typename T, int N, typename PropertyListT = empty_properties_t>
170171
/*slm-bs-1*/ void slm_block_store(uint32_t byte_offset, simd<T, N> vals, simd_mask<1> pred, PropertyListT props={});
171172
/*slm-bs-2*/ void slm_block_store(uint32_t byte_offset, simd<T, N> vals, PropertyListT props={});
172-
}
173+
} // end namespace sycl::ext::intel::esimd
173174
```
174175
### Description
175176
`(usm-bs-*)`: Stores `vals` to a contiguous global memory block referenced by the USM pointer `ptr` optionally adjusted by `byte_offset`.
@@ -328,6 +329,7 @@ template <typename T, int N, int VS = 1, typename OffsetSimdViewT, typename Prop
328329
simd_mask<N / VS> mask, PropertyListT props = {});
329330
/*slm-ga-9*/ simd <T, N> gather(OffsetSimdViewT byte_offsets,
330331
PropertyListT props = {});
332+
} // end namespace sycl::ext::intel::esimd
331333
```
332334
333335
### Description
@@ -399,6 +401,7 @@ template <typename T, int N, int VS = 1, typename OffsetSimdViewT, typename Prop
399401
simd_mask<N / VS> mask, PropertyListT props = {});
400402
/*slm-sc-4*/ void scatter(OffsetSimdViewT byte_offsets, simd<T, N> vals,
401403
PropertyListT props = {});
404+
} // end namespace sycl::ext::intel::esimd
402405
```
403406

404407
### Description
@@ -414,6 +417,7 @@ The optional [compile-time properties](#compile-time-properties) list `props` ma
414417

415418
### atomic_update() with 0 operands (inc, dec, load)
416419
```C++
420+
namespace sycl::ext::intel::esimd {
417421
// Atomic update the USM memory locations - zero operands (dec, load, etc.).
418422
template <atomic_op Op, typename T, int N, typename Toffset, typename PropertyListT = empty_properties_t>
419423
/*usm-au0-1*/ simd<T, N> atomic_update(T *p, simd<Toffset, N> byte_offset, simd_mask<N> mask, props = {});
@@ -533,6 +537,7 @@ template <atomic_op Op, typename T, int N, typename AccessorT>
533537
template <atomic_op Op, typename T, int N>
534538
/*slm-au2-1*/ simd<T, N> slm_atomic_update(simd<uint32_t, N> byte_offset,
535539
simd<T, N> src0, simd<T, N> src1, simd_mask<N> mask = 1);
540+
} // end namespace sycl::ext::intel::esimd
536541
```
537542
### Description
538543
`(usm-*)`: Atomically updates the global memory locations addressed by the base USM pointer `ptr` and byte-offsets `byte_offset`.
@@ -545,6 +550,7 @@ The optional parameter `pred` provides a `simd_mask`. If some element in `pred`
545550
546551
## prefetch(...)
547552
```C++
553+
namespace sycl::ext::intel::esimd {
548554
template <typename T, int N, int VS, typename OffsetT, typename PropertyListT = empty_properties_t>
549555
/*usm-pf-1*/ void prefetch(const T *p, simd<OffsetT, N / VS> byte_offsets,
550556
simd_mask<N / VS> mask, PropertyListT props = {});
@@ -612,6 +618,7 @@ template <typename T, int VS = 1, typename AccessorT,
612618
typename PropertyListT = empty_properties_t>
613619
/*acc-pf-9*/ void prefetch(AccessorT acc, simd_mask<1> mask, PropertyListT props = {});
614620
/*acc-pf-10*/ void prefetch(AccessorT acc, PropertyListT props = {});
621+
} // end namespace sycl::ext::intel::esimd
615622
```
616623
### Description
617624
`(usm-pf-1,2,3,4,5,6)`: Prefetches the memory locations addressed by the base USM pointer `ptr` and the vector of any integral type byte-offsets `byte_offsets`.
@@ -631,6 +638,79 @@ The `byte_offsets` is a vector of any integral type elements, limited in [statef
631638
`(usm-pf-*)`, `(acc-pf-*)`: The [compile-time properties](#compile-time-properties) list `props` must specify `cache-hints`.
632639

633640

641+
## fence(...) - set the memory read/write order
642+
```C++
643+
namespace sycl::ext::intel::esimd {
644+
enum fence_mask : uint8_t {
645+
/// “Commit enable” - wait for fence to complete before continuing.
646+
global_coherent_fence = 0x1,
647+
/// Flush the instruction cache.
648+
l2_flush_instructions = 0x2,
649+
/// Flush sampler (texture) cache.
650+
l2_flush_texture_data = 0x4,
651+
/// Flush constant cache.
652+
l2_flush_constant_data = 0x8,
653+
/// Flush constant cache.
654+
l2_flush_rw_data = 0x10,
655+
/// Issue SLM memory barrier only. If not set, the memory barrier is global.
656+
local_barrier = 0x20,
657+
/// Flush L1 read - only data cache.
658+
l1_flush_ro_data = 0x40
659+
};
660+
/*fence-1*/template <uint8_t ctrl_mask> void fence();
661+
662+
663+
/// The target memory kind for fence() operation.
664+
enum class memory_kind : uint8_t {
665+
global = 0, /// untyped global memory
666+
image = 2, /// image (also known as typed global memory)
667+
local = 3, /// shared local memory
668+
};
669+
/// The cache flush operation to apply to caches after fence() is complete.
670+
enum class fence_flush_op : uint8_t {
671+
none = 0, /// no operation;
672+
evict = 1, /// R/W: evict dirty lines; R/W and RO: invalidate clean lines
673+
invalidate = 2, /// R/W and RO: invalidate all clean lines;
674+
clean = 4 /// R/W: dirty lines are written to memory, but retained in
675+
/// cache in clean state; RO: no effect.
676+
};
677+
/// The scope that fence() operation should apply to.
678+
enum class fence_scope : uint8_t {
679+
/// Wait until all previous memory transactions from this thread are observed
680+
/// within the local thread-group.
681+
group = 0,
682+
/// Wait until all previous memory transactions from this thread are observed
683+
/// within the local sub-slice.
684+
local = 1,
685+
/// Wait until all previous memory transactions from this thread are observed
686+
/// in the local tile.
687+
tile = 2,
688+
/// Wait until all previous memory transactions from this thread are observed
689+
/// in the local GPU.
690+
gpu = 3,
691+
/// Wait until all previous memory transactions from this thread are observed
692+
/// across all GPUs in the system.
693+
gpus = 4,
694+
/// Global memory data-port only: wait until all previous memory transactions
695+
/// from this thread are observed at the "system" level.
696+
system = 5,
697+
/// Global memory data-port only: for GPUs that do not follow
698+
/// PCIe Write ordering for downstream writes targeting device memory,
699+
/// this op will commit to device memory all downstream and peer writes that
700+
/// have reached the device.
701+
system_acquire = 6
702+
};
703+
704+
/*fence-2*/template <memory_kind Kind = memory_kind::global,
705+
fence_flush_op FenceOp = fence_flush_op::none,
706+
fence_scope Scope = fence_scope::group> void fence();
707+
} // end namespace sycl::ext::intel::esimd
708+
```
709+
### Description
710+
`(fence-1)`: Sets the memory read/write order. This function has pretty limited functionality compared to `(fence-2)`. It accepts an 8-bit `ctrl_mask` containing one or more `fence_mask` enum values in it. It can be used for any Intel GPU.
711+
712+
`(fence-2)`: Sets the memory read/write order. This function provide a bit more flexible controls comparing to `(fence-1)`, but requires `Intel® Arc Series` (aka `DG2`) or `Intel® Data Center GPU Max Series` (aka `PVC`) to run.
713+
634714
## Examples
635715
```C++
636716
using namespace sycl;

sycl/include/sycl/ext/intel/esimd/memory.hpp

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7978,21 +7978,29 @@ enum fence_mask : uint8_t {
79787978
/// “Commit enable” - wait for fence to complete before continuing.
79797979
global_coherent_fence = 0x1,
79807980
/// Flush the instruction cache.
7981-
l3_flush_instructions = 0x2,
7981+
l2_flush_instructions = 0x2,
7982+
l3_flush_instructions __SYCL_DEPRECATED(
7983+
"it means L2 here, use l2_flush_instructions") = l2_flush_instructions,
79827984
/// Flush sampler (texture) cache.
7983-
l3_flush_texture_data = 0x4,
7985+
l2_flush_texture_data = 0x4,
7986+
l3_flush_texture_data __SYCL_DEPRECATED(
7987+
"it means L2 here, use l2_flush_texture_data") = l2_flush_texture_data,
79847988
/// Flush constant cache.
7985-
l3_flush_constant_data = 0x8,
7989+
l2_flush_constant_data = 0x8,
7990+
l3_flush_constant_data __SYCL_DEPRECATED(
7991+
"it means L2 here, use l2_flush_constant_data") = l2_flush_constant_data,
79867992
/// Flush constant cache.
7987-
l3_flush_rw_data = 0x10,
7993+
l2_flush_rw_data = 0x10,
7994+
l3_flush_rw_data __SYCL_DEPRECATED("it means L2 here, use l2_flush_rw_data") =
7995+
l2_flush_rw_data,
79887996
/// Issue SLM memory barrier only. If not set, the memory barrier is global.
79897997
local_barrier = 0x20,
79907998
/// Flush L1 read - only data cache.
79917999
l1_flush_ro_data = 0x40,
79928000
/// Creates a software (compiler) barrier, which does not generate
79938001
/// any instruction and only prevents instruction scheduler from
79948002
/// reordering instructions across this barrier at compile time.
7995-
sw_barrier = 0x80
8003+
sw_barrier __SYCL_DEPRECATED("reserved - this enum is ignored") = 0x80
79968004
};
79978005

79988006
/// esimd::fence sets the memory read/write order.

sycl/test/esimd/slm_gather_scatter.cpp

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
// RUN: %clangxx -fsycl -fsycl-device-only -fsyntax-only -Xclang -verify %s
2-
// expected-no-diagnostics
32

43
#include <limits>
54
#include <sycl/ext/intel/esimd.hpp>
@@ -17,10 +16,12 @@ void kernel() __attribute__((sycl_device)) {
1716
auto v0 = slm_gather<int, 32>(offsets);
1817

1918
constexpr auto fm =
20-
fence_mask::global_coherent_fence | fence_mask::l3_flush_instructions |
21-
fence_mask::l3_flush_texture_data | fence_mask::l3_flush_constant_data |
22-
fence_mask::l3_flush_rw_data | fence_mask::local_barrier |
19+
fence_mask::global_coherent_fence | fence_mask::l2_flush_instructions |
20+
fence_mask::l2_flush_texture_data | fence_mask::l2_flush_constant_data |
21+
fence_mask::l2_flush_rw_data | fence_mask::local_barrier |
2322
fence_mask::l1_flush_ro_data | fence_mask::sw_barrier;
23+
// expected-warning@-1 {{'sw_barrier' is deprecated}}
24+
// expected-note@sycl/ext/intel/esimd/memory.hpp:* {{has been explicitly marked deprecated here}}
2425

2526
esimd::fence<fm>();
2627
esimd::barrier();

0 commit comments

Comments
 (0)