Skip to content

Commit a69b053

Browse files
committed
Merge branch 'sycl' into oneapins
Signed-off-by: James Brodman <[email protected]>
2 parents df2f889 + 4d5b34d commit a69b053

File tree

15 files changed

+40
-226
lines changed

15 files changed

+40
-226
lines changed

sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc

Lines changed: 1 addition & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,7 @@ John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com)
5151

5252
== Dependencies
5353

54-
This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6 and the following extensions:
55-
56-
- +SYCL_INTEL_extended_atomics+
54+
This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6.
5755

5856
== Overview
5957

@@ -69,10 +67,6 @@ The extension introduces the following functions:
6967
- +reduce+
7068
- +exclusive_scan+
7169
- +inclusive_scan+
72-
- +barrier+
73-
74-
The definitions and behavior of the following functions are based on equivalents in the SYCL 2020 provisional specification:
75-
- +barrier+
7670

7771
=== Alignment with OpenCL vs C++
7872

@@ -258,21 +252,6 @@ The return types of the collective functions in {cpp}17 are not deduced from the
258252
|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _binary_op_ and _init_ must be the same for all work-items in the group. _binary_op(init, *first)_ must return a value of type _T_.
259253
|===
260254

261-
==== Synchronization
262-
263-
The behavior of memory fences in this section is aligned with the single happens-before relationship defined by the +SYCL_INTEL_extended_atomics+ extension.
264-
265-
|===
266-
|Function|Description
267-
268-
|+template <typename Group> void barrier(Group g);+
269-
|Synchronize all work-items in the group, and ensure that all memory accesses to any address space prior to the barrier are visible to all work-items in the group after the barrier. The scope of the group memory fences implied by this barrier is the narrowest scope including all work-items in the group.
270-
271-
|+template <typename Group> void barrier(Group g, memory_scope scope);+
272-
|Synchronize all work-items in the group, and ensure that all memory accesses to any address space prior to the barrier are visible to all work-items specified by _scope_ after the barrier. The scope of the group memory fences implied by this barrier is controlled by _scope_ and must be broader than the narrowest scope including all work-items in the group. If the specified _scope_ is narrower than the narrowest scope including all work-items in the group, the _scope_ argument is ignored.
273-
274-
|===
275-
276255
== Issues
277256

278257
None.
@@ -291,7 +270,6 @@ None.
291270
|========================================
292271
|Rev|Date|Author|Changes
293272
|1|2020-01-30|John Pennycook|*Initial public working draft*
294-
|2|2020-07-28|John Pennycook|*Add group barrier*
295273
|========================================
296274
297275
//************************************************************************

sycl/doc/extensions/SubGroupAlgorithms/SYCL_INTEL_sub_group_algorithms.asciidoc

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,6 @@ This extension adds sub-group support to all of the functions from +SYCL_INTEL_g
7070
- +reduce+
7171
- +exclusive_scan+
7272
- +inclusive_scan+
73-
- +barrier+
7473

7574
It additionally introduces a number of functions that are currently specific to sub-groups:
7675

@@ -166,7 +165,6 @@ None.
166165
|========================================
167166
|Rev|Date|Author|Changes
168167
|1|2020-03-16|John Pennycook|*Initial public working draft*
169-
|2|2020-07-28|John Pennycook|*Add group barrier*
170168
|========================================
171169
172170
//************************************************************************

sycl/include/CL/sycl/ONEAPI/group_algorithm.hpp

Lines changed: 0 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -79,16 +79,6 @@ template <> inline id<3> linear_id_to_id(range<3> r, size_t linear_id) {
7979
return result;
8080
}
8181

82-
// TODO: Replace with Group::fence_scope from SYCL 2020 provisional
83-
template <typename Group> struct FenceScope {
84-
static constexpr ONEAPI::memory_scope value =
85-
ONEAPI::memory_scope::work_group;
86-
};
87-
88-
template <> struct FenceScope<ONEAPI::sub_group> {
89-
static constexpr ONEAPI::memory_scope value = ONEAPI::memory_scope::sub_group;
90-
};
91-
9282
template <typename T, class BinaryOperation> struct identity {};
9383

9484
template <typename T, typename V> struct identity<T, ONEAPI::plus<V>> {
@@ -908,42 +898,6 @@ template <typename Group> bool leader(Group g) {
908898
#endif
909899
}
910900

911-
template <typename Group> void barrier(Group, memory_scope scope) {
912-
static_assert(sycl::detail::is_generic_group<Group>::value,
913-
"Group algorithms only support the sycl::group and "
914-
"intel::sub_group class.");
915-
#ifdef __SYCL_DEVICE_ONLY__
916-
// MemoryScope must be broader than Group scope for correctness
917-
auto GroupScope = detail::FenceScope<Group>::value;
918-
auto BroadestScope = (scope > GroupScope) ? scope : GroupScope;
919-
auto MemoryScope = sycl::detail::spirv::getScope(BroadestScope);
920-
auto ExecutionScope = sycl::detail::spirv::group_scope<Group>::value;
921-
__spirv_ControlBarrier(ExecutionScope, MemoryScope,
922-
__spv::MemorySemanticsMask::AcquireRelease |
923-
__spv::MemorySemanticsMask::SubgroupMemory |
924-
__spv::MemorySemanticsMask::WorkgroupMemory |
925-
__spv::MemorySemanticsMask::CrossWorkgroupMemory);
926-
#else
927-
(void)scope;
928-
throw runtime_error("Group algorithms are not supported on host device.",
929-
PI_INVALID_DEVICE);
930-
#endif
931-
}
932-
933-
template <typename Group> void barrier(Group g) {
934-
static_assert(sycl::detail::is_generic_group<Group>::value,
935-
"Group algorithms only support the sycl::group and "
936-
"intel::sub_group class.");
937-
#ifdef __SYCL_DEVICE_ONLY__
938-
auto MemoryScope = detail::FenceScope<Group>::value;
939-
barrier(g, MemoryScope);
940-
#else
941-
(void)g;
942-
throw runtime_error("Group algorithms are not supported on host device.",
943-
PI_INVALID_DEVICE);
944-
#endif
945-
}
946-
947901
} // namespace ONEAPI
948902
} // namespace sycl
949903
} // __SYCL_INLINE_NAMESPACE(cl)

sycl/test/group-algorithm/barrier.cpp

Lines changed: 0 additions & 58 deletions
This file was deleted.

sycl/test/lit.cfg.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
config.test_exec_root = os.path.join(config.sycl_obj_root, 'test')
3737

3838
# Propagate some variables from the host environment.
39-
llvm_config.with_system_environment(['PATH', 'OCL_ICD_FILENAME', 'SYCL_DEVICE_ALLOWLIST', 'SYCL_CONFIG_FILE_NAME'])
39+
llvm_config.with_system_environment(['PATH', 'OCL_ICD_FILENAMES', 'SYCL_DEVICE_ALLOWLIST', 'SYCL_CONFIG_FILE_NAME'])
4040

4141
# Configure LD_LIBRARY_PATH or corresponding os-specific alternatives
4242
if platform.system() == "Linux":

sycl/test/sub_group/broadcast.hpp

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,10 @@
88

99
#include "helper.hpp"
1010
#include <CL/sycl.hpp>
11-
template <typename T>
12-
class sycl_subgr;
11+
template <typename T> class sycl_subgr;
1312
using namespace cl::sycl;
14-
template <typename T>
15-
void check(queue &Queue) {
16-
const int G = 240, L = 60;
13+
template <typename T> void check(queue &Queue) {
14+
const int G = 256, L = 64;
1715
try {
1816
nd_range<1> NdRange(G, L);
1917
buffer<T> syclbuf(G);
@@ -23,9 +21,10 @@ void check(queue &Queue) {
2321
auto sgsizeacc = sgsizebuf.get_access<access::mode::read_write>(cgh);
2422
cgh.parallel_for<sycl_subgr<T>>(NdRange, [=](nd_item<1> NdItem) {
2523
ONEAPI::sub_group SG = NdItem.get_sub_group();
26-
/*Broadcast GID of element with SGLID == SGID */
24+
/*Broadcast GID of element with SGLID == SGID % SGMLR*/
2725
syclacc[NdItem.get_global_id()] =
28-
broadcast(SG, T(NdItem.get_global_id(0)), SG.get_group_id());
26+
broadcast(SG, T(NdItem.get_global_id(0)),
27+
SG.get_group_id() % SG.get_max_local_range()[0]);
2928
if (NdItem.get_global_id(0) == 0)
3029
sgsizeacc[0] = SG.get_max_local_range()[0];
3130
});
@@ -44,7 +43,8 @@ void check(queue &Queue) {
4443
WGid++;
4544
SGid = 0;
4645
}
47-
exit_if_not_equal<T>(syclacc[j], L * WGid + SGid + SGid * sg_size,
46+
exit_if_not_equal<T>(syclacc[j],
47+
L * WGid + SGid % sg_size + SGid * sg_size,
4848
"broadcasted value");
4949
}
5050
} catch (exception e) {

sycl/test/sub_group/generic-shuffle.cpp

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
1-
// UNSUPPORTED: cuda || cpu
1+
// UNSUPPORTED: cuda
22
// CUDA compilation and runtime do not yet support sub-groups.
3-
// #2245 failed on OpenCL CPU (2020.10.7.0.15) with avx2 instruction set
43
//
54
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
65
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out
@@ -24,7 +23,7 @@ template <typename T> class pointer_kernel;
2423
using namespace cl::sycl;
2524

2625
template <typename T>
27-
void check_pointer(queue &Queue, size_t G = 240, size_t L = 60) {
26+
void check_pointer(queue &Queue, size_t G = 256, size_t L = 64) {
2827
try {
2928
nd_range<1> NdRange(G, L);
3029
buffer<T *> buf(G);
@@ -118,7 +117,7 @@ void check_pointer(queue &Queue, size_t G = 240, size_t L = 60) {
118117
}
119118

120119
template <typename T, typename Generator>
121-
void check_struct(queue &Queue, Generator &Gen, size_t G = 240, size_t L = 60) {
120+
void check_struct(queue &Queue, Generator &Gen, size_t G = 256, size_t L = 64) {
122121

123122
// Fill a vector with values that will be shuffled
124123
std::vector<T> values(G);

sycl/test/sub_group/load_store.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
// UNSUPPORTED: cuda
1+
// UNSUPPORTED: cuda || cpu
22
// CUDA compilation and runtime do not yet support sub-groups.
3+
// #2252 Disable until all variants of built-ins are available in OpenCL CPU
4+
// runtime for every supported ISA
35
//
46
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
57
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out

sycl/test/sub_group/reduce.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
// UNSUPPORTED: cuda
1+
// UNSUPPORTED: cuda || cpu
22
// CUDA compilation and runtime do not yet support sub-groups.
3+
// #2252 Disable until all variants of built-ins are available in OpenCL CPU
4+
// runtime for every supported ISA
35
//
46
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
57
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out

sycl/test/sub_group/reduce.hpp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,13 @@
99
#include "helper.hpp"
1010
#include <CL/sycl.hpp>
1111

12-
template <typename T, class BinaryOperation>
13-
class sycl_subgr;
12+
template <typename T, class BinaryOperation> class sycl_subgr;
1413

1514
using namespace cl::sycl;
1615

1716
template <typename T, class BinaryOperation>
1817
void check_op(queue &Queue, T init, BinaryOperation op, bool skip_init = false,
19-
size_t G = 240, size_t L = 60) {
18+
size_t G = 256, size_t L = 64) {
2019
try {
2120
nd_range<1> NdRange(G, L);
2221
buffer<T> buf(G);
@@ -65,8 +64,7 @@ void check_op(queue &Queue, T init, BinaryOperation op, bool skip_init = false,
6564
}
6665
}
6766

68-
template <typename T>
69-
void check(queue &Queue, size_t G = 240, size_t L = 60) {
67+
template <typename T> void check(queue &Queue, size_t G = 256, size_t L = 64) {
7068
// limit data range for half to avoid rounding issues
7169
if (std::is_same<T, cl::sycl::half>::value) {
7270
G = 64;

sycl/test/sub_group/reduce_fp64.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
// UNSUPPORTED: cuda
1+
// UNSUPPORTED: cuda || cpu
22
// CUDA compilation and runtime do not yet support sub-groups.
3+
// #2252 Disable until all variants of built-ins are available in OpenCL CPU
4+
// runtime for every supported ISA
35
//
46
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
57
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out

sycl/test/sub_group/scan.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
// UNSUPPORTED: cuda
1+
// UNSUPPORTED: cuda || cpu
22
// CUDA compilation and runtime do not yet support sub-groups.
3+
// #2252 Disable until all variants of built-ins are available in OpenCL CPU
4+
// runtime for every supported ISA
35
//
46
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
57
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out

sycl/test/sub_group/scan.hpp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,13 @@
1010
#include <CL/sycl.hpp>
1111
#include <limits>
1212

13-
template <typename T, class BinaryOperation>
14-
class sycl_subgr;
13+
template <typename T, class BinaryOperation> class sycl_subgr;
1514

1615
using namespace cl::sycl;
1716

1817
template <typename T, class BinaryOperation>
1918
void check_op(queue &Queue, T init, BinaryOperation op, bool skip_init = false,
20-
size_t G = 120, size_t L = 60) {
19+
size_t G = 256, size_t L = 64) {
2120
try {
2221
nd_range<1> NdRange(G, L);
2322
buffer<T> exbuf(G), inbuf(G);
@@ -73,8 +72,7 @@ void check_op(queue &Queue, T init, BinaryOperation op, bool skip_init = false,
7372
}
7473
}
7574

76-
template <typename T>
77-
void check(queue &Queue, size_t G = 120, size_t L = 60) {
75+
template <typename T> void check(queue &Queue, size_t G = 256, size_t L = 64) {
7876
// limit data range for half to avoid rounding issues
7977
if (std::is_same<T, cl::sycl::half>::value) {
8078
G = 64;

sycl/test/sub_group/scan_fp64.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
// UNSUPPORTED: cuda
1+
// UNSUPPORTED: cuda || cpu
22
// CUDA compilation and runtime do not yet support sub-groups.
3+
// #2252 Disable until all variants of built-ins are available in OpenCL CPU
4+
// runtime for every supported ISA
35
//
46
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
57
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out

0 commit comments

Comments
 (0)