@@ -43,7 +43,6 @@ SYCL specification refer to that revision.
43
43
44
44
This extension also depends on the following other SYCL extensions:
45
45
46
- * link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_sub_group_mask.asciidoc[sycl_ext_oneapi_sub_group_mask]
47
46
* link:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_root_group.asciidoc[sycl_ext_oneapi_root_group]
48
47
49
48
@@ -143,7 +142,6 @@ following user-constructed groups:
143
142
- `cluster_group`
144
143
- `tangle_group`
145
144
- `opportunistic_group`
146
- - `masked_sub_group`
147
145
148
146
The `is_fixed_topology_group` and `is_user_constructed_group` traits can be
149
147
used to detect whether a group type represents a fixed topology or
@@ -175,12 +173,11 @@ namespace sycl::ext::oneapi::experimental {
175
173
`root_group`, `group` or `sub_group`.
176
174
177
175
`is_user_constructed_group<T>::value` is `std::true_type` if `T` is one of:
178
- `ballot_group`, `cluster_group`, `tangle_group`, `opportunisic_group` or
179
- `masked_sub_group`.
176
+ `ballot_group`, `cluster_group`, `tangle_group`, or `opportunisic_group`.
180
177
181
178
Additionally, the `is_group<T>::value` trait from SYCL 2020 is `std::true_type`
182
- if `T` is one of: `ballot_group`, `cluster_group`, `tangle_group`,
183
- `opportunistic_group` or `masked_sub_group` .
179
+ if `T` is one of: `ballot_group`, `cluster_group`, `tangle_group`, or
180
+ `opportunistic_group`.
184
181
185
182
186
183
=== Group Functions and Algorithms
@@ -199,13 +196,13 @@ make assumptions regarding work-item scheduling and forward progress
199
196
guarantees.
200
197
201
198
The following group functions support the `ballot_group`, `cluster_group`,
202
- `tangle_group`, `opportunistic_group` and `masked_sub_group ` group types:
199
+ `tangle_group`, and `opportunistic_group ` group types:
203
200
204
201
* `group_barrier`
205
202
* `group_broadcast`
206
203
207
204
The following group algorithms support `ballot_group`, `cluster_group`,
208
- `tangle_group`, `opportunistic_group` and `masked_sub_group ` group types:
205
+ `tangle_group`, and `opportunistic_group ` group types:
209
206
210
207
* `joint_any_of` and `any_of_group`
211
208
* `joint_all_of` and `all_of_group`
@@ -898,217 +895,6 @@ int atomic_aggregate_inc(sycl::sub_group sg, sycl::atomic_ref<int, Order, Scope,
898
895
----
899
896
900
897
901
- === Masked Sub-groups
902
-
903
- A masked sub-group is a non-contiguous subset of a sub-group, representing an
904
- arbitrary user-defined subset of work-items. The members of a masked sub-group
905
- are described by a bitmask, where a 1 denotes membership of the group.
906
-
907
- The work-items within a masked sub-group retain information about the original
908
- sub-group, and many member functions of the `masked_sub_group` class reflect
909
- this. Developers are strongly recommended to use other user-constructed groups
910
- that match their use-case, both for improved performance and a simplified
911
- mental model.
912
-
913
- NOTE: Masked sub-groups exist primarily to support experimentation with
914
- arbitrary subsets of work-items within a sub-group, and to support the
915
- migration of algorithms already expressed via masks.
916
-
917
-
918
- ==== Creation
919
-
920
- Masked sub-groups are created by calls to the `get_masked_sub_group()`
921
- function, which applies a bitmask to an existing sub-group.
922
-
923
- NOTE: Creating a masked sub-group does not require a barrier across all
924
- work-items in the parent sub-group or introduce any sychronization, since
925
- work-items can independently identify members directly from the specified
926
- membership mask.
927
-
928
- [source, c++]
929
- ----
930
- namespace ext::oneapi::experimental {
931
-
932
- masked_sub_group get_masked_sub_group(sub_group sg, sub_group_mask mask);
933
-
934
- } // namespace ext::oneapi::experimental
935
- ----
936
-
937
- _Preconditions_: All work-items in `sg` with a corresponding bit set in `mask`
938
- must encounter this function in converged control flow.
939
-
940
- _Returns_: A `masked_sub_group` consisting of the work-items in `sg` with a
941
- corresponding bit set in `mask`.
942
-
943
-
944
- ==== `masked_sub_group` Class
945
-
946
- The `masked_sub_group` class contains an additional `get_mask()` function,
947
- returning the membership mask. Since the other member functions of
948
- `masked_sub_group` reflect the original sub-group, developers must use this
949
- mask to reason about the local numbering of work-items within the group.
950
-
951
- [source, c++]
952
- ----
953
- namespace sycl::ext::oneapi::experimental {
954
-
955
- class masked_sub_group {
956
- public:
957
- using id_type = id<1>;
958
- using range_type = range<1>;
959
- using linear_id_type = uint32_t;
960
- static constexpr int dimensions = 1;
961
- static constexpr sycl::memory_scope fence_scope =
962
- sycl::memory_scope::sub_group;
963
-
964
- id_type get_group_id() const;
965
-
966
- id_type get_local_id() const;
967
-
968
- range_type get_group_range() const;
969
-
970
- range_type get_local_range() const;
971
-
972
- linear_id_type get_group_linear_id() const;
973
-
974
- linear_id_type get_local_linear_id() const;
975
-
976
- linear_id_type get_group_linear_range() const;
977
-
978
- linear_id_type get_local_linear_range() const;
979
-
980
- bool leader() const;
981
-
982
- sub_group_mask get_mask() const;
983
- };
984
-
985
- }
986
- ----
987
-
988
- [source,c++]
989
- ----
990
- id_type get_group_id() const;
991
- ----
992
- _Returns_: An `id` representing the index of the sub-group within the
993
- parent work-group.
994
-
995
- [source,c++]
996
- ----
997
- id_type get_local_id() const;
998
- ----
999
- _Returns_: An `id` representing the calling work-item's position within
1000
- the sub-group.
1001
-
1002
- [source,c++]
1003
- ----
1004
- range_type get_group_range() const;
1005
- ----
1006
- _Returns_: A `range` representing the number of sub-groups within the parent
1007
- work-group.
1008
-
1009
- [source,c++]
1010
- ----
1011
- range_type get_local_range() const;
1012
- ----
1013
- _Returns_: A `range` representing the number of work-items in the sub-group.
1014
-
1015
- [source,c++]
1016
- ----
1017
- id_type get_group_linear_id() const;
1018
- ----
1019
- _Returns_: A linearized version of the `id` returned by `get_group_id()`.
1020
-
1021
- [source,c++]
1022
- ----
1023
- id_type get_local_linear_id() const;
1024
- ----
1025
- _Returns_: A linearized version of the `id` returned by `get_local_linear_id()`.
1026
-
1027
- [source,c++]
1028
- ----
1029
- range_type get_group_linear_range() const;
1030
- ----
1031
- _Returns_: A linearized version of the `id` returned by `get_group_range()`.
1032
-
1033
- [source,c++]
1034
- ----
1035
- range_type get_local_linear_range() const;
1036
- ----
1037
- _Returns_: A linearized version of the `id` returned by `get_local_range()`.
1038
-
1039
- [source,c++]
1040
- ----
1041
- bool leader() const;
1042
- ----
1043
- _Returns_: `true` for exactly one work-item in the masked sub-group, if the
1044
- calling work-item is the leader of the masked sub-group, and `false` for all
1045
- other work-items in the masked sub-group. The leader of the masked sub-group
1046
- is guaranteed to be the work-item corresponding to the least-significant bit in
1047
- the mask.
1048
-
1049
- [source,c++]
1050
- ----
1051
- sub_group_mask get_mask() const;
1052
- ----
1053
- _Returns_: A `sub_group_mask` representing which work-items from the sub-group
1054
- are considered a member of this `masked_sub_group`.
1055
-
1056
-
1057
- ==== Usage Example
1058
-
1059
- A `masked_sub_group` can be used to implement algorithms where a membership
1060
- mask is already present or easily computed:
1061
-
1062
- [source, c++]
1063
- ----
1064
- // set initial mask to full sub-group
1065
- auto sg = it.get_sub_group();
1066
- auto active = std::pow(2, sg.get_max_local_range()) - 1;
1067
-
1068
- float sum = x;
1069
- for (int shift = sg.get_max_local_range() / 2; shift > 0; shift /= 2)
1070
- {
1071
- // create representation of work-items still active in this phase
1072
- auto masked_sg = sycl::ext::oneapi::experimental::get_masked_sub_group(sg, active);
1073
-
1074
- // call shift only for work-items that are still active
1075
- // using the parent sub_group would have been unsafe due to divergence
1076
- sum += sycl::shift_group_left(masked_sg, x, shift);
1077
-
1078
- // remove half of the work-items from the group
1079
- active >>= shift;
1080
- }
1081
- ----
1082
-
1083
- Note that in many cases these algorithms can be translated (manually) to use
1084
- one of the alternative group types:
1085
-
1086
- [source, c++]
1087
- ----
1088
- // set initial mask to full sub-group
1089
- auto sg = it.get_sub_group();
1090
-
1091
- float sum = x;
1092
- for (int phase = 1; phase < sg.get_max_local_range() / 2; phase *= 2)
1093
- {
1094
- // create representation of work-items still active in this phase
1095
- auto active_group = sycl::ext::oneapi::experimental::get_tangle_group(sg);
1096
-
1097
- // call shift only for work-items that are still active
1098
- // note that the shift is now 1, because of how tangle-group local IDs are defined
1099
- sum += sycl::shift_group_left(active_group, x, 1);
1100
- }
1101
- ----
1102
-
1103
- Or, even more simply, one of the SYCL group algorithms:
1104
-
1105
- [source, c++]
1106
- ----
1107
- auto sg = it.get_sub_group();
1108
- sum = sycl::reduce_over_group(sg, x, sycl::plus<>());
1109
- ----
1110
-
1111
-
1112
898
== Implementation notes
1113
899
1114
900
This non-normative section provides information about one possible
0 commit comments