You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sycl/doc/EnvironmentVariables.md
+5-48Lines changed: 5 additions & 48 deletions
Original file line number
Diff line number
Diff line change
@@ -161,54 +161,10 @@ If this environment variable is not set, the preferred work-group size for reduc
161
161
162
162
Note that conflicting configuration tuples in the same list will favor the last entry. For example, a list `cpu:32,gpu:32,cpu:16` will set the preferred work-group size of reductions to 32 for GPUs and 16 for CPUs. This also applies to `*`, for example `cpu:32,*:16` sets the preferred work-group size of reductions on all devices to 16, while `*:16,cpu:32` sets the preferred work-group size of reductions to 32 on CPUs and to 16 on all other devices.
163
163
164
-
## Range Rounded Parallel For
165
-
166
-
Kernels to be executed using a `sycl::range`, and not a `sycl::nd_range`,
167
-
may have their execution space reconfigured by the SYCL runtime. This is done
168
-
since oddly shaped execution dimensions can hinder performance, especially when
169
-
executing kernels on GPUs. It is worth noting that although the
170
-
`sycl::parallel_for` using a `sycl::range` does not expose the concept of a
171
-
`work_group` to the user, behind the scenes all GPU APIs require a work group
172
-
configuration when dispatching kernels. In this case the work group
173
-
configuration is provided by the implementation and not the user.
174
-
175
-
As an example, imagine a SYCL kernel is dispatched with 1d range `{7727}`. Since
176
-
7727 is a prime number, there is no way to divide this kernel up into workgroups
177
-
of any size other than 1. Therefore 7727 workgroups are dispatched, each with
178
-
size 1. Because of the parallel nature of execution on modern GPUs, this
179
-
results in low occupancy, since we are not using all of the available work items
180
-
that execute in lockstep in each (implicit) subgroup. This can hinder
181
-
performance.
182
-
183
-
To mitigate the performance hit of choosing an awkward implicit workgroup size,
184
-
for each kernel using a `sycl::range`, the SYCL runtime will generate two
185
-
kernels:
186
-
187
-
1. The original kernel without any modifications.
188
-
2. The "Range rounded" kernel, which checks the global index of each work item
189
-
at the beginning of execution, exiting early for a work item if the global
190
-
index exceeds the user provided execution range. If the original kernel has
191
-
the signature `foo`, then this kernel will have a signature akin to
0 commit comments