You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL] [NATIVECPU] Disabling default -sycl-opt for NativeCPU and updating docs (#12025)
This PR no longer passes `-mllvm -sycl-opt` to the device compiler
invocation for NativeCPU to enable the original full llvm optimization
pipeline.
`EnvironmentVariables.md` has been updated to document NativeCPU and its
usage in environment variables.
Copy file name to clipboardExpand all lines: sycl/doc/EnvironmentVariables.md
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ compiler and runtime.
9
9
| -------------------- | ------ | ----------- |
10
10
|`ONEAPI_DEVICE_SELECTOR`|[See below.](#oneapi_device_selector)| This device selection environment variable can be used to limit the choice of devices available when the SYCL-using application is run. Useful for limiting devices to a certain type (like GPUs or accelerators) or backends (like Level Zero or OpenCL). This device selection mechanism is replacing `SYCL_DEVICE_FILTER` . The `ONEAPI_DEVICE_SELECTOR` syntax is shared with OpenMP and also allows sub-devices to be chosen. [See below.](#oneapi_device_selector) for a full description. |
11
11
|`SYCL_DEVICE_FILTER` (deprecated) |`backend:device_type:device_num`| Please use `ONEAPI_DEVICE_SELECTOR` environment variable instead. See section [`SYCL_DEVICE_FILTER`](#sycl_device_filter) below for `SYCL_DEVICE_FILTER` description. |
12
-
|`SYCL_DEVICE_ALLOWLIST`| See [below](#sycl_device_allowlist)| Filter out devices that do not match the pattern specified. `BackendName` accepts `host`, `opencl`, `level_zero` or `cuda`. `DeviceType` accepts `host`, `cpu`, `gpu` or `acc`. `DeviceVendorId` accepts uint32_t in hex form (`0xXYZW`). `DriverVersion`, `PlatformVersion`, `DeviceName` and `PlatformName` accept regular expression. Special characters, such as parenthesis, must be escaped. DPC++ runtime will select only those devices which satisfy provided values above and regex. More than one device can be specified using the piping symbol "\|".|
12
+
|`SYCL_DEVICE_ALLOWLIST`| See [below](#sycl_device_allowlist)| Filter out devices that do not match the pattern specified. `BackendName` accepts `host`, `opencl`, `level_zero`, `native_cpu` or `cuda`. `DeviceType` accepts `host`, `cpu`, `gpu` or `acc`. `DeviceVendorId` accepts uint32_t in hex form (`0xXYZW`). `DriverVersion`, `PlatformVersion`, `DeviceName` and `PlatformName` accept regular expression. Special characters, such as parenthesis, must be escaped. DPC++ runtime will select only those devices which satisfy provided values above and regex. More than one device can be specified using the piping symbol "\|".|
|`SYCL_CACHE_DIR`| Path | Path to persistent cache root directory. Default values are `%AppData%\libsycl_cache` for Windows and `$XDG_CACHE_HOME/libsycl_cache` on Linux, if `XDG_CACHE_HOME` is not set then `$HOME/.cache/libsycl_cache`. When none of the environment variables are set SYCL persistent cache is disabled. |
15
15
|`SYCL_CACHE_DISABLE_PERSISTENT (deprecated)`| Any(\*) | Has no effect. |
@@ -58,13 +58,13 @@ The device indices are zero-based and are unique only within a backend. Therefor
58
58
59
59
Additionally, if a sub-device is chosen (via numeric index or wildcard), then an additional layer of partitioning can be specified. In other words, a sub-sub-device can be selected. Like sub-devices, this is done with a period ( `.` ) and a sub-sub-device specifier which is a wildcard symbol ( `*` ) or a numeric index. Example `ONEAPI_DEVICE_SELECTOR=level_zero:0.*.*` would partition device 0 into sub-devices and then partition each of those into sub-sub-devices. The range of grandchild sub-sub-devices would be the final devices available to the app, neither device 0, nor its child partitions would be in that list.
60
60
61
-
Lastly, a filter in the grammar can be thought of as a term in conjuction with an action that is taken on all devices that are selected by the term. The action can be an accept action or a discard action. Based on the action, a filter can be an accept filter or a discard filter.
61
+
Lastly, a filter in the grammar can be thought of as a term in conjunction with an action that is taken on all devices that are selected by the term. The action can be an accept action or a discard action. Based on the action, a filter can be an accept filter or a discard filter.
62
62
The string `<term>` represents an accept filter and the string `!<term>` represents a discard filter. The underlying term is the same but they perform different actions on the matching devices list.
63
63
For example, `!opencl:*` discards all devices of the opencl backend from the list of available devices. The discarding filters, if there are any, must all appear at the end of the selector string.
64
64
When one or more filters accept a device and one or more filters discard the device, the latter have priority and the device is ultimately not made available to the user. This allows the user to provide selector strings such as `*:gpu;!cuda:*` that accepts all gpu devices except those with a CUDA backend.
65
65
Furthermore, if the value of this environment variable only has discarding filters, an accepting filter that matches all devices, but not sub-devices and sub-sub-devices, will be implicitly included in the
66
66
environment variable to allow the user to specify only the list of devices that must not be made available. Therefore, `!*:cpu` will accept all devices except those that are of the cpu type and `opencl:*;!*:cpu`
67
-
will accept all devices of the opencl backend exept those that are of the opencl backend and of the cpu type. It is legal to have a rejection filter even if it specifies devices have already been omitted by previous filters in the selection string. Doing so has no effect; the rejected devices are still omitted.
67
+
will accept all devices of the opencl backend except those that are of the opencl backend and of the cpu type. It is legal to have a rejection filter even if it specifies devices have already been omitted by previous filters in the selection string. Doing so has no effect; the rejected devices are still omitted.
68
68
69
69
The following examples further illustrate the usage of this environment variable:
70
70
@@ -290,7 +290,7 @@ variables in production code.</span>
290
290
|`SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. |
291
291
|`SYCL_PI_LEVEL_ZERO_COMMANDLISTS_CLEANUP_THRESHOLD`| Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of command lists in a queue exceeds this threshold, an attempt is made to cleanup completed command lists for their subsequent reuse. The default is 20. |
292
292
|`SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD`| Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 1000. |
293
-
|`SYCL_PI_LEVEL_ZERO_USM_RESIDENT`| Integer | Bit-mask controls if/where to make USM allocations resident at the time of allocation. Input value is of the form 0xHSD, where 4-bits of D control device allocations, 4-bits of S control shared allocations, and 4-bits of H control host allocations. Each 4-bit componenet is holding one of the following values: "0" - then no special residency is forced, "1" - then allocation is made resident at the device of allocation, or "2" - then allocation is made resident on all devices in the context of allocation that have P2P access to the device of allocation. Default is 0x002, i.e. force full residency for device allocations only. |
293
+
|`SYCL_PI_LEVEL_ZERO_USM_RESIDENT`| Integer | Bit-mask controls if/where to make USM allocations resident at the time of allocation. Input value is of the form 0xHSD, where 4-bits of D control device allocations, 4-bits of S control shared allocations, and 4-bits of H control host allocations. Each 4-bit component is holding one of the following values: "0" - then no special residency is forced, "1" - then allocation is made resident at the device of allocation, or "2" - then allocation is made resident on all devices in the context of allocation that have P2P access to the device of allocation. Default is 0x002, i.e. force full residency for device allocations only. |
294
294
|`SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D`| Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. |
0 commit comments