Skip to content

Commit cfc803c

Browse files
authored
[SYCL] Redistribute USM aspects among CUDA devices (#18782)
We were previously reporting all USM aspects as supported on all CUDA devices. This is incorrect behaviour as many devices do not support USM system allocations, nor atomic host/shared USM allocations. Unfortunately it is very difficult to get a conclusive list of which devices support which features. Links such as [1] suggest that pageable memory access (which the UR adapater uses to determine the runtime equivalents of these aspects) is limited to a Grace Hopper device or newer, or with Linux systems with HMM enabled. HMM is not something we can currently determine at compile time for these aspects. This change is therefore conservative for older devices (SM6.X) with HMM enabled, where we will now report "false". For atomic host/shared allocations, the documentation on the 'hostNativeAtomicSupported' property at [1] and [2] suggests that we need both a hardware coherent system, for which [3] suggests we again need at least a Grace Hopper device. However, note again that only "some" hardware-coherent systems support the host native atomics, "including" NVLink-connected devices. This is therefore not an exhaustive list and we can't derive anything conclusive from it. This change might again be conservative for architectures older than Grace Hopper. In short, this PR essentially just punts the problem slightly further down the road and prevents these three USM aspects from being reported as supported for SM89 devices and earlier. [1]: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#system-requirements-for-unified-memory. [2]: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#host-native-atomics [3]: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cpu-and-gpu-page-tables-hardware-coherency-vs-software-coherency
1 parent f2fa176 commit cfc803c

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

llvm/include/llvm/SYCLLowerIR/DeviceConfigFile.td

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,10 @@ class CudaTargetInfo<string targetName, list<Aspect> aspectList, int subGroupSiz
262262
assert !eq(subGroupSize, 32), "sub-group size for Cuda must be equal to 32 and not " # subGroupSize # ".";
263263
}
264264

265-
defvar CudaMinAspects = !listconcat(AllUSMAspects, [AspectGpu, AspectFp64, AspectOnline_compiler, AspectOnline_linker,
265+
defvar CudaMinUSMAspects = [AspectUsm_device_allocations, AspectUsm_host_allocations, AspectUsm_shared_allocations];
266+
defvar CudaSM90USMAspects = [AspectUsm_system_allocations, AspectUsm_atomic_host_allocations, AspectUsm_atomic_shared_allocations];
267+
268+
defvar CudaMinAspects = !listconcat(CudaMinUSMAspects, [AspectGpu, AspectFp64, AspectOnline_compiler, AspectOnline_linker,
266269
AspectQueue_profiling, AspectExt_intel_pci_address, AspectExt_intel_max_mem_bandwidth, AspectExt_intel_memory_bus_width,
267270
AspectExt_intel_device_info_uuid, AspectExt_oneapi_native_assert, AspectExt_intel_free_memory, AspectExt_intel_device_id,
268271
AspectExt_intel_memory_clock_rate, AspectExt_oneapi_ballot_group, AspectExt_oneapi_fixed_size_group,
@@ -292,9 +295,9 @@ def : CudaTargetInfo<"nvidia_gpu_sm_87", !listconcat(CudaMinAspects, CudaBindles
292295
[AspectFp16, AspectAtomic64, AspectExt_oneapi_cuda_async_barrier])>;
293296
def : CudaTargetInfo<"nvidia_gpu_sm_89", !listconcat(CudaMinAspects, CudaBindlessImagesAspects,
294297
[AspectFp16, AspectAtomic64, AspectExt_oneapi_cuda_async_barrier])>;
295-
def : CudaTargetInfo<"nvidia_gpu_sm_90", !listconcat(CudaMinAspects, CudaBindlessImagesAspects,
298+
def : CudaTargetInfo<"nvidia_gpu_sm_90", !listconcat(CudaMinAspects, CudaSM90USMAspects, CudaBindlessImagesAspects,
296299
[AspectFp16, AspectAtomic64, AspectExt_oneapi_cuda_async_barrier, AspectExt_oneapi_cuda_cluster_group])>;
297-
def : CudaTargetInfo<"nvidia_gpu_sm_90a", !listconcat(CudaMinAspects, CudaBindlessImagesAspects,
300+
def : CudaTargetInfo<"nvidia_gpu_sm_90a", !listconcat(CudaMinAspects, CudaSM90USMAspects, CudaBindlessImagesAspects,
298301
[AspectFp16, AspectAtomic64, AspectExt_oneapi_cuda_async_barrier, AspectExt_oneapi_cuda_cluster_group])>;
299302

300303
//

0 commit comments

Comments
 (0)