Skip to content

Commit f48f96e

Browse files
authored
[SYCL][CUDA][DOC] State how to pass ptxas options (#7045)
This patch documents the utilization of `-Xcuda-ptxas` in SYCL. Refer to #6821 and #6942.
1 parent 8a0c9a0 commit f48f96e

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

sycl/doc/GetStartedGuide.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -644,11 +644,15 @@ clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa \
644644
The target architecture may also be specified for the CUDA backend, with
645645
`-Xsycl-target-backend --cuda-gpu-arch=<arch>`. Specifying the architecture is
646646
necessary if an application aims to use newer hardware features, such as
647-
native atomic operations or tensor core operations.
647+
native atomic operations or tensor core operations.
648+
Moreover, it is possible to pass specific options to CUDA `ptxas` (such as
649+
`--maxrregcount=<n>` for limiting the register usage or `--verbose` for
650+
printing generation statistics) using the `-Xcuda-ptxas` flag.
648651
649652
```bash
650653
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
651654
simple-sycl-app.cpp -o simple-sycl-app-cuda.exe \
655+
-Xcuda-ptxas --maxrregcount=128 -Xcuda-ptxas --verbose \
652656
-Xsycl-target-backend --cuda-gpu-arch=sm_80
653657
```
654658

0 commit comments

Comments
 (0)