[SYCL][CUDA][DOC] State how to pass ptxas options (#7045)

pgorlani · web-flow · commit f48f96eb3faa · 2022-10-13T12:35:25.000+01:00
This patch documents the utilization of `-Xcuda-ptxas` in SYCL. Refer to #6821 and #6942.
diff --git a/sycl/doc/GetStartedGuide.md b/sycl/doc/GetStartedGuide.md
@@ -644,11 +644,15 @@ clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa \
 The target architecture may also be specified for the CUDA backend, with 
 `-Xsycl-target-backend --cuda-gpu-arch=<arch>`. Specifying the architecture is 
 necessary if an application aims to use newer hardware features, such as
-native atomic operations or tensor core operations. 
+native atomic operations or tensor core operations.
+Moreover, it is possible to pass specific options to CUDA `ptxas` (such as
+`--maxrregcount=<n>` for limiting the register usage or `--verbose` for
+printing generation statistics) using the `-Xcuda-ptxas` flag.
 
 ```bash
 clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
   simple-sycl-app.cpp -o simple-sycl-app-cuda.exe \
+  -Xcuda-ptxas --maxrregcount=128 -Xcuda-ptxas --verbose \
   -Xsycl-target-backend --cuda-gpu-arch=sm_80
 ```