Skip to content

Commit 9b9e5de

Browse files
[SYCL][NATIVECPU] Enable source-based code coverage in Native CPU (#15073)
Supports [clang's source-based code coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html) to enable code coverage testing of SYCL applications via the Native CPU SYCL target. Clang's `-fprofile-instr-generate -fcoverage-mapping` options can now be used with the `native_cpu` SYCL target to compile/instrument host and device code, enabling 'llvm-cov' to render a coverage report after running the SYCL application (see also updated documentation in this PR). Subsequent PRs will enable in NativeCPU more of the currently unsupported options for device compilation, also for performance profiling. **Details and explanations for the changes in this PR:** This PR tests coverage options on the existing NativeCPU vector-add test by adding an additional invocation with previously disabled options `-fprofile-instr-generate -fcoverage-mapping -mllvm -system-headers-coverage`. Enabling these options on device code caused an [assert in the upstream clang profiling code generation tools](https://github.com/intel/llvm/blob/b023d407862bd853ba5881c34985f99d039d856c/clang/lib/CodeGen/CoverageMappingGen.cpp#L960) due to the invalid source location on the AST for the implicitly generated kernel body, specifically the compound statement containing the kernel body. This PR honors this upstream clang assert by replacing the invalid source location in the compound statement with the source location of the kernel body. Using this now valid source location maintains the location (of the kernel caller function) currently tested by [non-upstream-llvm lit test `CodeGenSYCL/debug-info-srcpos-kernel.cpp`](https://github.com/intel/llvm/blob/sycl/clang/test/CodeGenSYCL/debug-info-srcpos-kernel.cpp), but exposed an issue that led to a change in behavior in [non-llvm-upstream lit test `SemaSYCL/kernel-arg-opt-report.cpp`](https://github.com/intel/llvm/blob/sycl/clang/test/SemaSYCL/kernel-arg-opt-report.cpp), which was due to the previously invalid source location causing the compiler to skip code to set the current location. To restore the original behavior of this test (checking for the location of the kernel functor, as opposed to the kernel caller function) this PR temporarily (and only for the purpose of generating the report) sets the current location to the location of the kernel argument using the upstream clang utility [clang::CodeGen::ApplyDebugLocation](https://github.com/intel/llvm/blob/b023d407862bd853ba5881c34985f99d039d856c/clang/lib/CodeGen/CGDebugInfo.h#L860). --------- Co-authored-by: Michael Toguchi <[email protected]>
1 parent 77abc08 commit 9b9e5de

File tree

7 files changed

+93
-10
lines changed

7 files changed

+93
-10
lines changed

clang/lib/CodeGen/CodeGenFunction.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1789,6 +1789,9 @@ void CodeGenFunction::GenerateCode(GlobalDecl GD, llvm::Function *Fn,
17891789
if (SyclOptReport.HasOptReportInfo(FD)) {
17901790
llvm::OptimizationRemarkEmitter ORE(Fn);
17911791
for (auto ORI : llvm::enumerate(SyclOptReport.GetInfo(FD))) {
1792+
// Temporarily apply arg location to ensure SourceLocToDebugLoc
1793+
// picks up the expected file.
1794+
ApplyDebugLocation TempApplyLoc(*this, ORI.value().KernelArgLoc);
17921795
llvm::DiagnosticLocation DL =
17931796
SourceLocToDebugLoc(ORI.value().KernelArgLoc);
17941797
StringRef NameInDesc = ORI.value().KernelArgDescName;

clang/lib/Driver/ToolChains/SYCL.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1618,6 +1618,23 @@ static std::vector<OptSpecifier> getUnsupportedOpts(void) {
16181618
return UnsupportedOpts;
16191619
}
16201620

1621+
// Currently supported options by SYCL NativeCPU device compilation
1622+
static inline bool SupportedByNativeCPU(const SYCLToolChain &TC,
1623+
const OptSpecifier &Opt) {
1624+
if (!TC.IsSYCLNativeCPU)
1625+
return false;
1626+
1627+
switch (Opt.getID()) {
1628+
case options::OPT_fcoverage_mapping:
1629+
case options::OPT_fno_coverage_mapping:
1630+
case options::OPT_fprofile_instr_generate:
1631+
case options::OPT_fprofile_instr_generate_EQ:
1632+
case options::OPT_fno_profile_instr_generate:
1633+
return true;
1634+
}
1635+
return false;
1636+
}
1637+
16211638
SYCLToolChain::SYCLToolChain(const Driver &D, const llvm::Triple &Triple,
16221639
const ToolChain &HostTC, const ArgList &Args)
16231640
: ToolChain(D, Triple, Args), HostTC(HostTC),
@@ -1629,6 +1646,9 @@ SYCLToolChain::SYCLToolChain(const Driver &D, const llvm::Triple &Triple,
16291646
// Diagnose unsupported options only once.
16301647
for (OptSpecifier Opt : getUnsupportedOpts()) {
16311648
if (const Arg *A = Args.getLastArg(Opt)) {
1649+
// Native CPU can support options unsupported by other targets.
1650+
if (SupportedByNativeCPU(*this, Opt))
1651+
continue;
16321652
// All sanitizer options are not currently supported, except
16331653
// AddressSanitizer
16341654
if (A->getOption().getID() == options::OPT_fsanitize_EQ &&
@@ -1669,6 +1689,9 @@ SYCLToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
16691689
bool Unsupported = false;
16701690
for (OptSpecifier UnsupportedOpt : getUnsupportedOpts()) {
16711691
if (Opt.matches(UnsupportedOpt)) {
1692+
// NativeCPU should allow most normal cpu options.
1693+
if (SupportedByNativeCPU(*this, Opt.getID()))
1694+
continue;
16721695
if (Opt.getID() == options::OPT_fsanitize_EQ &&
16731696
A->getValues().size() == 1) {
16741697
std::string SanitizeVal = A->getValue();

clang/lib/Sema/SemaSYCL.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3625,8 +3625,11 @@ class SyclKernelBodyCreator : public SyclKernelFieldHandler {
36253625
BodyStmts.insert(BodyStmts.end(), FinalizeStmts.begin(),
36263626
FinalizeStmts.end());
36273627

3628+
SourceLocation LL = NewBody ? NewBody->getBeginLoc() : SourceLocation();
3629+
SourceLocation LR = NewBody ? NewBody->getEndLoc() : SourceLocation();
3630+
36283631
return CompoundStmt::Create(SemaSYCLRef.getASTContext(), BodyStmts,
3629-
FPOptionsOverride(), {}, {});
3632+
FPOptionsOverride(), LL, LR);
36303633
}
36313634

36323635
void annotateHierarchicalParallelismAPICalls() {

clang/test/Driver/sycl-native-cpu.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,9 @@
2424

2525
// RUN: %clangxx -fsycl -fsycl-targets=spir64 %s -### 2>&1 | FileCheck -check-prefix=CHECK-NONATIVECPU %s
2626
// CHECK-NONATIVECPU-NOT: "-D" "__SYCL_NATIVE_CPU__"
27+
28+
// Checking that coverage testing options are accepted by native_cpu, and that device and host compilation invocations receive the same options
29+
// RUN: %clangxx -fsycl -fsycl-targets=native_cpu -Werror -fno-profile-instr-generate -fprofile-instr-generate -fno-coverage-mapping -fcoverage-mapping -### %s 2>&1 | FileCheck %s --check-prefix=CHECK_COV_INVO
30+
// CHECK_COV_INVO:{{.*}}clang{{.*}}-fsycl-is-device{{.*}}"-fsycl-is-native-cpu" "-D" "__SYCL_NATIVE_CPU__"{{.*}}"-fprofile-instrument=clang"{{.*}}"-fcoverage-mapping" "-fcoverage-compilation-dir={{.*}}"
31+
// CHECK_COV_INVO:{{.*}}clang{{.*}}"-fsycl-is-host"{{.*}}"-fprofile-instrument=clang"{{.*}}"-fcoverage-mapping" "-fcoverage-compilation-dir={{.*}}"
32+
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
// RUN: %clang_cc1 -fsycl-is-device -ast-dump %s | FileCheck %s
2+
//
3+
// Checks that the compound statement of the implicitly generated kernel body
4+
// has a valid source location (containing "line"). Previously this location
5+
// was invalid containing "<<invalid sloc>>" which causes asserts in the
6+
// llvm profiling tools.
7+
8+
#include "Inputs/sycl.hpp"
9+
10+
struct Functor {
11+
void operator()() const {}
12+
};
13+
14+
// CHECK: FunctionDecl {{.*}} _ZTS7Functor 'void ()'
15+
// CHECK-NEXT: |-CompoundStmt {{.*}} <{{.*}}line{{.*}}>
16+
17+
int main() {
18+
19+
sycl::queue().submit([&](sycl::handler &cgh) {
20+
cgh.single_task(Functor{});
21+
});
22+
}

sycl/doc/design/SYCLNativeCPU.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# SYCL Native CPU
22

3-
The SYCL Native CPU flow aims at treating the host CPU as a "first class citizen", providing a SYCL implementation that targets CPUs of various different architectures, with no other dependencies than DPC++ itself, while bringing performances comparable to state-of-the-art CPU backends.
3+
The SYCL Native CPU flow aims at treating the host CPU as a "first class citizen", providing a SYCL implementation that targets CPUs of various different architectures, with no other dependencies than DPC++ itself, while bringing performances comparable to state-of-the-art CPU backends. SYCL Native CPU also provides some initial/experimental support for LLVM's [source-based code coverage tools](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html) (see also section [Code coverage](#code-coverage)).
44

55
# Compiler and runtime options
66

7-
The SYCL Native CPU flow is enabled by setting `native_cpu` as a `sycl-target` (please note that currently doing so overrides any other SYCL target specified in the compiler invocation):
7+
The SYCL Native CPU flow is enabled by setting `native_cpu` as a `sycl-target`:
88

99
```
1010
clang++ -fsycl -fsycl-targets=native_cpu <input> -o <output>
@@ -28,9 +28,16 @@ clang++ <device-ir> -o <device-o>
2828
clang++ -L<sycl-lib-path> -lsycl <device-o> <host-o> -o <output>
2929
```
3030

31+
Note that SYCL Native CPU co-exists alongside the other SYCL targets. For example, the following command line builds SYCL code simultaneously for SYCL Native CPU and for OpenCL.
32+
33+
```
34+
clang++ -fsycl -fsycl-targets=native_cpu,spir64 <input> -o <output>
35+
```
36+
The application can then run on either SYCL target by setting the DPC++ `ONEAPI_DEVICE_SELECTOR` environment variable accordingly.
37+
3138
## Configuring DPC++ with SYCL Native CPU
3239

33-
SYCL Native CPU needs to be enabled explictly when configuring DPC++, using `--native_cpu`, e.g.
40+
SYCL Native CPU needs to be enabled explicitly when configuring DPC++, using `--native_cpu`, e.g.
3441

3542
```
3643
python buildbot/configure.py \
@@ -86,7 +93,19 @@ Whole Function Vectorization is enabled by default, and can be controlled throug
8693
* `-mllvm -sycl-native-cpu-no-vecz`: disable Whole Function Vectorization.
8794
* `-mllvm -sycl-native-cpu-vecz-width`: sets the vector width to the specified value, defaults to 8.
8895

89-
For more details on how the Whole Function Vectorizer is integrated for SYCL Native CPU, refer to the [Technical details[(#technical-details) section.
96+
For more details on how the Whole Function Vectorizer is integrated for SYCL Native CPU, refer to the [Technical details](#technical-details) section.
97+
98+
# Code coverage
99+
100+
SYCL Native CPU has experimental support for LLVM's source-based [code coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html). This enables coverage testing across device and host code.
101+
Example usage:
102+
103+
```bash
104+
clang.exe -fsycl -fsycl-targets=native_cpu -fprofile-instr-generate -fcoverage-mapping %fname% -o vector-add.exe
105+
.\vector-add.exe
106+
llvm-profdata merge -sparse default.profraw -o foo.profdata
107+
llvm-cov show .\vector-add.exe -instr-profile=foo.profdata
108+
```
90109

91110
## Ongoing work
92111

@@ -95,7 +114,7 @@ For more details on how the Whole Function Vectorizer is integrated for SYCL Nat
95114
* Subgroup support
96115
* Performance optimizations
97116

98-
### Please note that Windows support is temporarily disabled due to some implementation details, it will be reinstantiated soon.
117+
### Please note that Windows is partially supported but temporarily disabled due to some implementation details, it will be re-enabled soon.
99118

100119
# Technical details
101120

@@ -140,13 +159,13 @@ entry:
140159
}
141160
```
142161

143-
For the SYCL Native CPU target, the device compiler is in charge of materializing the SPIRV builtins (such as `@__spirv_BuiltInGlobalInvocationId`), so that they can be correctly updated by the runtime when executing the kernel. This is performed by the [PrepareSYCLNativeCPU pass](https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLLowerIR/PrepareSYCLNativeCPU.cpp).
162+
For the SYCL Native CPU target, the device compiler is in charge of materializing the SPIRV builtins (such as `@__spirv_BuiltInGlobalInvocationId`), so that they can be correctly updated by the runtime when executing the kernel. This is performed by the [PrepareSYCLNativeCPU pass](https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLNativeCPUUtils/PrepareSYCLNativeCPU.cpp).
144163
The PrepareSYCLNativeCPUPass also emits a `subhandler` function, which receives the kernel arguments from the SYCL runtime (packed in a vector), unpacks them, and forwards only the used ones to the actual kernel.
145164

146165

147166
## PrepareSYCLNativeCPU Pass
148167

149-
This pass will add a pointer to a `nativecpu_state` struct as kernel argument to all the kernel functions, and it will replace all the uses of SPIRV builtins with the return value of appropriately defined functions, which will read the requested information from the `__nativecpu_state` struct. The `__nativecpu_state` struct and the builtin functions are defined in [native_cpu.hpp](https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/native_cpu.hpp).
168+
This pass will add a pointer to a `native_cpu::state` struct as kernel argument to all the kernel functions, and it will replace all the uses of SPIRV builtins with the return value of appropriately defined functions, which will read the requested information from the `native_cpu::state` struct. The `native_cpu::state` struct is defined in the [native_cpu UR adapter](https://github.com/oneapi-src/unified-runtime/blob/main/source/adapters/native_cpu/nativecpu_state.hpp) and the builtin functions are defined in the [native_cpu device library](https://github.com/intel/llvm/blob/sycl/libdevice/nativecpu_utils.cpp).
150169

151170

152171
The resulting IR is:
@@ -188,11 +207,11 @@ entry:
188207
}
189208
```
190209

191-
As you can see, the `subhandler` steals the kernel's function name, and receives two pointer arguments: the first one points to the kernel arguments from the SYCL runtime, and the second one to the `__nativecpu_state` struct.
210+
As you can see, the `subhandler` steals the kernel's function name, and receives two pointer arguments: the first one points to the kernel arguments from the SYCL runtime, and the second one to the `nativecpu::state` struct.
192211

193212
## Handling barriers
194213

195-
On SYCL Native CPU, calls to `__spirv_ControlBarrier` are handled using the `WorkItemLoopsPass` from the oneAPI Construction Kit. This pass handles barriers by splitting the kernel between calls calls to `__spirv_ControlBarrier`, and creating a wrapper that runs the subkernels over the local range. In order to correctly interface to the oneAPI Construction Kit pass pipeline, SPIRV builtins are converted to `mux` builtins (used by the OCK) by the `ConvertToMuxBuiltinsSYCLNativeCPUPass`.
214+
On SYCL Native CPU, calls to `__spirv_ControlBarrier` are handled using the `WorkItemLoopsPass` from the oneAPI Construction Kit. This pass handles barriers by splitting the kernel between calls to `__spirv_ControlBarrier`, and creating a wrapper that runs the subkernels over the local range. In order to correctly interface to the oneAPI Construction Kit pass pipeline, SPIRV builtins are defined in the device library to call the corresponding `mux` builtins (used by the OCK).
196215

197216
## Vectorization
198217

sycl/test/native_cpu/vector-add.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,13 @@
1515
// RUN: %clangxx -fsycl -fsycl-targets=native_cpu -mllvm -sycl-native-cpu-vecz-width=4 %s -g -o %t-vec
1616
// RUN: env ONEAPI_DEVICE_SELECTOR="native_cpu:cpu" %t-vec
1717

18+
// Ensure coverage options work in the compiler invocations.
19+
// For builds with asserts enabled we also need to pass the option
20+
// -mllvm -system-headers-coverage
21+
// We need to also check if clang-rt is built and then run the executable and
22+
// verify the (profiling) outputs.
23+
// RUN: %clangxx -fsycl -fsycl-targets=native_cpu %s -fprofile-instr-generate -fcoverage-mapping -mllvm -system-headers-coverage -c -o %t
24+
1825
#include <sycl/sycl.hpp>
1926

2027
#include <array>

0 commit comments

Comments
 (0)