Skip to content

Commit 9454aee

Browse files
author
Sergey Kanaev
committed
[Doc] Add overview of kernel-program caching
Signed-off-by: Sergey Kanaev <[email protected]>
1 parent a476a49 commit 9454aee

File tree

1 file changed

+82
-0
lines changed

1 file changed

+82
-0
lines changed

sycl/doc/KernelProgramCache.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# A brief overview of kernel/program caching mechanism.
2+
3+
The cache is employed when one submits kernel for execution or builds program or
4+
kernel with SYCL API. At the same time programs and kernels are cached only when
5+
they're built from C++ source, i.e. `program::build_with_kernel_type<>()` and
6+
`program::get_kernel<>()` methods are employed. This restriction is implemented
7+
via use of `program_impl::is_cacheable_with_options()` and
8+
`program_impl::is_cacheable()` methods. The latter method only returns a boolean
9+
flag which is set to false on default and is set to true in a single use-case.
10+
One can find use-cases and cache filling in the [unit-tests](https://github.com/intel/llvm/blob/sycl/sycl/unittests/kernel-and-program/Cache.cpp).
11+
12+
How does it work, i.e. at which point is the cache employed? At some point of
13+
`ExecCGCommand`'s enqueue process the program manager's method will be called:
14+
either `ProgramManager::getBuildPIProgram` or
15+
`ProgramManager::getOrCreateKernel`. Now, both these methods will call template
16+
function [`getOrBuild`](../source/detail/program_manager/program_manager.cpp#L149)
17+
with multiple lambdas passed to it:
18+
- Acquire function;
19+
- GetCache function;
20+
- Build function.
21+
22+
Acquire function returns a locked version of cache. Locking is employed for
23+
thread safety. The threads are blocked only for insert-or-acquire attempt, i.e.
24+
when calling to `map::insert` in [`getOrBuild`](../source/detail/program_manager/program_manager.cpp#L149)
25+
function. The rest of operation is done with the help of atomics and condition
26+
variables (plus a mutex for proper work of condition variable).
27+
28+
GetCache function returns a reference to mapping `key->value` out of locked
29+
instance of cache. We will see rationale behind it a bit later.
30+
31+
Build function actually builds the kernel or program.
32+
33+
When we say "cache" we think about mapping of some key to value. These maps are
34+
contained within [KernelProgramCache](https://github.com/intel/llvm/blob/sycl/sycl/source/detail/kernel_program_cache.hpp)
35+
class instance which on its own belongs to `context_impl` class instance.
36+
Kernel cache is per program mapping of kernel name plus device pair to
37+
`BuildResult<PiKernel>`<sup>[1](#remove-pointer)</sup>. When `getOrBuild`
38+
function is called the key for kernel cache is pair/tuple of kernel name and
39+
device. Program cache maps triple (spec consts, kernel set id, device) to
40+
`BuildResult<PiProgram>`<sup>[1](#remove-pointer)</sup>.
41+
42+
Now, we have a helper [Locked](https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/detail/locked.hpp)
43+
class. It's to use RAII to make code look cleaner a bit. Acquire function/lambda
44+
will return a specialization of Locked class for reference of proper mapping.
45+
Now, GetCache function will return the mapping to be employed i.e. it'll fetch
46+
mapping of kernel name plus device to `BuildResult` for proper program as
47+
`getOrBuild` will work with mapping of key (whichever it is) to `BuildResult`
48+
specialization.
49+
50+
`BuildResult` structure contains three information fields - pointer to built
51+
kernel/program, build error (if applicable) and current build status
52+
(either of "in progress", "succeeded", "failed").
53+
54+
Now, how `getOrBuild` works?
55+
First, we fetch the cache with sequential calls to Acquire and GetCache
56+
functions. Then, we check if we're the first ones who build this kernel/program.
57+
This is achieved with attempt to insert another key-value pair into the map.
58+
At this point we try to insert `BuildResult` stub instance with status equal to
59+
"in progress" which will allow other threads to know that someone is (i.e.
60+
we're) building the object (i.e. kernel or program) now. If insertion fails we
61+
will wait for building thread to finish with call to `waitUntilBuilt` function.
62+
This function will throw stored exception<sup>[2](#exception-data)</sup> upon
63+
build failure. This allows waiting threads to result the same as the building
64+
thread. Special case of the failure is when build result doesn't contain the
65+
error (i.e. the error wasn't of `cl::sycl::exception` type) and the pointer to
66+
object in `BuildResult` instance is nil. In this case the building thread has
67+
finished build process and returned the user an error. Though, this error could
68+
be of spurious/sporadic nature. Hence, the waiting thread will try to build the
69+
same object once more.
70+
71+
`BuildResult` structure also contains synchronization objects: mutex and
72+
condition variable. We employ them to signal waiting threads that the build
73+
process for this kernl/program is finished (either successfuly or with a
74+
failure).
75+
76+
77+
<a name="remove-pointer">1</a>: The use of `std::remove_pointer` was omitted in sake of
78+
simplicity here.
79+
80+
<a name="exception-data">2</a>: Actually, we store contents of the exception: its message and
81+
error code.
82+

0 commit comments

Comments
 (0)