|
| 1 | +# A brief overview of kernel/program caching mechanism. |
| 2 | + |
| 3 | +The cache is employed when one submits kernel for execution or builds program or |
| 4 | +kernel with SYCL API. At the same time programs and kernels are cached only when |
| 5 | +they're built from C++ source, i.e. `program::build_with_kernel_type<>()` and |
| 6 | +`program::get_kernel<>()` methods are employed. This restriction is implemented |
| 7 | +via use of `program_impl::is_cacheable_with_options()` and |
| 8 | +`program_impl::is_cacheable()` methods. The latter method only returns a boolean |
| 9 | +flag which is set to false on default and is set to true in a single use-case. |
| 10 | +One can find use-cases and cache filling in the [unit-tests](https://github.com/intel/llvm/blob/sycl/sycl/unittests/kernel-and-program/Cache.cpp). |
| 11 | + |
| 12 | +How does it work, i.e. at which point is the cache employed? At some point of |
| 13 | +`ExecCGCommand`'s enqueue process the program manager's method will be called: |
| 14 | +either `ProgramManager::getBuildPIProgram` or |
| 15 | +`ProgramManager::getOrCreateKernel`. Now, both these methods will call template |
| 16 | +function [`getOrBuild`](../source/detail/program_manager/program_manager.cpp#L149) |
| 17 | +with multiple lambdas passed to it: |
| 18 | + - Acquire function; |
| 19 | + - GetCache function; |
| 20 | + - Build function. |
| 21 | + |
| 22 | +Acquire function returns a locked version of cache. Locking is employed for |
| 23 | +thread safety. The threads are blocked only for insert-or-acquire attempt, i.e. |
| 24 | +when calling to `map::insert` in [`getOrBuild`](../source/detail/program_manager/program_manager.cpp#L149) |
| 25 | +function. The rest of operation is done with the help of atomics and condition |
| 26 | +variables (plus a mutex for proper work of condition variable). |
| 27 | + |
| 28 | +GetCache function returns a reference to mapping `key->value` out of locked |
| 29 | +instance of cache. We will see rationale behind it a bit later. |
| 30 | + |
| 31 | +Build function actually builds the kernel or program. |
| 32 | + |
| 33 | +When we say "cache" we think about mapping of some key to value. These maps are |
| 34 | +contained within [KernelProgramCache](https://github.com/intel/llvm/blob/sycl/sycl/source/detail/kernel_program_cache.hpp) |
| 35 | +class instance which on its own belongs to `context_impl` class instance. |
| 36 | +Kernel cache is per program mapping of kernel name plus device pair to |
| 37 | +`BuildResult<PiKernel>`<sup>[1](#remove-pointer)</sup>. When `getOrBuild` |
| 38 | +function is called the key for kernel cache is pair/tuple of kernel name and |
| 39 | +device. Program cache maps triple (spec consts, kernel set id, device) to |
| 40 | +`BuildResult<PiProgram>`<sup>[1](#remove-pointer)</sup>. |
| 41 | + |
| 42 | +Now, we have a helper [Locked](https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/detail/locked.hpp) |
| 43 | +class. It's to use RAII to make code look cleaner a bit. Acquire function/lambda |
| 44 | +will return a specialization of Locked class for reference of proper mapping. |
| 45 | +Now, GetCache function will return the mapping to be employed i.e. it'll fetch |
| 46 | +mapping of kernel name plus device to `BuildResult` for proper program as |
| 47 | +`getOrBuild` will work with mapping of key (whichever it is) to `BuildResult` |
| 48 | +specialization. |
| 49 | + |
| 50 | +`BuildResult` structure contains three information fields - pointer to built |
| 51 | +kernel/program, build error (if applicable) and current build status |
| 52 | +(either of "in progress", "succeeded", "failed"). |
| 53 | + |
| 54 | +Now, how `getOrBuild` works? |
| 55 | +First, we fetch the cache with sequential calls to Acquire and GetCache |
| 56 | +functions. Then, we check if we're the first ones who build this kernel/program. |
| 57 | +This is achieved with attempt to insert another key-value pair into the map. |
| 58 | +At this point we try to insert `BuildResult` stub instance with status equal to |
| 59 | +"in progress" which will allow other threads to know that someone is (i.e. |
| 60 | +we're) building the object (i.e. kernel or program) now. If insertion fails we |
| 61 | +will wait for building thread to finish with call to `waitUntilBuilt` function. |
| 62 | +This function will throw stored exception<sup>[2](#exception-data)</sup> upon |
| 63 | +build failure. This allows waiting threads to result the same as the building |
| 64 | +thread. Special case of the failure is when build result doesn't contain the |
| 65 | +error (i.e. the error wasn't of `cl::sycl::exception` type) and the pointer to |
| 66 | +object in `BuildResult` instance is nil. In this case the building thread has |
| 67 | +finished build process and returned the user an error. Though, this error could |
| 68 | +be of spurious/sporadic nature. Hence, the waiting thread will try to build the |
| 69 | +same object once more. |
| 70 | + |
| 71 | +`BuildResult` structure also contains synchronization objects: mutex and |
| 72 | +condition variable. We employ them to signal waiting threads that the build |
| 73 | +process for this kernl/program is finished (either successfuly or with a |
| 74 | +failure). |
| 75 | + |
| 76 | + |
| 77 | +<a name="remove-pointer">1</a>: The use of `std::remove_pointer` was omitted in sake of |
| 78 | +simplicity here. |
| 79 | + |
| 80 | +<a name="exception-data">2</a>: Actually, we store contents of the exception: its message and |
| 81 | +error code. |
| 82 | + |
0 commit comments