You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sycl/doc/KernelProgramCache.md
+73-63Lines changed: 73 additions & 63 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,15 @@
1
-
# A brief overview of kernel/program caching mechanism.
1
+
# A brief overview of kernel and program caching mechanism.
2
2
3
3
## Rationale behind caching
4
4
5
-
*Use-case #1.* Looped enqueue of the same kernel:
5
+
During SYCL program execution SYCL runtime will create internal objects
6
+
representing kernels and programs, it may also invoke JIT compiler to bring
7
+
kernels in a program to executable state. Those runtime operations are quite
8
+
expensive, and in some cases caching approach can be employed to eliminate
9
+
redundant kernel or program object re-creation and online recompilation. Few
10
+
examples below illustrate scenarios where such optimization is possible.
11
+
12
+
*Use-case #1.* Submission of the same kernel in a loop:
6
13
```C++
7
14
usingnamespacecl::sycl::queue;
8
15
@@ -23,7 +30,7 @@
23
30
}
24
31
```
25
32
26
-
*Use-case #2.* Enqueue of multiple kernels within a single program<sup>[1](#what-is-program)</sup>:
33
+
*Use-case #2.* Submission of multiple kernels within a single program<sup>[1](#what-is-program)</sup>:
27
34
```C++
28
35
using namespace cl::sycl::queue;
29
36
@@ -56,36 +63,36 @@
56
63
});
57
64
```
58
65
59
-
Both these use-cases will need to built the program or kernel multiple times.
60
-
When JIT is employed this process may take quite a lot of time.
66
+
In both cases SYCL runtime will need to build the program and kernels multiple
67
+
times, which may involve JIT compilation and take quite a lot of time.
61
68
62
-
In order to eliminate this waste of run-time we introduce a kernel/program
69
+
In order to eliminate this waste of run-time we introduce a kernel and program
63
70
caching. The cache is per-context and it caches underlying objects of non
64
71
interop kernels and programs which are built with no options.
65
72
66
-
<aname="what-is-program">1</a>: Here we use the term "program" in the same
67
-
sense as OpenCL does i.e. a set of kernels.
73
+
<aname="what-is-program">1</a>: Here "program" means an internal SYCL runtime
74
+
object corresponding to a SPIRV module or native binary defining a set of SYCL
75
+
kernels and/or device functions.
68
76
69
77
70
78
## Data structure of cache
71
79
72
-
The cache stores underlying PI objects of`cl::sycl::program` and
73
-
`cl::sycl::kernel` in a per-context data storage. The storage consists of two
74
-
maps: one is for programs and the other is for kernels.
80
+
The cache stores underlying PI objects behind`cl::sycl::program` and
81
+
`cl::sycl::kernel`user-levelobjects in a per-context data storage. The storage
82
+
consists of two maps: one is for programs and the other is for kernels.
75
83
76
-
Programs mapping's key consists of three components:
77
-
kernel set id<sup>[1](#what-is-ksid)</sup>, specialized constants, device this
78
-
program is built for.
84
+
The programs map's key consists of three components: kernel set id<sup>[1](#what-is-ksid)</sup>,
85
+
specialized constants, device this program is built for.
79
86
80
-
Kernels mapping's key consists of three components too: program the kernel
87
+
The krnels map's key consists of three components too: program the kernel
81
88
belongs to, kernel name<sup>[2](#what-is-kname)</sup>, device the program is
82
89
built for.
83
90
84
-
<aname="what-is-ksid">1</a>: Kernel set id is merely a number of translation
85
-
unit which contains at least one kernel.
91
+
<aname="what-is-ksid">1</a>: Kernel set id is an ordinal number of the device
92
+
binary image the kernel is contained in.
86
93
87
-
<aname="what-is-kname">2</a>: Kernel name is mangled class name which is
88
-
provided to methods of `cl::sycl::handler` (e.g. `parallel_for` or
94
+
<aname="what-is-kname">2</a>: Kernel name is a kernel ID mangled class' name
95
+
which is provided to methods of `cl::sycl::handler` (e.g. `parallel_for` or
89
96
`single_task`).
90
97
91
98
@@ -102,19 +109,23 @@ provided to methods of `cl::sycl::handler` (e.g. `parallel_for` or
102
109
## Implementation details
103
110
104
111
The caches are represented with instance of [`KernelProgramCache`](https://github.com/intel/llvm/blob/sycl/sycl/source/detail/kernel_program_cache.hpp)
105
-
class. The class is instantiated in a per-context manner.
112
+
class. The runtime creates one instance of the class per distinct SYCL context
113
+
(A context object which is a result of copying another context object isn't
114
+
"distinct", as it corresponds to the same underlying internal object
115
+
representing a context).
106
116
107
-
The `KernelProgramCache` is the storage descrived above.
117
+
The `KernelProgramCache` is essentially a pair of maps as described above.
108
118
109
119
110
120
### When does the cache come at work?
111
121
112
-
The cache is employed when one submits kernel for execution or builds program or
113
-
kernel with SYCL API. That means that the cache works when either user
0 commit comments