|
| 1 | +# Design |
| 2 | + |
| 3 | +## [Why AbstractVectors Everywhere?](@id why_abstract_vectors) |
| 4 | + |
| 5 | +To understand the advantages of using `AbstractVector`s everywhere to represent collections of inputs, first consider the following properties that it is desirable for a collection of inputs to satisfy. |
| 6 | + |
| 7 | +#### Unique Ordering |
| 8 | + |
| 9 | +There must be a clearly-defined first, second, etc element of an input collection. |
| 10 | +If this were not the case, it would not be possible to determine a unique mapping between a collection of inputs and the output of `kernelmatrix`, as it would not be clear what order the rows and columns of the output should appear in. |
| 11 | + |
| 12 | +Moreover, ordering guarantees that if you permute the collection of inputs, the ordering of the rows and columns of the `kernelmatrix` are correspondingly permuted. |
| 13 | + |
| 14 | +#### Generality |
| 15 | + |
| 16 | +There must be no restriction on the domain of the input. |
| 17 | +Collections of `Real`s, vectors, graphs, finite-dimensional domains, or really anything else that you fancy should be straightforwardly representable. |
| 18 | +Moreover, whichever input class is chosen should not prevent optimal performance from being obtained. |
| 19 | + |
| 20 | +#### Unambiguously-Defined Length |
| 21 | + |
| 22 | +Knowing the length of a collection of inputs is important. |
| 23 | +For example, a well-defined length guarantees that the size of the output of `kernelmatrix`, |
| 24 | +and related functions, are predictable. |
| 25 | +It also makes it possible to perform internal error-checking that ensures that e.g. there |
| 26 | +are the same number of inputs in two collections of inputs. |
| 27 | + |
| 28 | + |
| 29 | + |
| 30 | +### AbstractMatrices Do Not Cut It |
| 31 | + |
| 32 | +Notably, while `AbstractMatrix` objects are often used to represent collections of vector-valued |
| 33 | +inputs, they do _not_ immediately satisfy these properties as it is unclear whether a matrix |
| 34 | +of size `P x Q` represents a collection of `P` `Q`-dimensional inputs (each row is an |
| 35 | +input), or `Q` `P`-dimensional inputs (each column is an input). |
| 36 | + |
| 37 | +Moreover, they occassionally add some aesthetic inconvenience. |
| 38 | +For example, a collection of `Real`-valued inputs, which might be straightforwardly |
| 39 | +represented as an `AbstractVector{<:Real}`, must be reshaped into a matrix. |
| 40 | + |
| 41 | +There are two commonly used ways to partly resolve these shortcomings: |
| 42 | + |
| 43 | +#### Resolution 1: Specify a Convention |
| 44 | + |
| 45 | +One way that these shortcomings can be partly resolved is by specifying a convention that |
| 46 | +everyone adheres to regarding the interpretation of rows vs columns. |
| 47 | +However, opinions about the choice of convention are often surprisingly strongly held, and |
| 48 | +users regularly have to remind themselves _which_ convention has been chosen. |
| 49 | +While this resolves the ordering problem, and in principle defines the "length" of a |
| 50 | +collection of inputs, `AbstractMatrix`s already have a `length` defined in Julia, which |
| 51 | +would generally disagree with our internal notion of `length`. |
| 52 | +This isn't a show-stopper, but it isn't an especially clean situation. |
| 53 | + |
| 54 | +There is also the opportunity for some kinds of silent bugs. |
| 55 | +For example, if an input matrix happens to be square because the number of input dimensions |
| 56 | +is the same as the number of inputs, it would be hard to know whether the correct |
| 57 | +`kernelmatrix` has been computed. |
| 58 | +This kind of bug seems unlikely, but it exists regardless. |
| 59 | + |
| 60 | +Finally, suppose that your inputs are some type `T` that is not simply a vector of real |
| 61 | +numbers, say a graph. |
| 62 | +In this situation, how should a collection of inputs be represented? |
| 63 | +A `N x 1` or `1 x N` matrix is the only obvious candidate, but the additional singular |
| 64 | +dimension seems somewhat redundant. |
| 65 | + |
| 66 | +#### Resolution 2: Always Specify An `obsdim` Argument |
| 67 | + |
| 68 | +Another way to partly resolve these problems is to not commit to a convention, and instead |
| 69 | +to propagate some additional information through the codebase that specifies how the input |
| 70 | +data is to be interpretted. |
| 71 | +For example, a kernel `k` that represents the sum of two other kernels might implement |
| 72 | +`kernelmatrix` as follows: |
| 73 | +```julia |
| 74 | +function kernelmatrix(k::KernelSum, x::AbstractMatrix; obsdim=1) |
| 75 | + return kernelmatrix(k.kernels[1], x; obsdim=obsdim) + |
| 76 | + kernelmatrix(k.kernels[2], x; obsdim=obsdim) |
| 77 | +end |
| 78 | +``` |
| 79 | +While this prevents this package from having to pre-specify a convention, it doesn't resolve |
| 80 | +the `length` issue, or the issue of representing collections of inputs which aren't |
| 81 | +immediately represented as vectors. |
| 82 | +Moreover, it complicates the internals; in contrast, consider what this function looks like |
| 83 | +with an `AbstractVector`: |
| 84 | +```julia |
| 85 | +function kernelmatrix(k::KernelSum, x::AbstractVector) |
| 86 | + return kernelmatrix(k.kernels[1], x) + kernelmatrix(k.kernels[2], x) |
| 87 | +end |
| 88 | +``` |
| 89 | +This code is clearer (less visual noise), and has removed a possible bug -- if the |
| 90 | +implementer of `kernelmatrix` forgets to pass the `obsdim` kwarg into each subsequent |
| 91 | +`kernelmatrix` call, it's possible to get the wrong answer. |
| 92 | + |
| 93 | +This being said, we do support matrix-valued inputs -- see |
| 94 | +[Why We Have Support for Both](@ref). |
| 95 | + |
| 96 | + |
| 97 | +### AbstractVectors |
| 98 | + |
| 99 | +Requiring all collections of inputs to be `AbstractVector`s resolves all of these problems, |
| 100 | +and ensures that the data is self-describing to the extent that KernelFunctions.jl requires. |
| 101 | + |
| 102 | +Firstly, the question of how to interpret the columns and rows of a matrix of inputs is |
| 103 | +resolved. |
| 104 | +Users _must_ wrap matrices which represent collections of inputs in either a `ColVecs` or |
| 105 | +`RowVecs`, both of which have clearly defined semantics which are hard to confuse. |
| 106 | + |
| 107 | +By design, there is also no discrepancy between the number of inputs in the collection, and |
| 108 | +the `length` function -- the `length` of a `ColVecs`, `RowVecs`, or `Vector{<:Real}` is |
| 109 | +equal to the number of inputs. |
| 110 | + |
| 111 | +There is no loss of performance. |
| 112 | + |
| 113 | +A collection of `N` `Real`-valued inputs can be represented by an |
| 114 | +`AbstractVector{<:Real}` of `length` `N`, rather than needing to use an |
| 115 | +`AbstractMatrix{<:Real}` of size either `N x 1` or `1 x N`. |
| 116 | +The same can be said for any other input type `T`, and new subtypes of `AbstractVector` can |
| 117 | +be added if particularly efficient ways exist to store collections of inputs of type `T`. |
| 118 | +A good example of this in practice is using `Tuple{S, Int}`, for some input type `S`, as the |
| 119 | +[Inputs for Multiple Outputs](@ref). |
| 120 | + |
| 121 | +This approach can also lead to clearer user code. |
| 122 | +A user need only wrap their inputs in a `ColVecs` or `RowVecs` once in their code, and this |
| 123 | +specification is automatically re-used _everywhere_ in their code. |
| 124 | +In this sense, it is straightforward to write code in such a way that there is one unique |
| 125 | +source of "truth" about the way in which a particular data set should be interpreted. |
| 126 | +Conversely, the `obsdim` resolution requires that the `obsdim` keyword argument is passed |
| 127 | +around with the data _every_ _single_ _time_ that you use it. |
| 128 | + |
| 129 | +The benefits of the `AbstractVector` approach are likely most strongly felt when writing a substantial amount of code on top of KernelFunctions.jl -- in the same way that using |
| 130 | +`AbstractVector`s inside KernelFunctions.jl removes the need for large amounts of keyword |
| 131 | +argument propagation, the same will be true of other code. |
| 132 | + |
| 133 | + |
| 134 | + |
| 135 | + |
| 136 | +### Why We Have Support for Both |
| 137 | + |
| 138 | +In short: many people like matrices, and are familiar with `obsdim`-style keyword |
| 139 | +arguments. |
| 140 | + |
| 141 | +All internals are implemented using `AbstractVector`s though, and the `obsdim` interface |
| 142 | +is just a thin layer of utility functionality which sits on top of this. |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | + |
| 147 | + |
| 148 | +## [Kernels for Multiple-Outputs](@id inputs_for_multiple_outputs) |
| 149 | + |
| 150 | +There are two equally-valid perspectives on multi-output kernels: they can either be treated |
| 151 | +as matrix-valued kernels, or standard kernels on an extended input domain. |
| 152 | +Each of these perspectives are convenient in different circumstances, but the latter |
| 153 | +greatly simplifies the incorporation of multi-output kernels in KernelFunctions. |
| 154 | + |
| 155 | +More concretely, let `k_mat` be a matrix-valued kernel, mapping pairs of inputs of type `T` to matrices of size `P x P` to describe the covariance between `P` outputs. |
| 156 | +Given inputs `x` and `y` of type `T`, and integers `p` and `q`, we can always find an |
| 157 | +equivalent standard kernel `k` mapping from pairs of inputs of type `Tuple{T, Int}` to the |
| 158 | +`Real`s as follows: |
| 159 | +```julia |
| 160 | +k((x, p), (y, q)) = k_mat(x, y)[p, q] |
| 161 | +``` |
| 162 | +This ability to treat multi-output kernels as single-output kernels is very helpful, as it |
| 163 | +means that there is no need to introduce additional concepts into the API of |
| 164 | +KernelFunctions.jl, just additional kernels! |
| 165 | +This in turn simplifies downstream code as they don't need to "know" about the existence of |
| 166 | +multi-output kernels in addition to standard kernels. For example, GP libraries built on |
| 167 | +top of KernelFunctions.jl just need to know about `Kernel`s, and they get multi-output |
| 168 | +kernels, and hence multi-output GPs, for free. |
| 169 | + |
| 170 | +Where there is the need to specialise _implementations_ for multi-output kernels, this is |
| 171 | +done in an encapsulated manner -- parts of KernelFunctions that have nothing to do with |
| 172 | +multi-output kernels know _nothing_ about the existence of multi-output kernels. |
0 commit comments