Skip to content

[WIP] Refactoring of KernelSum and KernelProduct #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

theogf
Copy link
Member

@theogf theogf commented Mar 30, 2020

Following #68, we decided to refactor KernelSum and KernelProduct to contain two kernels only to avoid abstract types and avoid unnecessary dispatch. There is still work to see if for similar metric AND transform, one can save some computation time.

@theogf theogf added the WIP label Mar 30, 2020
@willtebbutt
Copy link
Member

willtebbutt commented Apr 8, 2020

This seems reasonable to me @theogf -- it's what Stheno has always done for precisely the same reasons.

I wonder whether we can do better though. Another option is to allow for one of a couple of containers, rather than requiring only two elements -- specifically either a Tuple or AbstractVector of things. This handles all of the following cases gracefully:

  • k1 + k2 yields Sum{Tuple{typeof(k1), typeof(k2)}}
  • k1 + k2 + k3 + ... yields Sum{Tuple{typeof(k1), typeof(k2), typeof(k3), ...}
  • sum(ks), where ks is either an AbstractVector or Tuple of kernels, and yields Sum{typeof(ks}, where typeof(ks) is either AbstractVector{<:Kernel} or whatever the tuple-type looks like.

(The above also apply to * / prod). The advantage of this is that you still get to flatten things like k1 + k2 + k3 and sum(ks) into a single kernel, which open up some opportunities for optimisation, and will make the types simpler, without having type instability in the k1 + k2 case.

Not requiring the container to be a tuple type allows for large combinations of kernels. Say you want to implement an N-component spectral mixture kernel. If N is large, this will yield large compilation times, and the compiler might even hit a fallback of some kind. Moreover, in the spectral mixture case it's quite possible that the sum will comprise N kernels of the same type, even if they have different numerical values for the length scale and variance. In this case, you might store them in a Vector{some_concrete_kernel_type}, which doesn't lead to any type instabilities.

There is still work to see if for similar metric AND transform, one can save some computation time.

My intuition has always been that this optimisation is likely not worth it. Do you have any particular kernels in mind for which this is going to be helpful?

@devmotion
Copy link
Member

My intuition has always been that this optimisation is likely not worth it. Do you have any particular kernels in mind for which this is going to be helpful?

I suggested this since I was wondering if it could be helpful, e.g., for the computation of kernelmatrix. Currently (and also in this PR) always two different kernel matrices are computed and summed (or multiplied) for all components of KernelSum (or KernelProd). I assumed it could maybe be more efficient to compute the pairwise distances only once and then map these once with one combined sum (or product) of the kappa functions if all components are based on the same metric.

In the case of two components or a tuple of components, the compiler could figure out if such an optimization is possible if the metric system is improved to a proper trait, I guess. In case of vectorized components one could check it at runtime if this optimization is possible, I guess.

@willtebbutt
Copy link
Member

Okay, my question was more of the form: can you give me some kernels for which this optimisation would be helpful that I can't trivially re-write as a more efficient kernel, and that I actually care about.

For example, Matern32() + Matern32() doesn't count, because I can trivially re-write it as 2 * Matern32(). Similarly, I would argue that Matern32() + Matern12() doesn't count because I don't think that I really care about that particular kernel as you will always have different length scales for the different kernels which, in general, requires the construction of different distance matrices.

@theogf
Copy link
Member Author

theogf commented Apr 8, 2020

Yeah that's a fair point. Theoretically it is possible, and it would lead to a performance improvement but in practice such a case would never be found! To stretch it to a maximum, in the case of SqEuclidean and Euclidean, it would be possible that with multiple kernels with isotropic lengthscales to compute the untransformed data pairwise matrix and scale/square/sqrt the values appropriately after.

@willtebbutt
Copy link
Member

willtebbutt commented Apr 8, 2020

it would be possible that with multiple kernels with isotropic lengthscales to compute the untransformed data pairwise matrix and scale/square/sqrt the values appropriately after.

Yeah, this is the only case that I can see this helping in. It might actually come up fairly often in 1 / low dimensions if you've got a sum of isotropic kernels with different length scales or something. No idea whether it's ever a bottleneck though.

In any case, I'm certainly not opposed to someone figuring whether this is really worth it / out how to make it work in cases that we care about, but it doesn't (probably shouldn't) happen in this PR.

@theogf
Copy link
Member Author

theogf commented Apr 8, 2020

Regarding what you said in Tuple vs Vector, are you aware on the number of kernel that would lead to a performance decrease?
Generally the whole thing seems a bit messy to me, given the fact that there is no optimization in the background for now, having large tuples or sums of sums etc does not make any difference

@willtebbutt
Copy link
Member

willtebbutt commented Apr 8, 2020

Regarding what you said in Tuple vs Vector, are you aware on the number of kernel that would lead to a performance decrease?

Not entirely sure. There is a number somewhere in wherever Tuples are implemented, but the rule of thumb is you shouldn't use Tuples to encode more than a handful of things -- they quickly create noticeably slower compile times if they're large (in my experience), and inflate your type signatures horribly. They're just not the right choice if you're trying to hold lots of things.

Generally the whole thing seems a bit messy to me, given the fact that there is no optimization in the background for now, having large tuples or sums of sums etc does not make any difference

I'm not sure that I follow. For example, the Sum implementation would look something like

struct Sum{Tks} <: Kernel
    ks::Tks
end

function kernelmatrix(
    κ::KernelSum,
    X::AbstractMatrix,
    Y::AbstractMatrix;
    obsdim::Int = defaultobs,
)
    return sum(map(k -> _kernelmatrix(k, X, Y, obsdim), κ.ks))
end

Have I missed something?

@theogf
Copy link
Member Author

theogf commented Apr 8, 2020

The whole thing of having an "arbitrary" threshold of kernels, seems a bit weird. And I thought having two different parametrizations would be complicated but you just showed that the common interface is actually pretty simple.

@theogf theogf mentioned this pull request Apr 27, 2020
6 tasks
@theogf theogf added this to the Road do 1.0 milestone Apr 28, 2020
@theogf
Copy link
Member Author

theogf commented Aug 4, 2020

Closed in favor of #146

@theogf theogf closed this Aug 4, 2020
@theogf theogf deleted the refactor_sum_product branch November 24, 2020 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants