Skip to content

Add Spectral Mixture Kernel #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 5, 2020
Merged

Add Spectral Mixture Kernel #80

merged 23 commits into from
May 5, 2020

Conversation

sharanry
Copy link
Contributor

Issue #44
Gaussian Spectral Mixture kernel function. The kernel function
parametrization depends on the sign of Q.

Let t(Dx1) be an offset vector in dataspace e.g. t = x-z. Then w(DxP)
are the weights and m(Dx|Q|) = 1/p, v(Dx|Q|) = (2*pi*ell)^-2 are spectral
means (frequencies) and variances, where p is the period and ell the length
scale of the Gabor function h(t2v,tm) given by the expression

    h(t2v, tm) = exp(-2 * pi^2 * t2v) .* cos(2 * pi * tm)

Then, the two covariances are obtained as follows:

SM, spectral mixture: Q>0 => P = 1

   k(x, y) = w' * h((t .* t)' * v, t' * m), t = x-y

SMP, spectral mixture product: Q<0 => P = D

   k(x, y) = prod(w' * h(T * T * v, T * m)), T = diag(t), t = x-y

Note that for D=1, the two modes +Q and -Q are exactly the same.

References:\
[1] SM: Gaussian Process Kernels for Pattern Discovery and Extrapolation,
ICML, 2013, by Andrew Gordon Wilson and Ryan Prescott Adams,
[2] SMP: GPatt: Fast Multidimensional Pattern Extrapolation with GPs,
arXiv 1310.5288, 2013, by Andrew Gordon Wilson, Elad Gilboa,
Arye Nehorai and John P. Cunningham, and
[3] Covariance kernels for fast automatic pattern discovery and extrapolation
with Gaussian processes, Andrew Gordon Wilson, PhD Thesis, January 2014.
http://www.cs.cmu.edu/~andrewgw/andrewgwthesis.pdf
[4] http://www.cs.cmu.edu/~andrewgw/pattern/.

@sharanry
Copy link
Contributor Author

Having a few issues coding up the SM's kappa. The dimensions don't seem seem to add up. Trying to find answers in the references.

@sharanry
Copy link
Contributor Author

Also, do you think it is better to have Spectral Mixture and Spectral Mixture Product as two separate kernels? I didn't really like the GPML's setup.

@theogf
Copy link
Member

theogf commented Apr 14, 2020

@sharanry GPML is a good base and it's interesting to enlarge our collection with the multiple examples they have. But we don't have to copy exactly all what they are doing. If you feel another structure is more appropriate you should go fir it.

@willtebbutt
Copy link
Member

I'm don't believe that this is the best way to implement the spectral mixture. @sharanry is there a reason not to do this by simply writing a function that accepts the parameters of a spectral mixture and spits out a kernel built by combining the existing building blocks?

@sharanry
Copy link
Contributor Author

sharanry commented Apr 14, 2020

@willtebbutt I am trying to do that using SqExponentialKernel and CosineKernel or just GaborKernel.

Problem with using GaborKernel -
The inputs to SqExponential and Cosine kernels are different for Spectral Mixture kernel. This can't be accommodated by directly using Gabor's kappa function. This forces us to access GaborKernels's component kernels separately like - k.kernel.kernel.kernels[1] and k.kernel.kernel.kernels[2]. The point of using Gabor is lost - its not really Gabor.

Problem in general with using pre existing kernels -
Parameters w, m and v vary across dimensions. The only reason we would be using pre-exsiting kernels is to use their kappa function. There are no non-constant parameters which can be accommodated as non-constant parameters of Gabor kernel or SqExponential and Cosine kernels. Again, the point of using SqExponentialKernel and CosineKernel might be lost.

This is why I thought it would be cleaner to implement from scratch.

Do you suggest I use preexisting kernels?

@willtebbutt
Copy link
Member

willtebbutt commented Apr 14, 2020

Okay, maybe I've misunderstood something. Considering first just the 1-dimensional SM kernel, is there any reason that we can't implement that as

sum(ws .* SqExponential(sigmas) .* Cosine(mus))

with appropriate re-scalings to match the parametrisations in the paper?

edit: there are also things like [1], so it would be really nice to be able to parametrise a larger space of spectral mixture kernels than just the ones involving Exponentiated Quadratic kernels. (See eqn 6)

[1] - Samo, Yves-Laurent Kom, and Stephen Roberts. "Generalized spectral kernels." arXiv preprint arXiv:1506.02236 (2015).

@sharanry
Copy link
Contributor Author

sharanry commented Apr 15, 2020

@willtebbutt I think something like sum(ws .* SqExponential(sigmas) .* Cosine(mus)) should work just fine for 1-dimensional SM kernel. Problem arises when dealing with multi-dimensional SM kernel.

Edit:
I case of multi-dimensional SM, i.e, Q>1, we would require a element dot product, i.e,

sum(w' .* exp(-2 * pi^2 * t2v) .* cos(2 * pi * tm))

but applying the two kernels kappas and taking their product would give (product of two reals)

exp(-2 * pi^2 * ||t2v||) * cos(2 * pi * ||tm||)

which are quite different?

One possible way I can think of addressing this issue while still using SqExponentialKernel and CosineKernel would be to initialize a separate pair of kernels for each dimension.

@willtebbutt
Copy link
Member

willtebbutt commented Apr 15, 2020

Yeah, so for the separable multi-dimensional kernel, we need some kind of Separable abstraction, that takes as input a collection of D kernels that are valid in 1D, and produces the following kernel:

prod(map(d -> k[d](x[d], y[d]), 1:D))

Once you've got this, constructing a spectral mixture kernel that is separable over dimensions, such as the one in the GPatt paper, is straightforward.

It's pretty common to want separable kernels, so we should definitely have this abstraction in the KernelFunctions.jl.

@sharanry
Copy link
Contributor Author

sharanry commented Apr 15, 2020

Yeah, so for the multi-dimensional kernel, we need some kind of Separable abstraction, that takes as input a collection of D kernels that are valid in 1D, and produces the following kernel:

I think so too. I mentioned something like this just now in the edit of my earlier comment.

@sharanry
Copy link
Contributor Author

It's pretty common to want separable kernels, so we should definitely have this abstraction in the KernelFunctions.jl.

@willtebbutt Do you suggest I open a separate PR for this?

@willtebbutt
Copy link
Member

Yeah. I reckon implement a PR that does the separable stuff, then come back to this one.

@theogf
Copy link
Member

theogf commented Apr 15, 2020

I would be happy to make a PR to make such a kernel, what should it be called?

@willtebbutt
Copy link
Member

I would suggest SeparableKernel, there might be a better choice though.

@devmotion
Copy link
Member

Isn't that a general tensor product kernel? 🤔 See #56 for a discussion of a version with just two kernels.

@willtebbutt
Copy link
Member

Yes, yes it is. Would be great if that PR could be extended to work nicely in D-dimensions. I'm thinking along the same lines as the discussions around sum / product kernels, where you allow for sums / product over arbitrarily many kernels, but allow the user to specify the storage to ensure efficiency over a wide range of components in the sum / product / tensor-product.

@willtebbutt
Copy link
Member

I think it should now be possible to implement this as a function, without introducing a new type. @sharanry are you happy with how that would look?

@sharanry
Copy link
Contributor Author

@willtebbutt I am currently trying to implement this using TensorProduct. But I can't think of a straight forward implementation still where SpectralMixtureKernel is just a function.

However, TensorProduct might be useful for SpectralMixtureProductKernel. I am trying it out right now.

@willtebbutt
Copy link
Member

Regarding the spectral mixture kernel, I would suggest taking a look at equation 6 of [1] (we don't need to implement precisely the parametrisation originally suggested) -- it's a strict generalised of equation 5 in the same paper, which is AGW's spectral mixture kernel for multi-dimensional inputs. We should be able to implement it just as a sum of Gabor kernels, or more generally something like

sum(alphas .* StretchTransform.(SqExponential(), gammas) .* Cosine.(omegas))

(this definitely isn't exactly correct, but hopefully the gist is clear).

Then we could just write a function that takes in the parameters alphas, gammas, and omegas, and spits out the kernel.

Certainly it won't be the case that we can use the tensor product kernel for anything other than the tensor product version of the single-dimensional spectral mixture as it's quite a restrictive form to impose on a multi-dimensional kernel.

[1] - Samo, Yves-Laurent Kom, and Stephen Roberts. "Generalized spectral kernels." arXiv preprint arXiv:1506.02236 (2015).

@sharanry
Copy link
Contributor Author

sharanry commented May 2, 2020

@willtebbutt Thanks! I will go through the paper and get back to you.

@sharanry
Copy link
Contributor Author

sharanry commented May 3, 2020

@willtebbutt I had the chance to go through the paper you mentioned and their way of parametrization.

To confirm what you meant, the idea is to create a transform called StretchTransform which would transform the input space using (omegas/gammas) before apply the kernel? It would also enable element-wise product(.*) between two StretchTransformed kernel outputs.

@sharanry
Copy link
Contributor Author

sharanry commented May 3, 2020

The problem I am facing right now is that CosineKernel/SqExponentialKernel should be applied element-wise on the output of StretchTransform. Not sure if we can do this currently with the transform function.

Edit:
I don't see a way to reduce this to function without implementing a few more transforms.
In order to reduce this to a simple function we will need

  • To implement a way to apply kernels elementwise after getting a transformed output. We could make this into a separate transform called SplitTransform whose kernel output is an array.
  • A way to stack kernels/transforms, i.e, apply the kernel on the output of another kernel/transforms.

@willtebbutt
Copy link
Member

You're completely right @sharanry , good point. It is what we want for the input to h, but as @devmotion points out the LinearTransform is the right thing for the cosine bit.

@willtebbutt
Copy link
Member

Nice changes :)

@sharanry sharanry changed the title [WIP] Add Spectral Mixture Kernel Add Spectral Mixture Kernel May 3, 2020
@sharanry
Copy link
Contributor Author

sharanry commented May 5, 2020

@willtebbutt @devmotion Do you suggest any other changes or can this be merged?

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the parameters for spectral_mixture_kernel and spectral_mixture_product_kernel are represented using row-major storage. This probably isn't optimal since Julia uses column-major, so it's the wrong access pattern really. I think you should change the parameter matrices so be of size D x K, rather than K x D.

This should simplify the implementations a bit. For example, you should be able to relax the AbstractMatrix type constraint to AbstractVecOrMat for the spectral_mixture_kernel inputs, so that there's no need to reshape stuff in spectral_mixture_product_kernel.

@sharanry
Copy link
Contributor Author

sharanry commented May 5, 2020

@willtebbutt Currently, I believe spectral_mixture_kernel accesses in a column major way whereas spectral_mixture_product_kernel accesses in a row major way. AFAIK, only way to prevent this is to keep the storage for spectral_mixture_kernel as DxA but change storage for spectral_mixture_product_kernel as AxD to prevent row wise access.

Edit:
D here is the input dimension and A is the number of spectral components.

@willtebbutt
Copy link
Member

willtebbutt commented May 5, 2020

Currently, I believe spectral_mixture_kernel accesses in a column major way whereas spectral_mixture_product_kernel accesses in a row major way.

You're totally right about this, my mistake.

The other alternative here would be the vector-of-vector approach, as we've taken with the inputs, but I think the current implementation is probably fine. I'm happy for this to be merged now :) Nice work.

edit: @sharanry could you add a default for the h parameter in both kernels? Specifically, if h is not provided, the SqExponential is used? I think this is what users would probably expect.

@willtebbutt
Copy link
Member

LGTM! Will merge once tests pass if you're happy @sharanry

@sharanry
Copy link
Contributor Author

sharanry commented May 5, 2020

LGTM too. :)

@willtebbutt willtebbutt merged commit df3819a into JuliaGaussianProcesses:master May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants