Skip to content

Add general TensorProduct kernel #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 16, 2020

Conversation

devmotion
Copy link
Member

See #80 (comment).

As discussed in the outdated PR #56 (comment), it is still unclear what type of arguments kernelmatrix and kerneldiagmatrix should take, or how to implement this in a general way to support both data types (i.e., collections of joint observations in all spaces (e.g., of type Vector{Tuple{Vector{Float64},Vector{Int}}}), or collections of collections of observations in each space (e.g., of type Tuple{Matrix{Float64},Matrix{Int}}).

@willtebbutt
Copy link
Member

willtebbutt commented Apr 15, 2020

It would be good to get a basic implementation off the ground -- so it should be fine to make this a Matrix for now, inline with the rest of the package. In the future when we refactor the inputs types, as per the discussion in #43 , we could consider having some custom input type that plays nicely with this type of kernel. eg. an internal representation comprising a Vector or Tuple comprising D Vectors, that looks like a Vector of N Vectors from the outside.

In an initial implementation, we could just assume 1 dimension per group.

@theogf
Copy link
Member

theogf commented Apr 15, 2020

Realising that now we have quite a collection of kernels which will only work with kappa(k, x, y) shouldn't we create generic kernelmatrix and kerneldiagmatrix for these kernels (we can create a trait), where we iterate over all columns/rows?

@willtebbutt
Copy link
Member

willtebbutt commented Apr 15, 2020

Realising that now we have quite a collection of kernels which will only work with kappa(k, x, y) shouldn't we create generic kernelmatrix and kerneldiagmatrix for these kernels (we can create a trait), where we iterate over all columns/rows?

I would be up for that generally, but in a separate PR. I think in this case we're best off implementing kernelmatrix and kerneldiagmatrix directly, since it can be done in terms of other calls to kernelmatrix and kerneldiagmatrix, and that is what is going to be most efficient in general.

@theogf
Copy link
Member

theogf commented Apr 15, 2020

Ok I will take care of this tomorrow then. This could also be a good start for tackling the "my data is not a matrix" problem

@devmotion
Copy link
Member Author

In an initial implementation, we could just assume 1 dimension per group.

So you suggest implementing kernelmatrix and kerneldiagmatrix for Matrix-like inputs, where columns (or rows, depending on obsdim) with n elements represent observations of n groups that are handled by n different kernels?

I think in this case we're best off implementing kernelmatrix and kerneldiagmatrix directly, since it can be done in terms of other calls to kernelmatrix and kerneldiagmatrix

Intuitively, I would say in the setting described above it's more efficient to fill the matrix by evaluating kappa(::TensorProduct, obs_i, obs_j) for each pair of observations (exploiting symmetry as well, I guess) instead of multiplying together n different matrices?

@willtebbutt
Copy link
Member

So you suggest implementing kernelmatrix and kerneldiagmatrix for Matrix-like inputs, where columns (or rows, depending on obsdim) with n elements represent observations of n groups that are handled by n different kernels?

Yeah, although I think in my notation I was thinking D groups of inputs with D different kernels.

Intuitively, I would say in the setting described above it's more efficient to fill the matrix by evaluating kappa(::TensorProduct, obs_i, obs_j) for each pair of observations (exploiting symmetry as well, I guess) instead of multiplying together n different matrices?

I can see your point, but it's not generally the case that any one of the D kernel matrices will be most efficiently computed in this manner. For example, if one of the D groups has P-dimensional inputs and the Exponentiated Quadratic kernel, the most efficient thing is generally going to be to make a call out to Distances.jl to compute the matrix, then product it, since the naive implementation of the kernel matrix in that case is really bad.

I think the high level point is that any given kernel knows best how to execute kernelmatrix and kerneldiagmatrix, so if you don't use their implementation it's quite possible that you'll be stuck with sub-optimal behaviour.

@devmotion
Copy link
Member Author

I completely agree that a general implementation should just call kernelmatrix and kerneldiagmatrix of the individual kernels. But in the case of matrices as inputs P always has to be 1, hasn't it? I got confused since we were talking about implementing this special case with this very special structure, but at the same time the discussion of kernelmatrix and kerneldiagmatrix seemed to suggest that one should still use the more general but in this case maybe less efficient implementation.

@willtebbutt
Copy link
Member

That's a fair point, my previous argument does really only apply to P > 1.

I think that there are other reason though, including code complexity and ease of doing reverse-mode AD. The former is in the eye of the beholder of course, and the latter is hard to assess without having a decent suite of benchmarks.

@devmotion
Copy link
Member Author

I ended up using mapreduce for kernelmatrix and kerneldiagmatrix. I'm not completely happy with the implementation of kernelmatrix! and kerneldiagmatrix!, somehow the use of Iterators.drop feels unsatisfying. It's basically just a more explicit version of the implementation of the out-of-place variants in which in the initial step the output array is filled with kernelmatrix! and kerneldiagmatrix!.

I also had to add some methods for constructing kernel matrices (and their diagonal) from vector-valued inputs (it felt silly to reshape just the slices in tensorproduct.jl, IMO this should work for other kernels as well).

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a bit of readability-related comments, also it would be great if we could have some more tests. In particular the edge-case in which there's only a single kernel in the TensorProduct.

(edit: this looks great other than the above / below 🙂 )


featuredim = feature_dim(obsdim)
if !check_dims(X, X, featuredim, obsdim)
throw(DimensionMismatch("Dimensions of the target array K $(size(K)) are not consistent with X $(size(X))"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not within 92 char lim. Please wrap string over multiple lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just copied this from kernelmatrix.jl (which is not great either) and missed that it doesn't follow the style guide.

@willtebbutt
Copy link
Member

Happy for you to merge when you're happy @devmotion , nice work.

@devmotion
Copy link
Member Author

Just want to make sure that tests pass. Otherwise I'm happy.

@devmotion
Copy link
Member Author

OK, tests pass (test failures on Julia master are unrelated).

@devmotion devmotion merged commit bb3e859 into JuliaGaussianProcesses:master Apr 16, 2020
@devmotion devmotion deleted the tensor branch April 16, 2020 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants