Skip to content

Add Latent-factor multi-output kernel #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Sep 2, 2020
Merged

Add Latent-factor multi-output kernel #143

merged 19 commits into from
Sep 2, 2020

Conversation

sharanry
Copy link
Contributor

@sharanry sharanry commented Jul 30, 2020

return sum([κ.g[i](first(x), first(y)) * κ.A[last(x), i] for i in 1:length(κ.g)]) +
κ.e[last(x)](first(x), first(y))
else
return 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a type instability, maybe we can avoid it by computing some dummy value z of the same type as in the other branch and then return zero(z)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean. How do we ensure the computed dummy value is of the same type without actually executing it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure, I was still wondering what's the best way to do it. Sometimes one can perform some possibly cheaper dummy operation by, e.g., using zeros instead of actually indexing etc. Alternatively one could always evaluate the first branch and just call zero(res) on its results res - however, that will perform superfluous computations if last(x) != last(y).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried about the first branch not giving a Float64 output? When could this possibly happen?
Maybe we could explicity impose Float64 on both the outputs.

function::LatentFactorMOKernel)((x, px)::Tuple{Vector, Int}, (y, py)::Tuple{Vector, Int})
    if px == py
        return Float64(sum([κ.g[i](x, y) * κ.A[px, i] for i in 1:length.g)]) + 
            κ.e[px](x, y))
    else
        return Float64(0.0)
    end
end

I am not sure if this would cause any problems with AD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it could happen for dual numbers (or basically any other number types). Type instability occurs even if the kernels return values of type Float32 and the elements of A are of type Float32. IMO this example also shows that it's not a good idea to enforce Float64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willtebbutt @theogf Any suggestions on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely clear on why there's this zero branch as the SLFM produces non-zero covariance between all outputs. Am I missing something?

Copy link
Contributor Author

@sharanry sharanry Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. Which kernel(s) from g and e would we use if px!=py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willtebbutt Is this resolved? I am still unsure how to handle cases when px!=py.

@theogf
Copy link
Member

theogf commented Jul 30, 2020

I dont know if it's worth opening another PR for it specifically, but it would be nice to add a multi output section to the docs

@sharanry
Copy link
Contributor Author

I dont know if it's worth opening another PR for it specifically, but it would be nice to add a multi output section to the docs

I think docs as a whole require some fixing. https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/ they don't seem to load properly.

@devmotion
Copy link
Member

@sharanry
Copy link
Contributor Author

sharanry commented Jul 30, 2020

https://juliagaussianprocesses.github.io/KernelFunctions.jl/ works though.

Oh that's weird. Maybe we should change the link in the "About" section then.

Edit: I changed it.

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is getting there, just a few things to think about.

Comment on lines 27 to 28
all(gi isa Kernel for gi in g) || error("`g` should be an collection of kernels")
all(ei isa Kernel for ei in e) || error("`e` should be an collection of kernels")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this not be achieved by type checking? i.e. constrain g::AbstractVector{<:Kernel} or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can but @devmotion suggested we allow tuples of Kernels as inputs too.

return sum([κ.g[i](first(x), first(y)) * κ.A[last(x), i] for i in 1:length(κ.g)]) +
κ.e[last(x)](first(x), first(y))
else
return 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely clear on why there's this zero branch as the SLFM produces non-zero covariance between all outputs. Am I missing something?

Copy link
Member

@theogf theogf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one typo to fix :)

@sharanry
Copy link
Contributor Author

sharanry commented Aug 5, 2020

Any objections to merging this?

@willtebbutt
Copy link
Member

Any objections to merging this?

Yes -- we still need to figure out what's going on with the off-diagonal kernel elements.

@willtebbutt Is this resolved? I am still unsure how to handle cases when px!=py.

The best equation to look at is 1.7 from my doc. This lays out how to compute the covariance between two inputs that correspond to different outputs.

@sharanry
Copy link
Contributor Author

sharanry commented Aug 6, 2020

The best equation to look at is 1.7 from my doc. This lays out how to compute the covariance between two inputs that correspond to different outputs.

How many κ^e kernels would there be? My initial assumption was that there would be out_dim of them. However the second half of the eq 1.7 κ^e((x, p), (x', p')) would indicate we would need out_dim*out_dim of them as it is parametrized by p and p' each of which can assume out_dim values. Is this true or am I missing something?

@devmotion
Copy link
Member

There's exactly one kernel k^e and one kernel k^g in this setup. The output dimension is taken care of by the modified input space (showing up by the additional p, p', and q in the equations).

@sharanry
Copy link
Contributor Author

sharanry commented Aug 6, 2020

There's exactly one kernel k^e and one kernel k^g in this setup. The output dimension is taken care of by the modified input space (showing up by the additional p, p', and q in the equations).

So this implies both k^e and k^g would be single multioutput kernels as they would need to handle modified input space?

@willtebbutt
Copy link
Member

So this implies both k^e and k^g would be single multioutput kernels as they would need to handle modified input space?

Ish. You can definitely look at them that way:

k_g((x, p), (y, q)) = p == q ? kernels_g[p](x, y) : 0 

but notice that in practice you'll only ever query k_g when p == q because equation 1.7 has already cancelled the zero terms. So from an implementation perspective, you can just utilise vectors of kernels. But it might actually make sense to make k_e an IndependentMOKernel (or whatever it is we called the multi-output kernels).

@sharanry sharanry requested a review from willtebbutt August 10, 2020 13:34
@willtebbutt
Copy link
Member

I think that this is pretty much good to go now.

The only other thing that I would ask for are that the AD tests are run for at least Zygote.

@sharanry
Copy link
Contributor Author

The only other thing that I would ask for are that the AD tests are run for at least Zygote.

@willtebbutt I am not sure how to move forward with this. I am facing problems while defining adjoints for MOKernels which expect a pair of tuples whose second element is a integer. I there any way to indicate to Zygote/FiniteDifferences that we are not looking for a adjoint w.r.t to these integer parts of the inputs? I believe we would just need to return nothing for these parts to indicate to the calling functions that there is no adjoint information available.

@willtebbutt
Copy link
Member

willtebbutt commented Aug 30, 2020

This is actually a little tricky at the minute. I would suggest writing out the test yourself (using the same structure as the test_AD tests) and get the gradient w.r.t. the kernel parameters and the vector of inputs that you use to parametrise the MOInput object. This way you should be able to avoid handling the integer values entirely.

edit: the same should work for Tuples -- compose with a thing that just accepts the vector of numbers that you care about and builds the tuples inside.

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Once the minor suggestions have been addressed and the patch version bumped, please feel free to go ahead and merge.

@sharanry sharanry merged commit 8c99314 into master Sep 2, 2020
@sharanry sharanry deleted the sharan/slfm branch September 15, 2020 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants