Skip to content

Add median_heuristic_transform #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 12, 2022
Merged

Conversation

devmotion
Copy link
Member

This PR adds median_heuristic_transform(distance, x) which returns a ScaleTransform with the lengthscale set to the median distance of the datapoints in x.

Fixes #78.

@theogf
Copy link
Member

theogf commented Jan 23, 2021

That's very nice! But you probably want to ignore the contributions of the diagonal for a better estimate...

@theogf
Copy link
Member

theogf commented Jan 24, 2021

Should there also be a wrapper median_heuristic_transform(k::BaseKernel, X) that directly returns a Transformed Kernel?

@devmotion
Copy link
Member Author

Probably a general wrapper is not useful since it will give incorrect/unexpected results for kernels with non-standard "metrics" such as SqEuclidean for SqExponentialKernel. There one should still use Euclidean or take the square root of the median_heuristic with SqEuclidean. So one would have to add special implementations for different kernels and this seems a bit too much.

I am wondering if one should maybe only define median_heuristic (without transform) that returns the inverse lengthscale according to the heuristic instead of the ScaleTransform. In the literature different definitions of the median heuristic are used, e.g. for the SqExponentialKernel sometimes with and sometimes without a scaling factor of 1/sqrt(2). It would be easier to modify the median heuristic for users if only the scalar value instead of the transform would be returned.

@theogf
Copy link
Member

theogf commented Jan 25, 2021

Probably a general wrapper is not useful since it will give incorrect/unexpected results for kernels with non-standard "metrics" such as SqEuclidean for SqExponentialKernel. There one should still use Euclidean or take the square root of the median_heuristic with SqEuclidean. So one would have to add special implementations for different kernels and this seems a bit too much.

Couldn't we automatically check what distance is used and correctly infer what should be returned :

function median_heuristic_transform(k::BaseKernel, X)
	if metric(k) isa SqEuclidean
		return transform(k, inv(sqrt(median_heuristic(metric(k), X))))
	elseif metric(k) isa Euclidean
		return transform(k, inv(median_heuristic(metric(k), X)))
	else
		error("No heuristic could be inferred with the given kernel, see docs for `median_heuristic`")
	end
end

I am wondering if one should maybe only define median_heuristic (without transform) that returns the inverse lengthscale according to the heuristic instead of the ScaleTransform.

I completely agree! It also allows to also use it as an initialization for an ARDTransform for example

@devmotion
Copy link
Member Author

If one wants to implement something like this I would suggest not hardcoding the metrics but instead using something like:

median_heuristic(k::BaseKernel, x::AbstractVector) = median_heuristic(base_metric(metric(k)), x)

base_metric(m) = m
base_metric(::SqEuclidean) = Euclidean()

This could be extended and defined for other metrics as well (well, if they are not defined by the user it would be type piracy...).

I am still a bit sceptical if it is useful though since it does not cover kernels that are not BaseKernels which includes any more complex kernel such as sums of kernels, products of kernels, and scaled kernels. For the latter one could add another special implementation but in general it is more difficult.

@theogf
Copy link
Member

theogf commented Jan 25, 2021

We could also implement this approach (Section 5) https://arxiv.org/pdf/1807.01750.pdf

@st--
Copy link
Member

st-- commented Jan 27, 2021

What do you want the scope of KernelFunctions.jl to be? What should be within scope, what shouldn't be? I agree it's nice to have this functionality available, I'm just wondering whether it should go into this package - I normally tend to be on the side of "yay more features" but in other projects I've worked in I tended in hindsight to appreciate when we limited the scope and regret when we let it creep. Perhaps having a little bit of an outsider view makes it easier for me to wonder if this is the right place for it - it seems to me not well connected to the rest of this package. Personally I would include it as a demonstration in a notebook perhaps, or in a separate "heuristics"/"assorted tools" package.

@theogf
Copy link
Member

theogf commented Mar 19, 2021

Personally I would include it as a demonstration in a notebook perhaps, or in a separate "heuristics"/"assorted tools" package.

That's a good point but there is also a risk in separating functionalities too much. median_heuristic is definitely a standard tool for kernels, just as much as adding diagonal jitter to be sure to obtain a PD matrix (done via kernelpdmat).
This is a typical error that beginner users tend to face and having it directly in the main package would be nice.

If we don't want to have additional transform APIs I would be in favor to provide median_heuristic only and to properly document its use in different examples.

@theogf
Copy link
Member

theogf commented Jul 22, 2021

I think the only thing missing for this PR is to properly introduce the method in the docs.

@theogf
Copy link
Member

theogf commented Nov 17, 2021

@devmotion Do you want to finish this PR, or if you want I can take of finishing it.

@devmotion
Copy link
Member Author

Oh, I missed your earlier comment during my vacation.

I think the only thing missing for this PR is to properly introduce the method in the docs.

It's already included in the documentation but I noticed that it should be moved to the section with "convenience helpers". Do you think it should advertised more prominently? While it is a popular and simple heuristic, it can be quite suboptimal and there is a diverse set of different definitions in literature. Therefore I'm not even sure anymore if it should be included at all 😄

That's very nice! But you probably want to ignore the contributions of the diagonal for a better estimate...

This is not done in the current implementation. I found both approaches in literature, so even this point is not clearly defined. Personally I think they should be excluded.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Nov 19, 2021

Codecov Report

Merging #245 (b1ba461) into master (99b53c6) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #245      +/-   ##
==========================================
+ Coverage   93.14%   93.16%   +0.02%     
==========================================
  Files          52       52              
  Lines        1254     1259       +5     
==========================================
+ Hits         1168     1173       +5     
  Misses         86       86              
Impacted Files Coverage Δ
src/KernelFunctions.jl 100.00% <ø> (ø)
src/transform/scaletransform.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99b53c6...b1ba461. Read the comment docs.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@devmotion
Copy link
Member Author

Test error is caused by the unrelated Zygote issue.

@devmotion devmotion requested a review from theogf April 12, 2022 16:52
@devmotion devmotion merged commit 7143f87 into master Apr 12, 2022
@devmotion devmotion deleted the dw/median_heuristic_transform branch April 12, 2022 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a good lengthscale initialization
3 participants