Add `median_heuristic_transform` #245

devmotion · 2021-01-22T22:57:29Z

This PR adds median_heuristic_transform(distance, x) which returns a ScaleTransform with the lengthscale set to the median distance of the datapoints in x.

Fixes #78.

theogf · 2021-01-23T13:23:24Z

That's very nice! But you probably want to ignore the contributions of the diagonal for a better estimate...

theogf · 2021-01-24T11:24:20Z

Should there also be a wrapper median_heuristic_transform(k::BaseKernel, X) that directly returns a Transformed Kernel?

devmotion · 2021-01-24T11:33:53Z

Probably a general wrapper is not useful since it will give incorrect/unexpected results for kernels with non-standard "metrics" such as SqEuclidean for SqExponentialKernel. There one should still use Euclidean or take the square root of the median_heuristic with SqEuclidean. So one would have to add special implementations for different kernels and this seems a bit too much.

I am wondering if one should maybe only define median_heuristic (without transform) that returns the inverse lengthscale according to the heuristic instead of the ScaleTransform. In the literature different definitions of the median heuristic are used, e.g. for the SqExponentialKernel sometimes with and sometimes without a scaling factor of 1/sqrt(2). It would be easier to modify the median heuristic for users if only the scalar value instead of the transform would be returned.

theogf · 2021-01-25T09:32:46Z

Probably a general wrapper is not useful since it will give incorrect/unexpected results for kernels with non-standard "metrics" such as SqEuclidean for SqExponentialKernel. There one should still use Euclidean or take the square root of the median_heuristic with SqEuclidean. So one would have to add special implementations for different kernels and this seems a bit too much.

Couldn't we automatically check what distance is used and correctly infer what should be returned :

function median_heuristic_transform(k::BaseKernel, X)
	if metric(k) isa SqEuclidean
		return transform(k, inv(sqrt(median_heuristic(metric(k), X))))
	elseif metric(k) isa Euclidean
		return transform(k, inv(median_heuristic(metric(k), X)))
	else
		error("No heuristic could be inferred with the given kernel, see docs for `median_heuristic`")
	end
end

I am wondering if one should maybe only define median_heuristic (without transform) that returns the inverse lengthscale according to the heuristic instead of the ScaleTransform.

I completely agree! It also allows to also use it as an initialization for an ARDTransform for example

devmotion · 2021-01-25T09:49:41Z

If one wants to implement something like this I would suggest not hardcoding the metrics but instead using something like:

median_heuristic(k::BaseKernel, x::AbstractVector) = median_heuristic(base_metric(metric(k)), x)

base_metric(m) = m
base_metric(::SqEuclidean) = Euclidean()

This could be extended and defined for other metrics as well (well, if they are not defined by the user it would be type piracy...).

I am still a bit sceptical if it is useful though since it does not cover kernels that are not BaseKernels which includes any more complex kernel such as sums of kernels, products of kernels, and scaled kernels. For the latter one could add another special implementation but in general it is more difficult.

theogf · 2021-01-25T13:42:35Z

We could also implement this approach (Section 5) https://arxiv.org/pdf/1807.01750.pdf

st-- · 2021-01-27T10:15:30Z

What do you want the scope of KernelFunctions.jl to be? What should be within scope, what shouldn't be? I agree it's nice to have this functionality available, I'm just wondering whether it should go into this package - I normally tend to be on the side of "yay more features" but in other projects I've worked in I tended in hindsight to appreciate when we limited the scope and regret when we let it creep. Perhaps having a little bit of an outsider view makes it easier for me to wonder if this is the right place for it - it seems to me not well connected to the rest of this package. Personally I would include it as a demonstration in a notebook perhaps, or in a separate "heuristics"/"assorted tools" package.

theogf · 2021-03-19T14:56:37Z

Personally I would include it as a demonstration in a notebook perhaps, or in a separate "heuristics"/"assorted tools" package.

That's a good point but there is also a risk in separating functionalities too much. median_heuristic is definitely a standard tool for kernels, just as much as adding diagonal jitter to be sure to obtain a PD matrix (done via kernelpdmat).
This is a typical error that beginner users tend to face and having it directly in the main package would be nice.

If we don't want to have additional transform APIs I would be in favor to provide median_heuristic only and to properly document its use in different examples.

theogf · 2021-07-22T12:45:55Z

I think the only thing missing for this PR is to properly introduce the method in the docs.

theogf · 2021-11-17T14:50:08Z

@devmotion Do you want to finish this PR, or if you want I can take of finishing it.

devmotion · 2021-11-19T01:36:22Z

Oh, I missed your earlier comment during my vacation.

I think the only thing missing for this PR is to properly introduce the method in the docs.

It's already included in the documentation but I noticed that it should be moved to the section with "convenience helpers". Do you think it should advertised more prominently? While it is a popular and simple heuristic, it can be quite suboptimal and there is a diverse set of different definitions in literature. Therefore I'm not even sure anymore if it should be included at all 😄

That's very nice! But you probably want to ignore the contributions of the diagonal for a better estimate...

This is not done in the current implementation. I found both approaches in literature, so even this point is not clearly defined. Personally I think they should be excluded.

src/transform/scaletransform.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

codecov · 2021-11-19T02:07:08Z

Codecov Report

Merging #245 (b1ba461) into master (99b53c6) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #245      +/-   ##
==========================================
+ Coverage   93.14%   93.16%   +0.02%     
==========================================
  Files          52       52              
  Lines        1254     1259       +5     
==========================================
+ Hits         1168     1173       +5     
  Misses         86       86

Impacted Files	Coverage Δ
src/KernelFunctions.jl	`100.00% <ø> (ø)`
src/transform/scaletransform.jl	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99b53c6...b1ba461. Read the comment docs.

test/transform/scaletransform.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

devmotion · 2021-11-19T13:21:49Z

Test error is caused by the unrelated Zygote issue.

Add median_heuristic_transform

ed0f218

devmotion mentioned this pull request Jan 25, 2021

Bandwith selection via the heat equation #248

Open

Merge branch 'master' into dw/median_heuristic_transform

53d9904

devmotion added 3 commits November 19, 2021 02:36

Merge branch 'master' into dw/median_heuristic_transform

75faa4f

Move median_heuristic_transform to convenience fcns in the docs

04a7f3d

Only compute pairwise distances between different elements

d68e207

github-actions bot reviewed Nov 19, 2021

View reviewed changes

src/transform/scaletransform.jl Outdated Show resolved Hide resolved

Update src/transform/scaletransform.jl

027d607

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Fix tests

389a383

github-actions bot reviewed Nov 19, 2021

View reviewed changes

test/transform/scaletransform.jl Outdated Show resolved Hide resolved

Update test/transform/scaletransform.jl

a250a93

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

devmotion added 2 commits April 10, 2022 01:30

Merge branch 'master' into dw/median_heuristic_transform

eaf7e95

Update Project.toml

b1ba461

devmotion requested a review from theogf April 12, 2022 16:52

theogf approved these changes Apr 12, 2022

View reviewed changes

devmotion merged commit 7143f87 into master Apr 12, 2022

devmotion deleted the dw/median_heuristic_transform branch April 12, 2022 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `median_heuristic_transform` #245

Add `median_heuristic_transform` #245

Uh oh!

devmotion commented Jan 22, 2021

Uh oh!

theogf commented Jan 23, 2021

Uh oh!

theogf commented Jan 24, 2021

Uh oh!

devmotion commented Jan 24, 2021

Uh oh!

theogf commented Jan 25, 2021

Uh oh!

devmotion commented Jan 25, 2021

Uh oh!

theogf commented Jan 25, 2021

Uh oh!

st-- commented Jan 27, 2021

Uh oh!

theogf commented Mar 19, 2021

Uh oh!

theogf commented Jul 22, 2021

Uh oh!

theogf commented Nov 17, 2021

Uh oh!

devmotion commented Nov 19, 2021

Uh oh!

Uh oh!

codecov bot commented Nov 19, 2021 •

edited

Loading

Uh oh!

Uh oh!

devmotion commented Nov 19, 2021

Uh oh!

Uh oh!

Add median_heuristic_transform #245

Add median_heuristic_transform #245

Uh oh!

Conversation

devmotion commented Jan 22, 2021

Uh oh!

theogf commented Jan 23, 2021

Uh oh!

theogf commented Jan 24, 2021

Uh oh!

devmotion commented Jan 24, 2021

Uh oh!

theogf commented Jan 25, 2021

Uh oh!

devmotion commented Jan 25, 2021

Uh oh!

theogf commented Jan 25, 2021

Uh oh!

st-- commented Jan 27, 2021

Uh oh!

theogf commented Mar 19, 2021

Uh oh!

theogf commented Jul 22, 2021

Uh oh!

theogf commented Nov 17, 2021

Uh oh!

devmotion commented Nov 19, 2021

Uh oh!

Uh oh!

codecov bot commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

devmotion commented Nov 19, 2021

Uh oh!

Uh oh!

Add `median_heuristic_transform` #245

Add `median_heuristic_transform` #245

codecov bot commented Nov 19, 2021 •

edited

Loading