-
Notifications
You must be signed in to change notification settings - Fork 36
Add median_heuristic_transform
#245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
That's very nice! But you probably want to ignore the contributions of the diagonal for a better estimate... |
Should there also be a wrapper |
Probably a general wrapper is not useful since it will give incorrect/unexpected results for kernels with non-standard "metrics" such as I am wondering if one should maybe only define |
Couldn't we automatically check what distance is used and correctly infer what should be returned : function median_heuristic_transform(k::BaseKernel, X)
if metric(k) isa SqEuclidean
return transform(k, inv(sqrt(median_heuristic(metric(k), X))))
elseif metric(k) isa Euclidean
return transform(k, inv(median_heuristic(metric(k), X)))
else
error("No heuristic could be inferred with the given kernel, see docs for `median_heuristic`")
end
end
I completely agree! It also allows to also use it as an initialization for an |
If one wants to implement something like this I would suggest not hardcoding the metrics but instead using something like: median_heuristic(k::BaseKernel, x::AbstractVector) = median_heuristic(base_metric(metric(k)), x)
base_metric(m) = m
base_metric(::SqEuclidean) = Euclidean() This could be extended and defined for other metrics as well (well, if they are not defined by the user it would be type piracy...). I am still a bit sceptical if it is useful though since it does not cover kernels that are not |
We could also implement this approach (Section 5) https://arxiv.org/pdf/1807.01750.pdf |
What do you want the scope of KernelFunctions.jl to be? What should be within scope, what shouldn't be? I agree it's nice to have this functionality available, I'm just wondering whether it should go into this package - I normally tend to be on the side of "yay more features" but in other projects I've worked in I tended in hindsight to appreciate when we limited the scope and regret when we let it creep. Perhaps having a little bit of an outsider view makes it easier for me to wonder if this is the right place for it - it seems to me not well connected to the rest of this package. Personally I would include it as a demonstration in a notebook perhaps, or in a separate "heuristics"/"assorted tools" package. |
That's a good point but there is also a risk in separating functionalities too much. If we don't want to have additional transform APIs I would be in favor to provide |
I think the only thing missing for this PR is to properly introduce the method in the docs. |
@devmotion Do you want to finish this PR, or if you want I can take of finishing it. |
Oh, I missed your earlier comment during my vacation.
It's already included in the documentation but I noticed that it should be moved to the section with "convenience helpers". Do you think it should advertised more prominently? While it is a popular and simple heuristic, it can be quite suboptimal and there is a diverse set of different definitions in literature. Therefore I'm not even sure anymore if it should be included at all 😄
This is not done in the current implementation. I found both approaches in literature, so even this point is not clearly defined. Personally I think they should be excluded. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #245 +/- ##
==========================================
+ Coverage 93.14% 93.16% +0.02%
==========================================
Files 52 52
Lines 1254 1259 +5
==========================================
+ Hits 1168 1173 +5
Misses 86 86
Continue to review full report at Codecov.
|
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Test error is caused by the unrelated Zygote issue. |
This PR adds
median_heuristic_transform(distance, x)
which returns aScaleTransform
with the lengthscale set to the mediandistance
of the datapoints inx
.Fixes #78.