-
Notifications
You must be signed in to change notification settings - Fork 36
Make pullback error for ColVecs and RowVecs a bit more informative #523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pullback error for ColVecs and RowVecs a bit more informative #523
Conversation
Codecov ReportAll modified lines are covered by tests ✅
... and 22 files with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
To avoid surprisingly slow code. The code was introduced in #84 initially (moved to ChainRules in #208) but as it was copied from Stheno.jl maybe @willtebbutt has some additional comments. |
Looks good to me, can you also bump the patch version? |
That's what I suspected, but is it worth it given how difficult it can be to debug these adjoint issues for most users? 😬 |
Difficult to debug implies that probably it would be even more difficult to notice the problem and the cause for the slow performance without the error 😛 |
True! But KernelFunctions.jl generally supports these slow-paths given that it's all defined on |
I think the error makes complete sense in the scenario where all usages of |
For example, the issue referenced above, is caused because we end up silently taking the "slow path", but we're doing this because there exists a default implementation for EDIT: I realize AbstractGPs is a different package, but since it's under the same org + probably done by the same people, I'm guessing the decisions are somewhat related:) |
Without looking into all details, if AbstractGPs takes a slow path, that's a bug in AbstractGPs that should be fixed there. No method in KernelFunctions should support slow paths for ColVecs and RowVecs. But of course we want to be as generic as possible so we define methods for |
I agree with this. I can definitely see you point @torfjelde , in that if a slow path is taken, it might be nice for the code not to fall over. Moreover, if the forwards-pass takes a slow path, then I am completely fine with the reverse-pass also taking a slow path. What I was primarily trying to guard against was the forwards-pass taking the fast path, and the pullback somehow hitting the slow path (I believe that this bit me several times when I hit a piratic rrule). I'll add another comment to the PR, as I think we could clarify this even further. |
Ah gotcha, then it makes sense 👍
Yeah I can see how an incorrect But so this is in general a "restriction" with EDIT: As in, without either defining an explicit overload for the forward-pass or defining a custom adjoint. |
Depends on the kind of slow path. If when you AD the slow path, you produce a |
With "slow path" I mean when a method that works with |
Kind of. I'm just saying it's more specific than that in that the reverse-pass must return a |
Sure, that I'm with; I was referring to "slow paths" because I thought that was the original motivation as to why this But is it understandable that it's somewhat confusing that something like |
It's entirely understandable, but I felt (still feel) that it's the lesser of two evils -- it's opting for a loud error when performance is bad (read: catastrophic) on the reverse-pass, rather than allowing the code to run very slowly, but eventually produce an answer. |
Gotcha, gotcha 👍 Thanks for explaining the decision! I'm curious about this stuff partially because of the discussions we've had regarding more general batching structures in the past. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sorting this out @torfjelde . I'll merge when CI passes.
Something has broken with AD 🤦 . It's clearly not this PR's fault, so I'm going to merge and tag a release anyway. |
The adjoint defined for
ColVecs
andRowVecs
indicates that the cause of the error might be something internal to KernelFunctions.jl. But other packages are also making use ofColVecs
andRowVecs
, e.g. AbstractGPs.jl, which in turn means that issues often encountered in practice are not related to what the adjoint-error indicates, e.g. JuliaGaussianProcesses/AbstractGPs.jl#344I ran into this yesterday and because of the error message I spent a fair bit of time looking for bugs in the kernel-related code rather than looking elsewhere. Hence the PR:)
My wording in the error message can probably do with an improvement?
EDIT: Oh and why does this adjoint definition exist? Why not just pull back the vector of vectors to a matrix? AFAIK the adjoint is still valid?