-
Notifications
You must be signed in to change notification settings - Fork 333
Variance along an axis #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @LukeMathWalker. Welcome to Rust, and thanks for the PR! It looks well-written. There are a few things I'd like to see before this gets merged:
Here's an example modification for items 2 and 3 (along with a couple of stylistic changes and more docs). (I hope I didn't break anything :-).) It uses Is the implementation correct for complex numbers? It's not obvious to me whether or not it is. NumPy handles complex numbers by taking the absolute value before squaring, which seems like a reasonable approach. (I think NumPy's approach follows Uncyclopedia's statement that "The variance is always a nonnegative real number. It is equal to the sum of the variances of the real and imaginary part of the complex random variable".) Edit: One more question – is there a source that isn't paywalled for the Welford method? Edit2: The docs should also indicate what conditions cause the method to panic. |
src/numeric/impl_numeric.rs
Outdated
/// ); | ||
/// ``` | ||
pub fn var_axis(&self, axis: Axis) -> Array<A, D::Smaller> | ||
where A: LinalgScalar + ScalarOperand, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ScalarOperand is not intended to be used like this, so I'd try to find a different way to do it, without that trait.
- Variance usually has a ddof parameter, and I think we need to allow for it in some way for var and std.
- We need to find a different way to compute this, to avoid using
.to_owned()
on every row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by ScalarOperand is not intended used like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a crutch specifically for some operator impls for arrays. Not all traits have the same role.
The Welford online algorithm does not handle complex numbers by default. Nonetheless, if What is the best way to do this in Rust? if issubclass(arr.dtype.type, nt.complexfloating):
x = um.multiply(x, um.conjugate(x), out=x).real
else:
x = um.multiply(x, x, out=x) What is the most idiomatic way to handle it in Rust? Because it looks strange to me to have a type check in a generic function... |
@LukeMathWalker The way to handle this type of thing in Rust is to have all of the types you want to operate on implement the necessary trait bounds. If necessary, you can define your own traits and implement them for external types. (In this case, we need a trait for calculating the complex conjugate of the value (or getting the real and imaginary parts), and it would also be nice to have a trait define the associated real type (e.g. After thinking about this some more, I would not object to supporting only real numbers, at least until rust-num/num-complex#2 is resolved. In fact, I'd prefer to support only
@bluss What are your thoughts? Other comments on the PR:
|
Add tests with discontiguous owned data
Add note about memory layout in .to_owned() docs
…to avoid panicking if length of axis is equal to 1. Added panic if ddof is greater or equal than the length of the variance axis.
I have refactored I added another panic trigger in the case of I am open to implementing the overall trait architecture to support For the time being I have not dropped the |
Signed-off-by: Dan Mack <[email protected]>
DOC: Fix spelling and white space in docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry about the delay; life has been busy recently.
The Float
bound is sufficient; we can remove the ScalarOperand
bound by rewriting the division to use mapv
. (See the comments below.) Everything else looks good to me.
@LukeMathWalker Will you please make the changes listed in the comments below and squash this PR into a single commit? It would also be nice if you could rebase off of the latest master.
Once that's done, I'll merge this PR unless @bluss has any objections.
src/numeric/impl_numeric.rs
Outdated
panic!("Ddof needs to be strictly smaller than the length \ | ||
of the axis you are computing the variance for!") | ||
} else { | ||
sum_sq / (count - ddof) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change this line to sum_sq.mapv(|s| s / (count - ddof))
to remove the ScalarOperand
bound.
Edit: It might be faster to do this instead:
let dof = count - ddof;
sum_sq.mapv(|s| s / dof)
to avoid recomputing of count - ddof
for every element.
src/numeric/impl_numeric.rs
Outdated
/// ``` | ||
pub fn var_axis(&self, axis: Axis, ddof: A) -> Array<A, D::Smaller> | ||
where | ||
A: Float + ScalarOperand, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ScalarOperand
bound can be removed.
src/numeric/impl_numeric.rs
Outdated
@@ -14,6 +14,7 @@ use imp_prelude::*; | |||
use numeric_util; | |||
|
|||
use { | |||
ScalarOperand, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import can be removed.
src/numeric/impl_numeric.rs
Outdated
/// ``` | ||
/// | ||
/// **Panics** if `ddof` is greater equal than the length of `axis`. | ||
/// **Panics** if `axis` is out of bounds or if lenght of `axis` is zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "lenght" should be "length"
src/numeric/impl_numeric.rs
Outdated
{ | ||
let mut count = A::zero(); | ||
let mut mean = Array::zeros(self.dim.remove_axis(axis)); | ||
let mut sum_sq = Array::zeros(self.dim.remove_axis(axis)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tried making the other changes, the compiler had trouble inferring the type of this array, so it was necessary to change this to Array::<A, _>::zeros(...)
.
…to avoid panicking if length of axis is equal to 1. Added panic if ddof is greater or equal than the length of the variance axis.
I have made all the edits you suggested to remove the |
Okay, I squashed, merged, and closed this PR. @LukeMathWalker Thanks for working on this and for your patience! I've wanted to add a variance method to |
Morning!
I crafted a small function to compute the variance along an axis, which seemed to be a missing functionality.
It's my first Rust PR, so I am more than open to suggestions and critics.