Variance along an axis #440

LukeMathWalker · 2018-04-24T06:32:59Z

Morning!

I crafted a small function to compute the variance along an axis, which seemed to be a missing functionality.
It's my first Rust PR, so I am more than open to suggestions and critics.

jturner314 · 2018-04-24T23:16:34Z

Hi @LukeMathWalker. Welcome to Rust, and thanks for the PR! It looks well-written.

There are a few things I'd like to see before this gets merged:

Add a few tests of .var_axis() in tests/array.rs, including a test with complex numbers.
Add a ddof parameter so that the method can calculate both the population variance and sample variance. (For example, see the ddof parameter for numpy.var.)
Reduce the number of heap allocations. (There are two heap allocations in each iteration of the loop. self.subview(axis, i).to_owned() clones the data in the subview. &new_row - &mean performs a heap allocation for its result.)

Here's an example modification for items 2 and 3 (along with a couple of stylistic changes and more docs). (I hope I didn't break anything :-).) It uses azip! to avoid making any heap allocations in the loop. This provides a large performance boost in some cases (e.g. 72 μs → 21 μs on one benchmark).

Is the implementation correct for complex numbers? It's not obvious to me whether or not it is. NumPy handles complex numbers by taking the absolute value before squaring, which seems like a reasonable approach. (I think NumPy's approach follows Uncyclopedia's statement that "The variance is always a nonnegative real number. It is equal to the sum of the variances of the real and imaginary part of the complex random variable".)

Edit: One more question – is there a source that isn't paywalled for the Welford method?

Edit2: The docs should also indicate what conditions cause the method to panic.

bluss · 2018-04-28T09:31:29Z

src/numeric/impl_numeric.rs

+    /// );
+    /// ```
+    pub fn var_axis(&self, axis: Axis) -> Array<A, D::Smaller>
+        where A: LinalgScalar + ScalarOperand,


ScalarOperand is not intended to be used like this, so I'd try to find a different way to do it, without that trait.

Variance usually has a ddof parameter, and I think we need to allow for it in some way for var and std.

We need to find a different way to compute this, to avoid using .to_owned() on every row.

What do you mean by ScalarOperand is not intended used like this?

It's a crutch specifically for some operator impls for arrays. Not all traits have the same role.

LukeMathWalker · 2018-05-01T07:42:55Z

The Welford online algorithm does not handle complex numbers by default. Nonetheless, if X=a+ib, where a and b are real vectors, Var[X]=Var[a]+Var[b]; we can thus use this function on the real and on the imaginary parts separately and then sum the results.

What is the best way to do this in Rust?
In Python I would have done something similar to what NumPy does, i.e. a type check:

if issubclass(arr.dtype.type, nt.complexfloating):
        x = um.multiply(x, um.conjugate(x), out=x).real
    else:
        x = um.multiply(x, x, out=x)

What is the most idiomatic way to handle it in Rust? Because it looks strange to me to have a type check in a generic function...

jturner314 · 2018-05-03T22:18:15Z

@LukeMathWalker The way to handle this type of thing in Rust is to have all of the types you want to operate on implement the necessary trait bounds. If necessary, you can define your own traits and implement them for external types. (In this case, we need a trait for calculating the complex conjugate of the value (or getting the real and imaginary parts), and it would also be nice to have a trait define the associated real type (e.g. f64 for c64) so that we could always return a real number instead of a complex number from var_axis.)

After thinking about this some more, I would not object to supporting only real numbers, at least until rust-num/num-complex#2 is resolved. In fact, I'd prefer to support only A: Float for the time being. Here's my reasoning:

If we want to support complex numbers before rust-num/num-complex#2 is resolved, then we need to add traits to be able to operate over element types in a generic way. In particular, we'd need traits similar to Conjugate and AssociatedReal from the ndarray-linalg crate. I'd prefer to avoid adding these traits to ndarray because I don't see any obvious uses for them other than variance and standard deviation, so the additional complexity seems to outweigh the benefit of supporting complex numbers in this case.
I'd prefer an A: Float constraint over A: Add + Div + ... (or equivalent bound) for two reasons:
1. It doesn't seem very useful to be able to calculate the variance of integer arrays since the error will be fairly large.
2. Since Float is not implemented for complex numbers, this also avoids users trying to take the variance of a complex array and getting an incorrect result.

@bluss What are your thoughts?

Other comments on the PR:

The most recent commits strip trailing whitespace on some lines unrelated to the changes (e.g. quite a few in src/lib.rs). I prefer not to do this in order to keep a clean history.
I noticed that this method will panic if the length of the axis is less than 2. I think we should support axes of length 1 without panicking. What should be behavior for axes of length 0? Panic or return Err/None? I lean towards "panic".

To support axes of length 1, we can just change the initialization and iteration to:
```
let mut count = A::zero();
let mut mean = Array::zeros(self.dim.remove_axis(axis));
let mut sum_sq = Array::zeros(self.dim.remove_axis(axis));
for subview in self.axis_iter(axis) {
```
or we could instead add an if statement that checks for the length = 1 case.

Add tests with discontiguous owned data

Add note about memory layout in .to_owned() docs

…to avoid panicking if length of axis is equal to 1. Added panic if ddof is greater or equal than the length of the variance axis.

LukeMathWalker · 2018-05-08T07:16:49Z

I have refactored var_axis to accept A: Float + ScalarOperand instead of LinalgScalar + ScalarOperand.
I have also changed the initialization, as @jturner314 suggested, to avoid panicking on axis of length 1.
I opted to have the method panicking for axis of length 0.

I added another panic trigger in the case of ddof greater or equal than the length of the axis (otherwise we might return negative values - it does not make sense for a variance computation).

I am open to implementing the overall trait architecture to support Complex values, if you believe that to be the best course of action.

For the time being I have not dropped the ScalarOperand bound: the alternative is to instantiate count as a one-element array and rely on broadcasting behaviour, which seems a little more obscure to me than a plain scalar division on an array. I am open to implement that change as well if @bluss believes it to be really necessary.

Signed-off-by: Dan Mack <[email protected]>

DOC: Fix spelling and white space in docstrings

jturner314

I'm sorry about the delay; life has been busy recently.

The Float bound is sufficient; we can remove the ScalarOperand bound by rewriting the division to use mapv. (See the comments below.) Everything else looks good to me.

@LukeMathWalker Will you please make the changes listed in the comments below and squash this PR into a single commit? It would also be nice if you could rebase off of the latest master.

Once that's done, I'll merge this PR unless @bluss has any objections.

jturner314 · 2018-05-23T22:33:54Z

src/numeric/impl_numeric.rs

+            panic!("Ddof needs to be strictly smaller than the length \
+                    of the axis you are computing the variance for!")
+        } else {
+            sum_sq / (count - ddof)


We can change this line to sum_sq.mapv(|s| s / (count - ddof)) to remove the ScalarOperand bound.

Edit: It might be faster to do this instead:

let dof = count - ddof; sum_sq.mapv(|s| s / dof)

to avoid recomputing of count - ddof for every element.

jturner314 · 2018-05-23T22:34:31Z

src/numeric/impl_numeric.rs

+    /// ```
+    pub fn var_axis(&self, axis: Axis, ddof: A) -> Array<A, D::Smaller>
+    where
+        A: Float + ScalarOperand,


The ScalarOperand bound can be removed.

jturner314 · 2018-05-23T22:34:49Z

src/numeric/impl_numeric.rs

@@ -14,6 +14,7 @@ use imp_prelude::*;
 use numeric_util;

 use {
+    ScalarOperand,


This import can be removed.

jturner314 · 2018-05-23T22:43:01Z

src/numeric/impl_numeric.rs

+    /// ```
+    ///
+    /// **Panics** if `ddof` is greater equal than the length of `axis`.
+    /// **Panics** if `axis` is out of bounds or if lenght of `axis` is zero.


Typo: "lenght" should be "length"

jturner314 · 2018-05-23T22:52:28Z

src/numeric/impl_numeric.rs

+    {
+        let mut count = A::zero();
+        let mut mean = Array::zeros(self.dim.remove_axis(axis));
+        let mut sum_sq = Array::zeros(self.dim.remove_axis(axis));


When I tried making the other changes, the compiler had trouble inferring the type of this array, so it was necessary to change this to Array::<A, _>::zeros(...).

…to avoid panicking if length of axis is equal to 1. Added panic if ddof is greater or equal than the length of the variance axis.

LukeMathWalker · 2018-05-24T08:02:34Z

I have made all the edits you suggested to remove the ScalarOperand bound.
I have rebased from master too.
To squash all the commits you can select "Squash and merge" from the Pull Request "Merge" button - I don't want to make a mess ^^"

jturner314 · 2018-05-28T20:02:43Z

Okay, I squashed, merged, and closed this PR.

@LukeMathWalker Thanks for working on this and for your patience! I've wanted to add a variance method to ndarray for a while.

LukeMathWalker added 8 commits April 19, 2018 07:51

Added function signature.

0c83a76

Ignoring stuff

dae4c72

Added implementation detail in the documentation.

548e431

Initialized the accumulators.

2049e82

Fixing the initialization.

ee88de5

Completed first implementation.

e9b19eb

Added a test for var_axis

65343f6

remove changes

371490c

bluss reviewed Apr 28, 2018

View reviewed changes

LukeMathWalker added 2 commits May 1, 2018 06:47

Included changes proposed by jturner314

8beb7fd

Added panic note to the docs. Removed a print in the test.

9db3840

ExpHP and others added 8 commits May 3, 2018 21:21

add tests with discontiguous owned data

66a8cfa

Merge pull request rust-ndarray#444 from ExpHP/owned-strided-tests

d4a326d

Add tests with discontiguous owned data

Add note about memory layout in .to_owned() docs

8c003b0

Merge pull request rust-ndarray#442 from jturner314/more-to-owned-docs

98d20ef

Add note about memory layout in .to_owned() docs

Changed bound to accept only Float values. Refactored initialization …

2f502df

…to avoid panicking if length of axis is equal to 1. Added panic if ddof is greater or equal than the length of the variance axis.

Restoring trailing white spaces.

7482baa

Restoring white spaces

6383f45

Restoring white spaces

e552f9a

danmack and others added 2 commits May 9, 2018 09:39

DOC: Fix spelling and white space in docstrings

532d557

Signed-off-by: Dan Mack <[email protected]>

Merge pull request rust-ndarray#447 from danmack/doc-spelling-fix

6d2eb5e

DOC: Fix spelling and white space in docstrings

jturner314 reviewed May 23, 2018

View reviewed changes

LukeMathWalker and others added 4 commits May 24, 2018 09:55

Made all required edits to remove ScalarOperand.

bb7a22b

Added function signature.

cdcd25d

Ignoring stuff

4b507e1

Added implementation detail in the documentation.

fe017f6

LukeMathWalker and others added 13 commits May 24, 2018 09:57

Initialized the accumulators.

c2bfe85

Fixing the initialization.

2fabc23

Completed first implementation.

0b7b289

Added a test for var_axis

34387e3

remove changes

217ba74

Included changes proposed by jturner314

3fc4228

Added panic note to the docs. Removed a print in the test.

7389720

Changed bound to accept only Float values. Refactored initialization …

d5956b8

…to avoid panicking if length of axis is equal to 1. Added panic if ddof is greater or equal than the length of the variance axis.

Restoring trailing white spaces.

d078f76

Restoring white spaces

ef0c02d

Restoring white spaces

2279f49

Made all required edits to remove ScalarOperand.

58091e4

Merge branch 'master' of github.com:LukeMathWalker/ndarray

7e52346

jturner314 added a commit to jturner314/ndarray that referenced this pull request May 28, 2018

Merge pull request rust-ndarray#440

a4f5ca7

jturner314 closed this May 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Variance along an axis #440

Variance along an axis #440

Uh oh!

LukeMathWalker commented Apr 24, 2018

Uh oh!

jturner314 commented Apr 24, 2018 •

edited

Loading

Uh oh!

bluss Apr 28, 2018

Uh oh!

LukeMathWalker Apr 28, 2018

Uh oh!

bluss Apr 29, 2018

Uh oh!

LukeMathWalker commented May 1, 2018 •

edited

Loading

Uh oh!

jturner314 commented May 3, 2018

Uh oh!

LukeMathWalker commented May 8, 2018

Uh oh!

jturner314 left a comment

Uh oh!

jturner314 May 23, 2018 •

edited

Loading

Uh oh!

jturner314 May 23, 2018

Uh oh!

jturner314 May 23, 2018

Uh oh!

jturner314 May 23, 2018

Uh oh!

jturner314 May 23, 2018

Uh oh!

LukeMathWalker commented May 24, 2018

Uh oh!

jturner314 commented May 28, 2018

Uh oh!

Uh oh!

Variance along an axis #440

Variance along an axis #440

Uh oh!

Conversation

LukeMathWalker commented Apr 24, 2018

Uh oh!

jturner314 commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bluss Apr 28, 2018

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Apr 28, 2018

Choose a reason for hiding this comment

Uh oh!

bluss Apr 29, 2018

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker commented May 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jturner314 commented May 3, 2018

Uh oh!

LukeMathWalker commented May 8, 2018

Uh oh!

jturner314 left a comment

Choose a reason for hiding this comment

Uh oh!

jturner314 May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jturner314 May 23, 2018

Choose a reason for hiding this comment

Uh oh!

jturner314 May 23, 2018

Choose a reason for hiding this comment

Uh oh!

jturner314 May 23, 2018

Choose a reason for hiding this comment

Uh oh!

jturner314 May 23, 2018

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker commented May 24, 2018

Uh oh!

jturner314 commented May 28, 2018

Uh oh!

Uh oh!

jturner314 commented Apr 24, 2018 •

edited

Loading

LukeMathWalker commented May 1, 2018 •

edited

Loading

jturner314 May 23, 2018 •

edited

Loading