Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Means #20

Merged
merged 10 commits into from
Jan 10, 2019
Merged

Means #20

merged 10 commits into from
Jan 10, 2019

Conversation

LukeMathWalker
Copy link
Member

A basic implementation of arithmetic, harmonic and geometric mean.

Even though ndarray exposes mean_axis it does not provide mean, hence I have added it to the PR. Should I contribute it back to ndarray @jturner314?

@LukeMathWalker LukeMathWalker mentioned this pull request Jan 5, 2019
17 tasks
Copy link
Member

@jturner314 jturner314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I've added some comments.

A couple more things:

  1. I do think we should add .mean() to ndarray since ndarray already has .sum() and .mean_axis().

  2. It's unfortunate that harmonic_mean and geometric_mean allocate a temporary array for the result of the map. However, for the time being, I don't see a good way to avoid this while still taking advantage of the pairwise implementation of .sum() (Pairwise summation ndarray#577).

src/summary_statistics/means.rs Outdated Show resolved Hide resolved
src/summary_statistics/means.rs Outdated Show resolved Hide resolved
src/summary_statistics/means.rs Outdated Show resolved Hide resolved
src/summary_statistics/mod.rs Outdated Show resolved Hide resolved
src/summary_statistics/mod.rs Outdated Show resolved Hide resolved
src/summary_statistics/means.rs Outdated Show resolved Hide resolved
@LukeMathWalker
Copy link
Member Author

LukeMathWalker commented Jan 9, 2019

I do think we should add .mean() to ndarray since ndarray already has .sum() and .mean_axis().

I agree - but the semantic of mean_axis is different at the moment: it panics instead of returning a None if the array is empty. That might have to be changed.

It's unfortunate that harmonic_mean and geometric_mean allocate a temporary array for the result of the map. However, for the time being, I don't see a good way to avoid this while still taking advantage of the pairwise implementation of .sum() (rust-ndarray/ndarray#577).

It's exactly what I thought (and mean calculation is why I drafted the pairwise summation PR in the first place). I think it can be optimized to use less memory (we could use a lazy version of map and re-engineer pairwise summation to consume the array in blocks, thus only allocating a fixed size buffer to hold the current block it's working on and partial results), but I wouldn't pursue it right now.

@LukeMathWalker
Copy link
Member Author

LukeMathWalker commented Jan 9, 2019

CI on Rust 1.30 is failing with error: Edition 2018 is unstable and only available for nightly builds of rustc. for noisy_floats. I am bumping Rust version to 1.31 in CI to fix it.

@jturner314 jturner314 merged commit b3161ce into rust-ndarray:master Jan 10, 2019
@jturner314
Copy link
Member

Everything looks good, so I merged the PR. If .mean() gets added to ndarray, we can remove it from ndarray-stats.

I agree - but the semantic of mean_axis is different at the moment: it panics instead of returning a None if the array is empty. That might have to be changed.

Will you please submit a PR to ndarray to add .mean()? I lean towards changing mean_axis to return None instead of making mean divide by zero, because of this reasoning. However, for floating-point values, division by zero would be more convenient since it would just result in NaN instead of panicking. What do you think? I'll be interested to hear @bluss's opinion too.

@bluss
Copy link
Member

bluss commented Jan 10, 2019

@jturner314 Your rule makes sense, sounds good to me to return an Option. I think it's quite frequent anyway that you want to compute a mean and if empty, get zero, and the option makes it simple to do that.

@LukeMathWalker
Copy link
Member Author

LukeMathWalker commented Jan 10, 2019

I do agree with that reasoning and I still think that it's better to return None even for float values: it's much easier to track down an unwrap or an expect that panicked instead of a NaN that might have propagated way down the line when it actually gets detected.

I'll draft the PR for ndarray.

@LukeMathWalker LukeMathWalker deleted the harmonic-mean branch March 10, 2019 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants