Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deviation functions #41

Merged
merged 5 commits into from
Jul 4, 2019

Conversation

munckymagik
Copy link
Contributor

@munckymagik munckymagik commented Apr 17, 2019

Port of deviation functions from StatsBase.jl.

References:

TODO

Basic port:

Design:

  • Design error handling
  • Consider using sum, max etc

Testing:

  • Test with integer inputs
  • Test with multi-dimensional inputs
  • Test with empty inputs
  • Test with inconsistent lengths
  • Test with no differences
  • Test with NaNs / NoisyFloats
  • Test with mut input
  • Test with Clone types (BigInt, Complex maybe?)
  • Consider using quickcheck

Finishing up:

  • Try to simplify trait bounds
  • Review function names
  • Review variable names, r, a, and b etc.
  • private_impl! marker
  • Documentation & examples
  • Squash commits down a bit

@munckymagik
Copy link
Contributor Author

Hi @jturner314 @LukeMathWalker, I still have a number of things I want to address (see todo list in description). However, would you mind having a quick look through to make sure I'm heading in the right direction?

@LukeMathWalker
Copy link
Member

Sorry @munckymagik, but I won't be able to take a look before the beginning of June - I will be traveling around for work.
Once I am back I should be able go over this PR and contribute to #47!

@munckymagik
Copy link
Contributor Author

@LukeMathWalker @jturner314 don't worry if you need time, I'm in no hurry.

Here are 4 specific questions I need to answer to continue:

  1. Implement in terms of ArrayBase::sum and QuantileExt::max etc., and wait for Work on performance issues in summary statistics due to using ArrayBase::sum #35 to address performance? Or retain the more self-contained implementations we have as of now (f11665d) and look to optimise?
  2. Do we really need gkldiv considering we have EntropyExt::kl_divergence?
  3. Should these methods be associated with a trait or would they be more convenient as free standing functions? a.f(b) vs. f(a, b)? We could provide both options with the trait methods forwarding to the free-standing functions.
  4. Longer more explicit method names or the abbreviated ones as used in StatsBase.jl? I've tried for longer ones so far (because it seemed like that was more consistent), but some feel a bit too long e.g. peak_signal_to_noise_ratio vs. psnr.

@LukeMathWalker
Copy link
Member

LukeMathWalker commented Jun 6, 2019

@LukeMathWalker @jturner314 don't worry if you need time, I'm in no hurry.

Here are 4 specific questions I need to answer to continue:

  1. Implement in terms of ArrayBase::sum and QuantileExt::max etc., and wait for Work on performance issues in summary statistics due to using ArrayBase::sum #35 to address performance? Or retain the more self-contained implementations we have as of now (f11665d) and look to optimise?

I don't have a strong preference in either direction. I think we could retain the current implementation and then revise once we have sorted out the performance/memory usage issue.

  1. Do we really need gkldiv considering we have EntropyExt::kl_divergence?

I would avoid duplicating functionality: where we have an equivalent function, I would just drop it from this deviation sub-module. There is no need to emulate 1 to 1 StatsBase's organization.

  1. Should these methods be associated with a trait or would they be more convenient as free standing functions? a.f(b) vs. f(a, b)? We could provide both options with the trait methods forwarding to the free-standing functions.

For consistency purposes, I would prefer to have them as methods. To avoid confusion, I'd prefer to choose one way or the other: having them both (method and free function) doesn't provide any significant advantage in my eyes.

  1. Longer more explicit method names or the abbreviated ones as used in StatsBase.jl? I've tried for longer ones so far (because it seemed like that was more consistent), but some feel a bit too long e.g. peak_signal_to_noise_ratio vs. psnr.

I generally prefer longer, self-explanatory names. It's consistent with the rest of the API and optimises for searchability when you are browsing the docs/trying things out in an IDE. It also minimises the need to actual go and read the documentation for that specific method, if its intent is clear from the name.
On the other side, we can be pragmatic: if we have some really atrociously long name, we can think of some abbreviation.

@munckymagik munckymagik force-pushed the deviations branch 3 times, most recently from 3b5656a to 65b73fa Compare June 22, 2019 15:51
@munckymagik
Copy link
Contributor Author

@LukeMathWalker really nearly there now. Is there anything specific we should do to handle NaNs?

@LukeMathWalker
Copy link
Member

For a first iteration I would do as we did for SummaryStatisticsExt: no special handling, if NaN could cause a panic make sure to point it out in the docs. @munckymagik

@munckymagik munckymagik changed the title [WIP] deviation Add deviation functions Jul 1, 2019
@munckymagik
Copy link
Contributor Author

Ok @LukeMathWalker and @jturner314, this is now complete enough to be worth reviewing. Thanks for your patience.

src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
src/deviation.rs Outdated Show resolved Hide resolved
@LukeMathWalker
Copy link
Member

Thank you for working on it @munckymagik!
I have gone through the PR and it looks quite solid to me - I have just left some minor comments regarding the docs.

@munckymagik
Copy link
Contributor Author

Thanks for the review @LukeMathWalker. I've addressed all the feedback.

I also added a link from the package summary to DeviationExt.

@LukeMathWalker
Copy link
Member

Nice work @munckymagik! Happy to merge 🚀

@LukeMathWalker LukeMathWalker merged commit 6f16dd8 into rust-ndarray:master Jul 4, 2019
@munckymagik munckymagik deleted the deviations branch July 4, 2019 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants