-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rolling window result centering #889
base: master
Are you sure you want to change the base?
Conversation
Pandas-like center parameter; allows centering the result values instead of having a bunch of NaNs at the beginning of the column.
@@ -164,6 +164,10 @@ default RollingColumn rolling(final int windowSize) { | |||
return new RollingColumn(this, windowSize); | |||
} | |||
|
|||
default RollingColumn rolling(final int windowSize, final boolean center) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could use a javadoc for people not familiar with pandas's center
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies if this is duplicating something i wrote elsewhere. I made a comment but don't see it now, so perhaps i didn't save it.
I think the javadoc for this method needs to incorporate a slightly shorter version of the explanation you wrote for the PR. The comment here isn't enough for anyone to really get what it does, but the explanation you wrote is very clear. If you can add that, I think we can resolve this and approve and merge the PR
so this puts n/a at front and back? I'm curious when you might use this |
(I deleted my old comment in order to write a more detailed one) I use this centering when working with date(time)s indices so that the label is set to the midpoint of the rolling window, here's an example: Let's say we have this Table
Now let's calculate a 7-Day rolling mean for this table as usual
As you can see, the mean corresponding to the period 2020-01-01 to 2020-01-07 is set to the last value of the interval (2020-01-07). Now let's do the same centering the value:
Now the same value is set to the central point of the interval, this can be useful when working with dates, specially if this data is going to be subject to interpolation, where the results would be way more accurate this way. I hope I made it clear enough, I'll be happy to answer any other question, |
Uhh. This PR looks messed up now. It says 79 files changed |
You're right, I mistakenly pushed other changes to master, it should be fine now. |
Can you give a more concrete use case? It's not clear to me why you might want to use this. In the example dataset you shared, you're utilizing data from the future, which makes it useless for prediction, so I'm not sure what you'd use it for. Perhaps there are use cases I'm not aware of though |
I'm personally not a big fan of adding more code complication to handle specific use cases that can already be handled rather simply. In this case you could use |
Pandas-like center parameter; allows centering the result values instead of having a bunch of NaNs at the beginning of the column.
Thanks for contributing.
Description
It adds the option to center the result of a RollingColumn instead of having all the missing values at the beginning, like pandas does.
Testing
Did you add a unit test?
Yes.