-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds @unnest_wider, @unnest_longer, and @nest #77
Conversation
Thanks so much, @drizk1! Let me take a look at the This is exciting! |
Sounds good! For context, originally, after separating them into two underlying functions, I thought I could use multiple dispatch with 2 For For I'm also happy to go back to the drawing board and see if I can make one function that performs the different nests to make the macro syntax match more easily. |
Ah, this may be because while functions can perform dispatch based on types, macros can only perform dispatch based on the number of arguments. This is because macros see all arguments as expressions, so they are all the same type. |
Looking at the tidyr |
Re the macros: ok this is great to know for the future. Thank you for clarifying the rowwise aspect. I spent some time reading the cheat sheet and documentation again this morning, and it lines up in my mind more now. Focusing on nest and commenting out nest_by until tidier has the rowwise dataframe ability sounds good to me. Let me know if there's I can do to help. |
At this point, the main thing I'd like to see is support for grouped data frames in If you pass it a grouped data frame, it should separately nest the selected columns into a data frame for each group, with one nested data frame per group. |
Sweet. I was tinkering with that last night actually. I think I know how to make it happen now so I'll try to make it official in the next day or two |
Alright I sorted out the grouping. While doing so, I realized tidyr nests into tibbles, which might be more similar to nesting into dataframes, than to the arrays I was nesting into. Nesting into dataframes was just a few lines to changes, so switching it is no problem. The question I now have is which one would you prefer it nests into? My understanding is that arrays may be less memory intensive but also less flexible than a dataframe? We could theoretically offer an argument so the user can choose ? I'm open to anything, but fully defer the decision to you, and I will implement it. |
Ooh thanks for catching this. I think we should nest into DataFrames, which is important if you are nesting multiple columns. Let's not implement an option for alternatives for now. Just make sure that the unnesting works correctly if we nest into DataFrames. Once you do that, I'll review and merge. Exciting! |
Alright, so now, unnesting supports dataframes.
And I tested it against with the following
|
This looks amazing! I will review and merge soon. |
Great work thus far. Discovered one issue.
For example, in R, nesting multiple columns produces this: > df = tibble(a = rep(letters[1:5], each = 3), b = 1:15, c = 16:30)
> df |> nest(data = b:c)
# A tibble: 5 × 2
a data
<chr> <list>
1 a <tibble [3 × 2]>
2 b <tibble [3 × 2]>
3 c <tibble [3 × 2]>
4 d <tibble [3 × 2]>
5 e <tibble [3 × 2]> And in this PR, nesting multiple columns produces this: julia> df = DataFrame(a = repeat('a':'e', inner = 3), b = 1:15, c = 16:30)
julia> @chain df @nest(data = b:c)
15×2 DataFrame
Row │ a data
│ Char DataFrame
─────┼─────────────────────
1 │ a 1×2 DataFrame
2 │ a 1×2 DataFrame
3 │ a 1×2 DataFrame
4 │ b 1×2 DataFrame
5 │ b 1×2 DataFrame
6 │ b 1×2 DataFrame
7 │ c 1×2 DataFrame
8 │ c 1×2 DataFrame
9 │ c 1×2 DataFrame
10 │ d 1×2 DataFrame
11 │ d 1×2 DataFrame
12 │ d 1×2 DataFrame
13 │ e 1×2 DataFrame
14 │ e 1×2 DataFrame
15 │ e 1×2 DataFrame Any thoughts on how to fix? |
Oh wow. Great catch. It is almost as if it groups it based on the remaining columns and then nests. I think using edit: it works for 1 nest, now trying to sort out when its nesting multiple |
Hmm this gives me an idea. It might be possible to implement |
Actually, go ahead with modifying what you have now and see if you can get it working. There's one parsing functionality I'd need to add still to get this working with less code. So I'll revisit this if you can't get it working the way you have it. |
Alright,
I will note tho, that when trying to unnest multiple nested columns that i nest in this second example above, I am getting slightly different dimensions than with R. I suspect this might have to do with the slightly different behavior or unnest_wider (illustrated below - in Julia it won't add new rows, but in R it will)? Of note, when using only unnest_longer and unnest_wider in R, for the 6x5 df above, it does not return to a 6x5. It only does so if unnesting with Depending on what you think, I think i may have to go back and rework unnest_longer and unnest_wider given the example below. in R to go back to original df
in julia to go back to original df
|
Thanks for the update. I'll take a look and see if I can figure out why it's behaving differently. While I am eager to merge, I want to make sure things behave similarly across the implementations, especially for the use case where we nest and then unnest. |
I totally agree. please ignore the two commits below, and frankly most of my more recent comment above. they were my mind playing tricks on me.
unnesting multiple columns of nests back to the orignal dataframes is the last frontier I think. I still get multiple dimensions for that. Edit: I finally figured out where the bug is. the bug is not in either of the unnests, but in the
|
Sorry to have taken you on a journey of excess commits over the last week. I have a deep appreciation for un/nesting now. Last night, I realized that This is now fixed and it behaves the same as in R, So now the behavior for This returns to the original dataframe
just like in R
I checked the intermediate state after unnesting wider and they match each other as well. I think it is is finally ready from my standpoint. Again, sorry for the whirlwind of preemptive commits and thank you for helping me figure out some of the bugs. |
Awesome! Will look at this soon. Super excited to see this. |
This pull request got a little bigger than I initially anticipated, but the four added macros all support tidy selection and interpolation.
These two #34 support grouped dataframes (ungroup -> regroup)
indicies_include
, andkeep_empty
After the
unnests
, I thought I would trynest
. I was struggling with some syntax issues keeping@nest(df, by = , key = )
in the same macro with@nest(df, nested_col = cols)
method, so I ended up splitting them for the sake of simplicity.@nest_by
looks slightly different then the tidyr version in that theby
andkey
are not explicitly written, but supported.by
argument above, but similar (looks like maybe each group becomes its own df/array?).Before going further and writing brief documentation for the
nests
, I thought I would check in. Should I drop the nests from theis PR for now, while I try to sort out grouping and while I continue to try reducing them back into just 1 macro.I also added tidy selection to
@unite
.