-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Series.to_dummies raises for PyArrow when nulls are present #1037
Comments
I am taking a look at this. Currently also pandas-like fails to add a column for none, yet without raising an error. Yet hardest part is mapping column names and ordering of such 😂 |
😄 thanks for taking a look 🤔 i do think the Polars default output looks a bit odd here, and where I spotted this was in Formulaic, and they wouldn't include the "null" column in the output anyway. wondering if we should add a required |
Agree 👌 I opened the PR, but pandas behavior is so inconsistent and hard to remap. Tests are failing because for int values, adding the Do you think that this should be reported upstream? |
Upstream to Polars? Yes I think so that might be good |
I meant the pandas behavior that converts int to float for import pandas as pd
data = [1, 2, 1]
pd.get_dummies(pd.Series([1,2,1]), dummy_na=True)
while: pd.get_dummies(pd.Series([1,2,1]))
What I mean is that by adding the extra argument, the name of the first two columns will change as well even though no nulls/nans are present. This does not happen with nullable types. I am not sure how we should go about it honestly. |
ah i see - well pandas allows for column names to be floats, so I don't think pandas would consider this a bug nor something that needs changing the string "null" in Polars on the other hand...maybe that should at least be configurable, with a |
Edited the snippet above to better explain what I mean for pandas |
thanks, but i still think the response would be "this is just what pandas does" (😭 ) |
Alright, then let's discuss how we should behave 😂 |
i discussed this a bit with polars devs, they're open to having something like I think the Polars API should be revised further, because currently, if you get
then you have no way to know if it came from
or
I'm more inclined still to just follow the pandas default here (!), and loudly document that we slightly differ from Polars here (my prediction is that the Polars API will have to change for this function anyway) |
Thanks @MarcoGorelli , shall we wait for them to decide on the polars API? IIRC this was needed for formulaic, so you can set the pace for how soon we need this one out. I added some commits because I figured out of to workaround pandas issue 😂 |
Describe the bug
Calling
to_dummies
on a Series with nulls present raises an errorSteps or code to reproduce the bug
Expected results
Actual results
Please run narwhals.show_version() and enter the output below.
Relevant log output
No response
The text was updated successfully, but these errors were encountered: