Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clib.conversion._to_numpy: Add tests for pandas.Series with pyarrow date32/date64 types #3610

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

seisman
Copy link
Member

@seisman seisman commented Nov 10, 2024

Description of proposed changes

Add tests for panda.Series with pyarrow date32/date64 dtypes.

In PR #2845, we added the support of date32[day][pyarrow] and date64[ms][pyarrow] and mapped them to np.datetime64. This PR improves it and maps them to "datetime64[D]" and datetim64[ms] respectively.

After approval, I plan to cherry-pick changes in e13b5d3 into a separate PR so that we can have a changelog entry.

@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Nov 10, 2024
@seisman seisman added this to the 0.14.0 milestone Nov 10, 2024
@weiji14 weiji14 changed the title clib.conversion._to_numpy: Add tests for panda.Series with pyarrow date32/date64 types clib.conversion._to_numpy: Add tests for pandas.Series with pyarrow date32/date64 types Nov 12, 2024
@michaelgrund michaelgrund added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Nov 12, 2024
@seisman
Copy link
Member Author

seisman commented Nov 14, 2024

Ping @weiji14 for final reviews on the date32/date64 behavior changes.

@weiji14
Copy link
Member

weiji14 commented Nov 14, 2024

Maybe update the test here too to check explicitly for "datetime64[D]"/"datetim64[ms]" instead of just np.datetime64? Or does it not matter I guess?

vectors = [
pd.Series(
data=[datetime.date(2020, 1, 1), datetime.date(2021, 12, 31)],
dtype="date32[day][pyarrow]",
),
pd.Series(
data=[datetime.date(2022, 1, 1), datetime.date(2023, 12, 31)],
dtype="date64[ms][pyarrow]",
),
]
arrays = vectors_to_arrays(vectors)
assert all(i.dtype.type == np.datetime64 for i in arrays)

@seisman
Copy link
Member Author

seisman commented Nov 14, 2024

Actually I feel the test test_vectors_to_arrays_pyarrow_datetime can be removed, since it's already covered by the newly added tests in this PR.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I feel the test test_vectors_to_arrays_pyarrow_datetime can be removed, since it's already covered by the newly added tests in this PR.

On second thought, maybe keep the test. we need to test that vectors_to_arrays can handle mixed dtypes (and that test checks both date32/date64) from a pandas.DataFrame or nested list, whereas _to_numpy is for single dtypes on a pandas.Series or single list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final review call This PR requires final review and approval from a second reviewer maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants