-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clib.conversion._to_numpy: Add tests for pandas.Series with pyarrow numeric dtypes #3585
Conversation
176f511
to
eceff7f
Compare
2a2ab7a
to
69b44ad
Compare
6b77f42
to
7222db2
Compare
69b44ad
to
63df796
Compare
…ray with pyarrow numeric dtypes
63df796
to
a9b10d6
Compare
# PyArrow dtypes can be specified using the following formats: | ||
# | ||
# - Prefixed with the name of the dtype and "[pyarrow]" (e.g., "int8[pyarrow]") | ||
# - Specified using ``ArrowDType`` (e.g., "pd.ArrowDtype(pa.int8())") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use pd.ArrowDtype(pa.int8())
here because pa
is not defined when pyarrow
is not installed. So we have to use the string aliases.
pygmt/tests/test_clib_to_numpy.py
Outdated
pytest.param("float64[pyarrow]", np.float64, id="float64[pyarrow]"), | ||
], | ||
) | ||
def test_to_numpy_pandas_series_pyarrow_dtypes_numeric(dtype, expected_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is exactly the same as the test test_to_numpy_pandas_series_numpy_dtypes_numeric
above, but I don't think we should merge them into a single test.
To merge them into a single test, we have to change the pytest.param to
pytest.param("float64[pyarrow]", np.float64, id="float64[pyarrow]", marks=skip_if_no(package="pyarrow")),
which is too long to fit in one line and will make the pytest.params too long to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managed to fit it into a single test like so:
from pygmt.helpers.testing import skip_if_no
pa_marks = {"marks": skip_if_no(package="pyarrow")}
@pytest.mark.parametrize(
("dtype", "expected_dtype"),
[
...,
pytest.param("int8[pyarrow]", np.int8, id="int8[pyarrow]", **pa_marks),
pytest.param("int16[pyarrow]", np.int16, id="int16[pyarrow]", **pa_marks),
pytest.param("int32[pyarrow]", np.int32, id="int32[pyarrow]", **pa_marks),
pytest.param("int64[pyarrow]", np.int64, id="int64[pyarrow]", **pa_marks),
pytest.param("uint8[pyarrow]", np.uint8, id="uint8[pyarrow]", **pa_marks),
pytest.param("uint16[pyarrow]", np.uint16, id="uint16[pyarrow]", **pa_marks),
pytest.param("uint32[pyarrow]", np.uint32, id="uint32[pyarrow]", **pa_marks),
pytest.param("uint64[pyarrow]", np.uint64, id="uint64[pyarrow]", **pa_marks),
pytest.param("float16[pyarrow]", np.float16, id="float16[pyarrow]", **pa_marks),
pytest.param("float32[pyarrow]", np.float32, id="float32[pyarrow]", **pa_marks),
pytest.param("float64[pyarrow]", np.float64, id="float64[pyarrow]", **pa_marks),
],
)
The longest line is just under 88 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes. We probably should merge this PR into #3584, so that we can test all pandas dtypes (including pyarrow-backed) in a single PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I see that this branch is targetting to_numpy/pandas_numeric
/#3584, so you can merge this in then.
pygmt/tests/test_clib_to_numpy.py
Outdated
# In PyArrow, array types can be specified in two ways: | ||
# | ||
# - Using string aliases (e.g., "int8") | ||
# - Using pyarrow.DataType (e.g., ``pa.int8()``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we can't use pa.int8()
here because pa
is undefined if pyarrow
is not installed.
70341af
to
e09aa75
Compare
This reverts commit 50e6872.
This PR adds tests for pandas.Series with pyarrow numeric dtypes. Available pyarrow numeric dtypes are at https://arrow.apache.org/docs/python/api/datatypes.html.
Similar to the issue reported in #3584, there are also dtype conversion behavior changes in pandas 2.2.
With pandas>=2.2, everything works as expected:
With pandas 2.1, missing values needs to convert to numpy float dtype explicitly.