Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.NA treated differently in filter_array_like with newest pandas version #504

Open
DamianBarabonkovQC opened this issue Jan 10, 2022 · 2 comments · May be fixed by #505
Open

pd.NA treated differently in filter_array_like with newest pandas version #504

DamianBarabonkovQC opened this issue Jan 10, 2022 · 2 comments · May be fixed by #505

Comments

@DamianBarabonkovQC
Copy link

DamianBarabonkovQC commented Jan 10, 2022

Problem description

In an older version of pandas (before pandas commit pandas-dev/pandas@b2d54d9), when filter_array_like saw a pd.NA in the context of a pandas BooleanArray, it treated it as a False. In newer versions (after pandas-dev/pandas@b2d54d9), the pd.NA is treated as pd.NA, which when casting to a numpy array causes an error.

This relates to the pandas issue: pandas-dev/pandas#45249 which is actually a new behavioral change and not a BUG. The old functionality of treating pd.NA as False was a bug actually.

Example code (ideally copy-pastable)

Please provide a minimal reproducible code example to reproduce the behavior,

import pandas as pd
from kartothek.serialization import filter_array_like

boolean_array = pd.array([True, False, None], dtype="boolean")
# BooleanArray>
# [True, False, <NA>]
# Length: 3, dtype: boolean

ret = filter_array_like(
    boolean_array,
    "==",
    False,
)

print(boolean_array, ret)
# Newer pandas: ValueError: cannot convert to 'bool'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
# Older pandas: <BooleanArray>
#                          [True, False, <NA>]
#                          Length: 3, dtype: boolean [False  True  True]

Used versions

# packages in environment at /opt/miniconda3/envs/nightly:
#
# Name                    Version                   Build  Channel
abseil-cpp                20210324.2           he49afe7_0    conda-forge
alabaster                 0.7.12                     py_0    conda-forge
altair                    4.2.0              pyhd8ed1ab_1    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
appnope                   0.1.2                    pypi_0    pypi
argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0                   pypi_0    pypi
arrow-cpp                 6.0.1           py310h71bd60a_7_cpu    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     21.4.0             pyhd8ed1ab_0    conda-forge
aws-c-cal                 0.5.11               hd2e2f4b_0    conda-forge
aws-c-common              0.6.2                h0d85af4_0    conda-forge
aws-c-event-stream        0.2.7               hb9330a7_13    conda-forge
aws-c-io                  0.10.5               h35aa462_0    conda-forge
aws-checksums             0.1.11               h0010a65_7    conda-forge
aws-sdk-cpp               1.8.186              h766a74d_3    conda-forge
babel                     2.9.1              pyh44b312d_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
bleach                    4.1.0              pyhd8ed1ab_0    conda-forge
bokeh                     2.4.2                    pypi_0    pypi
brotlipy                  0.7.0                    pypi_0    pypi
bzip2                     1.0.8                h0d85af4_4    conda-forge
c-ares                    1.18.1               h0d85af4_0    conda-forge
ca-certificates           2021.10.8            h033912b_0    conda-forge
certifi                   2021.10.8                pypi_0    pypi
cffi                      1.15.0                   pypi_0    pypi
cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
charset-normalizer        2.0.9              pyhd8ed1ab_0    conda-forge
click                     8.0.3                    pypi_0    pypi
cloudpickle               2.0.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
coverage                  6.2                      pypi_0    pypi
cryptography              36.0.1                   pypi_0    pypi
cython                    0.29.26                  pypi_0    pypi
cytoolz                   0.11.2                   pypi_0    pypi
dask                      2021.12.0          pyhd8ed1ab_0    conda-forge
dask-core                 2021.12.0          pyhd8ed1ab_0    conda-forge
debugpy                   1.5.1                    pypi_0    pypi
decorator                 5.1.0              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distlib                   0.3.4              pyhd8ed1ab_0    conda-forge
distributed               2021.12.0                pypi_0    pypi
docutils                  0.17.1                   pypi_0    pypi
editdistance-s            1.0.0                    pypi_0    pypi
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
filelock                  3.4.2              pyhd8ed1ab_0    conda-forge
flit-core                 3.6.0              pyhd8ed1ab_0    conda-forge
freetype                  2.10.4               h4cff582_1    conda-forge
freezegun                 1.1.0              pyhd8ed1ab_0    conda-forge
fsspec                    2021.11.1          pyhd8ed1ab_0    conda-forge
gflags                    2.2.2             hb1e8313_1004    conda-forge
glog                      0.5.0                h25b26a9_0    conda-forge
great-expectations        0.13.49            pyha770c72_0    conda-forge
grpc-cpp                  1.42.0               h6da9ac5_1    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
identify                  2.3.7              pyhd8ed1ab_0    conda-forge
idna                      3.1                pyhd3deb0d_0    conda-forge
imagesize                 1.3.0              pyhd8ed1ab_0    conda-forge
importlib-metadata        4.10.0                   pypi_0    pypi
importlib_resources       5.4.0              pyhd8ed1ab_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
ipykernel                 6.6.1                    pypi_0    pypi
ipython                   7.30.1                   pypi_0    pypi
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.5              pyhd8ed1ab_0    conda-forge
jbig                      2.1               h0d85af4_2003    conda-forge
jedi                      0.18.1                   pypi_0    pypi
jinja2                    3.0.3              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   hbcb3906_0    conda-forge
jsonpatch                 1.32               pyhd8ed1ab_0    conda-forge
jsonpointer               2.0                        py_0    conda-forge
jsonschema                4.3.3              pyhd8ed1ab_0    conda-forge
jupyter-core              4.9.1                    pypi_0    pypi
jupyter_client            7.1.0              pyhd8ed1ab_0    conda-forge
jupyter_core              4.9.1           py310h2ec42d9_1    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_widgets        1.0.2              pyhd8ed1ab_0    conda-forge
jupytext                  1.13.5             pyheef035f_0    conda-forge
kartothek                 4.0.3              pyhd8ed1ab_1    conda-forge
krb5                      1.19.2               hcfbf3a7_3    conda-forge
lcms2                     2.12                 h577c468_0    conda-forge
lerc                      3.0                  he49afe7_0    conda-forge
libblas                   3.9.0           12_osx64_openblas    conda-forge
libbrotlicommon           1.0.9                h0d85af4_6    conda-forge
libbrotlidec              1.0.9                h0d85af4_6    conda-forge
libbrotlienc              1.0.9                h0d85af4_6    conda-forge
libcblas                  3.9.0           12_osx64_openblas    conda-forge
libcurl                   7.80.0               hf45b732_1    conda-forge
libcxx                    12.0.1               habf9029_1    conda-forge
libdeflate                1.8                  h0d85af4_0    conda-forge
libedit                   3.1.20191231         h0678c8f_2    conda-forge
libev                     4.33                 haf1e3a3_1    conda-forge
libevent                  2.1.10               h815e4d9_4    conda-forge
libffi                    3.4.2                h0d85af4_5    conda-forge
libgfortran               5.0.0           9_3_0_h6c81a4c_23    conda-forge
libgfortran5              9.3.0               h6c81a4c_23    conda-forge
liblapack                 3.9.0           12_osx64_openblas    conda-forge
libnghttp2                1.43.0               h6f36284_1    conda-forge
libopenblas               0.3.18          openmp_h3351f45_0    conda-forge
libpng                    1.6.37               h7cec526_2    conda-forge
libprotobuf               3.19.1               hcf210ce_0    conda-forge
libsodium                 1.0.18               hbcb3906_1    conda-forge
libssh2                   1.10.0               h52ee1ee_2    conda-forge
libthrift                 0.15.0               hab56fdc_1    conda-forge
libtiff                   4.3.0                hd146c10_2    conda-forge
libutf8proc               2.7.0                h0d85af4_0    conda-forge
libwebp-base              1.2.1                h0d85af4_0    conda-forge
libzlib                   1.2.11            h9173be1_1013    conda-forge
llvm-openmp               12.0.1               hda6cdc1_1    conda-forge
locket                    0.2.0                      py_2    conda-forge
lz4-c                     1.9.3                he49afe7_1    conda-forge
make                      4.3                  h22f3db7_1    conda-forge
markdown-it-py            1.1.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.0.1                    pypi_0    pypi
matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
mdit-py-plugins           0.3.0              pyhd8ed1ab_0    conda-forge
milksnake                 0.1.5                      py_0    conda-forge
minimalkv                 1.3.1              pyhd8ed1ab_1    conda-forge
mistune                   0.8.4                    pypi_0    pypi
more-itertools            8.12.0             pyhd8ed1ab_0    conda-forge
msgpack                   1.0.3                    pypi_0    pypi
msgpack-python            1.0.3           py310h2fea185_0    conda-forge
nbclient                  0.5.9              pyhd8ed1ab_0    conda-forge
nbconvert                 6.4.0                    pypi_0    pypi
nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
ncurses                   6.2                  h2e338ed_4    conda-forge
nest-asyncio              1.5.4              pyhd8ed1ab_0    conda-forge
nodeenv                   1.6.0              pyhd8ed1ab_0    conda-forge
notebook                  6.4.6              pyha770c72_0    conda-forge
numpy                     1.22.0                   pypi_0    pypi
numpydoc                  1.1.0                      py_1    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjpeg                  2.4.0                h6e7aa92_1    conda-forge
openssl                   1.1.1l               h0d85af4_0    conda-forge
orc                       1.7.2                h84518c8_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.5.0.dev0+11.g8c21dce69d           dev_0    <develop>
pandoc                    2.16.2               h0d85af4_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parquet-cpp               1.5.1                         1    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.2.0              pyhd8ed1ab_0    conda-forge
pbr                       5.8.0              pyhd8ed1ab_1    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.4.0                    pypi_0    pypi
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0                    pypi_0    pypi
pre-commit                2.16.0                   pypi_0    pypi
prometheus_client         0.12.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.24             pyha770c72_0    conda-forge
psutil                    5.9.0                    pypi_0    pypi
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
py                        1.11.0             pyh6c4a22f_0    conda-forge
pyarrow                   6.0.1                    pypi_0    pypi
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pygments                  2.11.1             pyhd8ed1ab_0    conda-forge
pyopenssl                 21.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyhd8ed1ab_1    conda-forge
pyrsistent                0.18.0                   pypi_0    pypi
pysocks                   1.7.1                    pypi_0    pypi
pytest                    6.2.5                    pypi_0    pypi
pytest-cov                3.0.0              pyhd8ed1ab_0    conda-forge
pytest-mock               3.6.1              pyhd8ed1ab_0    conda-forge
python                    3.10.1          h1248fe1_2_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-slugify            5.0.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2021.5             pyhd8ed1ab_0    conda-forge
python-xxhash             2.0.2           py310he24745e_1    conda-forge
python_abi                3.10                    2_cp310    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pytz-deprecation-shim     0.1.0.post0              pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
pyzmq                     22.3.0                   pypi_0    pypi
quantcore-thek            1.5.0.post6+gb4f8386.d20220106           dev_0    <develop>
re2                       2021.11.01           he49afe7_0    conda-forge
readline                  8.1                  h05e3726_0    conda-forge
requests                  2.26.0             pyhd8ed1ab_1    conda-forge
ruamel-yaml               0.17.19                  pypi_0    pypi
ruamel-yaml-clib          0.2.6                    pypi_0    pypi
ruamel.yaml               0.17.19         py310he24745e_0    conda-forge
ruamel.yaml.clib          0.2.6           py310he24745e_0    conda-forge
scipy                     1.7.3                    pypi_0    pypi
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                60.2.0                   pypi_0    pypi
simplejson                3.17.6                   pypi_0    pypi
simplekv                  0.14.1             pyh9f0ad1d_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.8                hb1e8313_3    conda-forge
snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
sphinx                    4.3.2              pyh6c4a22f_0    conda-forge
sphinx_rtd_theme          1.0.0              pyhd8ed1ab_0    conda-forge
sphinxcontrib-apidoc      0.3.0                      py_1    conda-forge
sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
sphinxcontrib-htmlhelp    2.0.0              pyhd8ed1ab_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_1    conda-forge
sqlite                    3.37.0               h23a322b_0    conda-forge
storefact                 0.10.0                     py_0    conda-forge
tabulate                  0.8.9              pyhd8ed1ab_0    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
termcolor                 1.1.0                      py_2    conda-forge
terminado                 0.12.1                   pypi_0    pypi
testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
text-unidecode            1.3                        py_0    conda-forge
tk                        8.6.11               h5dbffcc_1    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.0              pyhd8ed1ab_1    conda-forge
toolz                     0.11.2             pyhd8ed1ab_0    conda-forge
tornado                   6.1                      pypi_0    pypi
tqdm                      4.62.3             pyhd8ed1ab_0    conda-forge
traitlets                 5.1.1              pyhd8ed1ab_0    conda-forge
typing_extensions         4.0.1              pyha770c72_0    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
tzlocal                   4.1                      pypi_0    pypi
unidecode                 1.3.2              pyhd8ed1ab_0    conda-forge
uritools                  4.0.0              pyhd8ed1ab_0    conda-forge
urllib3                   1.26.7             pyhd8ed1ab_0    conda-forge
urlquote                  1.1.4                    pypi_0    pypi
virtualenv                20.4.7                   pypi_0    pypi
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
widgetsnbextension        3.5.2                    pypi_0    pypi
xxhash                    2.0.2                    pypi_0    pypi
xz                        5.2.5                haf1e3a3_1    conda-forge
yaml                      0.2.5                h0d85af4_2    conda-forge
zeromq                    4.3.4                he49afe7_1    conda-forge
zict                      2.0.0                      py_0    conda-forge
zipp                      3.6.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h9173be1_1013    conda-forge
zstandard                 0.16.0                   pypi_0    pypi
zstd                      1.5.1                h582d3a0_0    conda-forge


@xhochy
Copy link
Contributor

xhochy commented Jan 10, 2022

Is there anything that needs to be adressed regarding this in kartothek?

@DamianBarabonkovQC
Copy link
Author

I have a hacky patch in filter_array_like that looks like:

    with np.errstate(invalid="ignore"):
        if op == "==":
            if pd.isnull(value):
                np.logical_and(pd.isnull(array_like), mask, out=out)
            else:
                res_eq = array_like == value
                np.logical_and(res_eq.fillna(False), mask, out=out)

basically filling in any NA with False during the comparison before giving it up to np.logical_and

@DamianBarabonkovQC DamianBarabonkovQC linked a pull request Jan 18, 2022 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants