Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from polyfuzz import PolyFuzz #36

Open
wdchild opened this issue May 19, 2022 · 2 comments
Open

from polyfuzz import PolyFuzz #36

wdchild opened this issue May 19, 2022 · 2 comments

Comments

@wdchild
Copy link

wdchild commented May 19, 2022

Although I was able to use PolyFuzz once for some of your basic example code, once I tried messing around with Embeddings or Bert, the entire package broke. It seems to have to do with differing numpy version compatibilities. Currently, if I do a basic

pip install polyfuzz
followed by

from polyfuzz import PolyFuzz
I get the following error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [63], in <cell line: 1>()
----> 1 from polyfuzz import PolyFuzz

File /opt/conda/envs/vespid/lib/python3.9/site-packages/polyfuzz/__init__.py:1, in <module>
----> 1 from .polyfuzz import PolyFuzz
      2 __version__ = "0.3.2"

File /opt/conda/envs/vespid/lib/python3.9/site-packages/polyfuzz/polyfuzz.py:7, in <module>
      5 from polyfuzz.linkage import single_linkage
      6 from polyfuzz.utils import check_matches, check_grouped, create_logger
----> 7 from polyfuzz.models import TFIDF, RapidFuzz, Embeddings, BaseMatcher
      8 from polyfuzz.metrics import precision_recall_curve, visualize_precision_recall
     10 logger = create_logger()

File /opt/conda/envs/vespid/lib/python3.9/site-packages/polyfuzz/models/__init__.py:4, in <module>
      2 from ._distance import EditDistance
      3 from ._rapidfuzz import RapidFuzz
----> 4 from ._tfidf import TFIDF
      5 from ._utils import cosine_similarity
      7 from polyfuzz.error import NotInstalled

File /opt/conda/envs/vespid/lib/python3.9/site-packages/polyfuzz/models/_tfidf.py:7, in <module>
      4 from typing import List, Tuple
      5 from sklearn.feature_extraction.text import TfidfVectorizer
----> 7 from ._utils import cosine_similarity
      8 from ._base import BaseMatcher
     11 class TFIDF(BaseMatcher):

File /opt/conda/envs/vespid/lib/python3.9/site-packages/polyfuzz/models/_utils.py:9, in <module>
      6 from sklearn.metrics.pairwise import cosine_similarity as scikit_cosine_similarity
      8 try:
----> 9     from sparse_dot_topn import awesome_cossim_topn
     10     _HAVE_SPARSE_DOT = True
     11 except ImportError:

File /opt/conda/envs/vespid/lib/python3.9/site-packages/sparse_dot_topn/__init__.py:5, in <module>
      2 import sys
      4 if sys.version_info[0] >= 3:
----> 5     from sparse_dot_topn.awesome_cossim_topn import awesome_cossim_topn
      6 else:
      7     from awesome_cossim_topn import awesome_cossim_topn

File /opt/conda/envs/vespid/lib/python3.9/site-packages/sparse_dot_topn/awesome_cossim_topn.py:7, in <module>
      4 from scipy.sparse import isspmatrix_csr
      6 if sys.version_info[0] >= 3:
----> 7     from sparse_dot_topn import sparse_dot_topn as ct
      8     from sparse_dot_topn import sparse_dot_topn_threaded as ct_thread
      9 else:

File /opt/conda/envs/vespid/lib/python3.9/site-packages/sparse_dot_topn/sparse_dot_topn.pyx:1, in init sparse_dot_topn.sparse_dot_topn()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Following some StackOverflow posts, I tried installing differing versions of numpy, but in the end, something is always unhappy, and somehow I can no longer use PolyFuzz no matter what I do. It would be great if it would work with the latest version of numpy, or if at least one version definitely worked reliably! Thanks for looking into this.

@wdchild
Copy link
Author

wdchild commented May 19, 2022

I eventually got this working by reinstalling hdbscan! Very strange.

@MaartenGr
Copy link
Owner

I eventually got this working by reinstalling hdbscan! Very strange.

Glad to hear that it worked out! This used to be an issue with versions <0.28.0 of HDBSCAN as it did not use oldest-supported-numpy before to match ABI. Making sure you have the newest version of HDBSCAN, also in future instances, will prevent this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants