-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
results of get_matches() are not sorted by similarity score for all the values #50
Comments
Could create a minimal example out of what you show here? So with values for |
Please find below a minimal example: test_tolist_1 = ["2 IN 1 LAVENDER & CAMOMILE", "3 IN 1 LAVENDER & CAMOMILE",
"3 IN 1 LAVENDER", "3 IN 1 LAVENDER & CHAMOMILE", "LAVENDER CAMOMILE"]
test_tolist_2 = ["2 IN 1 LAVENDER & CAMOMILE", "3 IN 1 LAVENDER & CAMOMILE",
"3 IN 1 LAVENDER", "LAVENDER CAMOMILE"]
test_fromlist = ["3 IN 1 LAVENDER & CAMOMILE"]
test_model = TFIDF(n_gram_range=(2,5), min_similarity=0, top_n = 5, model_id = "tfidf")
# test_model
PolyFuzz(test_model).fit_transform(test_fromlist, test_tolist_1)["TF-IDF"]
PolyFuzz(test_model).fit_transform(test_fromlist, test_tolist_2)["TF-IDF"] Output for test_tolist_1:
Output for test_tolist_2:
Problems:
Just to add to this: I have commented following line of code as I had asked in the previous issue: #48 PolyFuzz/polyfuzz/models/_tfidf.py Line 130 in b26638f
|
Did you install PolyFuzz through
The |
Hi,
I was running polyfuzz tfidf model to get the matches but few rows of the result was not sorted as per the top_n similarity score.
eg:
The text was updated successfully, but these errors were encountered: