You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the code below (with output in attached picture) I perform a simple TFIDF matching of ["apple", "apples", "appl", "recal", "happy"].
The initial min_similarity is set to 0.2. The similarity of happy and appl is 0.24.
When grouping with a link_min_similarity of 0.5, happy should not belong in the apples group, though that's what happens in the output of .get_matches(), it is in the apples group.
I am not entirely sure but there seems to be an issue with the group_all_strings parameter combined with link_min_similarity. What most likely is happening is that (appl, apple) gets into the cluster apples and (happy, appl) gets into the same cluster because it shared appl. I'll have to dig a little deeper to figure this stuff out but I'll make sure it gets released in the next version!
In the code below (with output in attached picture) I perform a simple TFIDF matching of
["apple", "apples", "appl", "recal", "happy"]
.The initial
min_similarity
is set to 0.2. The similarity ofhappy
andappl
is 0.24.When grouping with a
link_min_similarity
of 0.5,happy
should not belong in theapples
group, though that's what happens in the output of.get_matches()
, it is in theapples
group.It appears it is not in the cluster though.
Plain text code:
The text was updated successfully, but these errors were encountered: