-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Similarity scores #33
Comments
Along these lines: showing separate checkboxes (or tags?) for the separate criteria that each account meets would be helpful, too. So, for example, instead of just showing one of the following, show each of them simultaneously:
|
Maybe also as a side-feature, allow prioritization or weighting of matching criteria -- e.g., identical handle names can be set as "Strong similarity" and "Handle name in description" set as "Minor similarity". The rendering might get crowded, but you could then display top-N matches below each profile on the "Following" page, rather than just having a single Bluesky account below the Twitter account. |
While this sounds like it could be interesting and useful, I worry that the added complexity that it would bring would be prohibitive to getting it to actually work... |
If you wanted to do fuzzy matching between display/handle names with similarity scoring, Levenshtein edit distance would be useful. There are implementations in most languages. For Typescript: levenshtein-edit-distance (I'm not a Typescript programmer, so sadly I can't help with this) |
I believe the intervention to support similarity scores, and Levenshtein edit distance specifically, should be made here:
Replace the tests fr equality with, for example:
More from the usage instructions:
[Disclaimer: I'm not a Typescript programmer!] |
This could be useful for comparing avatars: pixelmatch ("The smallest, simplest and fastest JavaScript pixel-level image comparison library"). I'm guessing that differences in image compression between Twitter and Bluesky on upload might introduce image artifacts that result in differences that otherwise would not occur, therefore it might be useful to greyscale and/or reduce the resolution of both images to the same for comparison. This would be a nice feature to have, but honestly this should be decoupled from the text matching part of this issue, which ought to be a lot easier to implement. |
That should go in #13 |
Here's a rudimentary PoC that I've tested and works; this is the first time I've coded in this language, so don't shoot me if I got something wrong! It found more of my friends, but also there were more false positives, as expected. It returns a match, but not the BEST match; some kind of loop to minimize and find the match with the lowest
|
I don't know this programming language well enough to implement this myself, but I'm guessing the logic should be something like this: in
I think this should find the bsky user with the smallest edit distance (=best match) for each twitter following. |
Instead of only using strict equalities, percentage similarities would be more inclusive in potential matches.
The text was updated successfully, but these errors were encountered: