Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type search #641

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Improve type search #641

wants to merge 2 commits into from

Conversation

wbazant
Copy link
Collaborator

@wbazant wbazant commented Dec 14, 2024

Closes #635

Different resolution:

  1. show the synonyms when selecting

As Ethan points out, the synonyms can help find the match, but just using them in the background would produce a "matching but not sure why" experience. Meanwhile, they're quite interesting - a bit of trivia about plants of the form: okra is also known as Ladies' fingers - and might help also when browsing.

  1. don't reorder search results

I proposed that solution originally, and from searching it seems to be borderline possible, but the library really didn't want me to do it, and it probably has drawbacks. Instead, match from the start only, and use common name + scientific name + all synonyms as possible starts.

@ezwelty
Copy link
Collaborator

ezwelty commented Jan 26, 2025

@wbazant It's immediately fun to see the synonyms displayed. A reminder of all this data we have but aren't yet using. So thanks for bringing them to the fore!

The only request for change in this PR is fixing the design to handle many synonyms. Can we allow rows to expand?

Screenshot 2025-01-26 at 13 51 36 Screenshot 2025-01-26 at 13 51 12

I'm willing to accept that prefix matching will lead to the best result for most users/searches, although it may fail in some cases.

  • Search "mulberry" and expect to be able to choose from a list like "black mulberry", "red mulberry", "white mulberry" rather than just get "mulberry"
  • Search by cultivar name "Reinette ..." and find nothing. This could be solved by parsing cultivar names from scientific names (or perhaps return a list of cultivars from the API) and add them to the search bucket?
  • Search for anything whose common name starts with "common ....". This could be solved by always including the second part as a synonym (e.g. "common yarrow", "yarrow").

During testing, I realized that pending types cannot be distinguished, which could lead to confusion. It isn't so serious, because pending types will get merged, but it might be worth considering flagging them somehow. We decided to include them so that a user can add a new type and then use it for subsequent locations without having to wait that the type is approved (which could take months).

Screenshot 2025-01-26 at 13 51 55

I also realized that matching fails because synonyms cannot realistically capture all permutations of no-space, space, and hyphenated versions (e.g. little-leaf linden, littleleaf linden, little leaf linden, small-leaved linden, ... lime, ...), which is a challenge in English, French, Portuguese (especially), and probably many others. Would it be crazy or helpful to ignore space and dash for matching?

Finally, while we're on the topic, there is the option of diacritic-insensitive matching for languages that use the latin script as a base. I have this function in Javascript for the purpose:
https://github.com/ezwelty/opentrees-harvester/blob/57110ccd51e5078665639ea593f799a7d59f9889/lib/helpers.js#L659

@ezwelty
Copy link
Collaborator

ezwelty commented Jan 26, 2025

One more little idea, maybe interpunct instead of comma-separated for legibility and consistency with other lists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Type search in location form: prioritise prefix matches
2 participants