Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make search queries more coherent #193

Open
6 tasks
alexgarel opened this issue Jun 26, 2024 · 2 comments
Open
6 tasks

Make search queries more coherent #193

alexgarel opened this issue Jun 26, 2024 · 2 comments
Assignees
Labels

Comments

@alexgarel
Copy link
Member

alexgarel commented Jun 26, 2024

We are trying to abstract out complexity of Elasticsearch through a configuration and Lucene query language.

Right now this abstraction is quite leaky. For example if you declare labels as a text_lang field, you should query, eg. labels.en:"my query" to reach it. While you also specify which language you want to query with. There are a number of other problems of this type.

Most of those problems can be solved by using Lucene to modify the request on the fly, based upon field deficinition

To be fixed

Preview Give feedback
@alexgarel
Copy link
Member Author

@raphael0202 this should be discussed to see what's best (next month)

@alexgarel
Copy link
Member Author

I started working on this, to better handle text and avoid having a big index because of synonyms.

Here is my plan:

  • I will use synonyms files (I prefer to synonyms sets, to keep compat with OpenSearch) generated from taxonomies and we will use a token filter on query analyzer to use synonyms on search
  • I will try to rewrite the query for text matches at lucene language level transforming the simple word and phrase to more complex expressions to match against the various fields with eventual boosting. An alternative being to rewrite the final ES query (if for example, it's simpler to use a multimatch)

I'm currently working on generating synonyms files and integrating them in query analyzer.

@alexgarel alexgarel self-assigned this Aug 21, 2024
@teolemon teolemon changed the title Make search queries more coherents Make search queries more coherent Aug 21, 2024
@alexgarel alexgarel moved this from Backlog (ready for dev) to In Progress in 🔎 Search-a-licious Aug 21, 2024
alexgarel added a commit that referenced this issue Oct 24, 2024
* Using synonyms capabilities of ES to avoid storing taxonomies fields
in the index
* Better handling of full text queries that support them within any
expression
* make boost_phrase a separate parameter
* Raise errors if the query is not well understood or do not pass some
sanity checks
* Use main translation of taxonomy for facets values (instead of a
random synonym)
* Better handling of global config to avoid treacherous patterns
* Unify parameters for Get and Post (better use of pydantic)
* Error on extraneous search parameters to avoid hard to debug issues
with typos
* Add a command to clean indexes
* Integrations tests on search and analyzers

Part of: #193

---------

Co-authored-by: Raphaël Bournhonesque <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

2 participants