Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap / Future #99

Open
valencik opened this issue Apr 21, 2023 · 0 comments
Open

Roadmap / Future #99

valencik opened this issue Apr 21, 2023 · 0 comments

Comments

@valencik
Copy link
Collaborator

What should Textmogrify be?

Currently:

  • it started from the need to use Lucene Analyzers in a cats-effect app
  • it lets one use Analyzers with Streams (unused by me so far)
  • provides a builder interface for customizing Analyzers
  • supports 8 languages

Protosearch

Is textmogrify suitable for protosearch?

Easily addressable needs:

  • tokenization
  • customizable collection type
  • offset tracking

Hard to meet needs:

Not using effects is at odds with leveraging Lucene.
And of course, Lucene is JVM only, so cross platform really means writing our own
pure Scala implementation, or finding equivalent implementations in JS and Native.
I think trying to leverage a different JS and Native lib is not worth the pain.

Pure Scala Implementation

If we go through the effort of writing a pure Scala implementation of a tokenizer,
what is the benefit of keeping the Lucene module around then?

  • Easy testing against Lucene in both performance and correctness.
  • Lucene has MANY more Analyzers to use
  • Likely still a great solution if you're on the JVM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant