Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'json' query does not return expected results from 'application/json' #365

Closed
lewisnyman opened this issue Jul 31, 2018 · 5 comments
Closed

Comments

@lewisnyman
Copy link

Hi 👋 I'm a bit stumped by the behaviour I'm seeing in this implementation. For some reason the term json doesn't return pages with API documentation in it, but application/json does. Is it related to the tokeniser?

I've created a jsfiddle here with the same content and the default pipeline functions turned off: https://jsfiddle.net/f3sbrtq2/25/

@hoelzro
Copy link
Contributor

hoelzro commented Jul 31, 2018

@lewisnyman Hi! Your suspicions about the tokenizer are spot on; it only splits tokens on - characters and whitespace by default. lunr uses full tokens to consult its inverted index - it doesn't find all documents containing application/json under the set of documents with the standalone json token, hence the "missing" results.

Another thing I noticed about your example; you remove everything from the builder pipeline, but you leave the search pipeline alone - you might want to this.searchPipeline.remove(lunr.stemmer) as well.

Out of curiosity, what are you trying to use lunr to accomplish? By removing all of the processing functions from the pipeline you're removing a lot of the value of what lunr provides! Or was that just for the purposes of your example?

@olivernn
Copy link
Owner

What @hoelzro says is spot on. You can customise what is considered a separator by overriding the lunr.tokenizer.separator property. If you want even more control (at the cost of more work) you can implement a custom tokeniser and use it when building an index:

lunr(function () {
  this.tokenizer = myCustomTokenizer
})

@lewisnyman lewisnyman changed the title 'json' query does not return expected results 'json' query does not return expected results from 'application/json' Aug 1, 2018
@lewisnyman
Copy link
Author

lewisnyman commented Aug 1, 2018

Thanks for the advice. I've update the codepen now with the fix: https://jsfiddle.net/f3sbrtq2/37/
this.tokenizer.separator = /[\s\-/]+/;
Note: I'm running an older version of Lunr (0.7.0) so my line is: this. tokenizerFn.seperator = /[\s\-/]+/;

@hoelzro I was removing all of the default processing just to prove that nothing unexpected was affecting the query string. In my real world use case I've only replaced the stop word filter. Thanks for the searchPipeline tip, I completely missed that I need to remove it twice in the docs and examples.

@olivernn
Copy link
Owner

olivernn commented Aug 1, 2018

Note: I'm running an older version of Lunr (0.7.0)

What is blocking you from upgrading to Lunr 2.x?

@lewisnyman
Copy link
Author

We're using middleman-search which helpfully prebuilds the index but is not actively maintained manastech/middleman-search#29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants