Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort by a combination of fields #1802

Open
elektito opened this issue Mar 25, 2023 · 4 comments
Open

Sort by a combination of fields #1802

elektito opened this issue Mar 25, 2023 · 4 comments

Comments

@elektito
Copy link

I'm trying to implement custom sorting based on a combination of _score and two other numeric fields. I've looked at the implementation of SortField and the like but I'm still not quite sure how to proceed.

What I've gathered so far is that I would need to extract the values from the terms provided to UpdateVisitor, convert them to a number and store them, and then in the Value function combine them with score as I wish and then encode them back somehow.

I haven't been able to figure out how the encoding/decoding part works. Is there a resource or an example you can point me to?

@abhinavdangeti
Copy link
Member

abhinavdangeti commented Mar 27, 2023

Multiple sort keys come in handy when some of the document hits are missing or equal sort values for earlier keys.

Here is an example test case that could help -
https://github.com/blevesearch/bleve/blob/v2.3.7/examples_test.go#L443

@elektito
Copy link
Author

elektito commented Mar 28, 2023

This doesn't seem to be quite what I was looking for. As far as I can tell, it's sorting only based on age while what I was looking for is something like sorting by the multiplication product of two relevance score and a field in the document. FWIW, I finally came up with this, which seems to work:

func (so *RankedSort) UpdateVisitor(field string, term []byte) {
	switch field {
	case "pRank":
		if len(term) > len(so.pRank) {
			so.pRank = make([]byte, len(term))
			copy(so.pRank, term)
		}
	case "hRank":
		if len(term) > len(so.hRank) {
			so.hRank = make([]byte, len(term))
			copy(so.hRank, term)
		}
	}
}

func (so *RankedSort) Value(a *search.DocumentMatch) string {
	prp, _ := numeric.PrefixCoded(so.pRank).Int64()
	pr := math.Float64frombits(uint64(prp))

	hrp, _ := numeric.PrefixCoded(so.hRank).Int64()
	hr := math.Float64frombits(uint64(hrp))

	so.pRank = so.pRank[:0]
	so.hRank = so.hRank[:0]

	score := numeric.Float64ToInt64((a.Score + 1) * (pr + 1) * (hr + 1))

	return string(numeric.MustNewPrefixCodedInt64(score, 0))
}

(I'm pretty sure how I use the absolute value of score and multiply it by other values is technically incorrect, since the score value does not have any bounds that I'm aware of; would love any suggestions on that front too.)

@abhinavdangeti
Copy link
Member

Ah I see, for custom sorting - we've provided an API for the SearchRequest that you could leverage. I suppose you're registering your RankedSort using this -
https://github.com/blevesearch/bleve/blob/v2.3.7/search.go#L629

The internal score determination is based on the tf-idf algorithm. Boosting is the only way we allow users to adjust/influence this score generated for your document hits.

@elektito
Copy link
Author

I'm using SortByCustom. I didn't know about SetSortFunc. Looks like it's more concerned with the implementation of sort, than scoring, or did I get that wrong?

Are you saying that what I'm doing is borderline unsupported? Boost it is not gonna work for my use case, because I want the scores to come partially from relevance, and partially by the rank of each item. This is a small search engine, and pRank is actually PageRank; the score*rank multiplication was the best I could come up with to achieve what I wanted. It does seem to be working mostly. I just don't like that I'm multiplying by score, which I don't know the dimensions or bounds of, and indeed I think I read somewhere it's not even meant to be comparable between different searches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants