-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue with elasticsearch extension to respect limit and minReleva… #935
base: main
Are you sure you want to change the base?
Conversation
…nce parameters when calling GetSimilarListAsync API
@microsoft-github-policy-service agree |
@@ -218,15 +218,20 @@ public async Task<string> UpsertAsync( | |||
Embedding embedding = await this._embeddingGenerator.GenerateEmbeddingAsync(text, cancellationToken).ConfigureAwait(false); | |||
var coll = embedding.Data.ToArray(); | |||
|
|||
//adjust min score for cosine similarity | |||
//https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search | |||
float adjustedMinSimilarityScore = (float)(2 * minRelevance - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formula transforms from score to cosine similarity, however internally ES works with score.
I believe this should be the inverse, e.g. float adjustedMinSimilarityScore = (float)((minRelevance + 1) / 2)
By the way, the formula (2 * minRelevance - 1)
should be used in line 253, replacing hit.Score
with the actual cosine similarity. The value returned is incorrect and doesn't match the similarity returned by other memory connectors.
var resp = await this._client.SearchAsync<ElasticsearchMemoryRecord>(s => | ||
s.Index(index) | ||
s.Index(index).Size(limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qd.k(limit)
already sets a limit, isn't this redundant? should we remove the limit
param in qd.k(limit)
?
maxRelevance = partitions.Max(p => p.Relevance); | ||
minRelevance = partitions.Min(p => p.Relevance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2 vars are not used. Missing some assertions?
//relevance 0.82: | ||
results = await this.KernelMemory.SearchAsync("What is Silicon?", index: indexName, null, null, minRelevance: 0.82, limit: 10); | ||
partitions = results.Results[0].Partitions; | ||
maxRelevance = partitions.Max(p => p.Relevance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The var is not used. Missing an assertion?
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/TestsHelper.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
…nal/KernelMemoryTests.cs
…nal/KernelMemoryTests.cs
…nal/TestsHelper.cs
…nce parameters when calling GetSimilarListAsync API
Motivation and Context (Why the change? What's the scenario?)
Fixing issue #934
High level description (Approach, Design)
using ElasticSearch Client .Size(limit) clause during query.
using .Similarity(min score) to fix minRelevance (after adjusting the score value based on cosine similarity as per https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search)