Skip to content

Commit

Permalink
community: fix duplicate content (#28003)
Browse files Browse the repository at this point in the history
Thank you for reading my first PR!

**Description:**
Deduplicate content in AzureSearch vectorstore.
Currently, by default, the content of the retrieval is placed both in
metadata and page_content of a Document.
This PR removes the content from metadata, and leaves it in
page_content.

**Issue:**:
Previously, the content was popped from result before metadata was
populated.
In #25828 , the order was changed which leads to a response with
duplicated content.
This was not the intention of that PR and seems undesirable.

Looking forward to seeing my contribution in the next version!

Cheers, 
Renzo
  • Loading branch information
Renzo-vS authored Nov 20, 2024
1 parent abaea28 commit 567dc1e
Showing 1 changed file with 4 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1798,15 +1798,17 @@ def _result_to_document(result: Dict) -> Document:
fields_metadata = json.loads(result[FIELDS_METADATA])
else:
fields_metadata = {
key: value for key, value in result.items() if key != FIELDS_CONTENT_VECTOR
key: value
for key, value in result.items()
if key not in [FIELDS_CONTENT_VECTOR, FIELDS_CONTENT]
}
# IDs
if FIELDS_ID in result:
fields_id = {FIELDS_ID: result.pop(FIELDS_ID)}
else:
fields_id = {}
return Document(
page_content=result.pop(FIELDS_CONTENT),
page_content=result[FIELDS_CONTENT],
metadata={
**fields_id,
**fields_metadata,
Expand Down

0 comments on commit 567dc1e

Please sign in to comment.