-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practices for limiting responses to a specific source document #95
Comments
In order to enrich this discussion, it would be beneficial to be able include keywords that can aid in "directing" the response. Keywords: Cat, Cat Book, Meals. |
Yes. Several options:
I'm unsure how the db is stored, but I don't think it's ordered by metadata, so if your db is big it will take a long time to filter by your document
Yes that sounds like a good idea |
Some delightful ideas I would have to think about, thanks. For starters I see no reason why we could not just pipe in if selected_document is not None:
metadata_filter = {"key": "source_document", "value": selected_document}
retriever = self.qdrant_langchain.as_retriever(
search_type="mmr", metadata_filter=metadata_filter, k=n_forward_documents, fetch_k=n_retrieve_documents
)
else:
retriever = self.qdrant_langchain.as_retriever(search_type="mmr", k=n_forward_documents, fetch_k=n_retrieve_documents) Where this would not perform worse than any other approach (have to check mmr compatibility tho). I think it should be possible to fetch a list of all available documents and set the filter on selection. |
Nice thanks for the quick suggestions. I'd probably go with the easiest for now. #95 (comment) Another idea is to modify the The challenge here is that many users may prefer to search across all their documents, which is not my specific use case. Regardless of the approach, will likely need a mechanism to keep track of which names have been ingested. This could potentially be achieved by creating another table in qdrant for storing global metadata. I'd love to collaborate, but my plate is pretty full these days. I usually sneak in some time for AI projects at night. Really appreciate what you're doing here! |
Hi, Thanks for the contribution.
I have been using your repository to train a model on a collection of books. My goal is to generate answers that are specific to a single source document, essentially using the model as an assistant that draws information from one selected book at a time (such as "cats.pdf").
Initially, I attempted to implement this by modifying the prompts, but the results were inconsistent, and the model sometimes used information from other sources. Here's an example of how I structured the prompts:
Seems like ingest.py adds the source path to the doc metadata. However, when a question is asked, the model retrieves the most relevant documents based on the semantic similarity between the query's embedding and the documents' embeddings, not a specific document identifier. The model does not consider the document's metadata (like its source path) during retrieval, which means it can't be instructed to refer to a specific document just by mentioning the document's name or identifier in the prompt (?).
Considering this, I'm evaluating the option of creating a dropdown menu that lists all the books I've trained the model on. When a book is selected from this menu, I would swap the databases to only include documents from the selected book when a query is made.
With that context, I have a few questions:
Thanks for your time & I'd appreciate your insights.
PS: Adding this under docs, because it might be a result of my lack of understanding of how everything works together.
The text was updated successfully, but these errors were encountered: