You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for taking the time to review this issue! Our team is actively developing around Azure Cloud, and we have found LlamaIndex to be a fantastic framework for building RAG-based AI applications.
When implementing the Document Summary Index with Azure AI Search and Cosmos DB as part of a fully Azure-based setup, we encounter an issue during querying. The index builds successfully, but querying fails with a KeyError regarding a missing mapping from document_id to node_id.
Here is what we observed:
Before moving to Azure: Querying works correctly in a local persistent storage context.
In Azure: When using Azure AI Search as the vector store, the index_store does not seem to be used. Instead, the document summary mapping is stored in the docstore with type='document_summary', while other entries have type='1' (TextNode).
If the docstore is queried to retrieve all documents, it fails due to the presence of document_summary entries that should ideally reside in the index_store.
This behavior seems specific to how Azure AI Search handles the vector store and storage for the Document Summary Index.
Additional Context
The issue does not occur when using a local persistent storage setup.
The behavior appears specific to Azure AI Search as the vector store.
The docstore contains entries of type document_summary, which should ideally reside in the index_store.
Version
0.12.5
Steps to Reproduce
Instantiate docstore, index_store, and vector_store using Azure services.
Index the documents.
Construct the Document Summary Index with the same parameters used for querying.
Run a query using query_engine.query(question).
Additionally, attempt to retrieve all documents from the docstore.
Code Example
docstore=AzureDocumentStore.from_connection_string(
connection_string=cosmos_table_connection_string,
namespace=namespace,
service_mode=ServiceMode.STORAGE,
partition_key=self.index_name# use index name as partition key
)
index_store=AzureIndexStore.from_connection_string(
connection_string=cosmos_table_connection_string,
namespace=namespace,
service_mode=ServiceMode.STORAGE,
partition_key=self.index_name# use index name as partition key
)
index_vector_store=AzureAISearchVectorStore(
search_or_index_client=self.index_client,
# filterable_metadata_field_keys=metadata_fields,index_name=self.index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=self.embeddings_dimension,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
language_analyzer="en.lucene",
vector_algorithm_type="exhaustiveKnn",
# compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)
storage_context=StorageContext.from_defaults(
docstore=docstore,
index_store=index_store,
vector_store=search_vector_store,
)
index=DocumentSummaryIndex.from_documents(
documents=documents,
storage_context=self.search_storage_context,
llm=self.llm,
embed_model=self.embed_model)
query_engine=index.as_query_engine()
response=query_engine.query(question)
Relevant Logs/Tracbacks
Traceback (most recent call last):
File "<user-path>/manual_test_document_summary_index_indexing.py", line 31, in<module>
response = query_engine.query(question)
File "<llama-index-path>/instrumentation/dispatcher.py", line 321, in wrapper
result = func(*args, **kwargs)
File "<llama-index-path>/base_query_engine.py", line 52, in query
query_result = self._query(str_or_query_bundle)
File "<llama-index-path>/instrumentation/dispatcher.py", line 321, in wrapper
result = func(*args, **kwargs)
File "<llama-index-path>/retriever_query_engine.py", line 178, in _query
nodes = self.retrieve(query_bundle)
File "<llama-index-path>/retriever_query_engine.py", line 133, in retrieve
nodes = self._retriever.retrieve(query_bundle)
File "<llama-index-path>/instrumentation/dispatcher.py", line 321, in wrapper
result = func(*args, **kwargs)
File "<llama-index-path>/base_retriever.py", line 245, in retrieve
nodes = self._retrieve(query_bundle)
File "<llama-index-path>/instrumentation/dispatcher.py", line 321, in wrapper
result = func(*args, **kwargs)
File "<llama-index-path>/document_summary/retrievers.py", line 185, in _retrieve
node_ids = self._index_struct.summary_id_to_node_ids[summary_id]
KeyError: '358720ff-2d3b-4d1a-9f00-6eb137ff8ed0'
The text was updated successfully, but these errors were encountered:
The issue you're encountering seems to be related to how the DocumentSummaryIndex interacts with Azure AI Search and Cosmos DB in the LlamaIndex framework. Specifically, the problem arises from a missing mapping between document_id and node_id, which is crucial for querying.
Here are some points to consider for troubleshooting and resolving this issue:
Mapping Storage: The DocumentSummaryIndex uses a dictionary to map document_id to node_id. This mapping is typically stored in the doc_id_to_summary_id attribute. Ensure that this mapping is correctly populated and accessible during both indexing and querying phases [1].
Docstore and Index Store: It seems that the docstore is being used to store entries of type document_summary, which might be causing the issue. Ideally, these entries should be managed by the index_store. Verify that the index_store is correctly configured and used for storing document summaries.
Configuration Consistency: Double-check the configuration of your AzureDocumentStore, AzureIndexStore, and AzureAISearchVectorStore. Ensure that they are correctly set up to handle the storage and retrieval of document summaries and indices. Pay attention to parameters like partition_key and service_mode to ensure they align with your setup.
Debugging the KeyError: The KeyError indicates that a summary_id is missing from the mapping. You might want to add logging or debugging statements to track the creation and retrieval of these mappings. This can help identify where the mapping is failing.
Local vs. Azure Setup: Since the issue does not occur in a local setup, compare the configurations and data flow between your local and Azure environments. There might be differences in how data is stored or retrieved that could lead to this issue.
By focusing on these areas, you should be able to identify and resolve the problem with the missing mappings in your Azure-based setup.
Bug Description
Thank you for taking the time to review this issue! Our team is actively developing around Azure Cloud, and we have found LlamaIndex to be a fantastic framework for building RAG-based AI applications.
When implementing the Document Summary Index with Azure AI Search and Cosmos DB as part of a fully Azure-based setup, we encounter an issue during querying. The index builds successfully, but querying fails with a KeyError regarding a missing mapping from document_id to node_id.
Here is what we observed:
Before moving to Azure: Querying works correctly in a local persistent storage context.
In Azure: When using Azure AI Search as the vector store, the index_store does not seem to be used. Instead, the document summary mapping is stored in the docstore with type='document_summary', while other entries have type='1' (TextNode).
If the docstore is queried to retrieve all documents, it fails due to the presence of document_summary entries that should ideally reside in the index_store.
This behavior seems specific to how Azure AI Search handles the vector store and storage for the Document Summary Index.
Additional Context
The issue does not occur when using a local persistent storage setup.
The behavior appears specific to Azure AI Search as the vector store.
The docstore contains entries of type document_summary, which should ideally reside in the index_store.
Version
0.12.5
Steps to Reproduce
Instantiate docstore, index_store, and vector_store using Azure services.
Index the documents.
Construct the Document Summary Index with the same parameters used for querying.
Run a query using query_engine.query(question).
Additionally, attempt to retrieve all documents from the docstore.
Code Example
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: