You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched both the documentation and discord for an answer.
Question
Context: build_nodes_from_splits in llama-index-core/llama_index/core/node_parser/node_utils.py
Each node embedding is initialized to the value of document embedding. This is incorrect.
We should probably initialize it with None and later add embeddings as needed.
def build_nodes_from_splits(
text_splits: List[str],
document: BaseNode,
ref_doc: Optional[BaseNode] = None,
id_func: Optional[IdFuncCallable] = None,
) -> List[TextNode]:
"""Build nodes from splits."""
ref_doc = ref_doc or document
id_func = id_func or default_id_func
nodes: List[TextNode] = []
"""Calling as_related_node_info() on a document recomputes the hash for the whole text and metadata"""
"""It is not that bad, when creating relationships between the nodes, but is terrible when adding a relationship"""
"""between the node and a document, hence we create the relationship only once here and pass it to the nodes"""
relationships = {NodeRelationship.SOURCE: ref_doc.as_related_node_info()}
for i, text_chunk in enumerate(text_splits):
logger.debug(f"> Adding chunk: {truncate_text(text_chunk, 50)}")
if isinstance(document, ImageDocument):
image_node = ImageNode(
id_=id_func(i, document),
text=text_chunk,
embedding=document.embedding,
image=document.image,
image_path=document.image_path,
image_url=document.image_url,
excluded_embed_metadata_keys=document.excluded_embed_metadata_keys,
excluded_llm_metadata_keys=document.excluded_llm_metadata_keys,
metadata_seperator=document.metadata_separator,
metadata_template=document.metadata_template,
text_template=document.text_template,
relationships=relationships,
)
nodes.append(image_node) # type: ignore
elif isinstance(document, Document):
node = TextNode(
id_=id_func(i, document),
text=text_chunk,
embedding=document.embedding,
excluded_embed_metadata_keys=document.excluded_embed_metadata_keys,
excluded_llm_metadata_keys=document.excluded_llm_metadata_keys,
metadata_seperator=document.metadata_separator,
metadata_template=document.metadata_template,
text_template=document.text_template,
relationships=relationships,
)
nodes.append(node)
elif isinstance(document, TextNode):
node = TextNode(
id_=id_func(i, document),
text=text_chunk,
embedding=document.embedding,
excluded_embed_metadata_keys=document.excluded_embed_metadata_keys,
excluded_llm_metadata_keys=document.excluded_llm_metadata_keys,
metadata_seperator=document.metadata_seperator,
metadata_template=document.metadata_template,
text_template=document.text_template,
relationships=relationships,
)
nodes.append(node)
else:
raise ValueError(f"Unknown document type: {type(document)}")
return nodes
The text was updated successfully, but these errors were encountered:
Hey there, @sgondala! I'm here to help you out with any bugs, questions, or contributions. Let's tackle this together!
The build_nodes_from_splits function initializes each node embedding to the value of the document embedding to maintain consistency and context from the original document. The embedding represents the semantic content of the document, and initializing each node with this embedding ensures that the nodes retain the semantic information of the original document. Initializing with None would mean losing this semantic context, which might not be desirable for the intended use of these nodes. Therefore, the current implementation is intentional and should not be changed to None without a specific reason that aligns with the application's requirements [1].
Question Validation
Question
Context:
build_nodes_from_splits
inllama-index-core/llama_index/core/node_parser/node_utils.py
Each node embedding is initialized to the value of document embedding. This is incorrect.
We should probably initialize it with
None
and later add embeddings as needed.The text was updated successfully, but these errors were encountered: