You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue you're experiencing with duplicated and extra characters in the ReAct mode of the LlamaIndex chat engine could be related to how the astream_chat method processes messages and generates responses. The astream_chat method uses an asynchronous generator to yield response tokens incrementally, which might lead to duplication if the message handling or memory update logic is not correctly implemented [1].
Here are a few things you might want to check or try:
Message Handling: Ensure that the chat history and message handling logic correctly manage the state and do not inadvertently duplicate messages. The chat history should be accurately set and updated to prevent repeated processing of the same message.
Response Generation: Verify that the response synthesizer and the asynchronous generator (async_response_gen) are correctly configured to handle streaming responses without duplicating tokens. The generator should yield each token only once [2].
Version-Specific Changes: Since the issue occurs between versions 0.12.5 and 0.12.7, reviewing the changelogs or commits for these versions might provide insights into any changes that could affect message processing or response generation.
If these steps do not resolve the issue, further investigation into the specific implementation details of the ReAct mode in these versions might be necessary.
I think I roughly understand the issue. Since 你好 is a simple message, the llm didn't follow the react prompt template but directly output the result 你好!请问你需要了解哪些信息呢?我可以帮助你查询相关信息。
In the OpenAI protocol-compatible streaming output, the content of the final chunk is empty.
The latest_chunk and missed_chunks_storage were modified in _infer_stream_chunk_is_final.
missed_chunks_storage is ['你好', '!', '请问']
latest_chunk is '你好!请问。。。'
Eventually, the result was reassembled, leading to the strange output described in my issue:
# add back the chunks that were missedresponse_stream=self._async_add_back_chunk_to_stream(
chunks=[*missed_chunks_storage, latest_chunk], chat_stream=chat_stream
)
Bug Description
When using the ReAct mode of the LlamaIndex chat engine, the output contains duplicated and extra characters that are not expected.
Version
0.12.5-0.12.7
Steps to Reproduce
set_global_handler("simple")
.你好
.All other ChatMode works fine.
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: