Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Incorrect Output in ReAct Mode of LlamaIndex Chat Engine #17322

Open
whisper-bye opened this issue Dec 19, 2024 · 2 comments
Open

[Bug]: Incorrect Output in ReAct Mode of LlamaIndex Chat Engine #17322

whisper-bye opened this issue Dec 19, 2024 · 2 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@whisper-bye
Copy link

whisper-bye commented Dec 19, 2024

Bug Description

When using the ReAct mode of the LlamaIndex chat engine, the output contains duplicated and extra characters that are not expected.

Version

0.12.5-0.12.7

Steps to Reproduce

message = messages[-1]["content"]
chat_history = [
    ChatMessage(role=message["role"], content=message["content"])
    for message in messages[:-1]
]

chat_engine = self.index.as_chat_engine()
streaming_response = await chat_engine.astream_chat(
    message=message, chat_history=chat_history
)

async for token in streaming_response.async_response_gen():
    print(token)
  1. Setup set_global_handler("simple").
  2. Send some questions.
  3. And Then send the message 你好.
  4. Observe the expected log output:
    你好!请问你需要了解哪些信息呢?我可以帮助你查询相关信息。
    
  5. Check the actual output in ReAct mode:
    你好!请问你好!请问你需要了解哪些信息呢?我可以帮助你查询相关信息。
    

All other ChatMode works fine.

Relevant Logs/Tracbacks

@whisper-bye whisper-bye added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 19, 2024
Copy link

dosubot bot commented Dec 19, 2024

The issue you're experiencing with duplicated and extra characters in the ReAct mode of the LlamaIndex chat engine could be related to how the astream_chat method processes messages and generates responses. The astream_chat method uses an asynchronous generator to yield response tokens incrementally, which might lead to duplication if the message handling or memory update logic is not correctly implemented [1].

Here are a few things you might want to check or try:

  1. Message Handling: Ensure that the chat history and message handling logic correctly manage the state and do not inadvertently duplicate messages. The chat history should be accurately set and updated to prevent repeated processing of the same message.

  2. Response Generation: Verify that the response synthesizer and the asynchronous generator (async_response_gen) are correctly configured to handle streaming responses without duplicating tokens. The generator should yield each token only once [2].

  3. Version-Specific Changes: Since the issue occurs between versions 0.12.5 and 0.12.7, reviewing the changelogs or commits for these versions might provide insights into any changes that could affect message processing or response generation.

If these steps do not resolve the issue, further investigation into the specific implementation details of the ReAct mode in these versions might be necessary.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@whisper-bye
Copy link
Author

whisper-bye commented Dec 19, 2024

@logan-markewich

I think I roughly understand the issue. Since 你好 is a simple message, the llm didn't follow the react prompt template but directly output the result 你好!请问你需要了解哪些信息呢?我可以帮助你查询相关信息。

At https://github.com/run-llama/llama_index/blob/v0.12.6/llama-index-core/llama_index/core/agent/react/step.py#L750:

  1. In the OpenAI protocol-compatible streaming output, the content of the final chunk is empty.
  2. The latest_chunk and missed_chunks_storage were modified in _infer_stream_chunk_is_final.
missed_chunks_storage is ['你好', '!', '请问']

latest_chunk is '你好!请问。。。'

Eventually, the result was reassembled, leading to the strange output described in my issue:

# add back the chunks that were missed
response_stream = self._async_add_back_chunk_to_stream(
    chunks=[*missed_chunks_storage, latest_chunk], chat_stream=chat_stream
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

1 participant