Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unresponsive ReACT agent that can't resolve #17337

Open
BrianMwas opened this issue Dec 20, 2024 · 3 comments
Open

[Bug]: Unresponsive ReACT agent that can't resolve #17337

BrianMwas opened this issue Dec 20, 2024 · 3 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@BrianMwas
Copy link

Bug Description

This is the code am using,

class EnhancedChatEngine:
    """Enhanced chat agent with specialized query engines"""

    def __init__(
        self,
        vector_store_manager: "VectorStoreManager",
        org_id: UUID,
        session_id: str,
        llm: Optional[Any],
        memory_buffer: Optional[ChatMemoryBuffer] = None,
    ):
        """Initialize the props"""
        self.vector_store_manager = vector_store_manager
        self.token_counter = TokenCountingHandler(
            tokenizer=None  # Will use default GPT tokenizer
        )
        callback_manager = CallbackManager([self.token_counter])

        self.org_id = org_id
        self.llm = llm
        self.memory_buffer = memory_buffer
        self.query_engines: Dict[str, RetrieverQueryEngine] = {}
        self.context_index = None
        self.agent = None
        self.llm = llm or Groq(
            model="llama3-70b-8192",
            api_key=settings.GROQ_API_KEY,
            temperature=0.4,
            callback_manager=callback_manager,
        )
        self.openai = OpenAI(api_key=settings.OPENAI_API_KEY, model="gpt-3.5-turbo")

        Settings.callback_manager = callback_manager

        chat_store = UpstashChatStore(
            redis_url=settings.UPSTASH_URL,
            redis_token=settings.UPSTASH_TOKEN,
            ttl=300,  # Optional: Time to live in seconds
        )
        self.memory_buffer = memory_buffer or ChatMemoryBuffer.from_defaults(
            token_limit=4000,
            llm=self.llm,
            chat_store=chat_store,
            chat_store_key=session_id,
        )

    async def initialize(self):
        """Initialize the agent with specialized query engines and context"""
        try:
            Settings.llm = self.llm
            Settings.embed_model = OpenAIEmbedding(
                model="text-embedding-ada-002",
                api_key=settings.OPENAI_API_KEY,
                embed_batch_size=100,
            )
            # Get vector store and create base index
            vector_store = await self.vector_store_manager.get_org_vector_store(
                self.org_id
            )
            base_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

            # Create specialized query engines with different strategies
            self.query_engines = {
                # VARA Regulations Engine - use TREE_SUMMARIZE for complex regulatory info
                "vara_regulations": base_index.as_query_engine(
                    similarity_top_k=8,
                    # Good for hierarchical regulatory info
                    response_mode=ResponseMode.TREE_SUMMARIZE,
                    node_postprocessors=[
                        SimilarityPostprocessor(similarity_cutoff=0.4),
                        KeywordNodePostprocessor(
                            required_keywords=[
                                "VARA",
                                "regulation",
                                "license",
                                "compliance",
                                "requirement",
                            ],
                            exclude_keywords=["technical", "protocol"],
                            match_threshold=0.2,
                        ),
                        MetadataReplacementPostProcessor(
                            target_metadata_key="document_id"
                        ),
                    ],
                    streaming=True,
                ),
                # Blockchain Tech Engine - use COMPACT for technical details
                "blockchain_tech": base_index.as_query_engine(
                    similarity_top_k=5,
                    response_mode=ResponseMode.COMPACT,  # Efficient for technical info
                    node_postprocessors=[
                        SimilarityPostprocessor(similarity_cutoff=0.5),
                        KeywordNodePostprocessor(
                            required_keywords=[
                                "blockchain",
                                "technical",
                                "protocol",
                                "implementation",
                                "system",
                            ],
                            exclude_keywords=["VARA", "regulation"],
                            match_threshold=0.3,
                        ),
                    ],
                    streaming=True,
                ),
                # Compliance Engine - use COMPACT_ACCUMULATE for comprehensive compliance info
                "compliance": base_index.as_query_engine(
                    similarity_top_k=3,
                    # Better for detailed compliance info
                    response_mode=ResponseMode.COMPACT_ACCUMULATE,
                    node_postprocessors=[
                        SimilarityPostprocessor(similarity_cutoff=0.6),
                        KeywordNodePostprocessor(
                            required_keywords=[
                                "compliance",
                                "audit",
                                "report",
                                "requirement",
                                "guideline",
                            ],
                            match_threshold=0.4,
                        ),
                        MetadataReplacementPostProcessor(
                            target_metadata_key="document_id"
                        ),
                    ],
                    streaming=True,
                ),
                # General Knowledge Engine - use REFINE for broad queries
                "general": base_index.as_query_engine(
                    similarity_top_k=10,
                    response_mode=ResponseMode.REFINE,  # Better for general knowledge synthesis
                    node_postprocessors=[
                        SimilarityPostprocessor(similarity_cutoff=0.3),
                    ],
                    streaming=True,
                ),
            }

            # Create query engine tools with detailed descriptions
            query_engine_tools = [
                QueryEngineTool(
                    query_engine=self.query_engines["vara_regulations"],
                    metadata=ToolMetadata(
                        name="vara_regulations",
                        description=(
                            "Specialized tool for VARA regulatory information. Best for: "
                            "- Specific VARA rules and guidelines "
                            "- Licensing requirements "
                            "- Regulatory compliance questions "
                            "- Updates to VARA regulations"
                        ),
                    ),
                ),
                QueryEngineTool(
                    query_engine=self.query_engines["blockchain_tech"],
                    metadata=ToolMetadata(
                        name="blockchain_tech",
                        description=(
                            "Technical blockchain information tool. Best for: "
                            "- Technical implementation details "
                            "- Protocol specifications "
                            "- System architecture questions "
                            "- Technical compliance requirements"
                        ),
                    ),
                ),
                QueryEngineTool(
                    query_engine=self.query_engines["compliance"],
                    metadata=ToolMetadata(
                        name="compliance",
                        description=(
                            "Compliance requirements tool. Best for: "
                            "- Audit requirements "
                            "- Reporting guidelines "
                            "- Compliance deadlines "
                            "- Process requirements"
                        ),
                    ),
                ),
            ]

            general_tool = QueryEngineTool(
                query_engine=self.query_engines["general"],
                metadata=ToolMetadata(
                    name="general_knowledge",
                    description=(
                        "General knowledge base. Best for: "
                        "- Broad overview questions "
                        "- Multiple topic queries "
                        "- General information needs"
                    ),
                ),
            )

            # Add Tavily tool for external context
            tavily_tool = FunctionTool.from_defaults(
                fn=lambda query: TavilyToolSpec(
                    api_key=settings.TAVILY_SEARCH_API_KEY
                ).search(query),
                name="tavily_search",
                description=(
                    "Search tool for finding current and external information"
                    " about blockchain and crypto regulations. Expects a query string."
                ),
            )

            # Create query plan tool with response synthesizer
            response_synthesizer = get_response_synthesizer(
                response_mode="tree_summarize"
            )

            query_plan_tool = QueryPlanTool.from_defaults(
                query_engine_tools=query_engine_tools,
                response_synthesizer=response_synthesizer,
                name="query_plan",
                description_prefix=(
                    "ONLY use for complex blockchain/VARA regulation queries."
                    "DO NOT use for greetings or basic questions."
                ),
            )

            # Create ReAct agent with all tools
            agent = ReActAgent.from_tools(
                tools=[query_plan_tool, tavily_tool, general_tool],
                llm=self.llm,
                verbose=True,
                memory=self.memory_buffer,
                system_prompt=self._default_system_prompt,
                max_iterations=3,
            )
            self.agent = agent

        except Exception as e:
            logger.error(f"Failed to initialize agent: {str(e)}")
            raise

    async def chat(self, message: str) -> ChatResponse:
        """Process chat message using the agent"""
        try:
            # Check if agent is initialized
            if self.agent is None:
                await self.initialize()
                if self.agent is None:  # Double check after initialization
                    raise ValueError("Failed to initialize chat agent")

            # Reset token counter
            self.token_counter.reset_counts()
            source_nodes = []
            response_text = ""
            used_tool = ""

            # Get agent response
            response = self.agent.chat(message)
            response_text = str(response)

            # Extract sources from agent's response if available
            if hasattr(response, "source_nodes"):
                source_nodes.extend(
                    [
                        NodeInfo(
                            text=node.text,  # Access directly from node
                            score=node.score if hasattr(node, "score") else 1.0,
                            document_id=(
                                node.node.ref_doc_id
                                if hasattr(node.node, "ref_doc_id")
                                else "unknown"
                            ),
                            title="Agent",
                            chunk_id=(
                                node.node.id_
                                if hasattr(node.node, "id_")
                                else str(uuid4())
                            ),
                        )
                        for node in response.source_nodes
                    ]
                )
                used_tool = "agent"

            print(f"response is here {response_text}")

            # Handle empty or unclear responses
            if not response_text.strip():
                # Try with general knowledge engine as fallback
                general_response = await self.query_engines["general"].aquery(message)

                # Add sources from general engine
                if hasattr(general_response, "source_nodes"):
                    source_nodes = [
                        NodeInfo(
                            text=node.text,  # Access directly from node
                            score=node.score if hasattr(node, "score") else 1.0,
                            document_id=(
                                node.node.ref_doc_id
                                if hasattr(node.node, "ref_doc_id")
                                else "unknown"
                            ),
                            title="Agent",
                            chunk_id=(
                                node.node.id_
                                if hasattr(node.node, "id_")
                                else str(uuid4())
                            ),
                        )
                        for node in general_response.source_nodes
                    ]
                    used_tool = "general_engine"

                if not str(general_response).strip():
                    # If still empty, use Tavily search
                    tavily_response = await self.agent.tools[1].acall(message)

                    if tavily_response:
                        response_text = (
                            f"Based on current information: {tavily_response}"
                        )
                        # Add Tavily result as a source
                        source_nodes.append(
                            NodeInfo(
                                text=str(tavily_response),
                                score=1.0,
                                document_id="tavily_search",
                                title="Web Search Result",
                                chunk_id=f"tavily_{str(uuid4())[:8]}",
                            )
                        )
                        used_tool = "tavily"
                    else:
                        response_text = (
                            f"I understand you're asking about {message}. "
                            "To provide a more accurate response, could you please:\n"
                            "- Provide more specific details about your query\n"
                            "- Specify which aspect of blockchain or VARA regulations you're"
                            "  interested in\n"
                            "- Let me know if you're looking for technical or "
                            "compliance information"
                        )
                        used_tool = "clarification"

            # Create response object
            metadata = {
                "token_usage": {
                    "prompt_tokens": self.token_counter.prompt_llm_token_count,
                    "completion_tokens": self.token_counter.completion_llm_token_count,
                    "total_tokens": self.token_counter.total_llm_token_count,
                },
                "used_tool": used_tool,
                "source_count": len(source_nodes),
                "chat_history_length": (
                    len(self.memory_buffer.get_all()) if self.memory_buffer else 0
                ),
            }

            # Log retrieval metrics
            logger.info(
                "Chat completion metrics",
                extra={
                    "message": message,
                    "source_count": len(source_nodes),
                    "used_tool": used_tool,
                    "token_usage": metadata["token_usage"],
                    "has_response": bool(response_text.strip()),
                },
            )

            return ChatResponse(
                response=response_text,
                response_id=str(uuid4()),
                source_nodes=source_nodes,  # You might want to add source tracking
                metadata=metadata,
            )

        except Exception as e:
            import traceback

            stack_trace = traceback.format_exc()
            logger.error(
                f"Error in chat processing: {str(e)}\n" f"Stack trace:\n{stack_trace}"
            )
            raise RuntimeError(f"Error during chat: {str(e)}")

    def _default_system_prompt(self) -> str:
        return """
            You are an Bot with knowledge in blockchain

            IMPORTANT INSTRUCTIONS:
            1. For greetings (like "hi", "hello"):
            - Respond directly with a simple greeting
            - DO NOT use any tools
            - DO NOT try to proactively provide VARA information

            2. For basic questions:
            - Respond naturally and briefly
            - Only use tools if specifically asked about blockchain or VARA

            3. Only use tools when:
            - User explicitly asks about VARA regulations
            - User asks about blockchain compliance
            - User needs specific technical information

            Example responses:
            - "Hi" → "Hello! How can I help you?"
            - "Hello" → "Hi there! Feel free to ask me about blockchain or VARA regulations."
        """

I have been trying for days to get this fixed. But for some reason, it always ends up with an exceeded rate limit, even for simple greetings like Hi, it goes through an endless loop of Thought and Observation with no resolve
Any help is welcome

Version

0.12.0

Steps to Reproduce

Run the code, make a request using the terminal

Relevant Logs/Tracbacks

2024-12-20T12:47:24.856+03:00
File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1092, in _retry_request
Link 


2024-12-20T12:47:24.856+03:00
return self._request(
Link 


2024-12-20T12:47:24.856+03:00
^^^^^^^^^^^^^^
Link 


2024-12-20T12:47:24.856+03:00
File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1043, in _request
Link 


2024-12-20T12:47:24.856+03:00
return self._retry_request(
Link 


2024-12-20T12:47:24.856+03:00
^^^^^^^^^^^^^^^^^^^^
Link 


2024-12-20T12:47:24.856+03:00
File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1092, in _retry_request
Link 


2024-12-20T12:47:24.856+03:00
return self._request(
Link 


2024-12-20T12:47:24.856+03:00
^^^^^^^^^^^^^^
Link 


2024-12-20T12:47:24.856+03:00
File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1058, in _request
Link 


2024-12-20T12:47:24.856+03:00
raise self._make_status_error_from_response(err.response) from None
Link 


2024-12-20T12:47:24.856+03:00
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01j9rchczxfxmr4sr5ckc7s9qh` on tokens per minute (TPM): Limit 6000, Used 5420, Requested 1351. Please try again in 7.709s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Link 


2024-12-20T12:47:24.856+03:00
2024-12-20 09:47:24,856 - src.api.routes.chat - ERROR - RuntimeError: Error during chat: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01j9rchczxfxmr4sr5ckc7s9qh` on tokens per minute (TPM): Limit 6000, Used 5420, Requested 1351. Please try again in 7.709s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Link 
2024-12-20 09:47:24,856 - src.api.routes.chat - ERROR - RuntimeError: Error during chat: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01j9rchczxfxmr4sr5ckc7s9qh` on tokens per minute (TPM): Limit 6000, Used 5420, Requested 1351. Please try again in 7.709s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}


2024-12-20T12:47:24.856+03:00
details {'development': {'host': 'localhost', 'port': 6333}, 'production': {'url': 'https://3fcc61a9-dfed-4ceb-832c-257a59ae4b8d.us-east4-0.gcp.cloud.qdrant.io:6333', 'api_key': 'nYGEi9udxDcO_0jjsnnlXXEGt-NHbQJwAoHXEVGxaddu0tYED9Zg-w'}} and production and {'url': 'https://3fcc61a9-dfed-4ceb-832c-257a59ae4b8d.us-east4-0.gcp.cloud.qdrant.io:6333', 'api_key': 'nYGEi9udxDcO_0jjsnnlXXEGt-NHbQJwAoHXEVGxaddu0tYED9Zg-w'}
Link 


2024-12-20T12:47:24.856+03:00
> Running step 085986b1-f870-491a-b22d-06be71a2c1e3. Step input: Hi
Link 


2024-12-20T12:47:24.856+03:00
�[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Link 


2024-12-20T12:47:24.856+03:00
Action: general_knowledge
Link 


2024-12-20T12:47:24.856+03:00
Action Input: {'input': 'Hi'}
Link 


2024-12-20T12:47:24.856+03:00
�[0m�[1;3;34mObservation: Hello!
Link 


2024-12-20T12:47:24.856+03:00
�[0m> Running step 1719698a-8820-4618-84d9-d075695550ad. Step input: None
Link 
�[0m> Running step 1719698a-8820-4618-84d9-d075695550ad. Step input: None


2024-12-20T12:47:24.856+03:00
�[1;3;38;5;200mThought: The user seems to be initiating a conversation, but I'm not sure what they want to talk about. I need to ask for clarification.
Link 
�[1;3;38;5;200mThought: The user seems to be initiating a conversation, but I'm not sure what they want to talk about. I need to ask for clarification.
@BrianMwas BrianMwas added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 20, 2024
Copy link

dosubot bot commented Dec 20, 2024

To address the issue of the unresponsive ReACT agent in the EnhancedChatEngine class, which results in an exceeded rate limit and an endless loop of Thought and Observation, you can take the following steps:

  1. Set Maximum Iterations: Ensure that the max_iterations parameter in the ReActAgent is set to a reasonable number to prevent endless loops. In your code, it is set to 3, which should be sufficient, but verify that this is being respected in the agent's logic.

  2. Handle Exceeded Iterations: Ensure that the logic in the _get_response method of the ReActAgent properly raises an exception or stops processing when the maximum number of iterations is reached. This will prevent the agent from continuing in an endless loop.

  3. Check for Final Response: Verify that the _infer_stream_chunk_is_final method correctly identifies when a response should be considered complete. This will help avoid unnecessary iterations.

  4. Verbose Logging: Since the verbose flag is set to True, use the detailed logs to identify where the loop might be getting stuck. This can provide insights into whether the agent is incorrectly deciding to continue processing.

  5. Tool Execution: Ensure that the tools being called, such as general_knowledge, are functioning correctly and not causing the loop to continue indefinitely. Check if the tools are returning expected results and if the agent is handling these results properly.

By addressing these areas, you should be able to mitigate the issue of an unresponsive agent and prevent it from entering an endless loop [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@BrianMwas
Copy link
Author

@dosu I applied all the recommendations but still got the error

Copy link

dosubot bot commented Dec 20, 2024

If you've already applied the previous recommendations and are still encountering the issue, consider the following additional steps:

  1. Rate Limit Handling: Implement a retry mechanism with exponential backoff for handling rate limit errors. This can help manage the rate limit more effectively by spacing out retries when a rate limit is hit.

  2. Simplify Initial Responses: For simple greetings like "Hi", ensure that the agent does not unnecessarily invoke any tools. You can modify the logic to directly return a response without engaging the ReActAgent or any tools.

  3. Inspect Tool Logic: Double-check the logic within the general_knowledge tool and any other tools to ensure they are not causing unnecessary iterations. Make sure they return promptly and correctly when invoked.

  4. Debugging and Logging: Increase the verbosity of your logging to capture more detailed information about each step the agent takes. This can help pinpoint where the loop might be getting stuck or why the rate limit is being exceeded.

  5. Token Usage Monitoring: Use the TokenCountingHandler to closely monitor token usage and identify if there are any unexpected spikes in token consumption that could be contributing to the rate limit issue.

  6. Agent Initialization: Ensure that the agent is correctly initialized and that all components are set up properly. Any misconfiguration could lead to unexpected behavior.

By focusing on these areas, you should be able to further diagnose and address the issue.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

1 participant