Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIL (redis) #52

Closed
wants to merge 14 commits into from
Closed

HIL (redis) #52

wants to merge 14 commits into from

Conversation

jankrepl
Copy link
Collaborator

@jankrepl jankrepl commented Nov 19, 2024

How to test?

First of all, spin up redis

$ docker run -p 6380:6379 -d redis

And then add the following entry to your .env

NEUROAGENT_HIL__REDIS_URI=redis://localhost:6380

I added a get-now tool that is the only tool that has the hil: True.

$ curl -X POST -H "content-type: application/json" -H "x-project-id: $PROJECT_ID" -H "x-virtual-lab-id: $VIRTUAL_LAB_ID" -H "Authorization: Bearer $TOKEN" -d '{"query": "Could you please call the get-now tool three times as the same time"}' 'localhost:8001/qa/run/' | jq 

You can then use the approvals router and the endpoint to view and decline/approve outstanding requests.

@jankrepl jankrepl marked this pull request as draft November 19, 2024 09:20
@@ -230,7 +232,7 @@ async def get_vlab_and_project(


def get_starting_agent(
_: Annotated[None, Depends(get_vlab_and_project)],
# _: Annotated[None, Depends(get_vlab_and_project)],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did this since the token generating service was down at the time of writing this PR.

We should not merge that

@@ -21,7 +21,7 @@ async def stream_agent_response(
) -> AsyncIterator[str]:
"""Redefine fastAPI connections to enable streaming."""
if isinstance(agents_routine.client, AsyncOpenAI):
connected_agents_routine = AgentsRoutine(
connected_agents_routine = AgentsRoutine( # type: ignore[call-arg]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not implement the logic here. I mean nothing would change really...just did not want to overcomplicate this poc

@jankrepl jankrepl marked this pull request as ready for review November 19, 2024 12:34
@WonderPG
Copy link
Contributor

Similar to Phantasm in terms of idea (besides the approval dashboard), however there is still the issue of the execution hanging. The hanging time is handled with the ttl, which for now is set to 10 mins by default. Now in my opinion issues will arise when we will start having more users. With more traffic we will have more and more concurrent hanging executions. To limit that we will have to reduce the ttl until a certain limit, which must have a lower bound to ensure that a user has enough time to read and modify his configuration.

Also thinking about the future, if we implement the queue mechanism to pick up requests it will block the queue for as long as the user wants, up to the ttl limit. This is both an advantage and a disadvantage: If your request is picked up you should be able to execute it until the end without going back to the queue but you also block other requests that could run while you do your approval. This is rather for larger simulations, since I assume that you will want to double check/do some research before clicking the "Yes I want to spend a ton of money" button. Speaking of which, having a larger ttl might also be necessary for bigger simulations since you want to quadruple check it before ruining your vlab.

On the plus side this solution is definitely cleaner than ours. I'm pretty sure we could refactor ours big time but it will remain some sort of mess regardless.

Also couple of side questions:

  • Is it okay for the frontend to wait for the answer of the user's request, while sending approval requests on the side ?
  • What will happen if the user clicks on another conversation while polling for approval then going back to the original one ?
  • How is the frontend knowing when there is something awaiting for approval ? Do they have to regularly query the GET all to know ?
  • What if the user wants to list the tools used to generate an AI message and one of the tool was modified by HIL ? The user will see the wrong inputs ? Also we have seen that this can be an issue for the model, e.g. ask to resolve Thalamus id -> HIL -> change Thalamus to auditory cortex -> Model says "I found the id of the thalamus" while showing the ID of the Auditory cortex. Modifying the inputs of the AI_TOOL message solved the problem apparently.
  • Aren't we hanging resources on the frontend as well ?

Anyway I tried to be super critic, but definitely a great solution I like it. Simple and efficient. Thank you for that !

@WonderPG
Copy link
Contributor

Ah I see that modifying the input is not possible for now. I guess it's the patch endpoint that will handle it somehow ?

@jankrepl
Copy link
Collaborator Author

jankrepl commented Nov 19, 2024

Now in my opinion issues will arise when we will start having more users. With more traffic we will have more and more concurrent hanging executions

Possibly. Honestly, I am not able to say at this point what is going to happen when there are a lot of requests. We can alternatively go for the pub sub model (redis supports it) but I guess there might be other downsides.

having a larger ttl might also be necessary for bigger simulations since you want to quadruple check it before ruining your vlab.

Yeh. Very likely. Agree.

  • Is it okay for the frontend to wait for the answer of the user's request, while sending approval requests on the side ?

Sure. Frontends do everything async so this isn't a problem.

  • What will happen if the user clicks on another conversation while polling for approval then going back to the original one ?

So currently the approvals are basically identified via approval_key=approval/<userid>/<approvalid>. So they are "global" for a given user and there is no concept of threads (=conversations). Of course this could be customized.

  • How is the frontend knowing when there is something awaiting for approval ? Do they have to regularly query the GET all to know ?

Exactly. The frontend would indeed have to be polling the approvals/ endpoint every X milliseconds. https://developer.mozilla.org/en-US/docs/Web/API/Window/setInterval

  • What if the user wants to list the tools used to generate an AI message and one of the tool was modified by HIL ? The user will see the wrong inputs ? Also we have seen that this can be an issue for the model, e.g. ask to resolve Thalamus id -> HIL -> change Thalamus to auditory cortex -> Model says "I found the id of the thalamus" while showing the ID of the Auditory cortex. Modifying the inputs of the AI_TOOL message solved the problem apparently.

This POC does not allow for modifications. I guess they are useful and phantasm does support them.

  • Aren't we hanging resources on the frontend as well ?

I think the Frontend -> Backend polling is not a problem. Backend -> Redis might be more prone to issues but I am not sure.

Anyway I tried to be super critic, but definitely a great solution I like it. Simple and efficient. Thank you for that !

:)) Thanks. Again, it was just a suggestion. Happy to change it/not use it

@WonderPG

@jankrepl
Copy link
Collaborator Author

jankrepl commented Jan 9, 2025

Closing this one since it is not the approach we decided to go for

@jankrepl jankrepl closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants