-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create a push to talk example for OAI realtime api #1359
base: main
Are you sure you want to change the base?
Conversation
|
just my 2c, but would be nice if this was using @bcherry's RPC feature via LiveKit, instead of a custom WebSocket ? |
The frontend in the example is just a simple button without a full livekit client. What is the simplest way to incorporate the RPC in a single web page or do we want to add a full client in the example. Or mention to use RPC for a real production in readme and comments? |
@lukasIO @davidzhao updated to use RPC with frontend example livekit-examples/voice-assistant-frontend#23 |
agent.interrupt() | ||
elif data.payload == "release": | ||
agent.generate_reply(on_duplicate="cancel_existing") | ||
return "ok" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW i think it would make more sense for these to just be separate rpc methods e.g. ptt.start
and ptt.end
with an empty payload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is more clear. updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this some more... you might consider having the frontend send PTT heartbeats every few seconds while the user continues to hold. that way if something goes wrong in the remote client the agent can time out its PTT handler and generate a response? otherwise it may be possible to get the state out of sync and have the agent hang for a significant period of time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. but since it's just an example about how to interrupt and manually trigger an agent response, I prefer to keep it simple and mention the risk of state out of sync and solution in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@longcw since the frontend will take awhile, can we push the multimodal_agent changes in first, without the PTT example?
sounds good. how about we keep the example but just mention how should it work in readme without a link to the frontend. |
@property | ||
def server_vad_enabled(self) -> bool: | ||
return self._opts.turn_detection is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put that inside the Capabilities dataclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to add a support_manual_vad
in Capabilities dataclass. But it still need to read the self._opts.turn_detection
check if the server vad enabled. Should we have both support_manual_vad
and session.vad_mode
?
Created a push-to-talk example for the manual VAD of OAI realtime api.
A follow up of #1347