Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a push to talk example for OAI realtime api #1359

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

longcw
Copy link
Collaborator

@longcw longcw commented Jan 10, 2025

Created a push-to-talk example for the manual VAD of OAI realtime api.

A follow up of #1347

Copy link

changeset-bot bot commented Jan 10, 2025

⚠️ No Changeset found

Latest commit: cfdc00b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@longcw longcw requested a review from a team January 10, 2025 10:40
@lukasIO
Copy link
Contributor

lukasIO commented Jan 10, 2025

just my 2c, but would be nice if this was using @bcherry's RPC feature via LiveKit, instead of a custom WebSocket ?

@longcw
Copy link
Collaborator Author

longcw commented Jan 10, 2025

just my 2c, but would be nice if this was using @bcherry's RPC feature via LiveKit, instead of a custom WebSocket ?

The frontend in the example is just a simple button without a full livekit client. What is the simplest way to incorporate the RPC in a single web page or do we want to add a full client in the example. Or mention to use RPC for a real production in readme and comments?

@longcw
Copy link
Collaborator Author

longcw commented Jan 11, 2025

@lukasIO @davidzhao updated to use RPC with frontend example livekit-examples/voice-assistant-frontend#23

agent.interrupt()
elif data.payload == "release":
agent.generate_reply(on_duplicate="cancel_existing")
return "ok"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW i think it would make more sense for these to just be separate rpc methods e.g. ptt.start and ptt.end with an empty payload

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more clear. updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this some more... you might consider having the frontend send PTT heartbeats every few seconds while the user continues to hold. that way if something goes wrong in the remote client the agent can time out its PTT handler and generate a response? otherwise it may be possible to get the state out of sync and have the agent hang for a significant period of time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. but since it's just an example about how to interrupt and manually trigger an agent response, I prefer to keep it simple and mention the risk of state out of sync and solution in the comment.

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@longcw since the frontend will take awhile, can we push the multimodal_agent changes in first, without the PTT example?

@longcw
Copy link
Collaborator Author

longcw commented Jan 15, 2025

@longcw since the frontend will take awhile, can we push the multimodal_agent changes in first, without the PTT example?

sounds good. how about we keep the example but just mention how should it work in readme without a link to the frontend.

Comment on lines +1035 to +1037
@property
def server_vad_enabled(self) -> bool:
return self._opts.turn_detection is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put that inside the Capabilities dataclass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to add a support_manual_vad in Capabilities dataclass. But it still need to read the self._opts.turn_detection check if the server vad enabled. Should we have both support_manual_vad and session.vad_mode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants