Skip to content

Latest commit

 

History

History
186 lines (131 loc) · 5.87 KB

0.8-migration-guide.md

File metadata and controls

186 lines (131 loc) · 5.87 KB

Migrating to 0.8.x

v0.8 is a major release of the framework, featuring significant reliability improvements to VoiceAssistant. This update includes a few breaking API changes that will impact the way you build your agents. We strive to minimize breaking changes, and will stabilize the API as we approach version 1.0.

Job and Worker API

Specifying your entrypoint function

entrypoint_fnc is now a parameter in WorkerOptions. Previously, you were required to explicitly accept the job.

Namespace has been removed

We've removed the namespace option in order to simplify the registration process. In future versions, it'll be possible to provide an explicit agent_name to launch multiple kinds of agents for each room.

Connecting to room is explicit

You now need to call await ctx.connect() to initiate the connection to the room. This allows for pre-connect setup (such as callback registrations) to avoid race conditions.

Example

The above changes are reflected in the following minimal example:

from livekit.agents import JobContext, JobRequest, WorkerOptions, cli

async def job_entrypoint(ctx: JobContext):
    await ctx.connect()
    # your logic here
    ...

if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(entrypoint_fnc=job_entrypoint)
    )

VoiceAssistant

VoiceAssistant API remains mostly unchanged, despite significant improvements to functionality and internals. However, there have been changes to the configuration.

Initialization args

  • Removed
    • base_volume
    • debug
    • sentence_tokenizer, word_tokenizer, hyphenate_word
  • Changed
    • transcription related options are grouped within transcription param
class VoiceAssistant(utils.EventEmitter[EventTypes]):
    def __init__(
        self,
        *,
        vad: vad.VAD,
        stt: stt.STT,
        llm: LLM,
        tts: tts.TTS,
        chat_ctx: ChatContext | None = None,
        fnc_ctx: FunctionContext | None = None,
        allow_interruptions: bool = True,
        interrupt_speech_duration: float = 0.6,
        interrupt_min_words: int = 0,
        preemptive_synthesis: bool = True,
        transcription: AssistantTranscriptionOptions = AssistantTranscriptionOptions(),
        will_synthesize_assistant_reply: WillSynthesizeAssistantReply = _default_will_synthesize_assistant_reply,
        plotting: bool = False,
        loop: asyncio.AbstractEventLoop | None = None,
    ) -> None:
    ...

LLM

The LLM class has been restructured to enhance ergonomics and improve the function calling support.

Function/tool calling

Function calling has gotten a complete overhaul in v0.8.0. The primary breaking change is that function calls are now NOT automatically invoked when iterating the LLM stream. LLMStream.execute_functions needs to be called instead. (VoiceAssistant handles this automatically)

LLM.chat is no longer an async method

Previously, LLM.chat() was an async method that returned an LLMStream (which itself was an AsyncIterable).

We found it easier and less-confusing for LLM.chat() to be synchronous, while still returning the same AsyncIterable LLMStream.

LLM.chat history has been renamed to chat_ctx

In order to improve consistency and reduce confusion.

chat_ctx = llm.ChatContext()
chat_ctx.append(role="user", text="user message")
stream = llm_plugin.chat(chat_ctx=chat_ctx)

STT

SpeechStream.flush

Previously, to communicate to a STT provider that you have sent enough input to generate a response - you could push_frame(None) to coax the TTS into synthesizing a response.

In v0.8.0 that API has been removed and replaced with flush()

SpeechStream.end_input

end_input signals to the STT provider that the input is complete and no additional input will follow. Previously, this was done using aclose(wait=True).

SpeechStream.aclose

The wait arg of aclose has been removed in favor of SpeechStream.end_input (see above). Now, if you call TTS.aclose() without first calling STT.end_input, the behavior will be that the request is cancelled.

stt_stream = my_stt_instance.stream()
async for ev in audio_stream:
  stt_stream.push_frame(ev.frame)
  # optionally flush when enough frames have been pushed
  stt_stream.flush()

stt_stream.end_input()
await stt_stream.aclose()

TTS

SynthesizedAudio changed and SynthesisEvent removed

SynthesizedAudio dataclass has gone through a major change

# New SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
    request_id: str
    """Request ID (one segment could be made up of multiple requests)"""
    segment_id: str
    """Segment ID, each segment is separated by a flush"""
    frame: rtc.AudioFrame
    """Synthesized audio frame"""
    delta_text: str = ""
    """Current segment of the synthesized audio"""

#Old SynthesizedAudio dataclass
@dataclass
class SynthesizedAudio:
    text: str
    data: rtc.AudioFrame

The SynthesisEvent has been removed entirely. All occurrences of it have been replaced with SynthesizedAudio

SynthesizeStream.flush

Similar to the STT changes, this coaxes the TTS provider into generating a response. The SynthesizedAudio response will have a new segment_id after calls to flush().

SynthesizeStream.end_input

Similar to the STT changes, aclose(wait=True) has been replaced.

SynthesizeStream.aclose

Similar to the STT changes, the wait arg has been removed.

tts_stream = my_tts_instance.stream()
tts_stream.push_text("This is the first sentence")
tts_stream.flush()
tts_stream.push_text("This is the second sentence")
tts_stream.end_input()
await tts_stream.aclose()

VAD

The same changes made to STT and TTS have also been made to VAD

vad_stream = my_vad_instance.stream()
async for ev in audio_stream:
  vad_stream.push_frame(ev.frame)
  # optionally flush when enough frames have been pushed
  vad_stream.flush()

vad_stream.end_input()
await vad_stream.aclose()