All notable changes to Pipecat will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
-
Added
AssemblyAISTTService
and corresponding foundational examples07o-interruptible-assemblyai.py
and13d-assemblyai-transcription.py
. -
Added a foundational example for Gladia transcription:
13c-gladia-transcription.py
- Fixed
enable_usage_metrics
to control LLM/TTS usage metrics separately fromenable_metrics
.
-
Added
audio_passthrough
parameter toSTTService
. If enabled it allows audio frames to be pushed downstream in case other processors need them. -
Added input parameter options for
PlayHTTTSService
andPlayHTHttpTTSService
.
-
Moved
SileroVAD
audio processor toprocessors.audio.vad
. -
Module
utils.audio
is nowaudio.utils
. A newresample_audio
function has been added. -
PlayHTTTSService
now uses PlayHT websockets instead of HTTP requests. -
The previous
PlayHTTTSService
HTTP implementation is nowPlayHTHttpTTSService
. -
PlayHTTTSService
andPlayHTHttpTTSService
now use avoice_engine
ofPlayHT3.0-mini
, which allows for multi-lingual support. -
Renamed
OpenAILLMServiceRealtimeBeta
toOpenAIRealtimeBetaLLMService
to match other services.
-
LLMUserResponseAggregator
andLLMAssistantResponseAggregator
are mostly deprecated, useOpenAILLMContext
instead. -
The
vad
package is now deprecated andaudio.vad
should be used instead. Theavd
package will get removed in a future release.
-
Fixed an issue that would cause an error if no VAD analyzer was passed to
LiveKitTransport
params. -
Fixed
SileroVAD
processor to support interruptions properly.
- Added
examples/foundational/07-interruptible-vad.py
. This is the same as07-interruptible.py
but using theSileroVAD
processor instead of passing theVADAnalyzer
in the transport.
- Metrics messages have moved out from the transport's base output into RTVI.
-
Added support for OpenAI Realtime API with the new
OpenAILLMServiceRealtimeBeta
processor. (see https://platform.openai.com/docs/guides/realtime/overview) -
Added
RTVIBotTranscriptionProcessor
which will send the RTVIbot-transcription
protocol message. These are TTS text aggregated (into sentences) messages. -
Added new input params to the
MarkdownTextFilter
utility. You can setfilter_code
to filter code from text andfilter_tables
to filter tables from text. -
Added
CanonicalMetricsService
. This processor uses the newAudioBufferProcessor
to capture conversation audio and later send it to Canonical AI. (see https://canonical.chat/) -
Added
AudioBufferProcessor
. This processor can be used to buffer mixed user and bot audio. This can later be saved into an audio file or processed by some audio analyzer. -
Added
on_first_participant_joined
event toLiveKitTransport
.
-
LLM text responses are now logged properly as unicode characters.
-
UserStartedSpeakingFrame
,UserStoppedSpeakingFrame
,BotStartedSpeakingFrame
,BotStoppedSpeakingFrame
,BotSpeakingFrame
andUserImageRequestFrame
are now based fromSystemFrame
-
Merge
RTVIBotLLMProcessor
/RTVIBotLLMTextProcessor
andRTVIBotTTSProcessor
/RTVIBotTTSTextProcessor
to avoid out of order issues. -
Fixed an issue in RTVI protocol that could cause a
bot-llm-stopped
orbot-tts-stopped
message to be sent before abot-llm-text
orbot-tts-text
message. -
Fixed
DeepgramSTTService
constructor settings not being merged with default ones. -
Fixed an issue in Daily transport that would cause tasks to be hanging if urgent transport messages were being sent from a transport event handler.
-
Fixed an issue in
BaseOutputTransport
that would causeEndFrame
to be pushed downed too early and callFrameProcessor.cleanup()
before letting the transport stop properly.
-
Added a new util called
MarkdownTextFilter
which is a subclass of a new base class calledBaseTextFilter
. This is a configurable utility which is intended to filter text received by TTS services. -
Added new
RTVIUserLLMTextProcessor
. This processor will send an RTVIuser-llm-text
message with the user content's that was sent to the LLM.
-
TransportMessageFrame
doesn't have anurgent
field anymore, instead there's now aTransportMessageUrgentFrame
which is aSystemFrame
and therefore skip all internal queuing. -
For TTS services, convert inputted languages to match each service's language format
- Fixed an issue where changing a language with the Deepgram STT service wouldn't apply the change. This was fixed by disconnecting and reconnecting when the language changes.
-
SentryMetrics
has been added to report frame processor metrics to Sentry. This is now possible becauseFrameProcessorMetrics
can now be passed toFrameProcessor
. -
Added Google TTS service and corresponding foundational example
07n-interruptible-google.py
-
Added AWS Polly TTS support and
07m-interruptible-aws.py
as an example. -
Added InputParams to Azure TTS service.
-
Added
LivekitTransport
(audio-only for now). -
RTVI 0.2.0 is now supported.
-
All
FrameProcessors
can now register event handlers.
tts = SomeTTSService(...)
@tts.event_handler("on_connected"):
async def on_connected(processor):
...
-
Added
AsyncGeneratorProcessor
. This processor can be used together with aFrameSerializer
as an async generator. It provides agenerator()
function that returns anAsyncGenerator
and that yields serialized frames. -
Added
EndTaskFrame
andCancelTaskFrame
. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively. -
Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions.
-
Added
sample_rate
as a constructor parameter for TTS services. -
Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames.
To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task.
In this version all the frame processors have their own task to push frames. That is, when
push_frame()
is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out. -
Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional
pts
field (prensentation timestamp). There's currently just one clock implementationSystemClock
and thepts
field is currently only used forTextFrame
s (audio and image frames will be next). -
A clock can now be specified to
PipelineTask
(defaults toSystemClock
). This clock will be passed to each frame processor via theStartFrame
. -
Added
CartesiaHttpTTSService
. -
DailyTransport
now supports setting the audio bitrate to improve audio quality through theDailyParams.audio_out_bitrate
parameter. The new default is 96kbps. -
DailyTransport
now uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed. -
Interruptions support has been added to
TwilioFrameSerializer
when usingFastAPIWebsocketTransport
. -
Added new
LmntTTSService
text-to-speech service. (see https://www.lmnt.com/) -
Added
TTSModelUpdateFrame
,TTSLanguageUpdateFrame
,STTModelUpdateFrame
, andSTTLanguageUpdateFrame
frames to allow you to switch models, language and voices in TTS and STT services. -
Added new
transcriptions.Language
enum.
-
Context frames are now pushed downstream from assistant context aggregators.
-
Removed Silero VAD torch dependency.
-
Updated individual update settings frame classes into a single
ServiceUpdateSettingsFrame
class. -
We now distinguish between input and output audio and image frames. We introduce
InputAudioRawFrame
,OutputAudioRawFrame
,InputImageRawFrame
andOutputImageRawFrame
(and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport. -
ParallelTask
has been renamed toSyncParallelPipeline
. ASyncParallelPipeline
is a frame processor that contains a list of different pipelines to be executed concurrently. The difference between aSyncParallelPipeline
and aParallelPipeline
is that, given an input frame, theSyncParallelPipeline
will wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response). -
StartFrame
is back a system frame to make sure it's processed immediately by all processors.EndFrame
stays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed. -
Updated
MoondreamService
revision to2024-08-26
. -
CartesiaTTSService
andElevenLabsTTSService
now add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time. -
DailyTransport.on_joined
event now returns the full session data instead of just the participant. -
CartesiaTTSService
is now a subclass ofTTSService
. -
DeepgramSTTService
is now a subclass ofSTTService
. -
WhisperSTTService
is now a subclass ofSegmentedSTTService
. ASegmentedSTTService
is aSTTService
where the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream.
-
Fixed OpenAI multiple function calls.
-
Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases.
-
Fixed a
BaseOutputTransport
issue that would stop audio and video rendering tasks (after receiving andEndFrame
) before the internal queue was emptied, causing the pipeline to finish prematurely. -
StartFrame
should be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens onStartFrame
) and other frames come in resulting in undesired behavior.
obj_id()
andobj_count()
now useitertools.count
avoiding the need ofthreading.Lock
.
- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).
- Added
LivekitFrameSerializer
audio frame serializer.
-
Fix
FastAPIWebsocketOutputTransport
variable name clash with subclass. -
Fix an
AnthropicLLMService
issue with empty arguments in function calling.
- Fixed
studypal
example errors.
-
VAD parameters can now be dynamicallt updated using the
VADParamsUpdateFrame
. -
ErrorFrame
has now afatal
field to indicate the bot should exit if a fatal error is pushed upstream (false by default). A newFatalErrorFrame
that sets this flag to true has been added. -
AnthropicLLMService
now supports function calling and initial support for prompt caching. (see https://www.anthropic.com/news/prompt-caching) -
ElevenLabsTTSService
can now specify ElevenLabs input parameters such asoutput_format
. -
TwilioFrameSerializer
can now specify Twilio's and Pipecat's desired sample rates to use. -
Added new
on_participant_updated
event toDailyTransport
. -
Added
DailyRESTHelper.delete_room_by_name()
andDailyRESTHelper.delete_room_by_url()
. -
Added LLM and TTS usage metrics. Those are enabled when
PipelineParams.enable_usage_metrics
is True. -
AudioRawFrame
s are now pushed downstream from the base output transport. This allows capturing the exact words the bot says by adding an STT service at the end of the pipeline. -
Added new
GStreamerPipelineSource
. This processor can generate image or audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP stream or anything supported by GStreamer). -
Added
TransportParams.audio_out_is_live
. This flag is False by default and it is useful to indicate we should not synchronize audio with sporadic images. -
Added new
BotStartedSpeakingFrame
andBotStoppedSpeakingFrame
control frames. These frames are pushed upstream and they should wrapBotSpeakingFrame
. -
Transports now allow you to register event handlers without decorators.
-
Support RTVI message protocol 0.1. This includes new messages, support for messages responses, support for actions, configuration, webhooks and a bunch of new cool stuff. (see https://docs.rtvi.ai/)
-
SileroVAD
dependency is now imported via pip'ssilero-vad
package. -
ElevenLabsTTSService
now useseleven_turbo_v2_5
model by default. -
BotSpeakingFrame
is now a control frame. -
StartFrame
is now a control frame similar toEndFrame
. -
DeepgramTTSService
now is more customizable. You can adjust the encoding and sample rate.
-
TTSStartFrame
andTTSStopFrame
are now sent when TTS really starts and stops. This allows for knowing when the bot starts and stops speaking even with asynchronous services (like Cartesia). -
Fixed
AzureSTTService
transcription frame timestamps. -
Fixed an issue with
DailyRESTHelper.create_room()
expirations which would cause this function to stop working after the initial expiration elapsed. -
Improved
EndFrame
andCancelFrame
handling.EndFrame
should end things gracefully while aCancelFrame
should cancel all running tasks as soon as possible. -
Fixed an issue in
AIService
that would cause a yieldedNone
value to be processed. -
RTVI's
bot-ready
message is now sent when the RTVI pipeline is ready and a first participant joins. -
Fixed a
BaseInputTransport
issue that was causing incoming system frames to be queued instead of being pushed immediately. -
Fixed a
BaseInputTransport
issue that was causing start/stop interruptions incoming frames to not cancel tasks and be processed properly.
-
Added
studypal
example (from to the Cartesia folks!). -
Most examples now use Cartesia.
-
Added examples
foundational/19a-tools-anthropic.py
,foundational/19b-tools-video-anthropic.py
andfoundational/19a-tools-togetherai.py
. -
Added examples
foundational/18-gstreamer-filesrc.py
andfoundational/18a-gstreamer-videotestsrc.py
that show how to useGStreamerPipelineSource
-
Remove
requests
library usage. -
Cleanup examples and use
DailyRESTHelper
.
- Fixed a regression introduced in 0.0.38 that would cause Daily transcription to stop the Pipeline.
-
Added
force_reload
,skip_validation
andtrust_repo
toSileroVAD
andSileroVADAnalyzer
. This allows caching and various GitHub repo validations. -
Added
send_initial_empty_metrics
flag toPipelineParams
to request for initial empty metrics (zero values). True by default.
-
Fixed initial metrics format. It was using the wrong keys name/time instead of processor/value.
-
STT services should be using ISO 8601 time format for transcription frames.
-
Fixed an issue that would cause Daily transport to show a stop transcription error when actually none occurred.
-
Added
RTVIProcessor
which implements the RTVI-AI standard. See https://github.com/rtvi-ai -
Added
BotInterruptionFrame
which allows interrupting the bot while talking. -
Added
LLMMessagesAppendFrame
which allows appending messages to the current LLM context. -
Added
LLMMessagesUpdateFrame
which allows changing the LLM context for the one provided in this new frame. -
Added
LLMModelUpdateFrame
which allows updating the LLM model. -
Added
TTSSpeakFrame
which causes the bot say some text. This text will not be part of the LLM context. -
Added
TTSVoiceUpdateFrame
which allows updating the TTS voice.
- We remove the
LLMResponseStartFrame
andLLMResponseEndFrame
frames. These were added in the past to properly handle interruptions for theLLMAssistantContextAggregator
. But theLLMContextAggregator
is now based onLLMResponseAggregator
which handles interruptions properly by just processing theStartInterruptionFrame
, so there's no need for these extra frames any more.
-
Fixed an issue with
StatelessTextTransformer
where it was pushing a string instead of aTextFrame
. -
TTSService
end of sentence detection has been improved. It now works with acronyms, numbers, hours and others. -
Fixed an issue in
TTSService
that would not properly flush the current aggregated sentence if anLLMFullResponseEndFrame
was found.
CartesiaTTSService
now uses websockets which improves speed. It also leverages the new Cartesia contexts which maintains generated audio prosody when multiple inputs are sent, therefore improving audio quality a lot.
-
Added
GladiaSTTService
. See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition -
Added
XTTSService
. This is a local Text-To-Speech service. See https://github.com/coqui-ai/TTS -
Added
UserIdleProcessor
. This processor can be used to wait for any interaction with the user. If the user doesn't say anything within a given timeout a provided callback is called. -
Added
IdleFrameProcessor
. This processor can be used to wait for frames within a given timeout. If no frame is received within the timeout a provided callback is called. -
Added new frame
BotSpeakingFrame
. This frame will be continuously pushed upstream while the bot is talking. -
It is now possible to specify a Silero VAD version when using
SileroVADAnalyzer
orSileroVAD
. -
Added
AysncFrameProcessor
andAsyncAIService
. Some services likeDeepgramSTTService
need to process things asynchronously. For example, audio is sent to Deepgram but transcriptions are not returned immediately. In these cases we still require all frames (except system frames) to be pushed downstream from a single task. That's whatAsyncFrameProcessor
is for. It creates a task and all frames should be pushed from that task. So, whenever a new Deepgram transcription is ready that transcription will also be pushed from this internal task. -
The
MetricsFrame
now includes processing metrics if metrics are enabled. The processing metrics indicate the time a processor needs to generate all its output. Note that not all processors generate these kind of metrics.
-
WhisperSTTService
model can now also be a string. -
Added missing * keyword separators in services.
-
WebsocketServerTransport
doesn't try to send frames anymore if serializers returnsNone
. -
Fixed an issue where exceptions that occurred inside frame processors were being swallowed and not displayed.
-
Fixed an issue in
FastAPIWebsocketTransport
where it would still try to send data to the websocket after being closed.
-
Added Fly.io deployment example in
examples/deployment/flyio-example
. -
Added new
17-detect-user-idle.py
example that shows how to use the newUserIdleProcessor
.
-
FastAPIWebsocketParams
now require a serializer. -
TwilioFrameSerializer
now requires astreamSid
.
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for 8000 sample rate.
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could interruptions to ignore transcriptions.
-
Fixed an issue introduced in 0.0.33 that would cause the LLM to generate shorter output.
- Upgraded to Cartesia's new Python library 1.0.0.
CartesiaTTSService
now expects a voice ID instead of a voice name (you can get the voice ID from Cartesia's playground). You can also specify the audiosample_rate
andencoding
instead of the previousoutput_format
.
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could cause static audio issues and interruptions to not work properly when dealing with multiple LLMs sentences.
-
Fixed an issue that could mix new LLM responses with previous ones when handling interruptions.
-
Fixed a Daily transport blocking situation that occurred while reading audio frames after a participant left the room. Needs daily-python >= 0.10.1.
-
Allow specifying a
DeepgramSTTService
url which allows using on-prem Deepgram. -
Added new
FastAPIWebsocketTransport
. This is a new websocket transport that can be integrated with FastAPI websockets. -
Added new
TwilioFrameSerializer
. This is a new serializer that knows how to serialize and deserialize audio frames from Twilio. -
Added Daily transport event:
on_dialout_answered
. See https://reference-python.daily.co/api_reference.html#daily.EventHandler -
Added new
AzureSTTService
. This allows you to use Azure Speech-To-Text.
- Convert
BaseOutputTransport
andBaseOutputTransport
to fully use asyncio and remove the use of threads.
-
Added
twilio-chatbot
. This is an example that shows how to integrate Twilio phone numbers with a Pipecat bot. -
Updated
07f-interruptible-azure.py
to useAzureLLMService
,AzureSTTService
andAzureTTSService
.
- Break long audio frames into 20ms chunks instead of 10ms.
-
Added
report_only_initial_ttfb
toPipelineParams
. This will make it so only the initial TTFB metrics after the user stops talking are reported. -
Added
OpenPipeLLMService
. This service will let you run OpenAI through OpenPipe's SDK. -
Allow specifying frame processors' name through a new
name
constructor argument. -
Added
DeepgramSTTService
. This service has an ongoing websocket connection. To handle this, it subclassesAIService
instead ofSTTService
. The output of this service will be pushed from the same task, except system frames likeStartFrame
,CancelFrame
orStartInterruptionFrame
.
-
FrameSerializer.deserialize()
can now returnNone
in case it is not possible to desearialize the given data. -
daily_rest.DailyRoomProperties
now allows extra unknown parameters.
-
Fixed an issue where
DailyRoomProperties.exp
always had the same old timestamp unless set by the user. -
Fixed a couple of issues with
WebsocketServerTransport
. It needed to usepush_audio_frame()
and also VAD was not working properly. -
Fixed an issue that would cause LLM aggregator to fail with small
VADParams.stop_secs
values. -
Fixed an issue where
BaseOutputTransport
would send longer audio frames preventing interruptions.
-
Added new
07h-interruptible-openpipe.py
example. This example shows how to use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe. -
Added new
dialin-chatbot
example. This examples shows how to call the bot using a phone number.
-
Added a new
FunctionFilter
. This filter will let you filter frames based on a given function, except system messages which should never be filtered. -
Added
FrameProcessor.can_generate_metrics()
method to indicate if a processor can generate metrics. In the future this might get an extra argument to ask for a specific type of metric. -
Added
BasePipeline
. All pipeline classes should be based on this class. All subclasses should implement aprocessors_with_metrics()
method that returns a list of allFrameProcessor
s in the pipeline that can generate metrics. -
Added
enable_metrics
toPipelineParams
. -
Added
MetricsFrame
. TheMetricsFrame
will report different metrics in the system. Right now, it can report TTFB (Time To First Byte) values for different services, that is the time spent between the arrival of aFrame
to the processor/service until the firstDataFrame
is pushed downstream. If metrics are enabled an intialMetricsFrame
with all the services in the pipeline will be sent. -
Added TTFB metrics and debug logging for TTS services.
- Moved
ParallelTask
topipecat.pipeline.parallel_task
.
- Fixed PlayHT TTS service to work properly async.
- Fixed an issue with
SileroVADAnalyzer
that would cause memory to keep growing indefinitely.
- Added
DailyTransport.participants()
andDailyTransport.participant_counts()
.
-
Added
OpenAITTSService
. -
Allow passing
output_format
andmodel_id
toCartesiaTTSService
to change audio sample format and the model to use. -
Added
DailyRESTHelper
which helps you create Daily rooms and tokens in an easy way. -
PipelineTask
now has ahas_finished()
method to indicate if the task has completed. If a task is never ranhas_finished()
will return False. -
PipelineRunner
now supports SIGTERM. If received, the runner will be canceled.
-
Fixed an issue where
BaseInputTransport
andBaseOutputTransport
where stopping push tasks before pushingEndFrame
frames could cause the bots to get stuck. -
Fixed an error closing local audio transports.
-
Fixed an issue with Deepgram TTS that was introduced in the previous release.
-
Fixed
AnthropicLLMService
interruptions. If an interruption occurred, auser
message could be appended after the previoususer
message. Anthropic does not allow that because it requires alternateuser
andassistant
messages.
-
The
BaseInputTransport
does not pull audio frames from sub-classes any more. Instead, sub-classes now push audio frames into a queue in the base class. Also,DailyInputTransport
now pushes audio frames every 20ms instead of 10ms. -
Remove redundant camera input thread from
DailyInputTransport
. This should improve performance a little bit when processing participant videos. -
Load Cartesia voice on startup.
-
Added WebsocketServerTransport. This will create a websocket server and will read messages coming from a client. The messages are serialized/deserialized with protobufs. See
examples/websocket-server
for a detailed example. -
Added function calling (LLMService.register_function()). This will allow the LLM to call functions you have registered when needed. For example, if you register a function to get the weather in Los Angeles and ask the LLM about the weather in Los Angeles, the LLM will call your function. See https://platform.openai.com/docs/guides/function-calling
-
Added new
LangchainProcessor
. -
Added Cartesia TTS support (https://cartesia.ai/)
-
Fixed SileroVAD frame processor.
-
Fixed an issue where
camera_out_enabled
would cause the highg CPU usage if no image was provided.
- Removed unnecessary audio input tasks.
-
Exposed
on_dialin_ready
for Daily transport SIP endpoint handling. This notifies when the Daily room SIP endpoints are ready. This allows integrating with third-party services like Twilio. -
Exposed Daily transport
on_app_message
event. -
Added Daily transport
on_call_state_updated
event. -
Added Daily transport
start_recording()
,stop_recording
andstop_dialout
.
-
Added
PipelineParams
. This replaces theallow_interruptions
argument inPipelineTask
and will allow future parameters in the future. -
Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
-
GoogleLLMService
api_key
argument is now mandatory.
-
Daily tranport
dialin-ready
doesn't not block anymore and it now handles timeouts. -
Fixed AzureLLMService.
- Fixed an issue handling Daily transport
dialin-ready
event.
-
Added Daily transport
start_dialout()
to be able to make phone or SIP calls. See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout -
Added Daily transport support for dial-in use cases.
-
Added Daily transport events:
on_dialout_connected
,on_dialout_stopped
,on_dialout_error
andon_dialout_warning
. See https://reference-python.daily.co/api_reference.html#daily.EventHandler
-
Added vision support to Anthropic service.
-
Added
WakeCheckFilter
which allows you to pass information downstream only if you say a certain phrase/word.
Filter
has been renamed toFrameFilter
and it's now underprocessors/filters
.
-
Fixed Anthropic service to use new frame types.
-
Fixed an issue in
LLMUserResponseAggregator
andUserResponseAggregator
that would cause frames after a brief pause to not be pushed to the LLM. -
Clear the audio output buffer if we are interrupted.
-
Re-add exponential smoothing after volume calculation. This makes sure the volume value being used doesn't fluctuate so much.
- In order to improve interruptions we now compute a loudness level using pyloudnorm. The audio coming WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm applied to the signal, however we don't do that on our local PyAudio signals. This means that currently incoming audio from PyAudio is kind of broken. We will fix it in future releases.
-
Fixed an issue where
StartInterruptionFrame
would causeLLMUserResponseAggregator
to push the accumulated text causing the LLM respond in the wrong task. TheStartInterruptionFrame
should not trigger any new LLM response because that would be spoken in a different task. -
Fixed an issue where tasks and threads could be paused because the executor didn't have more tasks available. This was causing issues when cancelling and recreating tasks during interruptions.
LLMUserResponseAggregator
andLLMAssistantResponseAggregator
internal messages are now exposed through themessages
property.
- Fixed an issue where
LLMAssistantResponseAggregator
was not accumulating the full response but short sentences instead. If there's an interruption we only accumulate what the bot has spoken until now in a long response as well.
- Fixed an issue in
DailyOuputTransport
where transport messages were not being sent.
-
Added
google.generativeai
model support, including vision. This newgoogle
service defaults to usinggemini-1.5-flash-latest
. Example inexamples/foundational/12a-describe-video-gemini-flash.py
. -
Added vision support to
openai
service. Example inexamples/foundational/12a-describe-video-gemini-flash.py
. -
Added initial interruptions support. The assistant contexts (or aggregators) should now be placed after the output transport. This way, only the completed spoken context is added to the assistant context.
-
Added
VADParams
so you can control voice confidence level and others. -
VADAnalyzer
now uses an exponential smoothed volume to improve speech detection. This is useful when voice confidence is high (because there's someone talking near you) but volume is low.
-
Fixed an issue where TTSService was not pushing TextFrames downstream.
-
Fixed issues with Ctrl-C program termination.
-
Fixed an issue that was causing
StopTaskFrame
to actually not exit thePipelineTask
.
-
DailyTransport
: don't publish camera and audio tracks if not enabled. -
Fixed an issue in
BaseInputTransport
that was causing frames pushed downstream not pushed in the right order.
- Quick hot fix for receiving
DailyTransportMessage
.
-
Added
DailyTransport
eventon_participant_left
. -
Added support for receiving
DailyTransportMessage
.
-
Images are now resized to the size of the output camera. This was causing images not being displayed.
-
Fixed an issue in
DailyTransport
that would not allow the input processor to shutdown if no participant ever joined the room. -
Fixed base transports start and stop. In some situation processors would halt or not shutdown properly.
-
MoondreamService
argumentmodel_id
is nowmodel
. -
VADAnalyzer
arguments have been renamed for more clarity.
-
Fixed an issue with
DailyInputTransport
andDailyOutputTransport
that could cause some threads to not start properly. -
Fixed
STTService
. Addmax_silence_secs
andmax_buffer_secs
to handle better what's being passed to the STT service. Also add exponential smoothing to the RMS. -
Fixed
WhisperSTTService
. Addno_speech_prob
to avoid garbage output text.
- Added
DailyTranscriptionSettings
to be able to specify transcription settings much easier (e.g. language).
-
Updated
simple-chatbot
with Spanish. -
Add missing dependencies in some of the examples.
- Allow stopping pipeline tasks with new
StopTaskFrame
.
- TTS, STT and image generation service now use
AsyncGenerator
.
DailyTransport
: allow registering for participant transcriptions even if input transport is not initialized yet.
- Updated
storytelling-chatbot
.
-
Added Intel GPU support to
MoondreamService
. -
Added support for sending transport messages (e.g. to communicate with an app at the other end of the transport).
-
Added
FrameProcessor.push_error()
to easily send anErrorFrame
upstream.
- Fixed Azure services (TTS and image generation).
- Updated
simple-chatbot
,moondream-chatbot
andtranslation-chatbot
examples.
Many things have changed in this version. Many of the main ideas such as frames, processors, services and transports are still there but some things have changed a bit.
-
Frame
s describe the basic units for processing. For example, text, image or audio frames. Or control frames to indicate a user has started or stopped speaking. -
FrameProcessor
s process frames (e.g. they convert aTextFrame
to anImageRawFrame
) and push new frames downstream or upstream to their linked peers. -
FrameProcessor
s can be linked together. The easiest wait is to use thePipeline
which is a container for processors. Linking processors allow frames to travel upstream or downstream easily. -
Transport
s are a way to send or receive frames. There can be local transports (e.g. local audio or native apps), network transports (e.g. websocket) or service transports (e.g. https://daily.co). -
Pipeline
s are just a processor container for other processors. -
A
PipelineTask
know how to run a pipeline. -
A
PipelineRunner
can run one or more tasks and it is also used, for example, to capture Ctrl-C from the user.
-
Added
FireworksLLMService
. -
Added
InterimTranscriptionFrame
and enable interim results inDailyTransport
transcriptions.
FalImageGenService
now uses newfal_client
package.
-
FalImageGenService
: useasyncio.to_thread
to not block main loop when generating images. -
Allow
TranscriptionFrame
after an end frame (transcriptions can be delayed and received afterUserStoppedSpeakingFrame
).
- Add
use_cpu
argument toMoondreamService
.
-
Added
FalImageGenService.InputParams
. -
Added
URLImageFrame
andUserImageFrame
. -
Added
UserImageRequestFrame
and allow requesting an image from a participant. -
Added base
VisionService
andMoondreamService
-
Don't pass
image_size
toImageGenService
, images should have their own size. -
ImageFrame
now receives a tuple(width,height)
to specify the size. -
on_first_other_participant_joined
now gets a participant argument.
- Check if camera, speaker and microphone are enabled before writing to them.
DailyTransport
only subscribe to desired participant video track.
-
Use
camera_bitrate
andcamera_framerate
. -
Increase
camera_framerate
to 30 by default.
- Fixed
LocalTransport.read_audio_frames
.
- Added project optional dependencies
[silero,openai,...]
.
-
Moved thransports to its own directory.
-
Use
OPENAI_API_KEY
instead ofOPENAI_CHATGPT_API_KEY
.
- Don't write to microphone/speaker if not enabled.
-
Added live translation example.
-
Fix foundational examples.
- Added
storybot
andchatbot
examples.
Initial public release.