[EPD-333] ElevenLabs Threaded MP3 Streaming #316

zaptrem · 2023-07-28T22:23:17Z

This PR adds the ability to stream audio from the ElevenLabs synthesizer API. It uses a threaded worker to convert the mp3 chunks from the API to wav chunks that can be consumed by the vocode streaming pipeline.

ThreadAsyncWorker doesn't terminate properly

…venlabs

linear · 2023-07-28T22:28:46Z

EPD-333 Support ElevenLabs Streaming

This may (significantly?) improve latency

https://docs.elevenlabs.io/api-reference/text-to-speech-stream

ajar98

the client session PR is merged too, so you can terminate the worker in synthesizer.tear_down()

ajar98 · 2023-07-29T01:08:57Z

playground/streaming/benchmark.py

@@ -94,7 +94,7 @@


 # These synthesizers stream output so they need to be traced within this file.
-STREAMING_SYNTHESIZERS = ["azure"]
+STREAMING_SYNTHESIZERS = ["azure", "elevenlabs"]


ajar98 · 2023-07-29T01:09:39Z

tests/synthesizer/conftest.py

@@ -38,23 +40,23 @@ def mock_eleven_labs_api():


 @pytest.fixture(scope="module")
-def eleven_labs_synthesizer_with_api_key():
+async def eleven_labs_synthesizer_with_api_key():


nit: can we rename all of these to _fixture - then it makes more sense that we'd want to await it

ajar98 · 2023-07-29T01:10:14Z

vocode/streaming/utils/worker.py

@@ -249,3 +255,37 @@ class InterruptibleAgentResponseWorker(
    InterruptibleWorker[InterruptibleAgentResponseEvent]
 ):
    pass
+
+class PydubWorker(ThreadAsyncWorker):


seems like this would be better inside of the synthesizer directory

ajar98 · 2023-07-29T01:11:58Z

vocode/streaming/utils/worker.py

@@ -84,7 +90,7 @@ def _run_loop(self):
    def terminate(self):
        return super().terminate()

-
+# ThreadedAsyncWorker with a run loop that exposes something


unclear comment

ajar98 · 2023-07-29T01:12:53Z

vocode/streaming/synthesizer/eleven_labs_synthesizer.py

@@ -44,6 +47,14 @@ def __init__(
        self.optimize_streaming_latency = synthesizer_config.optimize_streaming_latency
        self.words_per_minute = 150

+        # Create a PydubWorker instance as an attribute
+        self.pydub_worker = PydubWorker(


if we're breaking on is_last in the worker, shouldn't we make a new PydubWorker on each create_speech call?

ajar98 · 2023-07-29T01:13:20Z

vocode/streaming/synthesizer/eleven_labs_synthesizer.py

+            synthesizer_config, asyncio.Queue(), asyncio.Queue()
+        )
+        # Start the PydubWorker and store the task
+        self.pydub_worker_task = self.pydub_worker.start()


shouldn't need to store this task - worker.terminate should kill it for you

ajar98 · 2023-07-29T01:13:41Z

vocode/streaming/synthesizer/eleven_labs_synthesizer.py

-
-                return result
+
+        session = aiohttp.ClientSession()


can use self.aiohttp_session once my PR is done

ajar98 · 2023-07-31T18:12:15Z

vocode/streaming/synthesizer/eleven_labs_synthesizer.py

+
+        return SynthesisResult(
+            output_generator(response, session),  # should be wav
+            lambda _: "",  # useless for now


I think we have some estimation code for this: https://github.com/vocodedev/vocode-python/blob/4ce40fcfae6543d904f9b78b81f32aa77b3d9df0/vocode/streaming/synthesizer/base_synthesizer.py#L182-L188

let's use this for now and set some reasonable WPM

ajar98 · 2023-08-01T22:51:50Z

vocode/streaming/synthesizer/MiniaudioWorker.py

@@ -0,0 +1,41 @@
+import miniaudio


nit filename (lowercase miniaudio)

vocode/streaming/synthesizer/eleven_labs_synthesizer.py

ajar98 · 2023-08-01T22:54:54Z

vocode/streaming/utils/mp3_helper.py

+    mp3_chunk = io.BytesIO(mp3_bytes)
+
+    # Convert it to a wav chunk using miniaudio
+    wav_chunk = miniaudio.decode(mp3_chunk.read(), nchannels=1)


how come we need to make an mp3_chunk in memory file if we're just going to read from it?

ajar98 · 2023-08-01T22:55:37Z

vocode/streaming/models/synthesizer.py

@@ -104,9 +104,10 @@ class ElevenLabsSynthesizerConfig(
 ):
    api_key: Optional[str] = None
    voice_id: Optional[str] = ELEVEN_LABS_ADAM_VOICE_ID
+    optimize_streaming_latency: Optional[int] = 3


why are we adding a default value here?

This reverts commit fdd37d7.

* initial work, still blocking * Add threaded mp3 worker * fix tests add todo * use miniaudio worker and fix sync issues * attempt to handle error from short chunks? * make streaming optional & refac miniaudio decoding * teardown the experimental worker * fix tests and mypy * resolve comments * use sentinel to fix /stream endpoint * potentially fix typing * Revert "potentially fix typing" This reverts commit fdd37d7. * forgot about __future__ * fix termination code --------- Co-authored-by: Ajay Raj <[email protected]>

damzalla · 2024-04-11T22:49:47Z

how many latency do you win ?

zaptrem added 3 commits July 27, 2023 11:03

initial work, still blocking

066ac13

Merge remote-tracking branch 'origin/main' into zaptrem/streaming-ele…

cb4ae0e

…venlabs

Add threaded mp3 worker

2f20e5b

zaptrem requested a review from ajar98 July 28, 2023 22:23

zaptrem changed the title ~~ElevenLabs Threaded MP3 Streaming~~ [EPD-333] ElevenLabs Threaded MP3 Streaming Jul 28, 2023

fix tests add todo

e45f657

ajar98 requested changes Jul 31, 2023

View reviewed changes

zaptrem added 6 commits July 31, 2023 12:32

merge main

11d2edf

use miniaudio worker and fix sync issues

1521afa

attempt to handle error from short chunks?

82af895

make streaming optional & refac miniaudio decoding

5a1413e

teardown the experimental worker

098bc8c

fix tests and mypy

639034c

zaptrem requested a review from ajar98 August 1, 2023 22:23

ajar98 requested changes Aug 1, 2023

View reviewed changes

ajar98 added 7 commits August 1, 2023 16:14

resolve comments

364b850

use sentinel to fix /stream endpoint

f18b28e

potentially fix typing

fdd37d7

Revert "potentially fix typing"

fbf9200

This reverts commit fdd37d7.

forgot about __future__

27577cb

Merge branch 'main' into zaptrem/streaming-elevenlabs

f93bb12

fix termination code

2b2cfc4

ajar98 merged commit d078d19 into main Aug 2, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPD-333] ElevenLabs Threaded MP3 Streaming #316

[EPD-333] ElevenLabs Threaded MP3 Streaming #316

zaptrem commented Jul 28, 2023 •

edited by ajar98

Loading

linear bot commented Jul 28, 2023

ajar98 left a comment

ajar98 Jul 29, 2023

ajar98 Jul 29, 2023

ajar98 Jul 29, 2023

ajar98 Jul 29, 2023

ajar98 Jul 29, 2023

ajar98 Jul 29, 2023

ajar98 Jul 29, 2023

ajar98 Jul 31, 2023

ajar98 Aug 1, 2023

ajar98 Aug 1, 2023

ajar98 Aug 1, 2023

damzalla commented Apr 11, 2024

[EPD-333] ElevenLabs Threaded MP3 Streaming #316

[EPD-333] ElevenLabs Threaded MP3 Streaming #316

Conversation

zaptrem commented Jul 28, 2023 • edited by ajar98 Loading

linear bot commented Jul 28, 2023

ajar98 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damzalla commented Apr 11, 2024

zaptrem commented Jul 28, 2023 •

edited by ajar98

Loading