From 41c1f6cb4f4cbec0d79da36092d629c378c0f6c1 Mon Sep 17 00:00:00 2001 From: n-Arno Date: Sat, 2 Nov 2024 21:38:08 +0000 Subject: [PATCH] docs: Update documentation for text-to-audio feature regarding response_format Signed-off-by: n-Arno --- docs/content/docs/features/text-to-audio.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/content/docs/features/text-to-audio.md b/docs/content/docs/features/text-to-audio.md index 0e82f7f07ba8..3c650ad179c6 100644 --- a/docs/content/docs/features/text-to-audio.md +++ b/docs/content/docs/features/text-to-audio.md @@ -201,3 +201,21 @@ curl -L http://localhost:8080/tts \ "input": "Bonjour, je suis Ana Florence. Comment puis-je vous aider?" }' | aplay ``` + +## Response format + +To provide some compatibility with OpenAI API regarding `response_format`, ffmpeg must be installed (or a docker image including ffmpeg used) to leverage converting the generated wav file before the api provide its response. + +Warning regarding a cChange in behaviour. Before this addition, the parameter was ignored and a wav file was always returned, with potential codec errors later in the integration (like trying to decode a mp3 file from a wav, which is the default format used by OpenAI) + +Supported format thanks to ffmpeg are `wav`, `mp3`, `aac`, `flac`, `opus`, defaulting to `wav` if an unknown or no format is provided. + +```bash +curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{ + "input": "Hello world", + "model": "tts", + "response_format": "mp3" +}' +``` + +If a `response_format` is added in the query (other than `wav`) and ffmpeg is not available, the call will fail.