feat(tts): Implement naive response_format for tts endpoint #4035

n-Arno · 2024-11-02T11:40:45Z

Description

This PR fixes #2732

Notes for Reviewers

This is a naive implementation as a starting point / workaround. I coded and use it for Livekit Agent integration, since the default mp3 format is expected from OpenAI plugin. It leverage ffmpeg for conversion of the generated wav file at endpoint level, not backend level.

It is neither the best nor the prettiest but since it works, i contribute :D

netlify · 2024-11-02T11:41:06Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`6b5dbfd`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/67265138ae70380008d8bb31
😎 Deploy Preview	https://deploy-preview-4035--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Signed-off-by: n-Arno <[email protected]>

dave-gray101 · 2024-11-02T16:41:48Z

One potential issue with this:

For license reasons, not all of our images include ffmpeg.

We'll need to verify the error handling works on systems and return an error in that situation?

To be more clear: your function already -has- error handling, I think we just need to make sure the default option is to -not- format, and add some documentation that this option requires ffmpeg

n-Arno · 2024-11-02T16:57:11Z

Indeed, i didn't consider the "problem" with ffmpeg licencing since i am rebuilding the image with --build-arg FFMPEG=true and it was done to leverage LocalAI (which is altogether VERY COOL) for an internal demo.

I'll add a quick note and check (like i said, it's a very naive implementation, not something rock solid yet)

mudler

Looking good here, thanks @n-Arno !

mudler · 2024-11-02T17:40:49Z

To be more clear: your function already -has- error handling, I think we just need to make sure the default option is to -not- format, and add some documentation that this option requires ffmpeg

I think it does already, no ?

https://github.com/mudler/LocalAI/pull/4035/files#diff-09ebe993ca661f1de77ed47e54d46755a5189bde28f94f70f52a095b244c904eR38

If no format is specified, wav is implied, which in turn does skip calling ffmpeg completely

n-Arno · 2024-11-02T18:00:41Z

Indeed, if no format is given, wav is used and ffmpeg is not called. I think the idea was to avoid a failure due to its absence if a format is specified.

I am adding a "simple" function like this to test if ffmpeg is ok:

// FFmpegReady tests if ffmpeg is available by trying to print help
func FFmpegReady() bool {
        commandArgs := []string{"-h"}
        _, err := ffmpegCommand(commandArgs)
        return (err == nil)
}

Once the build is done, i'll do a quick test ok and i'll commit this "security" (if i figure how to squash two commits with a merge from master in between :D)

mudler · 2024-11-02T18:03:29Z

Indeed, if no format is given, wav is used and ffmpeg is not called. I think the idea was to avoid a failure due to its absence if a format is specified.

gotcha, yes in this case we should error out in a sane way so the user is aware of the image limitation (no ffmpeg present).

Can be done in a follow-up tho, if tests are passing I'd merge it as is, unless you want to improve it with the error propagation in this PR.

n-Arno · 2024-11-02T18:14:40Z

As-Is, the error is propagated correctly, so a merge is possible. It does not fallback silently, but maybe that's is not a desired behaviour.

n-Arno changed the title ~~Implement naive response_format for tts endpoint~~ feat: Implement naive response_format for tts endpoint Nov 2, 2024

feat(tts): Implement naive response_format for tts endpoint

0309351

Signed-off-by: n-Arno <[email protected]>

n-Arno force-pushed the tts-response-format branch from 566fa5b to 0309351 Compare November 2, 2024 16:19

Merge branch 'mudler:master' into tts-response-format

6b5dbfd

n-Arno changed the title ~~feat: Implement naive response_format for tts endpoint~~ feat(tts): Implement naive response_format for tts endpoint Nov 2, 2024

mudler added the enhancement New feature or request label Nov 2, 2024

mudler approved these changes Nov 2, 2024

View reviewed changes

mudler enabled auto-merge (squash) November 2, 2024 17:37

mudler merged commit 65c3df3 into mudler:master Nov 2, 2024
30 checks passed

BrewTestBot mentioned this pull request Nov 10, 2024

localai 2.23.0 Homebrew/homebrew-core#197255

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): Implement naive response_format for tts endpoint #4035

feat(tts): Implement naive response_format for tts endpoint #4035

n-Arno commented Nov 2, 2024

netlify bot commented Nov 2, 2024 •

edited

Loading

dave-gray101 commented Nov 2, 2024 •

edited

Loading

n-Arno commented Nov 2, 2024

mudler left a comment

mudler commented Nov 2, 2024

n-Arno commented Nov 2, 2024

mudler commented Nov 2, 2024

n-Arno commented Nov 2, 2024

feat(tts): Implement naive response_format for tts endpoint #4035

feat(tts): Implement naive response_format for tts endpoint #4035

Conversation

n-Arno commented Nov 2, 2024

netlify bot commented Nov 2, 2024 • edited Loading

✅ Deploy Preview for localai ready!

dave-gray101 commented Nov 2, 2024 • edited Loading

n-Arno commented Nov 2, 2024

mudler left a comment

Choose a reason for hiding this comment

mudler commented Nov 2, 2024

n-Arno commented Nov 2, 2024

mudler commented Nov 2, 2024

n-Arno commented Nov 2, 2024

netlify bot commented Nov 2, 2024 •

edited

Loading

dave-gray101 commented Nov 2, 2024 •

edited

Loading