Skip to content

Commit

Permalink
refactor(ai): apply some small formatting changes (#686)
Browse files Browse the repository at this point in the history
This commit applies some small formatting changes to cleanup the
codebase. This commit simplifies the SAM2 pipeline docker image url.
  • Loading branch information
rickstaa authored Nov 12, 2024
1 parent 112b74e commit 40e3f58
Show file tree
Hide file tree
Showing 6 changed files with 44 additions and 24 deletions.
3 changes: 2 additions & 1 deletion ai/pipelines/image-to-image.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,8 @@ curl -X POST https://<GATEWAY_IP>/image-to-image \
-F loras='{ "nerijs/pixel-art-xl": 1.2 }'
```

You can find a list of available LoRa models for various models on [lora-studio](https://huggingface.co/spaces/enzostvs/lora-studio).
You can find a list of available LoRa models for various models on
[lora-studio](https://huggingface.co/spaces/enzostvs/lora-studio).

## Orchestrator Configuration

Expand Down
8 changes: 4 additions & 4 deletions ai/pipelines/image-to-video.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Image-to-video
title: Image-to-Video
---

## Overview
Expand Down Expand Up @@ -34,7 +34,7 @@ graph LR

The current warm model requested for the `image-to-video` pipeline is:

- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
An updated version of the stable-video-diffusion-img2vid-xt model with
enhanced performance
([limited-commercial use license](https://stability.ai/license)).
Expand All @@ -59,9 +59,9 @@ pipeline:

{/* prettier-ignore */}
<Accordion title="Tested and Verified Diffusion Models">
- [stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt):
- [stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt):
A model by Stability AI designed for stable video diffusion from images ([limited-commercial use license](https://stability.ai/license)).
- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
An updated version of the stable-video-diffusion-img2vid-xt model with enhanced performance ([limited-commercial use license](https://stability.ai/license)).
</Accordion>

Expand Down
6 changes: 3 additions & 3 deletions ai/pipelines/segment-anything-2.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Segment-anything-2
title: Segment-Anything-2
---

## Overview
Expand All @@ -21,7 +21,7 @@ HuggingFace's

The current warm model requested for the `segment-anything-2` pipeline is:

- [facebook/sam2-hiera-large](https://huggingface.co/facebook/sam2-hiera-large):
- [facebook/sam2-hiera-large](https://huggingface.co/facebook/sam2-hiera-large):
The largest model in the Segment Anything 2 model suite, designed for the most
accurate image segmentation.

Expand Down Expand Up @@ -108,7 +108,7 @@ The following system requirements are recommended for optimal performance:

To serve the `segment-anything-2` pipeline, you must use a pipeline specific AI
Runner container. Pull the required container from
[Docker Hub](https://hub.docker.com/layers/livepeer/ai-runner/segment-anything-2/images/sha256-b47b04e31907670db673152c38221373e5d749173ed54f932f8d9f8ad5959a33?context=explore)
[Docker Hub](https://hub.docker.com/r/livepeer/ai-runner/tags?name=segment-anything-2-latest)
using the following command:

```bash
Expand Down
4 changes: 2 additions & 2 deletions ai/pipelines/text-to-image.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,14 @@ graph LR

The current warm model requested for the `text-to-image` pipeline is:

- [SG161222/RealVisXL_V4.0_Lightning](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning):
- [SG161222/RealVisXL_V4.0_Lightning](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning):
A streamlined version of RealVisXL_V4.0, designed for faster inference while
still aiming for photorealism.

Furthermore, several Orchestrators are currently maintaining the following model
in a ready state:

- [ByteDance/SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning):
- [ByteDance/SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning):
A high-performance diffusion model developed by ByteDance.

<Tip>
Expand Down
45 changes: 32 additions & 13 deletions ai/pipelines/text-to-speech.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,22 @@ title: Text-to-Speech

## Overview

The text-to-speech endpoint in Livepeer utilizes [Parler-TTS](https://github.com/huggingface/parler-tts), specifically `parler-tts/parler-tts-large-v1`. This model can generate speech with customizable characteristics such as voice type, speaking style, and audio quality.
The text-to-speech endpoint in Livepeer utilizes
[Parler-TTS](https://github.com/huggingface/parler-tts), specifically
`parler-tts/parler-tts-large-v1`. This model can generate speech with
customizable characteristics such as voice type, speaking style, and audio
quality.

## Basic Usage Instructions

<Tip>
For a detailed understanding of the `text-to-speech` endpoint and to experiment
with the API, see the [Livepeer AI API
For a detailed understanding of the `text-to-speech` endpoint and to
experiment with the API, see the [Livepeer AI API
Reference](/ai/api-reference/text-to-speech).
</Tip>

To use the text-to-speech feature, submit a POST request to the `/text-to-speech` endpoint. Here's an example of how to structure your request:
To use the text-to-speech feature, submit a POST request to the
`/text-to-speech` endpoint. Here's an example of how to structure your request:

```bash
curl -X POST "http://<GATEWAY_IP>/text-to-speech" \
Expand All @@ -28,29 +33,43 @@ curl -X POST "http://<GATEWAY_IP>/text-to-speech" \

### Request Parameters

- `model_id`: The ID of the text-to-speech model to use. Currently, this should be set to `"parler-tts/parler-tts-large-v1"`.
- `model_id`: The ID of the text-to-speech model to use. Currently, this should
be set to `"parler-tts/parler-tts-large-v1"`.
- `text`: The text you want to convert to speech.
- `description`: A description of the desired voice characteristics. This can include details about the speaker's voice, speaking style, and audio quality.
- `description`: A description of the desired voice characteristics. This can
include details about the speaker's voice, speaking style, and audio quality.

### Voice Customization

You can customize the generated voice by adjusting the `description` parameter. Some aspects you can control include:
You can customize the generated voice by adjusting the `description` parameter.
Some aspects you can control include:

- Speaker identity (e.g., "Jon's voice")
- Speaking style (e.g., "monotone", "expressive")
- Speaking speed (e.g., "slightly fast")
- Audio quality (e.g., "very close recording", "no background noise")

The checkpoint was trained on 34 speakers. The full list of available speakers includes: Laura, Gary, Jon, Lea, Karen, Rick, Brenda, David, Eileen, Jordan, Mike, Yann, Joy, James, Eric, Lauren, Rose, Will, Jason, Aaron, Naomie, Alisa, Patrick, Jerry, Tina, Jenna, Bill, Tom, Carol, Barbara, Rebecca, Anna, Bruce, and Emily.
The checkpoint was trained on 34 speakers. The full list of available speakers
includes: Laura, Gary, Jon, Lea, Karen, Rick, Brenda, David, Eileen, Jordan,
Mike, Yann, Joy, James, Eric, Lauren, Rose, Will, Jason, Aaron, Naomie, Alisa,
Patrick, Jerry, Tina, Jenna, Bill, Tom, Carol, Barbara, Rebecca, Anna, Bruce,
and Emily.

However, the models performed better with certain speakers. A list of the top 20 speakers for each model variant, ranked by their average speaker similarity scores can be found [here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency)
However, the models performed better with certain speakers. A list of the top 20
speakers for each model variant, ranked by their average speaker similarity
scores can be found
[here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency)

## Limitations and Considerations

- The maximum length of the input text may be limited. For long-form content, you will need to split your text into smaller chunks. The training default configuration in parler-tts is max 30sec, max text length 600 characters.
https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training
- While the model supports various voice characteristics, the exact replication of a specific speaker's voice is not guaranteed.
- The quality of the generated speech can vary based on the complexity of the input text and the specificity of the voice description.
- The maximum length of the input text may be limited. For long-form content,
you will need to split your text into smaller chunks. The training default
configuration in parler-tts is max 30sec, max text length 600 characters.
https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training
- While the model supports various voice characteristics, the exact replication
of a specific speaker's voice is not guaranteed.
- The quality of the generated speech can vary based on the complexity of the
input text and the specificity of the voice description.

## Orchestrator Configuration

Expand Down
2 changes: 1 addition & 1 deletion ai/pipelines/upscale.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ graph LR

The current warm model requested for the `upscale` pipeline is:

- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler):
- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler):
A text-guided upscaling diffusion model trained on large LAION images,
offering enhanced resolution and controlled noise addition.

Expand Down

0 comments on commit 40e3f58

Please sign in to comment.