refactor(ai): apply some small formatting changes (#686)

This commit applies some small formatting changes to cleanup the codebase. This commit simplifies the SAM2 pipeline docker image url.
livepeer · Nov 12, 2024 · 40e3f58 · 40e3f58
1 parent 112b74e
commit 40e3f58
Show file tree

Hide file tree

Showing 6 changed files with 44 additions and 24 deletions.
diff --git a/ai/pipelines/image-to-image.mdx b/ai/pipelines/image-to-image.mdx
@@ -126,7 +126,8 @@ curl -X POST https://<GATEWAY_IP>/image-to-image \
     -F loras='{ "nerijs/pixel-art-xl": 1.2 }'
 ```
 
-You can find a list of available LoRa models for various models on [lora-studio](https://huggingface.co/spaces/enzostvs/lora-studio).
+You can find a list of available LoRa models for various models on
+[lora-studio](https://huggingface.co/spaces/enzostvs/lora-studio).
 
 ## Orchestrator Configuration
 

diff --git a/ai/pipelines/image-to-video.mdx b/ai/pipelines/image-to-video.mdx
@@ -1,5 +1,5 @@
 ---
-title: Image-to-video
+title: Image-to-Video
 ---
 
 ## Overview
@@ -34,7 +34,7 @@ graph LR
 
 The current warm model requested for the `image-to-video` pipeline is:
 
-- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):  
+- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
   An updated version of the stable-video-diffusion-img2vid-xt model with
   enhanced performance
   ([limited-commercial use license](https://stability.ai/license)).
@@ -59,9 +59,9 @@ pipeline:
 
 {/* prettier-ignore */}
 <Accordion title="Tested and Verified Diffusion Models">
-- [stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt):  
+- [stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt):
   A model by Stability AI designed for stable video diffusion from images ([limited-commercial use license](https://stability.ai/license)).
-- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):  
+- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
   An updated version of the stable-video-diffusion-img2vid-xt model with enhanced performance ([limited-commercial use license](https://stability.ai/license)).
 </Accordion>
 

diff --git a/ai/pipelines/segment-anything-2.mdx b/ai/pipelines/segment-anything-2.mdx
@@ -1,5 +1,5 @@
 ---
-title: Segment-anything-2
+title: Segment-Anything-2
 ---
 
 ## Overview
@@ -21,7 +21,7 @@ HuggingFace's
 
 The current warm model requested for the `segment-anything-2` pipeline is:
 
-- [facebook/sam2-hiera-large](https://huggingface.co/facebook/sam2-hiera-large):  
+- [facebook/sam2-hiera-large](https://huggingface.co/facebook/sam2-hiera-large):
   The largest model in the Segment Anything 2 model suite, designed for the most
   accurate image segmentation.
 
@@ -108,7 +108,7 @@ The following system requirements are recommended for optimal performance:
 
 To serve the `segment-anything-2` pipeline, you must use a pipeline specific AI
 Runner container. Pull the required container from
-[Docker Hub](https://hub.docker.com/layers/livepeer/ai-runner/segment-anything-2/images/sha256-b47b04e31907670db673152c38221373e5d749173ed54f932f8d9f8ad5959a33?context=explore)
+[Docker Hub](https://hub.docker.com/r/livepeer/ai-runner/tags?name=segment-anything-2-latest)
 using the following command:
 
 ```bash

diff --git a/ai/pipelines/text-to-image.mdx b/ai/pipelines/text-to-image.mdx
@@ -30,14 +30,14 @@ graph LR
 
 The current warm model requested for the `text-to-image` pipeline is:
 
-- [SG161222/RealVisXL_V4.0_Lightning](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning):  
+- [SG161222/RealVisXL_V4.0_Lightning](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning):
   A streamlined version of RealVisXL_V4.0, designed for faster inference while
   still aiming for photorealism.
 
 Furthermore, several Orchestrators are currently maintaining the following model
 in a ready state:
 
-- [ByteDance/SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning):  
+- [ByteDance/SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning):
   A high-performance diffusion model developed by ByteDance.
 
 <Tip>

diff --git a/ai/pipelines/text-to-speech.mdx b/ai/pipelines/text-to-speech.mdx
@@ -4,17 +4,22 @@ title: Text-to-Speech
 
 ## Overview
 
-The text-to-speech endpoint in Livepeer utilizes [Parler-TTS](https://github.com/huggingface/parler-tts), specifically `parler-tts/parler-tts-large-v1`. This model can generate speech with customizable characteristics such as voice type, speaking style, and audio quality.
+The text-to-speech endpoint in Livepeer utilizes
+[Parler-TTS](https://github.com/huggingface/parler-tts), specifically
+`parler-tts/parler-tts-large-v1`. This model can generate speech with
+customizable characteristics such as voice type, speaking style, and audio
+quality.
 
 ## Basic Usage Instructions
 
 <Tip>
-  For a detailed understanding of the `text-to-speech` endpoint and to experiment
-  with the API, see the [Livepeer AI API
+  For a detailed understanding of the `text-to-speech` endpoint and to
+  experiment with the API, see the [Livepeer AI API
   Reference](/ai/api-reference/text-to-speech).
 </Tip>
 
-To use the text-to-speech feature, submit a POST request to the `/text-to-speech` endpoint. Here's an example of how to structure your request:
+To use the text-to-speech feature, submit a POST request to the
+`/text-to-speech` endpoint. Here's an example of how to structure your request:
 
 ```bash
 curl -X POST "http://<GATEWAY_IP>/text-to-speech" \
@@ -28,29 +33,43 @@ curl -X POST "http://<GATEWAY_IP>/text-to-speech" \
 
 ### Request Parameters
 
-- `model_id`: The ID of the text-to-speech model to use. Currently, this should be set to `"parler-tts/parler-tts-large-v1"`.
+- `model_id`: The ID of the text-to-speech model to use. Currently, this should
+  be set to `"parler-tts/parler-tts-large-v1"`.
 - `text`: The text you want to convert to speech.
-- `description`: A description of the desired voice characteristics. This can include details about the speaker's voice, speaking style, and audio quality.
+- `description`: A description of the desired voice characteristics. This can
+  include details about the speaker's voice, speaking style, and audio quality.
 
 ### Voice Customization
 
-You can customize the generated voice by adjusting the `description` parameter. Some aspects you can control include:
+You can customize the generated voice by adjusting the `description` parameter.
+Some aspects you can control include:
 
 - Speaker identity (e.g., "Jon's voice")
 - Speaking style (e.g., "monotone", "expressive")
 - Speaking speed (e.g., "slightly fast")
 - Audio quality (e.g., "very close recording", "no background noise")
 
-The checkpoint was trained on 34 speakers. The full list of available speakers includes: Laura, Gary, Jon, Lea, Karen, Rick, Brenda, David, Eileen, Jordan, Mike, Yann, Joy, James, Eric, Lauren, Rose, Will, Jason, Aaron, Naomie, Alisa, Patrick, Jerry, Tina, Jenna, Bill, Tom, Carol, Barbara, Rebecca, Anna, Bruce, and Emily.
+The checkpoint was trained on 34 speakers. The full list of available speakers
+includes: Laura, Gary, Jon, Lea, Karen, Rick, Brenda, David, Eileen, Jordan,
+Mike, Yann, Joy, James, Eric, Lauren, Rose, Will, Jason, Aaron, Naomie, Alisa,
+Patrick, Jerry, Tina, Jenna, Bill, Tom, Carol, Barbara, Rebecca, Anna, Bruce,
+and Emily.
 
-However, the models performed better with certain speakers.  A list of the top 20 speakers for each model variant, ranked by their average speaker similarity scores can be found [here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency)
+However, the models performed better with certain speakers. A list of the top 20
+speakers for each model variant, ranked by their average speaker similarity
+scores can be found
+[here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency)
 
 ## Limitations and Considerations
 
-- The maximum length of the input text may be limited. For long-form content, you will need to split your text into smaller chunks. The training default configuration in parler-tts is max 30sec, max text length 600 characters.
-https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training
-- While the model supports various voice characteristics, the exact replication of a specific speaker's voice is not guaranteed.
-- The quality of the generated speech can vary based on the complexity of the input text and the specificity of the voice description.
+- The maximum length of the input text may be limited. For long-form content,
+  you will need to split your text into smaller chunks. The training default
+  configuration in parler-tts is max 30sec, max text length 600 characters.
+  https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training
+- While the model supports various voice characteristics, the exact replication
+  of a specific speaker's voice is not guaranteed.
+- The quality of the generated speech can vary based on the complexity of the
+  input text and the specificity of the voice description.
 
 ## Orchestrator Configuration
 

diff --git a/ai/pipelines/upscale.mdx b/ai/pipelines/upscale.mdx
@@ -30,7 +30,7 @@ graph LR
 
 The current warm model requested for the `upscale` pipeline is:
 
-- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler):  
+- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler):
   A text-guided upscaling diffusion model trained on large LAION images,
   offering enhanced resolution and controlled noise addition.