Skip to content

Files

Latest commit

 

History

History

ultravox

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Ultravox vLLM Truss

This is a Truss for Ultravox using the vLLM OpenAI Compatible server. This Truss is designed to provide an efficient and scalable way to serve Ultravox and other models in an OpenAI compatible way using vLLM.

OpenAI Bridge Compatibility

This Truss is compatible with a custom version of our bridge endpoint for OpenAI ChatCompletion users. This means you can easily integrate this model into your existing applications that use the OpenAI API format.

client = OpenAI(
    api_key=os.environ["BASETEN_API_KEY"],
    base_url=f"https://bridge.baseten.co/{model_id}/direct/v1"
)

Truss

Truss is an open-source model serving framework developed by Baseten. It allows you to develop and deploy machine learning models onto Baseten (and other platforms like AWS or GCP). Using Truss, you can develop a GPU model using live-reload, package models and their associated code, create Docker containers, and deploy on Baseten.

Deployment

First, clone this repository:

git clone https://github.com/basetenlabs/truss-examples.git
cd ultravox

Before deployment:

  1. Make sure you have a Baseten account and API key.
  2. Install the latest version of Truss: pip install --upgrade truss

With ultravox as your working directory, you can deploy the model with:

truss push

Paste your Baseten API key if prompted.

For more information, see Truss documentation.

vLLM OpenAI Compatible Server

This Truss demonstrates how to start vLLM's OpenAI compatible server. The Truss is primarily used to start the server and then route requests to it. It currently supports ChatCompletions only.

Passing startup arguments to the server

In the config any key-values under model_metadata: arguments: will be passed to the vLLM OpenAI-compatible server at startup.

Base Image

You can use any vLLM compatible base image.

API Documentation

The API follows the OpenAI ChatCompletion format. You can interact with the model using the standard ChatCompletion interface.

Example usage:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR-API-KEY",
    base_url="https://bridge.baseten.co/MODEL-ID/v1"
)

response = client.chat.completions.create(
    model="fixie-ai/ultravox-v0.2",
    messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize the following: <|audio|>"},
                {"type": "image_url", "image_url": {"url": f"data:audio/wav;base64,{base64_wav}"}}
            ]
        }]
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta)

Future Improvements

We are actively working on enhancing this Truss. Some planned improvements include:

Stay tuned for updates!

Support

If you have any questions or need assistance, please open an issue in this repository or contact our support team.