Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add "local" source manager #285

Merged
merged 6 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 167 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,183 @@
# 🦙 `llama_deploy` 🤖
# 🦙 Llama Deploy 🤖

`llama_deploy` (formerly `llama-agents`) is an async-first framework for deploying, scaling, and productionizing agentic multi-service systems based on [workflows from `llama_index`](https://docs.llamaindex.ai/en/stable/understanding/workflows/). With `llama_deploy`, you can build any number of workflows in `llama_index` and then bring them into `llama_deploy` for deployment.
Llama Deploy (formerly `llama-agents`) is an async-first framework for deploying, scaling, and productionizing agentic
multi-service systems based on [workflows from `llama_index`](https://docs.llamaindex.ai/en/stable/understanding/workflows/).
With Llama Deploy, you can build any number of workflows in `llama_index` and then run them as services, accessible
through a HTTP API by a user interface or other services part of your system.

In `llama_deploy`, each workflow is seen as a `service`, endlessly processing incoming tasks. Each workflow pulls and publishes messages to and from a `message queue`.
In Llama Deploy each workflow is wrapped in a _Service_ object, endlessly processing incoming requests in form of
_Task_ objects. Each service pulls and publishes messages to and from a _Message Queue_. An internal component called
_Control Plane_ handles ongoing tasks, manages the internal state, keeps track of which services are available, and
decides which service should handle the next step of a task using another internal component called _Orchestrator_.
A well defined set of these components is called _Deployment_, and a single Llama Deploy instance can serve multiple
of them.

At the top of a `llama_deploy` system is the `control plane`. The control plane handles ongoing tasks, manages state, keeps track of which services are in the network, and also decides which service should handle the next step of a task using an `orchestrator`. The default `orchestrator` is purely programmatic, handling failures, retries, and state-passing.
The goal of Llama Deploy is to easily transition something that you built in a notebook to something running on the
cloud with the minimum amount of changes to the original code, possibly zero. In order to make this transition a
pleasant one, the intrinsic complexity of running agents as services is managed by a component called _API Server_,
the only one in Llama Deploy that's user facing. You can interact with the API Server in two ways:

- Using the `llamactl` CLI from a shell.
- Through the _LLama Deploy SDK_ from a Python application or script.

Both the SDK and the CLI are distributed with the Llama Deploy Python package, so batteries are included.

The overall system layout is pictured below.

![A basic system in llama_deploy](./system_diagram.png)

## Why `llama_deploy`?

1. **Seamless Deployment**: It bridges the gap between development and production, allowing you to deploy `llama_index` workflows with minimal changes to your code.
## Why Llama Deploy?

1. **Seamless Deployment**: It bridges the gap between development and production, allowing you to deploy `llama_index`
workflows with minimal changes to your code.
2. **Scalability**: The microservices architecture enables easy scaling of individual components as your system grows.

3. **Flexibility**: By using a hub-and-spoke architecture, you can easily swap out components (like message queues) or add new services without disrupting the entire system.

4. **Fault Tolerance**: With built-in retry mechanisms and failure handling, `llama_deploy` ensures robustness in production environments.

3. **Flexibility**: By using a hub-and-spoke architecture, you can easily swap out components (like message queues) or
add new services without disrupting the entire system.
4. **Fault Tolerance**: With built-in retry mechanisms and failure handling, Llama Deploy adds robustness in
production environments.
5. **State Management**: The control plane manages state across services, simplifying complex multi-step processes.

6. **Async-First**: Designed for high-concurrency scenarios, making it suitable for real-time and high-throughput applications.
6. **Async-First**: Designed for high-concurrency scenarios, making it suitable for real-time and high-throughput
applications.

## Wait, where is `llama-agents`?

The introduction of [Workflows](https://docs.llamaindex.ai/en/stable/module_guides/workflow/#workflows) in `llama_index`produced the most intuitive way to develop agentic applications. The question then became: how can we close the gap between developing an agentic application as a workflow, and deploying it?

With `llama_deploy`, the goal is to make it as 1:1 as possible between something that you built in a notebook, and something running on the cloud in a cluster. `llama_deploy` enables this by simply being able to pass in and deploy any workflow.
The introduction of [Workflows](https://docs.llamaindex.ai/en/stable/module_guides/workflow/#workflows) in `llama_index`
turned out to be the most intuitive way for our users to develop agentic applications. While we keep building more and
more features to support agentic applications into `llama_index`, Llama Deploy focuses on closing the gap between local
development and remote execution of agents as services.

## Installation

`llama_deploy` can be installed with pip, and relies mainly on `llama_index_core`:
`llama_deploy` can be installed with pip, and includes the API Server Python SDK and `llamactl`:

```bash
pip install llama_deploy
```

## Getting Started

### High-Level Deployment
Let's start with deploying a simple workflow on a local instance of Llama Deploy. After installing Llama Deploy, create
a `src` folder add a `workflow.py` file to it containing the following Python code:

```python
import asyncio
from llama_index.core.workflow import Workflow, StartEvent, StopEvent, step


class EchoWorkflow(Workflow):
"""A dummy workflow with only one step sending back the input given."""

@step()
async def run_step(self, ev: StartEvent) -> StopEvent:
message = str(ev.get("message", ""))
return StopEvent(result=f"Message received: {message}")


# `echo_workflow` will be imported by Llama Deploy
echo_workflow = EchoWorkflow()


async def main():
print(await echo_workflow.run(message="Hello!"))


# Make this script runnable from the shell so we can test the workflow execution
if __name__ == "__main__":
asyncio.run(main())
```

Test the workflow runs locally:

```
$ python src/workflow.py
Message received: Hello!
```

`llama_deploy` provides a simple way to deploy your workflows using configuration objects and helper functions.
Time to deploy that workflow! Create a file called `deployment.yml` containing the following YAML code:

When deploying, generally you'll want to deploy the core services and workflows each from their own python scripts (or docker images, etc.).
```yaml
name: QuickStart

Here's how you can deploy a core system and a workflow:
control-plane:
port: 8000

default-service: echo_workflow

services:
echo_workflow:
name: Echo Workflow
# We tell Llama Deploy where to look for our workflow
source:
# In this case, we instruct Llama Deploy to look in the local filesystem
type: local
# The path in the local filesystem where to look. This assumes there's an src folder in the
# current working directory containing the file workflow.py we created previously
name: ./src
# This assumes the file workflow.py contains a variable called `echo_workflow` containing our workflow instance
path: workflow:echo_workflow
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me want llamactl init 😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YAML code above defines the deployment that Llama Deploy will create and run as a service. As you can
see, this deployment has a name, some configuration for the control plane and one service to wrap our workflow. The
service will look for a Python variable named `echo_workflow` in a Python module named `workflow` and run the workflow.

At this point we have all we need to run this deployment. Ideally, we would have the API server already running
somewhere in the cloud, but to get started let's start an instance locally. Run the following python script from a shell:

```
$ python -m llama_deploy.apiserver
INFO: Started server process [10842]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:4501 (Press CTRL+C to quit)
```

From another shell, use `llamactl` to create the deployment:

```
$ llamactl deploy deployment.yml
Deployment successful: QuickStart
```

Our workflow is now part of the `QuickStart` deployment and ready to serve requests! We can use `llamactl` to interact
with this deployment:

```
$ llamactl run --deployment QuickStart --arg message 'Hello from my shell!'
Message received: Hello from my shell!
```

### Run the API server with Docker

Llama Deploy comes with Docker images that can be used to run the API server without effort. In the previous example,
if you have Docker installed, you can replace running the API server locally with `python -m llama_deploy.apiserver`
with:

```
$ docker run -p 4501:4501 -v .:/opt/quickstart -w /opt/quickstart llamaindex/llama-deploy
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:4501 (Press CTRL+C to quit)
```

The API server will be available at `http://localhost:4501` on your host, so `llamactl` will work the same as if you
run `python -m llama_deploy.apiserver`.

## Manual deployment without the API server

Llama Deploy offers different abstraction layers for maximum flexibility. For example, if you don't need the API
server, you can go down one layer and orchestrate the core components on your own. Llama Deploy provides a simple way
to self-manage a deployment using configuration objects and helper functions.

### Deploying the Core System

To deploy the core system (message queue, control plane, and orchestrator), you can use the `deploy_core` function:
> [!NOTE]
> When manually orchestrating a deployment, generally you'll want to deploy the core components and workflows services
> each from their own python scripts (or docker images, etc.).

To manually orchestrate a deployment, the first thing to do is to deploy the core system: message queue, control plane,
and orchestrator. You can use the `deploy_core` function:

```python
from llama_deploy import (
Expand All @@ -73,7 +200,8 @@ if __name__ == "__main__":
asyncio.run(main())
```

This will set up the basic infrastructure for your `llama_deploy` system. You can customize the configs to adjust ports and basic settings, as well as swap in different message queue configs (Redis, Kafka, RabbiMQ, etc.).
This will set up the basic infrastructure for your deployment. You can customize the configs to adjust ports and basic
settings, as well as swap in different message queue configs (Redis, Kafka, RabbiMQ, etc.).

### Deploying a Workflow

Expand Down Expand Up @@ -132,7 +260,7 @@ if __name__ == "__main__":
asyncio.run(main())
```

This will deploy your workflow as a service within the `llama_deploy` system, and register the service with the existing control plane and message queue.
This will deploy your workflow as a service and register it with the existing control plane and message queue.

### Interacting with your Deployment

Expand Down Expand Up @@ -206,12 +334,14 @@ outer = OuterWorkflow()
outer.add_workflows(inner=InnerWorkflow())
```

`llama_deploy` makes it dead simple to spin up each workflow above as a service, and run everything without any changes to your code!
Llama Deploy makes it dead simple to spin up each workflow above as a service, and run everything without any changes
to your code!

Just deploy each workflow:

> [!NOTE]
> This code is launching both workflows from the same script, but these could easily be separate scripts, machines, or docker containers!
> This code is launching both workflows from the same script, but these could easily be separate scripts, machines,
> or docker containers!

```python
import asyncio
Expand Down Expand Up @@ -266,18 +396,10 @@ print(result)
# prints 'hello_world_result_result'
```

## Components of a `llama_deploy` System

In `llama_deploy`, there are several key components that make up the overall system
## Manual deployment using the lower level API

- `message queue` -- the message queue acts as a queue for all services and the `control plane`. It has methods for publishing methods to named queues, and delegates messages to consumers.
- `control plane` -- the control plane is a the central gateway to the `llama_deploy` system. It keeps track of current tasks and the services that are registered to the system. The `control plane` also performs state and session management and utilizes the `orchestrator`.
- `orchestrator` -- The module handles incoming tasks and decides what service to send it to, as well as how to handle results from services. By default, the `orchestrator` is very simple, and assumes incoming tasks have a destination already specified. Beyond that, the default `orchestrator` handles retries, failures, and other nice-to-haves.
- `services` -- Services are where the actual work happens. A services accepts some incoming task and context, processes it, and publishes a result. When you deploy a workflow, it becomes a service.

## Low-Level Deployment

For more control over the deployment process, you can use the lower-level API. Here's what's happening under the hood when you use `deploy_core` and `deploy_workflow`:
For more control over the deployment process, you can use the lower-level API. Here's what's happening under the hood
when you use `deploy_core` and `deploy_workflow`:

### deploy_core

Expand Down Expand Up @@ -410,13 +532,14 @@ This function:
6. Sets up a consumer task for the service
7. Sets up a shutdown handler and keeps the event loop running

## Using the `llama_deploy` client

`llama_deploy` provides both a synchronous and an asynchronous client for interacting with a deployed system.
## Using the Python SDK

Both clients have the same interface, but the asynchronous client is recommended for production use to enable concurrent operations.
Llama Deploy provides access to a deployed system through a synchronous and an asynchronous client. Both clients have
the same interface, but the asynchronous client is recommended for production use to enable concurrent operations.

Generally, there is a top-level client for interacting with the control plane, and a session client for interacting with a specific session. The session client is created automatically for you by the top-level client and returned from specific methods.
Generally, there is a top-level client for interacting with the control plane, and a session client for interacting
with a specific session. The session client is created automatically for you by the top-level client and returned from
specific methods.

To create a client, you need to point it to a control plane.

Expand Down Expand Up @@ -549,7 +672,7 @@ async_client = AsyncLlamaDeployClient(ControlPlaneConfig())
print(result.result)
```

### Message Queue Integrations
## Message Queue Integrations

In addition to `SimpleMessageQueue`, we provide integrations for various
message queue providers, such as RabbitMQ, Redis, etc. The general usage pattern
Expand Down
14 changes: 14 additions & 0 deletions examples/quick_start/quick_start.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: QuickStart

control-plane:
port: 8000

default-service: dummy_workflow

services:
dummy_workflow:
name: Dummy Workflow
source:
type: local
name: src
path: workflow:echo_workflow
24 changes: 24 additions & 0 deletions examples/quick_start/src/workflow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import asyncio

from llama_index.core.workflow import Workflow, StartEvent, StopEvent, step


# create a dummy workflow
class EchoWorkflow(Workflow):
"""A dummy workflow with only one step sending back the input given."""

@step()
async def run_step(self, ev: StartEvent) -> StopEvent:
message = str(ev.get("message", ""))
return StopEvent(result=f"Message received: {message}")


echo_workflow = EchoWorkflow()


async def main():
print(await echo_workflow.run(message="Hello!"))


if __name__ == "__main__":
asyncio.run(main())
1 change: 1 addition & 0 deletions llama_deploy/apiserver/config_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class SourceType(str, Enum):

git = "git"
docker = "docker"
local = "local"


class ServiceSource(BaseModel):
Expand Down
7 changes: 5 additions & 2 deletions llama_deploy/apiserver/deployment.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,13 @@
Service,
MessageQueueConfig,
)
from .source_managers import GitSourceManager
from .source_managers import GitSourceManager, LocalSourceManager, SourceManager


SOURCE_MANAGERS = {SourceType.git: GitSourceManager()}
SOURCE_MANAGERS: dict[SourceType, SourceManager] = {
SourceType.git: GitSourceManager(),
SourceType.local: LocalSourceManager(),
}


class DeploymentError(Exception):
Expand Down
3 changes: 2 additions & 1 deletion llama_deploy/apiserver/source_managers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
from typing import Protocol

from .git import GitSourceManager
from .local import LocalSourceManager

__all__ = ["GitSourceManager"]
__all__ = ["GitSourceManager", "LocalSourceManager"]


class SourceManager(Protocol):
Expand Down
21 changes: 21 additions & 0 deletions llama_deploy/apiserver/source_managers/local.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import shutil


class LocalSourceManager:
"""A SourceManager specialized for sources of type `local`."""

def sync(self, source: str, destination: str | None = None) -> None:
"""Copies the folder with path `source` into a local path `destination`.

Args:
source: The filesystem path to the folder containing the source code.
destination: The path in the local filesystem where to copy the source directory.
"""
if not destination:
raise ValueError("Destination cannot be empty")

try:
shutil.copytree(source, destination, dirs_exist_ok=True)
except shutil.Error as e:
msg = f"Unable to copy {source} into {destination}: {e}"
raise ValueError(msg) from e
Binary file modified system_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading