Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Managed Identity ACI Prefect Worker recipe #238

Closed
wants to merge 14 commits into from
6 changes: 6 additions & 0 deletions azure/prefect-2/prefect-worker-on-aci/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM prefecthq/prefect:2-python3.9

COPY requirements.txt /opt/prefect/flows/
COPY retrieve_secrets.py /opt/prefect/

RUN pip install -r /opt/prefect/flows/requirements.txt
262 changes: 262 additions & 0 deletions azure/prefect-2/prefect-worker-on-aci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
# Azure Container Infrastructure with Managed Identity

## Introduction

The end goal of this configuration is to have a Prefect Worker running as an Azure Container Group.

Prefect flows will be created as new container groups, and upon creation, have an assigned identity attached to the container group.
This assigned identity has the minimal permissions required to retrieve secrets from a created Azure Keyvault - for this purpose, it is to obtain a BitBucket Access token in a secure manner.

> :warning: The purpose of this tutorial is for a private repository, while prefect-recipes is public. If you wish to follow along with the tutorial as it is written, fork the repo to a private repository. This example is written with BitBucket in mind, although all SCM tools can be used, with the only distinction being the private token format.

Once a new container group is running, and the token has been successfully retrieved, the configured SCM repository will be cloned into the container with the deployed Prefect flow.

## Overview

This document will provide step-by-step instructions to setup, configure, and run Prefect flows through ACI, using Managed Identity. Some steps are order dependent (e.g. creating a managed identity before permissions can be assigned), while others are not.


The following example is premised on a resource group existing, named `aci-prefect-agent`.

Steps that will be covered:

- Create and Push a Docker Image
- Creating an Azure Keyvault
- Creating a User-Assigned Managed Identity
- Assigning Permissions to the Managed Identity
- Creating a Docker Image with requisite packages for Prefect flows
- Creating an ACI Container Group for the Prefect ACI Worker
- Configuring an ACI Work-Pool
- Deploying a Prefect Flow



### Start from a clean directory
```bash
mkdir ~/aci_identity && cd ~/aci_identity
```

### Clone down the repository :
```bash
git clone https://bitbucket.org/sopkin/azure-deployments.git
```

### Change to the right directory:
```bash
cd azure-deployments/part_2/aci
```

### Create a Docker Image
This can be a public or private image, which will be relevant when deploying the worker. For the sake of the demonstration, this will be kept to a generic private repository to follow along in production.
```bash
export image_tag="chaboy/prefect-aci-worker:0.2.10"
docker build --platform linux/amd64 -t $image_tag .
docker push $image_tag
```

### Create a User-Assigned Identity:

```bash
# This creates a user identity with no permissions by the name of myaciid
az identity create \
--resource-group aci-prefect-agent \
--name myaciid
```

### Retrieve the resource ID and principal ID of the identity:
```bash
# Get service principal ID of the user-assigned identity
# Appears like 1234ef-818e-4186-b441-88e239941234
SP_ID=$(az identity show \
--resource-group aci-prefect-agent \
--name myaciid \
--query principalId --output tsv)

# Get resource ID of the user-assigned identity
# This will be necessary for adding to the work-pool
# Looks like: "/subscriptions/<subscription>/resourcegroups/aci-prefect-agent/providers/Microsoft.ManagedIdentity/userAssignedIdentities/myaciid"
RESOURCE_ID=$(az identity show \
--resource-group aci-prefect-agent \
--name myaciid \
--query id --output tsv)
```

### Create an Azure KeyVault:
```bash
# Azure Keyvaults must be globally unique, 3-24 characters and alphanumeric only
az keyvault create \
--name prefectkeyvault1234 \
--resource-group aci-prefect-agent \
--location eastus
```

### Add a Secret to the Vault:
```bash
# A bitbucket PAT has the form: <user>:<access Token>
# A bitbucket repo can have the form: x-token-auth:<access token>
az keyvault secret set \
--name bucketaccess \
--value "user:accesstoken" \
--description BitBucket --vault-name prefectkeyvault1234
```

### Give the user identity permissions to retrieve secrets:
```bash
# SP_ID is the service principal of the user identity from a previous step
az keyvault set-policy \
--name prefectkeyvault1234 \
--resource-group aci-prefect-agent \
--object-id $SP_ID \
--secret-permissions get
```

### Create a Prefect Work-Pool using Prefect CLI or UI
```bash
# This will create a work-pool named aci-test of type azure-container-instance
prefect work-pool create -t azure-container-instance aci-test
```

### Update the Worker Pool
- (Required) Update the Subscription ID
- (Required) Update the Resource Group Name, otherwise there will be insufficient scope to execute over in Azure.
- (Required) Attach an Azure Credentials Block for permissions to provision containers
- (Required) Attach the User-Assigned Identity (from the $RESOURCE_ID step)
- (Optional) If it's a Private Image, attach a Docker Registry Credentials Block. This step can utilize an Identity in lieu of credentials IF using an Azure Container Registry.


### Option 1 - Deploy via CLI
```bash
# ACI Worker Requires prefect-azure package
# Setup variables for az container create
export PREFECT_API_URL=<API URL, no quotations>
export PREFECT_PAI_KEY=<API key, no quotations>
export REGISTRY_PASSWORD=<Private REPO Password>
export REGISTRY_USERNAME=<Private Registry User>


az container create \
--resource-group aci-prefect-agent \
--name aci-prefect-worker \
--image index.docker.io/chaboy/prefect-aci-worker:0.2.10 \
--environment-variables PREFECT_API_ENABLE_HTTP2=False \
--secure-environment-variables PREFECT_API_URL=$PREFECT_API_URL PREFECT_API_KEY=$PREFECT_API_KEY \
--registry-login-server 'index.docker.io' \
--registry-password $REGISTRY_PASSWORD \
--registry-username $REGISTRY_USERNAME \
--command-line "/bin/bash -c 'prefect worker start --pool aci-test --type azure-container-instance'"
```

### Option 2 - Deploy via .yaml
```yaml
az container create --resource-group aci-prefect-agent -f container.yaml
# container.yaml included in directory - note, you'll need to provide the actual API key and url, index registry username/password, and if required a subnet id.
#https://learn.microsoft.com/en-us/azure/container-instances/container-instances-custom-dns
```


## Deploy Code
- Create a Deployment

### Init a Deployment

Reference for more info [here](https://docs.prefect.io/2.10.17/concepts/deployments-ux/).

From the root of the cloned repository (here this is `~/aci_identity/azure-deployments/`). This will configure our `prefect.yaml` with branch and repository information for deploying.
```bash
cd ~/aci_identity/azure-deployments
prefect init --recipe git
```

### Update Prefect.yaml to Retrieve Secrets from Keyvault

Included in this directory, and packaged in the Docker image being used is `retrieve_secrets.py`.
We need to instruct the `pull` step to utilize this to retrieve the Bitbucket access token, before the git clone operation is attempted.
Describing the configuration below:

1. `retrieve_secrets.main` - Retrieve_secrets is the module, while main is the entry function. This was copied into `/opt/prefect` during the Docker image build step earlier in this tutorial . This makes it an accessible function during the pull steps.
2. The `id` key is to provide a name to access the returned value. As `retrieve_secrets.py` is returning an access token in the form `{"access_token": <> }`, we can access it in the next step via `get-access-token.access_token`.
3. The repository information, and access token. At this time (June 28th, 2023), the values cannot be co-mingled such as `user:{{ get-access-token}}` or `https://user:{{ get-access-token }}@bitbucket.org`. They must exist wholly by themselves. What this means, for BitBucket in particular, the `access_token`: `'{{ get-access-token.access_token }}'` must exist as necessary in the keyvault to clone down the repository.
- With a PAT, this is stored as `<user>:<access token>` in the keyvault, like `userABC:pnu_asja12356zzcx`.
- With a Repo Token or BitBucket Cloud, this is stored as `x-auth-token:<access token>`

```yaml
pull:
- retrieve_secrets.main:
id: get-access-token
- prefect.deployments.steps.git_clone:
repository: https://bitbucket.org/sopkin/azure-deployments.git
branch: master
access_token: '{{ get-access-token.access_token }}'
```


### Deploy the Flow

As the `prefect.yaml` file is created at the root of the repository, we must run, and pass the flow relative to root. For the purposes of this tutorial, this flow exists at the path:
`part_2/aci/transform_flow.py`.
The root is `azure-deployments`.
For production environments, it's generally best practice to have each flow in it's own directory at the root of the repository.
```bash
# -p is the ACI Worker Pool we already Created
# -n is the name we give the deployment
# ./part_2/aci/transform_flow.py:transform_flow is the entrypoint of the container.
# This is the path in the repository AND locally - this is why we do prefect project init from the root.

prefect deploy -n aci-test ./part_2/aci/transform_flow.py:transform_flow -p aci-test
```

### Run the deployment

At this point, assuming the following have been configured and completed, we can run a flow / deployment:
* Work-pool is configured - Subscription ID, Image provided, Docker Credentials Block attached, ACI Credentials Attached, User-Assigned Identity Attached, Resource-Group provided
* KeyVault contains the correct token
* ACI Worker is Healthy
* Prefect Deployment deployed with correct values

```bash
# Run the flow, flow_name/deployment_name
prefect deployment run 'transform_flow/aci-test'
```

## Private Bitbucket Auth Examples:

See this issue for why `Secrets` and not GitHub / BitBucket Credentials:
https://github.com/PrefectHQ/prefect/issues/9683

```bash
pull:
- prefect.deployments.steps.git_clone_project:
repository: https://bitbucket.org/sopkin/azure-deployments.git
branch: master
access_token: '"{{ prefect.blocks.secret.secret-bitbucket-boyd }}"'
## The secret block was created in the UI with the value like: x-auth-token:pnu_aasedqjklczjklqklrj
```

```bash
pull:
- prefect.deployments.steps.git_clone_project:
repository: https://bitbucket.org/sopkin/azure-deployments.git
branch: master
access_token: '"x-auth-token:<PAT Token here>"'
```

```bash
pull:
- prefect.deployments.steps.git_clone_project:
repository: https://x-auth-token:<PAT Token here>@bitbucket.org/sopkin/azure-deployments.git
branch: master
access_token: null
```

It's also possible to run a shell script, and use other commands, such as azure-cli:
```bash
pull:
- prefect.deployments.steps.run_shell_script:
id: get-access-token
script: az keyvault secret show --name prefectboyd --vault-name boydaciprefectkv --query "value" --output tsv
stream_output: true
- prefect.deployments.steps.git_clone:
repository: https://bitbucket.org/sopkin/azure-deployments.git
branch: master
access_token: "{{ get-access-token.stdout }}"
```
28 changes: 28 additions & 0 deletions azure/prefect-2/prefect-worker-on-aci/container.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: '2018-10-01'
location: eastus
name: mycontainergroup
properties:
containers:
- name: prefect-worker
properties:
image: index.docker.io/chaboy/prefect-aci-worker:0.2.10
command: ['/bin/bash', '-c', 'prefect worker start --pool aci-test --type azure-container-instance']
environmentVariables:
- name: 'PREFECT_API_URL'
secureValue: <REAL API KEY HERE>
- name: 'PREFECT_API_URL'
secureValue: <REAL API URL HERE>
- name: 'PREFECT_API_ENABLE_HTTP2'
value: False
resources:
requests:
cpu: 1.0
memoryInGb: 1.5
imageRegistryCredentials:
- server: index.docker.io
username: <imageRegistryUsername>
password: <imageRegistryPassword>
subnetIds:
- id: /subscriptions/<subscription-ID>/resourceGroups/ACIResourceGroup/providers/Microsoft.Network/virtualNetworks/aci-vnet/subnets/aci-subnet
osType: Linux
restartPolicy: OnFailure
49 changes: 49 additions & 0 deletions azure/prefect-2/prefect-worker-on-aci/prefect.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Welcome to your prefect.yaml file! You can you this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.

# Generic metadata about this project
name: test
prefect-version: 2.10.16

# build section allows you to manage and build docker images
build:

# push section allows you to manage if and how this project is uploaded to remote locations
push:

# pull section allows you to provide instructions for cloning this project in remote locations
pull:
- retrieve_secrets.main:
id: get-access-token
- prefect.deployments.steps.git_clone:
repository: https://bitbucket.org/sopkin/azure-deployments.git
branch: master
access_token: '{{ get-access-token.access_token }}'

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name:
version:
tags: []
description:
schedule: {}
flow_name:
entrypoint:
parameters: {}
work_pool:
name:
work_queue_name:
job_variables: {}
- name: aci-deploy
version:
tags: []
description:
schedule:
entrypoint: ./part_2/aci/transform_flow.py:transform_flow
parameters: {}
work_pool:
name: aci-test
work_queue_name:
job_variables: {}
pull:
6 changes: 6 additions & 0 deletions azure/prefect-2/prefect-worker-on-aci/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
adlfs
prefect-azure
prefect-azure[blob-storage]
azure-identity
azure-storage-blob
azure-keyvault-secrets
35 changes: 35 additions & 0 deletions azure/prefect-2/prefect-worker-on-aci/retrieve_secrets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
from azure.keyvault.secrets import SecretClient
import sys

# Set up the default credential, which uses the managed identity of the Azure resource (ACI, VM, etc.)
def get_creds(auth_type):
if auth_type == "managed_identity":
credential = ManagedIdentityCredential(managed_identity_client_id="<the service principal / clientID of your identity>") # myaciid clientId
else:
credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)
return credential


# Create a secret client using the default credential and the URL to the Key Vault
def get_secret(credential):
secret_client = SecretClient(vault_url="https://<your vault>.vault.azure.net", credential=credential)
secret_name = "mysecret"

# Retrieve the secret
retrieved_secret = secret_client.get_secret(secret_name)
print (retrieved_secret.value) # This is the secret value - optional for development to verify


def main():
# Check if the user wants to use managed identity, or the default credential
if len(sys.argv) == 2 and sys.argv[1] == "managed_identity":
credential = get_creds("managed_identity")
else:
credential = get_creds("default")
access_token = get_secret(credential)
# Return the access token to the pull step
return {"access_token": access_token }

if __name__ == "__main__":
main()
Loading