Skip to content

Commit

Permalink
Restore tfserving directory (#2687)
Browse files Browse the repository at this point in the history
  • Loading branch information
diondrapeck authored Sep 27, 2023
1 parent b19a056 commit 7f44453
Show file tree
Hide file tree
Showing 12 changed files with 110 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Deploy an integrated inference container
This example shows how to deploy a fully-integrated inference BYOC image using TFServing with the model integrated into the container itself and both `model` and `code` absent from the deployment yaml.

## How to deploy
To deploy this example execute the deploy-custom-container-half-plus-two-integrated.sh script in the CLI directory.

## Testing
The endpoint can be tested using the code at the end of the deployment script:
```bash
curl -d @sample-data.json -H "Content-Type: application/json" -H "Authorization: Bearer $KEY" $SCORING_URL
```

The inputs are a list of tensors of dimension 2.
```json
{
"inputs" : [
[[1,2,3,4]],
[[0,1,1,1]]
]
}
```

## Model
This model is a simple [Half Plus Two](https://www.tensorflow.org/tfx/serving/docker) model which returns `2X+0.5` for input tensors of dimension 2.

```python
class HPT(tf.Module):
@tf.function(input_signature=[tf.TensorSpec(shape=[None,None], dtype=tf.float32)])
def __call__(self, x):
return tf.math.add(tf.math.multiply(x, 0.5),2)
```

## Image
This is a BYOC TFServing image with the model integrated into the container itself.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: integratedbyoc
endpoint_name: {{ENDPOINT_NAME}}
environment:
image: {{ACR_NAME}}.azurecr.io/azureml-examples/tfsintegrated:1
inference_config:
liveness_route:
port: 8501
path: /v1/models/hpt
readiness_route:
port: 8501
path: /v1/models/hpt
scoring_route:
port: 8501
path: /v1/models/hpt:predict
instance_type: Standard_DS3_v2
instance_count: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
FROM docker.io/tensorflow/serving:latest

ENV MODEL_NAME=hpt

COPY models /models
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"inputs" : [
[[1,2,3,4]],
[[0,1,1,1]]
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Deploy the Half Plus Two model using TFServing
In this example, we deploy a single model (half-plus-two) using a TFServing custom container.

This example can be run end-to-end by executing the `deploy-custom-container-tfserving-half-plus-two.sh` script in the `CLI` directory.

## Model
This example uses the `half-plus-two` model, which is downloaded in the script. In the deployment yaml it is registered as a model and mounted at runtime at the `AZUREML_MODEL_DIR` environment variable as in standard deployments. The default location for model mounting is `/var/azureml-app/azureml-models/<MODEL_NAME>/<MODEL_VERSION>` unless overridden by the `model_mount_path` field in the deployment yaml.

This path is passed to TFServing as an environment variable in the deployment YAML.

## Build the image
This example uses the `tensorflow/serving` image with no modifications as defined in the `tfserving.dockerfile`. Although this example demonstrates the usual workflow of building the image on an ACR instance, this deployment could bypass the ACR build step and include the `docker.io` path of the image as the image URL in the deployment YAML.

## Environment
The environment is defined inline in the deployment yaml and references the ACR url of the image. The ACR must be associated with the workspace (or have a user-assigned managed identity that enables ACRPull) in order to successfully deploy.

The environment also contains an `inference_config` block that defines the `liveness`, `readiness`, and `scoring` routes by path and port. Because the images used in this examples are based on the AzureML Inference Minimal images, these values are the same as those in a non-BYOC deployment, however they must be included since we are now using a custom image.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"instances": [1.0, 2.0, 5.0]}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: tfserving-deployment
endpoint_name: tfserving-endpoint
model:
name: tfserving-mounted
version: {{MODEL_VERSION}}
path: ./half_plus_two
environment_variables:
MODEL_BASE_PATH: /var/azureml-app/azureml-models/tfserving-mounted/{{MODEL_VERSION}}
MODEL_NAME: half_plus_two
environment:
#name: tfserving
#version: 1
image: docker.io/tensorflow/serving:latest
inference_config:
liveness_route:
port: 8501
path: /v1/models/half_plus_two
readiness_route:
port: 8501
path: /v1/models/half_plus_two
scoring_route:
port: 8501
path: /v1/models/half_plus_two:predict
instance_type: Standard_DS3_v2
instance_count: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
$schema: https://azuremlsdk2.blob.core.windows.net/latest/managedOnlineEndpoint.schema.json
name: tfserving-endpoint
auth_mode: aml_token
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FROM tensorflow/serving

0 comments on commit 7f44453

Please sign in to comment.