Restore tfserving directory (#2687)

Azure · Sep 27, 2023 · 7f44453 · 7f44453
1 parent b19a056
commit 7f44453
Show file tree

Hide file tree

Showing 12 changed files with 110 additions and 0 deletions.
diff --git a/cli/endpoints/online/custom-container/tfserving/half-plus-two-integrated/README.md b/cli/endpoints/online/custom-container/tfserving/half-plus-two-integrated/README.md
@@ -0,0 +1,34 @@
+# Deploy an integrated inference container 
+This example shows how to deploy a fully-integrated inference BYOC image using TFServing with the model integrated into the container itself and both `model` and `code` absent from the deployment yaml.
+
+## How to deploy
+To deploy this example execute the deploy-custom-container-half-plus-two-integrated.sh script in the CLI directory. 
+
+## Testing
+The endpoint can be tested using the code at the end of the deployment script:
+```bash
+curl -d @sample-data.json -H "Content-Type: application/json" -H "Authorization: Bearer $KEY" $SCORING_URL
+```
+
+The inputs are a list of tensors of dimension 2. 
+```json
+{
+    "inputs" : [
+                    [[1,2,3,4]],
+                    [[0,1,1,1]]
+               ]
+}
+``` 
+
+## Model 
+This model is a simple [Half Plus Two](https://www.tensorflow.org/tfx/serving/docker) model which returns `2X+0.5` for input tensors of dimension 2. 
+
+```python 
+class HPT(tf.Module):
+    @tf.function(input_signature=[tf.TensorSpec(shape=[None,None], dtype=tf.float32)])
+    def __call__(self, x):
+        return tf.math.add(tf.math.multiply(x, 0.5),2)
+```
+
+## Image
+This is a BYOC TFServing image with the model integrated into the container itself. 
diff --git a/...stom-container/tfserving/half-plus-two-integrated/half-plus-two-integrated-deployment.yml b/...stom-container/tfserving/half-plus-two-integrated/half-plus-two-integrated-deployment.yml
@@ -0,0 +1,17 @@
+$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
+name: integratedbyoc
+endpoint_name: {{ENDPOINT_NAME}}
+environment:
+  image: {{ACR_NAME}}.azurecr.io/azureml-examples/tfsintegrated:1
+  inference_config:
+    liveness_route:
+      port: 8501
+      path: /v1/models/hpt
+    readiness_route:
+      port: 8501
+      path: /v1/models/hpt
+    scoring_route:
+      port: 8501
+      path: /v1/models/hpt:predict
+instance_type: Standard_DS3_v2
+instance_count: 1
diff --git a/...e/custom-container/tfserving/half-plus-two-integrated/half-plus-two-integrated.Dockerfile b/...e/custom-container/tfserving/half-plus-two-integrated/half-plus-two-integrated.Dockerfile
@@ -0,0 +1,5 @@
+FROM docker.io/tensorflow/serving:latest
+
+ENV MODEL_NAME=hpt
+
+COPY models /models 
diff --git a/...ts/online/custom-container/tfserving/half-plus-two-integrated/models/hpt/1/saved_model.pb b/...ts/online/custom-container/tfserving/half-plus-two-integrated/models/hpt/1/saved_model.pb
diff --git a/...r/tfserving/half-plus-two-integrated/models/hpt/1/variables/variables.data-00000-of-00001 b/...r/tfserving/half-plus-two-integrated/models/hpt/1/variables/variables.data-00000-of-00001
diff --git a/...ustom-container/tfserving/half-plus-two-integrated/models/hpt/1/variables/variables.index b/...ustom-container/tfserving/half-plus-two-integrated/models/hpt/1/variables/variables.index
diff --git a/cli/endpoints/online/custom-container/tfserving/half-plus-two-integrated/sample-data.json b/cli/endpoints/online/custom-container/tfserving/half-plus-two-integrated/sample-data.json
@@ -0,0 +1,6 @@
+{
+    "inputs" : [
+                    [[1,2,3,4]],
+                    [[0,1,1,1]]
+               ]
+}
diff --git a/cli/endpoints/online/custom-container/tfserving/half-plus-two/README.md b/cli/endpoints/online/custom-container/tfserving/half-plus-two/README.md
@@ -0,0 +1,17 @@
+# Deploy the Half Plus Two model using TFServing
+In this example, we deploy a single model (half-plus-two) using a TFServing custom container. 
+
+This example can be run end-to-end by executing the `deploy-custom-container-tfserving-half-plus-two.sh` script in the `CLI` directory. 
+
+## Model 
+This example uses the `half-plus-two` model, which is downloaded in the script. In the deployment yaml it is registered as a model and mounted at runtime at the `AZUREML_MODEL_DIR` environment variable as in standard deployments. The default location for model mounting is `/var/azureml-app/azureml-models/<MODEL_NAME>/<MODEL_VERSION>` unless overridden by the `model_mount_path` field in the deployment yaml. 
+
+This path is passed to TFServing as an environment variable in the deployment YAML. 
+
+## Build the image 
+This example uses the `tensorflow/serving` image with no modifications as defined in the `tfserving.dockerfile`. Although this example demonstrates the usual workflow of building the image on an ACR instance, this deployment could bypass the ACR build step and include the `docker.io` path of the image as the image URL in the deployment YAML. 
+
+## Environment
+The environment is defined inline in the deployment yaml and references the ACR url of the image. The ACR must be associated with the workspace (or have a user-assigned managed identity that enables ACRPull) in order to successfully deploy.
+
+The environment also contains an `inference_config` block that defines the `liveness`, `readiness`, and `scoring` routes by path and port. Because the images used in this examples are based on the AzureML Inference Minimal images, these values are the same as those in a non-BYOC deployment, however they must be included since we are now using a custom image. 
diff --git a/cli/endpoints/online/custom-container/tfserving/half-plus-two/sample_request.json b/cli/endpoints/online/custom-container/tfserving/half-plus-two/sample_request.json
@@ -0,0 +1 @@
+{"instances": [1.0, 2.0, 5.0]}
diff --git a/cli/endpoints/online/custom-container/tfserving/half-plus-two/tfserving-deployment.yml b/cli/endpoints/online/custom-container/tfserving/half-plus-two/tfserving-deployment.yml
@@ -0,0 +1,26 @@
+$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
+name: tfserving-deployment
+endpoint_name: tfserving-endpoint
+model:
+  name: tfserving-mounted
+  version: {{MODEL_VERSION}}
+  path: ./half_plus_two
+environment_variables:
+  MODEL_BASE_PATH: /var/azureml-app/azureml-models/tfserving-mounted/{{MODEL_VERSION}}
+  MODEL_NAME: half_plus_two
+environment:
+  #name: tfserving
+  #version: 1
+  image: docker.io/tensorflow/serving:latest
+  inference_config:
+    liveness_route:
+      port: 8501
+      path: /v1/models/half_plus_two
+    readiness_route:
+      port: 8501
+      path: /v1/models/half_plus_two
+    scoring_route:
+      port: 8501
+      path: /v1/models/half_plus_two:predict
+instance_type: Standard_DS3_v2
+instance_count: 1
diff --git a/cli/endpoints/online/custom-container/tfserving/half-plus-two/tfserving-endpoint.yml b/cli/endpoints/online/custom-container/tfserving/half-plus-two/tfserving-endpoint.yml
@@ -0,0 +1,3 @@
+$schema: https://azuremlsdk2.blob.core.windows.net/latest/managedOnlineEndpoint.schema.json
+name: tfserving-endpoint
+auth_mode: aml_token
diff --git a/cli/endpoints/online/custom-container/tfserving/half-plus-two/tfserving.dockerfile b/cli/endpoints/online/custom-container/tfserving/half-plus-two/tfserving.dockerfile
@@ -0,0 +1 @@
+FROM tensorflow/serving