diff --git a/docs/admin/kubernetes_deployment.md b/docs/admin/kubernetes_deployment.md index de2631ad2..a4f8c14c1 100644 --- a/docs/admin/kubernetes_deployment.md +++ b/docs/admin/kubernetes_deployment.md @@ -1,6 +1,9 @@ # Kubernetes Deployment Installation Guide KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode. +** Starting with Kserve vx.xx release `InferenceGraph` as well supports `RawDeployment` mode +See release notes + Kubernetes 1.22 is the minimally required version and please check the following recommended Istio versions for the corresponding Kubernetes version. diff --git a/docs/blog/articles/2024-XX-XX-KServe-X.XX-release.md b/docs/blog/articles/2024-XX-XX-KServe-X.XX-release.md new file mode 100644 index 000000000..825b81a72 --- /dev/null +++ b/docs/blog/articles/2024-XX-XX-KServe-X.XX-release.md @@ -0,0 +1,88 @@ +# Announcing: KServe vx.xx + +We are excited to announce the release of KServe x.xx, in this release we made enhancements to the KServe control plane, especially brining RawDeployment for `InferenceGraph` as well. Previously `RawDeployment` existed only for `InferenceService` + +Here is a summary of the key changes: + +## KServe Core Inference Enhancements + +- Inference Graph enhancements for supporting `RawDeployment` along with Auto Scaling configuration right within the `InferenceGraphSpec` + +IG `RawDeployment` makes the deployment light weight using native k8s resources. See the comparison below + +![Inference graph Knative based deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png) + +![Inference graph raw deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png) + +AutoScaling configuration fields were introduced to support scaling needs in +`RawDeployment` mode. These fields are optional and when added effective only when this annotation `serving.kserve.io/autoscalerClass` not pointing to `external` + see the following example with Auto scaling fields `MinReplicas`, `MaxReplicas`, `ScaleTarget` and `ScaleMetric`: + + ```yaml + apiVersion: serving.kserve.io/v1alpha1 + kind: InferenceGraph + metadata: + name: graph_with_switch_node + annotations: + serving.kserve.io/deploymentMode: "RawDeployment" + spec: + nodes: + root: + routerType: Sequence + steps: + - name: "rootStep1" + nodeName: node1 + dependency: Hard + - name: "rootStep2" + serviceName: {{ success_200_isvc_id }} + node1: + routerType: Switch + steps: + - name: "node1Step1" + serviceName: {{ error_404_isvc_id }} + condition: "[@this].#(decision_picker==ERROR)" + dependency: Hard + MinReplicas: 5 + MaxReplicas: 10 + ScaleTarget: 50 + ScaleMetric: "cpu" + ``` + For more details please refer to the [issue](https://github.com/kserve/kserve/issues/2454). + +- + +### Enhanced Python SDK Dependency Management + +- +- + +### KServe Python Runtimes Improvements +- + +### LLM Runtimes + +#### TorchServe LLM Runtime + +#### vLLM Runtime + +## ModelMesh Updates + +### Storing Models on Kubernetes Persistent Volumes (PVC) + +### Horizontal Pod Autoscaling (HPA) + +### Model Metrics, Metrics Dashboard, Payload Event Logging + +## What's Changed? :warning: + +## Join the community + +- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve) +- Join the Slack ([#kserve](https://kubeflow.slack.com/?redir=%2Farchives%2FCH6E58LNP)) +- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars). +- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption! + + +Thanks for all the contributors who have made the commits to 0.11 release! + +The KServe Working Group diff --git a/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png b/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png new file mode 100644 index 000000000..4ff121bf0 Binary files /dev/null and b/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png differ diff --git a/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png b/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png new file mode 100644 index 000000000..21fd35e6f Binary files /dev/null and b/docs/images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png differ diff --git a/docs/reference/api.md b/docs/reference/api.md index 0fe45d299..08eb7e8aa 100644 --- a/docs/reference/api.md +++ b/docs/reference/api.md @@ -524,6 +524,63 @@ Kubernetes core/v1.Affinity (Optional) + + + +minReplicas
+ +int + + + +(Optional) +

Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.

+ + + + +maxReplicas
+ +int + + + +(Optional) +

Maximum number of replicas for autoscaling.

+ + + + +scaleTarget
+ +int + + + +(Optional) +

ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. +concurrency and rps targets are supported by Knative Pod Autoscaler +(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

+ + + + +scaleMetric
+ + +ScaleMetric + + + + +(Optional) +

ScaleMetric defines the scaling metric type watched by autoscaler +possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via +Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

+ + + +

InferenceGraphStatus