Skip to content

Commit

Permalink
initial changes for IG raw deployment mode
Browse files Browse the repository at this point in the history
Signed-off-by: Mopuri, Bharath <[email protected]>
  • Loading branch information
Mopuri, Bharath committed Feb 3, 2024
1 parent fc1ba3e commit 9b5b762
Show file tree
Hide file tree
Showing 5 changed files with 148 additions and 0 deletions.
3 changes: 3 additions & 0 deletions docs/admin/kubernetes_deployment.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Kubernetes Deployment Installation Guide
KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.

** Starting with Kserve vx.xx release `InferenceGraph` as well supports `RawDeployment` mode
See release notes

Kubernetes 1.22 is the minimally required version and please check the following recommended Istio versions for the corresponding
Kubernetes version.

Expand Down
88 changes: 88 additions & 0 deletions docs/blog/articles/2024-XX-XX-KServe-X.XX-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Announcing: KServe vx.xx

We are excited to announce the release of KServe x.xx, in this release we made enhancements to the KServe control plane, especially brining RawDeployment for `InferenceGraph` as well. Previously `RawDeployment` existed only for `InferenceService`

Here is a summary of the key changes:

## KServe Core Inference Enhancements

- Inference Graph enhancements for supporting `RawDeployment` along with Auto Scaling configuration right within the `InferenceGraphSpec`

IG `RawDeployment` makes the deployment light weight using native k8s resources. See the comparison below

![Inference graph Knative based deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png)

![Inference graph raw deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png)

AutoScaling configuration fields were introduced to support scaling needs in
`RawDeployment` mode. These fields are optional and when added effective only when this annotation `serving.kserve.io/autoscalerClass` not pointing to `external`
see the following example with Auto scaling fields `MinReplicas`, `MaxReplicas`, `ScaleTarget` and `ScaleMetric`:

```yaml
apiVersion: serving.kserve.io/v1alpha1
kind: InferenceGraph
metadata:
name: graph_with_switch_node
annotations:
serving.kserve.io/deploymentMode: "RawDeployment"
spec:
nodes:
root:
routerType: Sequence
steps:
- name: "rootStep1"
nodeName: node1
dependency: Hard
- name: "rootStep2"
serviceName: {{ success_200_isvc_id }}
node1:
routerType: Switch
steps:
- name: "node1Step1"
serviceName: {{ error_404_isvc_id }}
condition: "[@this].#(decision_picker==ERROR)"
dependency: Hard
MinReplicas: 5
MaxReplicas: 10
ScaleTarget: 50
ScaleMetric: "cpu"
```
For more details please refer to the [issue](https://github.com/kserve/kserve/issues/2454).
-
### Enhanced Python SDK Dependency Management
-
-
### KServe Python Runtimes Improvements
-
### LLM Runtimes
#### TorchServe LLM Runtime
#### vLLM Runtime
## ModelMesh Updates
### Storing Models on Kubernetes Persistent Volumes (PVC)
### Horizontal Pod Autoscaling (HPA)
### Model Metrics, Metrics Dashboard, Payload Event Logging
## What's Changed? :warning:
## Join the community
- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
- Join the Slack ([#kserve](https://kubeflow.slack.com/?redir=%2Farchives%2FCH6E58LNP))
- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!
Thanks for all the contributors who have made the commits to 0.11 release!
The KServe Working Group
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 57 additions & 0 deletions docs/reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,63 @@ Kubernetes core/v1.Affinity
<em>(Optional)</em>
</td>
</tr>

<tr>
<td>
<code>minReplicas</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.</p>
</td>
</tr>
<tr>
<td>
<code>maxReplicas</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Maximum number of replicas for autoscaling.</p>
</td>
</tr>
<tr>
<td>
<code>scaleTarget</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
concurrency and rps targets are supported by Knative Pod Autoscaler
(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-targets/">https://knative.dev/docs/serving/autoscaling/autoscaling-targets/</a>).</p>
</td>
</tr>
<tr>
<td>
<code>scaleMetric</code><br/>
<em>
<a href="#serving.kserve.io/v1beta1.ScaleMetric">
ScaleMetric
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScaleMetric defines the scaling metric type watched by autoscaler
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-metrics">https://knative.dev/docs/serving/autoscaling/autoscaling-metrics</a>).</p>
</td>
</tr>


</tbody>
</table>
<h3 id="serving.kserve.io/v1alpha1.InferenceGraphStatus">InferenceGraphStatus
Expand Down

0 comments on commit 9b5b762

Please sign in to comment.