Skip to content

Commit

Permalink
v0.14 release blog (#431)
Browse files Browse the repository at this point in the history
* v0.14 release blog

Contribute blog article for v0.14 release

Signed-off-by: Edgar Hernández <[email protected]>

* Fix introductory paragraph with the right summary

Signed-off-by: Edgar Hernández <[email protected]>

---------

Signed-off-by: Edgar Hernández <[email protected]>
  • Loading branch information
israel-hdez authored Dec 22, 2024
1 parent 3f9bdb9 commit 2ff73bd
Show file tree
Hide file tree
Showing 3 changed files with 151 additions and 4 deletions.
149 changes: 149 additions & 0 deletions docs/blog/articles/2024-12-13-KServe-0.14-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Announcing: KServe v0.14

We are excited to announce KServe v0.14. In this release we are introducing a new Python client designed for KServe, and a new model cache feature; we are promoting OCI storage for models as a stable feature; and we added support for deploying models directly from Hugging Face.

Below are a summary of the key changes.

## Introducing Inference client for Python

The KServe Python SDK now includes both [REST](https://github.com/kserve/kserve/blob/v0.14.0/python/kserve/kserve/inference_client.py#L388) and [GRPC](https://github.com/kserve/kserve/blob/v0.14.0/python/kserve/kserve/inference_client.py#L61) inference clients. The new Inference clients of the SDK were delivered as **alpha** features.

Inline with the features documented in issue [#3270](https://github.com/kserve/kserve/issues/3270), both clients have the following characteristics:

* The clients are asynchronous
* Support for HTTP/2 (via [httpx](https://www.python-httpx.org/) library)
* Support Open Inference Protocol v1 and v2

As usual, the version 0.14.0 of the KServe Python SDK is [published to PyPI](https://pypi.org/project/kserve/0.14.0/) and available to install via `pip install`.

<!--
Related tickets:
* Initial implementation [#3270](https://github.com/kserve/kserve/issues/3270)
* FP16 support [#3643](https://github.com/kserve/kserve/issues/3643)
-->

## Support for OCI storage for models (modelcars) becomes stable

In KServe version 0.12, support for using OCI containers for model storage was introduced as an experimental feature. This allows users to store models in containers in OCI format, and allows the usage of OCI-compatible registries for publishing the models.

This feature was implemented by configuring the OCI model container as a sidecar in the InferenceService pod, which was the motivation for naming the feature as modelcars. The model files are made available to the model server by configuring [process namespace sharing](https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/) in the pod.

There was one small but important detail that was unsolved and motivated the experimental status: since the modelcar is part of the main containers of the pod, there was no certainty that the modelcar would start quickly. The model server would be unstable if it starts first than the modelcar, and since there was no prefetching of the model image, this was thought as a likely condition.

The unstable situation has been mitigated by configuring the OCI model as an init container in addition to also configuring it as a sidecar. The configuration as an init container ensures that the model is fetched before the main containers are started. The prefetching allows the modelcar to start quickly.
The stabilization is available since KServe version 0.14, where modelcars are now a stable feature.

### Future plan

Modelcars is one implementation option for supporting OCI images for model storage. There are other alternatives commented in [issue #4083](https://github.com/kserve/kserve/issues/4083).

Using volume mounts based on OCI artifacts is the optimal implementation, but this is only [recently possible since Kubernetes 1.31](https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/) as a native alpha feature. KServe can now evolve to use this new Kubernetes feature.

## Introducing model cache

With models increasing in size, specially true for LLM models, pulling from storage each time a pod is created can result in unmanageable start-up times. Although OCI storage also has the benefit of model caching, the capabilities are not flexible since the management is delegated to the cluster.

The Model Cache was proposed as another alternative to enhance KServe usability with big models, released in KServe v0.14 as an **alpha** feature. It relies on a PV for storing models and it provides control about which models to store in the cache. The feature was designed to mainly to use node Filesystem as storage. Read the [design document for the details](https://docs.google.com/document/d/1nao8Ws3tonO2zNAzdmXTYa0hECZNoP2SV_z9Zg0PzLA/edit).

The model cache is currently disabled by default. To enable, you need to modify the `localmodel.enabled` field on the `inferenceservice-config` ConfigMap.

You start by creating a node group as follows:

```yaml
apiVersion: serving.kserve.io/v1alpha1
kind: LocalModelNodeGroup
metadata:
name: nodegroup1
spec:
persistentVolumeSpec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
capacity:
storage: 2Gi
hostPath:
path: /models
type: ""
persistentVolumeReclaimPolicy: Delete
storageClassName: standard
persistentVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: standard
volumeMode: Filesystem
volumeName: kserve

```

Then, you can specify to store an cache a model with the following resource:

```yaml
apiVersion: serving.kserve.io/v1alpha1
kind: ClusterLocalModel
metadata:
name: iris
spec:
modelSize: 1Gi
nodeGroup: nodegroup1
sourceModelUri: gs://kfserving-examples/models/sklearn/1.0/model
```
<!--
Related tickets:
* Cluster local model controller: [#3860](https://github.com/kserve/kserve/pull/3860)
* Cluster Local Model CR [#3839](https://github.com/kserve/kserve/pull/3839)
* Add NodeDownloadPending status to ClusterLocalModel [#3955](https://github.com/kserve/kserve/pull/3955)
-->
## Support for Hugging Face hub in storage initializer
The KServe storage initializer has been enhanced to support downloading models directly from Hugging Face. For this, the new schema `hf://` is now supported in the `storageUri` field of InferenceServices. The following YAML partial shows this:

```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: huggingface-llama3
spec:
predictor:
model:
storageUri: hf://meta-llama/meta-llama-3-8b-instruct
```

Both public and private Hugging Face repositories are supported. The credentials can be provided by the usual mechanism of binding Secrets to ServiceAccounts, or by binding the credentials Secret as environment variables in the InferenceService.

Read the [documentation](../../../modelserving/storage/huggingface/hf/) for more details.

<!--
Related tickets:
* Implement Huggingface model download in storage initializer [#3584](https://github.com/kserve/kserve/pull/3584)
-->

## Other Changes

This release also includes several enhancements and changes:

### What's New?
* New flag for automount serviceaccount token by [#3979](https://github.com/kserve/kserve/pull/3979)
* TLS support for inference loggers [#3837](https://github.com/kserve/kserve/issues/3837)
* Allow PVC storage to be mounted in ReadWrite mode via an annotation [#3687](https://github.com/kserve/kserve/issues/3687)

### What's Changed?
* Added `hostIPC` field to `ServingRuntime` CRD, for supporting more than one GPU in Serverless mode [#3791](https://github.com/kserve/kserve/issues/3791)
* Support for Python 3.12 is added, while support Python 3.8 is removed [#3645](https://github.com/kserve/kserve/pull/3645)

For complete details on the new features and updates, visit our [official release notes](https://github.com/kserve/kserve/releases/tag/v0.14.0).

## Join the community

- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
- Join the Slack ([#kserve](https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues))
- Attend our community meeting by subscribing to the [KServe calendar](https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month).
- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!

Thanks for all the contributors who have made the commits to 0.14 release!

The KServe Project
5 changes: 1 addition & 4 deletions docs/modelserving/storage/oci.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,10 @@ Modelcars represents a step forward in efficient model serving, particularly ben

## Enabling Modelcars

Modelcars is an experimental feature in KServe and is not enabled by default.
Modelcars feature in KServe is not enabled by default.
To take advantage of this new model serving method, it needs to be activated in the KServe configuration.
Follow the steps below to enable Modelcars in your environment.

!!! note
Modelcars are currently in an experimental phase. Enable this feature in a test environment first to ensure it meets your requirements before using it in a production setting.

Modelcars can be enabled by modifying the `storageInitializer` configuration in the `inferenceservice-config` ConfigMap.
This can be done manually using `kubectl edit` or by executing the script provided below, with the current namespace set to the namespace where the `kserve-controller-manager` is installed (depends on the way how KServer is installed.)

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ nav:
- Debugging guide: developer/debug.md
- Blog:
- Releases:
- KServe 0.14 Release: blog/articles/2024-12-13-KServe-0.14-release.md
- KServe 0.13 Release: blog/articles/2024-05-15-KServe-0.13-release.md
- KServe 0.11 Release: blog/articles/2023-10-08-KServe-0.11-release.md
- KServe 0.10 Release: blog/articles/2023-02-05-KServe-0.10-release.md
Expand Down

0 comments on commit 2ff73bd

Please sign in to comment.