-
Notifications
You must be signed in to change notification settings - Fork 125
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* v0.14 release blog Contribute blog article for v0.14 release Signed-off-by: Edgar Hernández <[email protected]> * Fix introductory paragraph with the right summary Signed-off-by: Edgar Hernández <[email protected]> --------- Signed-off-by: Edgar Hernández <[email protected]>
- Loading branch information
1 parent
3f9bdb9
commit 2ff73bd
Showing
3 changed files
with
151 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
# Announcing: KServe v0.14 | ||
|
||
We are excited to announce KServe v0.14. In this release we are introducing a new Python client designed for KServe, and a new model cache feature; we are promoting OCI storage for models as a stable feature; and we added support for deploying models directly from Hugging Face. | ||
|
||
Below are a summary of the key changes. | ||
|
||
## Introducing Inference client for Python | ||
|
||
The KServe Python SDK now includes both [REST](https://github.com/kserve/kserve/blob/v0.14.0/python/kserve/kserve/inference_client.py#L388) and [GRPC](https://github.com/kserve/kserve/blob/v0.14.0/python/kserve/kserve/inference_client.py#L61) inference clients. The new Inference clients of the SDK were delivered as **alpha** features. | ||
|
||
Inline with the features documented in issue [#3270](https://github.com/kserve/kserve/issues/3270), both clients have the following characteristics: | ||
|
||
* The clients are asynchronous | ||
* Support for HTTP/2 (via [httpx](https://www.python-httpx.org/) library) | ||
* Support Open Inference Protocol v1 and v2 | ||
|
||
As usual, the version 0.14.0 of the KServe Python SDK is [published to PyPI](https://pypi.org/project/kserve/0.14.0/) and available to install via `pip install`. | ||
|
||
<!-- | ||
Related tickets: | ||
* Initial implementation [#3270](https://github.com/kserve/kserve/issues/3270) | ||
* FP16 support [#3643](https://github.com/kserve/kserve/issues/3643) | ||
--> | ||
|
||
## Support for OCI storage for models (modelcars) becomes stable | ||
|
||
In KServe version 0.12, support for using OCI containers for model storage was introduced as an experimental feature. This allows users to store models in containers in OCI format, and allows the usage of OCI-compatible registries for publishing the models. | ||
|
||
This feature was implemented by configuring the OCI model container as a sidecar in the InferenceService pod, which was the motivation for naming the feature as modelcars. The model files are made available to the model server by configuring [process namespace sharing](https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/) in the pod. | ||
|
||
There was one small but important detail that was unsolved and motivated the experimental status: since the modelcar is part of the main containers of the pod, there was no certainty that the modelcar would start quickly. The model server would be unstable if it starts first than the modelcar, and since there was no prefetching of the model image, this was thought as a likely condition. | ||
|
||
The unstable situation has been mitigated by configuring the OCI model as an init container in addition to also configuring it as a sidecar. The configuration as an init container ensures that the model is fetched before the main containers are started. The prefetching allows the modelcar to start quickly. | ||
The stabilization is available since KServe version 0.14, where modelcars are now a stable feature. | ||
|
||
### Future plan | ||
|
||
Modelcars is one implementation option for supporting OCI images for model storage. There are other alternatives commented in [issue #4083](https://github.com/kserve/kserve/issues/4083). | ||
|
||
Using volume mounts based on OCI artifacts is the optimal implementation, but this is only [recently possible since Kubernetes 1.31](https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/) as a native alpha feature. KServe can now evolve to use this new Kubernetes feature. | ||
|
||
## Introducing model cache | ||
|
||
With models increasing in size, specially true for LLM models, pulling from storage each time a pod is created can result in unmanageable start-up times. Although OCI storage also has the benefit of model caching, the capabilities are not flexible since the management is delegated to the cluster. | ||
|
||
The Model Cache was proposed as another alternative to enhance KServe usability with big models, released in KServe v0.14 as an **alpha** feature. It relies on a PV for storing models and it provides control about which models to store in the cache. The feature was designed to mainly to use node Filesystem as storage. Read the [design document for the details](https://docs.google.com/document/d/1nao8Ws3tonO2zNAzdmXTYa0hECZNoP2SV_z9Zg0PzLA/edit). | ||
|
||
The model cache is currently disabled by default. To enable, you need to modify the `localmodel.enabled` field on the `inferenceservice-config` ConfigMap. | ||
|
||
You start by creating a node group as follows: | ||
|
||
```yaml | ||
apiVersion: serving.kserve.io/v1alpha1 | ||
kind: LocalModelNodeGroup | ||
metadata: | ||
name: nodegroup1 | ||
spec: | ||
persistentVolumeSpec: | ||
accessModes: | ||
- ReadWriteOnce | ||
volumeMode: Filesystem | ||
capacity: | ||
storage: 2Gi | ||
hostPath: | ||
path: /models | ||
type: "" | ||
persistentVolumeReclaimPolicy: Delete | ||
storageClassName: standard | ||
persistentVolumeClaimSpec: | ||
accessModes: | ||
- ReadWriteOnce | ||
resources: | ||
requests: | ||
storage: 2Gi | ||
storageClassName: standard | ||
volumeMode: Filesystem | ||
volumeName: kserve | ||
|
||
``` | ||
|
||
Then, you can specify to store an cache a model with the following resource: | ||
|
||
```yaml | ||
apiVersion: serving.kserve.io/v1alpha1 | ||
kind: ClusterLocalModel | ||
metadata: | ||
name: iris | ||
spec: | ||
modelSize: 1Gi | ||
nodeGroup: nodegroup1 | ||
sourceModelUri: gs://kfserving-examples/models/sklearn/1.0/model | ||
``` | ||
<!-- | ||
Related tickets: | ||
* Cluster local model controller: [#3860](https://github.com/kserve/kserve/pull/3860) | ||
* Cluster Local Model CR [#3839](https://github.com/kserve/kserve/pull/3839) | ||
* Add NodeDownloadPending status to ClusterLocalModel [#3955](https://github.com/kserve/kserve/pull/3955) | ||
--> | ||
## Support for Hugging Face hub in storage initializer | ||
The KServe storage initializer has been enhanced to support downloading models directly from Hugging Face. For this, the new schema `hf://` is now supported in the `storageUri` field of InferenceServices. The following YAML partial shows this: | ||
|
||
```yaml | ||
apiVersion: serving.kserve.io/v1beta1 | ||
kind: InferenceService | ||
metadata: | ||
name: huggingface-llama3 | ||
spec: | ||
predictor: | ||
model: | ||
storageUri: hf://meta-llama/meta-llama-3-8b-instruct | ||
``` | ||
|
||
Both public and private Hugging Face repositories are supported. The credentials can be provided by the usual mechanism of binding Secrets to ServiceAccounts, or by binding the credentials Secret as environment variables in the InferenceService. | ||
|
||
Read the [documentation](../../../modelserving/storage/huggingface/hf/) for more details. | ||
|
||
<!-- | ||
Related tickets: | ||
* Implement Huggingface model download in storage initializer [#3584](https://github.com/kserve/kserve/pull/3584) | ||
--> | ||
|
||
## Other Changes | ||
|
||
This release also includes several enhancements and changes: | ||
|
||
### What's New? | ||
* New flag for automount serviceaccount token by [#3979](https://github.com/kserve/kserve/pull/3979) | ||
* TLS support for inference loggers [#3837](https://github.com/kserve/kserve/issues/3837) | ||
* Allow PVC storage to be mounted in ReadWrite mode via an annotation [#3687](https://github.com/kserve/kserve/issues/3687) | ||
|
||
### What's Changed? | ||
* Added `hostIPC` field to `ServingRuntime` CRD, for supporting more than one GPU in Serverless mode [#3791](https://github.com/kserve/kserve/issues/3791) | ||
* Support for Python 3.12 is added, while support Python 3.8 is removed [#3645](https://github.com/kserve/kserve/pull/3645) | ||
|
||
For complete details on the new features and updates, visit our [official release notes](https://github.com/kserve/kserve/releases/tag/v0.14.0). | ||
|
||
## Join the community | ||
|
||
- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve) | ||
- Join the Slack ([#kserve](https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues)) | ||
- Attend our community meeting by subscribing to the [KServe calendar](https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month). | ||
- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption! | ||
|
||
Thanks for all the contributors who have made the commits to 0.14 release! | ||
|
||
The KServe Project |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters