Skip to content

KServe 0.8 Release

Authors

Dan Sun, Paul Van Eck, Vedant Padwal, Andrews Arokiam on behalf of the KServe Working Group.

Announcing: KServe v0.8

February 18, 2022

Today, we are pleased to announce the v0.8.0 release of KServe! While the last release was focused on the transition of KFServing to KServe, this release was focused on unifying the InferenceService API for deploying models on KServe and ModelMesh.

Note: For current users of KFServing/KServe, please take a few minutes to answer this short survey and provide your feedback!

Now, let's take a look at some of the changes and additions to KServe.

⚠ What’s changed?

  • ONNX Runtime Server has been removed from the supported serving runtime list. KServe by default now uses the Triton Inference Server to serve ONNX models.
  • KServe’s PyTorchServer has been removed from the supported serving runtime list. KServe by default now uses TorchServe to serve PyTorch models.
  • A few main KServe SDK class names have been changed:
    • KFModel is renamed to Model
    • KFServer is renamed to ModelServer
    • KFModelRepository is renamed to ModelRepository

🌈 What's new?

Some notable updates are:

  • ClusterServingRuntime and ServingRuntime CRDs are introduced. Learn more below.
  • A new Model Spec was introduced to the InferenceService Predictor Spec as a new way to specify models. Learn more below.
  • Knative 1.0 is now supported and certified for the KServe Serverless installation.
  • gRPC is now supported for transformer to predictor network communication.
  • TorchServe Serving runtime has been updated to 0.5.2 which now supports the KServe V2 REST protocol.
  • ModelMesh now has multi-namespace support, and users can now deploy GCS or HTTP(S) hosted models.

To see all release updates, check out the KServe release notes and ModelMesh Serving release notes!

ServingRuntimes and ClusterServingRuntimes

This release introduces two new CRDs ServingRuntimes and ClusterServingRuntimes with the only difference between these two is that one is namespace-scoped and one is cluster-scoped. A ServingRuntime defines the templates for Pods that can serve one or more particular model formats. Each ServingRuntime defines key information such as the container image of the runtime and a list of the model formats that the runtime supports.

In previous versions of KServe, supported predictor formats and container images were defined in a config map in the control plane namespace. The ServingRuntime CRD should allow for improved flexibility and extensibility for defining or customizing runtimes to how you see fit without having to modify any controller code or any resources in the controller namespace.

Several out-of-the-box ClusterServingRuntimes are provided with KServe so that users can continue to use KServe how they did before without having to define the runtimes themselves.

Example SKLearn ClusterServingRuntime:

apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
  name: kserve-sklearnserver
spec:
  supportedModelFormats:
    - name: sklearn
      version: "1"
      autoSelect: true
  containers:
    - name: kserve-container
      image: kserve/sklearnserver:latest
      args:
        - --model_name={{.Name}}
        - --model_dir=/mnt/models
        - --http_port=8080
      resources:
        requests:
          cpu: "1"
          memory: 2Gi
        limits:
          cpu: "1"
          memory: 2Gi

Updated InferenceService Predictor Spec

A new Model spec was also introduced as a part of the Predictor spec for InferenceServices. One of the problems KServe was having was that the InferenceService CRD was becoming unwieldy with each model serving runtime being an object in the Predictor spec. This generated a lot of field duplication in the schema, bloating the overall size of the CRD. If a user wanted to introduce a new model serving framework for KServe to support, the CRD would have to be modified, and subsequently the controller code.

Now, with the Model spec, a user can specify a model format and optionally a corresponding version. The KServe control plane will automatically select and use the ClusterServingRuntime or ServingRuntime that supports the given format. Each ServingRuntime maintains a list of supported model formats and versions. If a format has autoselect as true, then that opens the ServingRuntime up for automatic model placement for that model format.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: example-sklearn-isvc
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: s3://bucket/sklearn/mnist.joblib
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: example-sklearn-isvc
spec:
  predictor:
    sklearn:
      storageUri: s3://bucket/sklearn/mnist.joblib

The previous way of defining predictors is still supported, however, the new approach will be the preferred one going forward. Eventually, the previous schema, with the framework names as keys in the predictor spec, will be removed.

ModelMesh Updates

ModelMesh has been in the process of integrating as KServe’s multi-model serving backend. With the inclusion of the aforementioned ServingRuntime CRDs and the Predictor Model spec, the two projects are now much more aligned, with continual improvements underway.

ModelMesh now supports multi-namespace reconciliation. Previously, the ModelMesh controller would only reconcile against resources deployed in the same namespace as the controller. Now, by default, ModelMesh will be able to handle InferenceService deployments in any "modelmesh-enabled" namespace. Learn more here.

Also, while ModelMesh previously only supported S3-based storage, we are happy to share that ModelMesh now works with models hosted using GCS and HTTP(S).

Join the community

Thank you for trying out KServe!

Back to top