Model Serving Frameworks Overview
KServe provides a simple Kubernetes CRD to enable deploying single or multiple trained models onto various model serving runtimes. This page provides an overview of the supported frameworks and their capabilities.
Introduction
KServe supports multiple model serving runtimes including:
- TensorFlow Serving - Google's serving system for TensorFlow models.
- Triton Inference Server - NVIDIA's inference server supporting multiple frameworks
- Hugging Face Server - Specialized for transformer models with Open Inference and OpenAI Protocol support with vLLM.
- LightGBM ModelServer - Specialized for LightGBM models.
- XGBoost ModelServer - Specialized for XGBoost models.
- PMML ModelServer - Specialized for PMML models.
- SKLearn ModelServer - Specialized for SKLearn models.
- PaddlePaddle ModelServer - Specialized for PaddlePaddle models.
These runtimes provide out-of-the-box model serving capabilities. For more complex use cases, you can build custom model servers using KServe's API primitives or tools like BentoML.
Key Features
When you deploy models with InferenceService, you automatically get these serverless features:
Scalability
- Scale to and from Zero - Automatic scaling based on traffic
- Request-based Autoscaling - Support for both CPU and GPU scaling
- Optimized Containers - Performance-optimized runtime containers
Management
- Revision Management - Track and manage different model versions
- Traffic Management - Advanced routing and canary deployments
- Batching - Automatic request batching for improved throughput
Observability
- Request/Response Logging - Comprehensive logging capabilities
- Distributed Tracing - End-to-end request tracing
- Out-of-the-box Metrics - Built-in monitoring and metrics
Security
- Authentication/Authorization - Secure access controls
- Ingress/Egress Control - Network traffic management
Supported Frameworks
The following tables show model serving runtimes supported by KServe, split into predictive and generative inference capabilities:
- HTTP/gRPC columns indicate the prediction protocol version (v1 or v2)
- Asterisk (*) indicates custom prediction protocols in addition to KServe's standard protocols
- Default Runtime Version shows the source and version of the serving runtime
- Generative Inference
- Predictive Inference
Framework | Exported Model Format | HTTP | gRPC | Default Runtime Version | Supported Framework (Major) Version(s) | Examples |
---|---|---|---|---|---|---|
HuggingFace ModelServer | Saved Model, Huggingface Hub Model_Id | OpenAI | -- | v0.15 (KServe) | 4 (Transformers) | GitHub Examples |
HuggingFace VLLM ModelServer | Saved Model, Huggingface Hub Model_Id | OpenAI | -- | v0.15 (KServe) | 0 (VLLM) | GitHub Examples |
Protocol Notes
- *tensorflow: TensorFlow implements its own prediction protocol in addition to KServe's standard protocols. See the TensorFlow Serving Prediction API documentation.
Version Information
The framework versions and runtime configurations can be found in several locations:
- Runtime versions: Check the runtime kustomization YAML
- Supported formats: See individual runtime YAML files under the
supportedModelFormats
field - KServe native runtimes: Find specific versions in kserve/python subdirectories'
pyproject.toml
files
For example, the LightGBM server version can be found in the pyproject.toml file, which specifies lightgbm ~= 3.3.2
.
Runtime Version Configuration
For production services, we highly recommend explicitly setting the runtimeVersion
field in your InferenceService specification to ensure consistent deployments and avoid unexpected version changes.
You can override the default model serving runtime version using the runtimeVersion
field:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchscript-cifar"
spec:
predictor:
model:
modelFormat:
name: "pytorch"
storageUri: "gs://kfserving-examples/models/torchscript"
runtimeVersion: 23.08-py3
Next Steps
- Explore the KServe GitHub repository for more examples
- Learn about custom model serving
- Check out the sample implementations for hands-on tutorials
- Read the KServe developer guide