Skip to content
KServe Documentation Website
Index
Initializing search
GitHub
Home
Getting started
Administration Guide
User Guide
API Reference
Developer Guide
Blog
Community
KServe Documentation Website
GitHub
Home
Getting started
Getting started
KServe Quickstart
First InferenceService
Interact with InferenceService Swagger UI
Administration Guide
Administration Guide
Install KServe
Install KServe
Serverless
Serverless
Serverless installation
Istio Service Mesh
Kourier Networking Layer
ModelMesh installation
Kubernetes deployment installation
Migrating from KFServing
User Guide
User Guide
Concepts
Concepts
Control Plane
Control Plane
Model Serving Control Plane
Data Plane
Data Plane
Model Serving Data Plane
V1 Inference Protocol
Open Inference Protocol (V2 Inference Protocol)
Open Inference Protocol Extensions
Open Inference Protocol Extensions
Binary Tensor Data Extension
Serving Runtimes
Model Serving Runtimes
Model Serving Runtimes
Supported Model Frameworks/Formats
Supported Model Frameworks/Formats
Overview
Tensorflow
PyTorch
Scikit-learn
XGBoost
PMML
Spark MLlib
LightGBM
Paddle
MLFlow
ONNX
Multi-Framework Serving Runtimes
Multi-Framework Serving Runtimes
Nvidia Triton
Nvidia Triton
Torchscript
Tensorflow
Hugging Face
AMD
LLM Runtime
LLM Runtime
Hugging Face LLM
Hugging Face LLM
Overview
Text Generation
Text2Text Generation
Token Classification
Text Classification
Fill Mask
SDK Integration
TorchServe LLM
How to write a custom predictor
Multi Model Serving
Multi Model Serving
Overview
Overview
The Scalability Problem
ModelMesh Overview
Transformers
Transformers
Feast
How to write a custom transformer
Collocate transformer and predictor
Inference Graph
Inference Graph
Concept
Image classification inference graph
Model Storage
Model Storage
Storage Containers
Azure
PVC
S3
OCI
URI
CA Certificate
GCS
Hugging Face
Model Explainability
Model Explainability
Concept
TrustyAI Explainer
Alibi Explainer
Alibi Explainer
Image Explainer
Income Explainer
Text Explainer
AIX Explainer
Model Monitoring
Model Monitoring
Alibi Detector
AIF Bias Detector
ART Adversarial Detector
Request Batching
Request Batching
Inference Batcher
Payload Logging
Payload Logging
Inference Logger
Autoscaling
Autoscaling
Inference Autoscaling
Node Scheduling
Node Scheduling
Overview
InferenceService Node Scheduling
Kafka
Kafka
Inference with Kafka event source
Rollout Strategies
Rollout Strategies
Canary
Canary Example
Inference Observability
Inference Observability
Prometheus Metrics
Grafana Dashboards
API Reference
API Reference
Control Plane API
Open Inference Protocol API Spec
Python Client SDK
Python Runtime Server SDK
Inference Python Client
Developer Guide
Developer Guide
How to contribute
Debugging guide
Blog
Blog
Releases
Releases
KServe 0.13 Release
KServe 0.11 Release
KServe 0.10 Release
KServe 0.9 Release
KServe 0.8 Release
KServe 0.7 Release
Articles
Articles
KFserving Transition
Community
Community
How to Get Involved
Adopters
Demos and Presentations
Table of contents
vLLM Runtime
Index
vLLM Runtime
¶
The official vLLM support is available through
Hugging Face Serving Runtime
.
Back to top