KServe v0.14 is Released, Read blog >>

KServe Documentation Website

Index

Initializing search

GitHub

Home
Getting started
Administration Guide
User Guide
API Reference
Developer Guide
Blog
Community

KServe Documentation Website

GitHub

Home
Getting started
Getting started
Administration Guide
Administration Guide
- Install KServe
  Install KServe
  - Serverless
    Serverless
    
    Serverless installation
    
    Istio Service Mesh
    
    Kourier Networking Layer
  - ModelMesh installation
  - Kubernetes deployment installation
  - Migrating from KFServing
User Guide
User Guide
- Concepts
  Concepts
  - Control Plane
    Control Plane
    
    Model Serving Control Plane
  - Data Plane
    Data Plane
    
    Model Serving Data Plane
    
    V1 Inference Protocol
    
    Open Inference Protocol (V2 Inference Protocol)
    
    Open Inference Protocol Extensions
    Open Inference Protocol Extensions
    
    Binary Tensor Data Extension
  - Serving Runtimes
- Model Serving Runtimes
  Model Serving Runtimes
  - Supported Model Frameworks/Formats
    Supported Model Frameworks/Formats
    
    Overview
    
    Tensorflow
    
    PyTorch
    
    Scikit-learn
    
    XGBoost
    
    PMML
    
    Spark MLlib
    
    LightGBM
    
    Paddle
    
    MLFlow
    
    ONNX
  - Multi-Framework Serving Runtimes
    Multi-Framework Serving Runtimes
    
    Nvidia Triton
    Nvidia Triton
    
    Torchscript
    
    Tensorflow
    
    Hugging Face
    
    AMD
  - LLM Runtime
    LLM Runtime
    
    Hugging Face LLM
    Hugging Face LLM
    
    Overview
    
    Text Generation
    
    Text2Text Generation
    
    Token Classification
    
    Text Classification
    
    Fill Mask
    
    SDK Integration
    
    TorchServe LLM
  - How to write a custom predictor
- Multi Model Serving
  Multi Model Serving
  - Overview
    Overview
    
    The Scalability Problem
    
    ModelMesh Overview
- Transformers
  Transformers
- Inference Graph
  Inference Graph
  - Concept
  - Image classification inference graph
- Model Storage
  Model Storage
  - Storage Containers
  - Configure CA Certificate
  - Azure
  - PVC
  - S3
  - OCI
  - URI
  - GCS
  - Hugging Face
  - Model Cache
- Model Explainability
  Model Explainability
  - Concept
  - TrustyAI Explainer
  - Alibi Explainer
    Alibi Explainer
    
    Image Explainer
    
    Income Explainer
    
    Text Explainer
  - AIX Explainer
- Model Monitoring
  Model Monitoring
- Request Batching
  Request Batching
  - Inference Batcher
- Payload Logging
  Payload Logging
  - Inference Logger
- Autoscaling
  Autoscaling
  - Inference Autoscaling
- Node Scheduling
  Node Scheduling
  - Overview
  - InferenceService Node Scheduling
- Kafka
  Kafka
  - Inference with Kafka event source
- Rollout Strategies
  Rollout Strategies
  - Canary
  - Canary Example
- Inference Observability
  Inference Observability
  - Prometheus Metrics
  - Grafana Dashboards
API Reference
API Reference
Developer Guide
Developer Guide
- How to contribute
- Debugging guide
Blog
Blog
- Releases
  Releases
Community
Community

Table of contents

vLLM Runtime

Index

vLLM Runtime¶

The official vLLM support is available through Hugging Face Serving Runtime.

Back to top

Made with Material for MkDocs