kserve

KServe 2024-2025 Roadmap

LLM Serving Runtimes
- Support Speculative Decoding with vLLM runtime [https://github.com/kserve/kserve/issues/3800].
- Support LoRA adapters [https://github.com/kserve/kserve/issues/3750].
- Support LLM Serving runtimes for TensorRT-LLM, TGI and provide benchmarking comparisons [https://github.com/kserve/kserve/issues/3868].
- Support multi-host, multi-GPU inference runtime [https://github.com/kserve/kserve/issues/2145].
LLM Autoscaling
- Support Model Caching with automatic PV/PVC provisioning [https://github.com/kserve/kserve/issues/3869].
- Support Autoscaling settings for serving runtimes.
- Support Autoscaling based on custom metrics [https://github.com/kserve/kserve/issues/3561].
LLM RAG/Agent Pipeline Orchestration
- Support declarative RAG/Agent workflow using KServe Inference Graph [https://github.com/kserve/kserve/issues/3829].
Open Inference Protocol extension to GenAI Task APIs
- Community-maintained Open Inference Protocol repo for OpenAI schema [https://docs.google.com/document/d/1odTMdIFdm01CbRQ6CpLzUIGVppHSoUvJV_zwcX6GuaU].
- Support vertical GenAI Task APIs such as embedding, Text-to-Image, Text-To-Code, Doc-To-Text [https://github.com/kserve/kserve/issues/3572].
LLM Gateway
- Support multiple LLM providers.
- Support token based rate limiting.
- Support LLM router with traffic shaping, fallback, load balancing.
- LLM Gateway observability for metrics and cost reporting

Promote InferenceService and ClusterServingRuntime/ServingRuntime CRD to v1
- Improve InferenceService CRD for REST/gRPC protocol interface
- Improve model storage interface
- Deprecate TrainedModel CRD and add multiple model support for co-hosting, draft model, LoRA adapters to InferenceService.
- Improve YAML UX for predictor and transformer container collocation.
- Close the feature gap between RawDeployment and Serverless mode.
Open Inference Protocol
- Support batching for v2 inference protocol
- Transformer and Explainer v2 inference protocol interoperability
- Improve codec for v2 inference protocol

Add ModelMesh docs and explain the use cases for classic KServe and ModelMesh
Unify the data plane v1 and v2 page formats
Improve v2 data plane docs to tell the story why and what changed
Clean up the examples in kserve repo and unify them with the website’s by creating one source of truth for documentation
Update any out-of-date documentation and make sure the website as a whole is consistent and cohesive

This site is open source. Improve this page.