Announcing KServe v0.18 - Multi-Node Inference, OpenAI Responses API, and LLM-D v0.6
Published on April 29, 2026
We are excited to announce the release of KServe v0.18. This release brings multi-node inference support without Ray, LeaderWorkerSet (LWS)-based autoscaling for multi-node workloads, OpenAI Responses API routing, namespace-scoped ModelCache, vLLM upgrade to v0.19.0, llm-d v0.6 integration, enhanced security hardening, and GKE Gateway compatibility improvements.













