Version: Next

Kubernetes Deployment Installation Guide - LLMIsvc

LLMInferenceService is KServe's dedicated solution for Generative AI inference workloads, providing advanced features like:

Intelligent Routing: KV cache-aware scheduling, prefill-decode separation
Multi-Node Orchestration: Data parallelism, expert parallelism via LeaderWorkerSet
Gateway API Native: Built on Kubernetes Gateway API with Inference Extension
Autoscaling: Integration with KEDA for custom metrics-based scaling

note

LLMInferenceService is designed specifically for Generative AI workloads (LLMs). For Predictive AI workloads, use InferenceService.

Installation Requirements

Minimum Requirements

Kubernetes: Version 1.32+
Cert Manager: Version 1.18.0+
Gateway API: Version 1.2.1
Gateway API Inference Extension (GIE): Version 0.3.0
Gateway Provider: Envoy Gateway v1.5.0+
LeaderWorkerSet: Version 0.6.2+ (for multi-node deployments)

tip

For detailed dependency information and step-by-step installation, see LLMInferenceService Dependencies.

Prerequisites

kubectl configured to access your cluster
Cluster admin permissions
helm v3+ installed

The fastest way to get started with LLMInferenceService is using the quick install script. Please refer to the Quickstart Guide.

Installation

KServe provides installation scripts for infrastructure-related dependencies and CLI tools, with versions managed via a central place. Here, we will demonstrate how to use these scripts to install the components required for LLMInferenceService. Each component mentioned here can be installed according to your own environment. For example, you can use the GatewayClass you are already using, and you can choose the Gateway API provider that fits your environment

1. Clone KServe Repository

git clone https://github.com/kserve/kserve.git
cd kserve/hack/setup

2. Install Infrastructure Components

Install each component in the following order. Each script supports --install (default), --uninstall, and --reinstall options.

External Load Balancer (Local Clusters Only)

For local development on Kind or Minikube:

infra/external-lb/manage.external-lb.sh

note

Skip this step if you're using a cloud provider (AWS, GCP, Azure) that provides native LoadBalancer support.

Cert Manager

infra/manage.cert-manager.sh

note

Cert Manager is required for webhook certificates and LeaderWorkerSet operator. It's essential for production-grade installation.

Gateway API & Inference Extension CRDs

Installs both Gateway API CRDs and the Inference Extension (GIE):

infra/gateway-api/manage.gateway-api-crd.sh

Envoy Gateway

The Gateway API provider for routing

infra/manage.envoy-gateway.sh

Envoy AI Gateway

The Gateway API Extension provider(GIE) for routing

infra/manage.envoy-ai-gateway.sh

LeaderWorkerSet Operator

Required for multi-node deployments (Data/Expert Parallelism):

infra/manage.lws-operator.sh

GatewayClass

infra/gateway-api/managed.gateway-api-gwclass.sh

Gateway Instance

infra/gateway-api/managed.gateway-api-gw.sh

Install KServe Components

Choose your installation method based on your needs:

LLMIsvc Only (Helm)
LLMIsvc Only (Kustomize)
Full KServe (Helm)
Full KServe (Kustomize)

Install only LLMInferenceService CRDs and controller using helm:

LLMISVC=true infra/manage.kserve-helm.sh

success

This installs only the LLMInferenceService components. InferenceService is not included.

Install only LLMInferenceService CRDs and controller using kustomize:

LLMISVC=true infra/manage.kserve-kustomize.sh

note

This provides more granular control over the installation compared to Helm.

Install both InferenceService and LLMInferenceService:

infra/manage.kserve-helm.sh

info

This installs the complete KServe stack including both InferenceService (for Predictive AI) and LLMInferenceService (for Generative AI).

Install both InferenceService and LLMInferenceService using kustomize:

infra/manage.kserve-kustomize.sh

info

This installs the complete KServe stack with kustomize for greater customization flexibility.

Next Steps

Now that LLMInferenceService is installed, you can:

Deploy Your First LLM: Follow the Quick Start Guide
Understand the Architecture: Read LLMInferenceService Overview
Explore Configuration Options: Check LLMInferenceService Configuration
Learn Advanced Features:
- Multi-Node Deployments - Data/Expert parallelism
- Prefill-Decode Separation - Performance optimization

Installation Requirements​

Minimum Requirements​

Prerequisites​

Installation​

1. Clone KServe Repository​

2. Install Infrastructure Components​

External Load Balancer (Local Clusters Only)​

Cert Manager​

Gateway API & Inference Extension CRDs​

Envoy Gateway​

Envoy AI Gateway​

LeaderWorkerSet Operator​

GatewayClass​

Gateway Instance​

Install KServe Components​

Next Steps​

Installation Requirements

Minimum Requirements

Prerequisites

Installation

1. Clone KServe Repository

2. Install Infrastructure Components

External Load Balancer (Local Clusters Only)

Cert Manager

Gateway API & Inference Extension CRDs

Envoy Gateway

Envoy AI Gateway

LeaderWorkerSet Operator

GatewayClass

Gateway Instance

Install KServe Components

Next Steps