Kubernetes Deployment Installation Guide - LLMIsvc
LLMInferenceService is KServe's dedicated solution for Generative AI inference workloads, providing advanced features like:
- Intelligent Routing: KV cache-aware scheduling, prefill-decode separation
- Multi-Node Orchestration: Data parallelism, expert parallelism via LeaderWorkerSet
- Gateway API Native: Built on Kubernetes Gateway API with Inference Extension
- Autoscaling: Integration with KEDA for custom metrics-based scaling
LLMInferenceService is designed specifically for Generative AI workloads (LLMs). For Predictive AI workloads, use InferenceService.
Installation Requirements
Minimum Requirements
- Kubernetes: Version 1.30+
- Cert Manager: Version 1.16.0+
- Gateway API: Version 1.2.1
- Gateway API Inference Extension (GIE): Version 0.3.0
- Gateway Provider: Envoy Gateway v1.5.0+
- LeaderWorkerSet: Version 0.6.2+ (for multi-node deployments)
For detailed dependency information and step-by-step installation, see LLMInferenceService Dependencies.
Prerequisites
kubectlconfigured to access your cluster- Cluster admin permissions
helmv3+ installed
The fastest way to get started with LLMInferenceService is using the quick install script. Please refer to the Quickstart Guide.
Installation
KServe provides installation scripts for infrastructure-related dependencies and CLI tools, with versions managed via a central place. Here, we will demonstrate how to use these scripts to install the components required for LLMInferenceService. Each component mentioned here can be installed according to your own environment. For example, you can use the GatewayClass you are already using, and you can choose the Gateway API provider that fits your environment
1. Clone KServe Repository
git clone https://github.com/kserve/kserve.git
cd kserve/hack/setup
2. Install Infrastructure Components
Install each component in the following order. Each script supports --install (default), --uninstall, and --reinstall options.
External Load Balancer (Local Clusters Only)
For local development on Kind or Minikube:
infra/external-lb/manage.external-lb.sh
Skip this step if you're using a cloud provider (AWS, GCP, Azure) that provides native LoadBalancer support.
Cert Manager
infra/manage.cert-manager.sh
Cert Manager is required for webhook certificates and LeaderWorkerSet operator. It's essential for production-grade installation.
Gateway API & Inference Extension CRDs
Installs both Gateway API CRDs and the Inference Extension (GIE):
infra/gateway-api/manage.gateway-api-crd.sh
Envoy Gateway
The Gateway API provider for routing
infra/manage.envoy-gateway.sh
Envoy AI Gateway
The Gateway API Extension provider(GIE) for routing
infra/manage.envoy-ai-gateway.sh
LeaderWorkerSet Operator
Required for multi-node deployments (Data/Expert Parallelism):
infra/manage.lws-operator.sh
GatewayClass
infra/gateway-api/managed.gateway-api-gwclass.sh
Gateway Instance
infra/gateway-api/managed.gateway-api-gw.sh
Install KServe Components
Choose your installation method based on your needs:
- LLMIsvc Only (Helm)
- LLMIsvc Only (Kustomize)
- Full KServe (Helm)
- Full KServe (Kustomize)
Install only LLMInferenceService CRDs and controller using helm:
LLMISVC=true infra/manage.kserve-helm.sh
This installs only the LLMInferenceService components. InferenceService is not included.
Install only LLMInferenceService CRDs and controller using kustomize:
LLMISVC=true infra/manage.kserve-kustomize.sh
This provides more granular control over the installation compared to Helm.
Install both InferenceService and LLMInferenceService:
infra/manage.kserve-helm.sh
This installs the complete KServe stack including both InferenceService (for Predictive AI) and LLMInferenceService (for Generative AI).
Install both InferenceService and LLMInferenceService using kustomize:
infra/manage.kserve-kustomize.sh
This installs the complete KServe stack with kustomize for greater customization flexibility.
Next Steps
Now that LLMInferenceService is installed, you can:
- Deploy Your First LLM: Follow the Quick Start Guide
- Understand the Architecture: Read LLMInferenceService Overview
- Explore Configuration Options: Check LLMInferenceService Configuration
- Learn Advanced Features:
- Multi-Node Deployments - Data/Expert parallelism
- Prefill-Decode Separation - Performance optimization