Skip to main content
Version: Next

Kubernetes Deployment Installation Guide - LLMIsvc

LLMInferenceService is KServe's dedicated solution for Generative AI inference workloads, providing advanced features like:

  • Intelligent Routing: KV cache-aware scheduling, prefill-decode separation
  • Multi-Node Orchestration: Data parallelism, expert parallelism via LeaderWorkerSet
  • Gateway API Native: Built on Kubernetes Gateway API with Inference Extension
  • Autoscaling: Integration with KEDA for custom metrics-based scaling
note

LLMInferenceService is designed specifically for Generative AI workloads (LLMs). For Predictive AI workloads, use InferenceService.

Installation Requirements

Minimum Requirements

  • Kubernetes: Version 1.30+
  • Cert Manager: Version 1.16.0+
  • Gateway API: Version 1.2.1
  • Gateway API Inference Extension (GIE): Version 0.3.0
  • Gateway Provider: Envoy Gateway v1.5.0+
  • LeaderWorkerSet: Version 0.6.2+ (for multi-node deployments)
tip

For detailed dependency information and step-by-step installation, see LLMInferenceService Dependencies.

Prerequisites

  • kubectl configured to access your cluster
  • Cluster admin permissions
  • helm v3+ installed

The fastest way to get started with LLMInferenceService is using the quick install script. Please refer to the Quickstart Guide.

Installation

KServe provides installation scripts for infrastructure-related dependencies and CLI tools, with versions managed via a central place. Here, we will demonstrate how to use these scripts to install the components required for LLMInferenceService. Each component mentioned here can be installed according to your own environment. For example, you can use the GatewayClass you are already using, and you can choose the Gateway API provider that fits your environment

1. Clone KServe Repository

git clone https://github.com/kserve/kserve.git
cd kserve/hack/setup

2. Install Infrastructure Components

Install each component in the following order. Each script supports --install (default), --uninstall, and --reinstall options.

External Load Balancer (Local Clusters Only)

For local development on Kind or Minikube:

infra/external-lb/manage.external-lb.sh
note

Skip this step if you're using a cloud provider (AWS, GCP, Azure) that provides native LoadBalancer support.

Cert Manager

infra/manage.cert-manager.sh
note

Cert Manager is required for webhook certificates and LeaderWorkerSet operator. It's essential for production-grade installation.

Gateway API & Inference Extension CRDs

Installs both Gateway API CRDs and the Inference Extension (GIE):

infra/gateway-api/manage.gateway-api-crd.sh

Envoy Gateway

The Gateway API provider for routing

infra/manage.envoy-gateway.sh

Envoy AI Gateway

The Gateway API Extension provider(GIE) for routing

infra/manage.envoy-ai-gateway.sh

LeaderWorkerSet Operator

Required for multi-node deployments (Data/Expert Parallelism):

infra/manage.lws-operator.sh

GatewayClass

infra/gateway-api/managed.gateway-api-gwclass.sh

Gateway Instance

infra/gateway-api/managed.gateway-api-gw.sh

Install KServe Components

Choose your installation method based on your needs:

Install only LLMInferenceService CRDs and controller using helm:

LLMISVC=true infra/manage.kserve-helm.sh
success

This installs only the LLMInferenceService components. InferenceService is not included.

Next Steps

Now that LLMInferenceService is installed, you can:

  1. Deploy Your First LLM: Follow the Quick Start Guide
  2. Understand the Architecture: Read LLMInferenceService Overview
  3. Explore Configuration Options: Check LLMInferenceService Configuration
  4. Learn Advanced Features: