Skip to main content

Quickstart Guide

Welcome to the KServe Quickstart Guide! This guide will help you set up a KServe Quickstart environment for testing and experimentation. KServe provides two deployment paths based on your use case:

  • Generative AI (LLMInferenceService): For Large Language Models and generative AI workloads
  • Predictive AI (InferenceService): For traditional ML models and predictive inference workloads

This guide will walk you through the prerequisites, installation steps, and how to verify your KServe environment is up and running. By the end of this guide, you will have a fully functional KServe environment ready for experimentation.

Prerequisites

Before you can get started with a KServe Quickstart deployment, you will need to ensure you have the following prerequisites installed:

Tools

Make sure you have the following tools installed:

  • kubectl - The Kubernetes command-line tool
  • helm - for installing KServe and other Kubernetes operators
  • curl - for the quickstart script and for testing API endpoints (installed by default on most systems)
Verify Installations

Run the following commands to verify that you have the required tools installed:

To verify kubectl installation, run:

kubectl version --client

To verify helm installation, run:

helm version

To verify curl installation, run:

curl --version

Kubernetes Cluster

Version Requirements

Kubernetes version 1.30 or higher is required.

You will need a running Kubernetes cluster with properly configured kubeconfig to run KServe. You can use any Kubernetes cluster, but for local development and testing, we recommend using kind (Kubernetes in Docker) or minikube.

Using Kind (Kubernetes in Docker):

If you want to run a local Kubernetes cluster, you can use Kind. It allows you to create a Kubernetes cluster using Docker container nodes.

First, ensure you have Docker installed on your machine. Install Kind by following the Kind Quick Start Guide if you haven't done so already.

Then, you can create a local Kubernetes cluster with the following command:

kind create cluster

Using Minikube:

If you prefer to use Minikube, you can follow the Minikube Quickstart Guide to set up a local Kubernetes cluster.

First, ensure you have Minikube installed on your machine. Then, you can start a local Kubernetes cluster with the following command:

minikube start

Install KServe Quickstart Environment

Once you have the prerequisites installed and a Kubernetes cluster running, you can proceed with the KServe Quickstart installation.

warning

KServe Quickstart Environments are for experimentation use only. For production installation, see our Administrator's Guide.

The fastest way to get started with KServe is using the quick install script.

Choose your installation option based on your needs:

  • KServe (Standard) + LLMInferenceService: Install both KServe (Standard) and LLMInferenceService for complete functionality
  • LLMInferenceService Only: Install only LLMInferenceService components without KServe (Standard)
  • Dependencies Only: Install infrastructure dependencies first, then customize your installation

Install all dependencies, KServe (Standard), and LLMInferenceService:

curl -s "https://raw.githubusercontent.com/kserve/kserve/master/hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh" | bash

What gets installed:

Infrastructure Components for Kserve Standard:

  1. ✅ KEDA (for Standard KServe autoscaling)
  2. ✅ KEDA OpenTelemetry Addon (for Standard KServe autoscaling)

Infrastructure Components for LLMInferenceService:

  1. ✅ External Load Balancer (MetalLB for local clusters)
  2. ✅ Cert Manager
  3. ✅ Gateway API CRDs
  4. ✅ Gateway API Inference Extension CRDs
  5. ✅ Envoy Gateway
  6. ✅ Envoy AI Gateway
  7. ✅ LeaderWorkerSet (multi-node deployments)
  8. ✅ GatewayClass
  9. ✅ Gateway

KServe Components:

  1. ✅ KServe CRDs and Controller (Standard)
  2. ✅ LLMInferenceService CRDs and Controller
Component Versions

Component versions are managed via a central place. Check this file for the latest versions used by the installation script.

Installation time: ~5-10 minutes

Local Development

The quick install script automatically configures MetalLB if detected (for kind, minikube), providing LoadBalancer support for local testing.


Verify Installation

After installation, verify all components are working:

# Check all pods are running
kubectl get pods -n cert-manager
kubectl get pods -n envoy-gateway-system
kubectl get pods -n envoy-ai-gateway-system
kubectl get pods -n lws-system
kubectl get pods -n kserve

# Check LLMInferenceService CRD
kubectl get crd llminferenceservices.serving.kserve.io

# Check Gateway status
kubectl get gateway kserve-ingress-gateway -n kserve

# Check Gateway has external IP (may take a few minutes)
kubectl get gateway kserve-ingress-gateway -n kserve -o jsonpath='{.status.addresses[0].value}'

Expected output:

  • ✅ All pods in Running state
  • ✅ Gateway shows READY: True
  • ✅ Gateway has EXTERNAL-IP or ADDRESS assigned
Verify Installation

You should see the LLMInferenceService controller up and running:

NAME                                         READY   STATUS    RESTARTS   AGE
llmisvc-controller-manager-7f5b6c4d8f-abcde 1/1 Running 0 2m

Gateway should have an address:

NAME                                              CLASS   ADDRESS         PROGRAMMED   AGE
gateway.gateway.networking.k8s.io/kserve-ingress-gateway envoy <external-ip> True 2m

Next Steps

Now that you have a KServe Quickstart environment set up, you can start deploying and testing machine learning models. Here are some recommended next steps: