Quickstart Guide
Welcome to the KServe Quickstart Guide! This guide will help you set up a KServe Quickstart environment for testing and experimentation. KServe provides two deployment paths based on your use case:
- Generative AI (LLMInferenceService): For Large Language Models and generative AI workloads
- Predictive AI (InferenceService): For traditional ML models and predictive inference workloads
This guide will walk you through the prerequisites, installation steps, and how to verify your KServe environment is up and running. By the end of this guide, you will have a fully functional KServe environment ready for experimentation.
Prerequisites
Before you can get started with a KServe Quickstart deployment, you will need to ensure you have the following prerequisites installed:
Tools
Make sure you have the following tools installed:
- kubectl - The Kubernetes command-line tool
- helm - for installing KServe and other Kubernetes operators
- curl - for the quickstart script and for testing API endpoints (installed by default on most systems)
Run the following commands to verify that you have the required tools installed:
To verify kubectl installation, run:
kubectl version --client
To verify helm installation, run:
helm version
To verify curl installation, run:
curl --version
Kubernetes Cluster
Kubernetes version 1.30 or higher is required.
You will need a running Kubernetes cluster with properly configured kubeconfig to run KServe. You can use any Kubernetes cluster, but for local development and testing, we recommend using kind (Kubernetes in Docker) or minikube.
- Local Cluster (Kind/Minikube)
- Existing Kubernetes Cluster
Using Kind (Kubernetes in Docker):
If you want to run a local Kubernetes cluster, you can use Kind. It allows you to create a Kubernetes cluster using Docker container nodes.
First, ensure you have Docker installed on your machine. Install Kind by following the Kind Quick Start Guide if you haven't done so already.
Then, you can create a local Kubernetes cluster with the following command:
kind create cluster
Using Minikube:
If you prefer to use Minikube, you can follow the Minikube Quickstart Guide to set up a local Kubernetes cluster.
First, ensure you have Minikube installed on your machine. Then, you can start a local Kubernetes cluster with the following command:
minikube start
If you have access to an existing Kubernetes cluster, you can use that as well. Ensure that your kubeconfig is properly configured to connect to the cluster. You can verify your current context with:
kubectl config current-context
Verify your cluster meets the version requirements by running:
kubectl version --output=json
The server version in the output should show version 1.30 or higher:
{
"serverVersion": {
"major": "1",
"minor": "30",
...
}
}
Install KServe Quickstart Environment
Once you have the prerequisites installed and a Kubernetes cluster running, you can proceed with the KServe Quickstart installation.
KServe Quickstart Environments are for experimentation use only. For production installation, see our Administrator's Guide.
Quick Install (Recommended)
The fastest way to get started with KServe is using the quick install script.
- Generative AI (LLMInferenceService)
- Predictive AI (InferenceService)
Choose your installation option based on your needs:
- KServe (Standard) + LLMInferenceService: Install both KServe (Standard) and LLMInferenceService for complete functionality
- LLMInferenceService Only: Install only LLMInferenceService components without KServe (Standard)
- Dependencies Only: Install infrastructure dependencies first, then customize your installation
- KServe (Standard) + LLMInferenceService
- LLMInferenceService Only
- Dependencies Only
Install all dependencies, KServe (Standard), and LLMInferenceService:
curl -s "https://raw.githubusercontent.com/kserve/kserve/master/hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh" | bash
What gets installed:
Infrastructure Components for Kserve Standard:
- ✅ KEDA (for Standard KServe autoscaling)
- ✅ KEDA OpenTelemetry Addon (for Standard KServe autoscaling)
Infrastructure Components for LLMInferenceService:
- ✅ External Load Balancer (MetalLB for local clusters)
- ✅ Cert Manager
- ✅ Gateway API CRDs
- ✅ Gateway API Inference Extension CRDs
- ✅ Envoy Gateway
- ✅ Envoy AI Gateway
- ✅ LeaderWorkerSet (multi-node deployments)
- ✅ GatewayClass
- ✅ Gateway
KServe Components:
- ✅ KServe CRDs and Controller (Standard)
- ✅ LLMInferenceService CRDs and Controller
Component versions are managed via a central place. Check this file for the latest versions used by the installation script.
Installation time: ~5-10 minutes
Install all dependencies and LLMInferenceService (without KServe Standard):
curl -s "https://raw.githubusercontent.com/kserve/kserve/master/hack/setup/quick-install/llmisvc-full-install-with-manifests.sh" | bash
What gets installed:
Infrastructure Components:
- ✅ Cert Manager
- ✅ External Load Balancer (MetalLB for local clusters)
LLMInferenceService Components: 3. ✅ Gateway API CRDs 4. ✅ Gateway API Inference Extension 5. ✅ Envoy Gateway 6. ✅ Envoy AI Gateway 7. ✅ LeaderWorkerSet (multi-node deployments) 8. ✅ GatewayClass 9. ✅ Gateway 10. ✅ LLMInferenceService CRDs and Controller
This installs only LLMInferenceService components. KServe (Standard) is not included.
Installation time: ~5-10 minutes
Install only infrastructure dependencies for LLMIsvc without any KServe components:
curl -s "https://raw.githubusercontent.com/kserve/kserve/master/hack/setup/quick-install/llmisvc-dependency-install.sh" | bash
This is useful when you want to:
- Install LLMInferenceService controller manually later
- Use a specific version of LLMInferenceService
- Customize LLMInferenceService installation with specific Helm values
After installing dependencies, you can install LLMInferenceService controller separately:
# Install LLMInferenceService CRDs
helm install kserve-llmisvc-crd oci://ghcr.io/kserve/charts/kserve-llmisvc-crd \
--version <version> \
--namespace kserve \
--create-namespace
# Install LLMInferenceService Controller
helm install kserve-llmisvc oci://ghcr.io/kserve/charts/kserve-llmisvc-resources \
--version <version> \
--namespace kserve
Replace <version> with the desired version. Check available versions at KServe Releases or in kserve-deps.env.
The quick install script automatically configures MetalLB if detected (for kind, minikube), providing LoadBalancer support for local testing.
- Standard Deployment
- Knative
curl -s "https://raw.githubusercontent.com/kserve/kserve/master/hack/quick_install.sh" | bash -s -- -r
curl -s "https://raw.githubusercontent.com/kserve/kserve/master/hack/quick_install.sh" | bash
Verify Installation
After installation, verify all components are working:
- Generative AI (LLMInferenceService)
- Predictive AI (InferenceService)
# Check all pods are running
kubectl get pods -n cert-manager
kubectl get pods -n envoy-gateway-system
kubectl get pods -n envoy-ai-gateway-system
kubectl get pods -n lws-system
kubectl get pods -n kserve
# Check LLMInferenceService CRD
kubectl get crd llminferenceservices.serving.kserve.io
# Check Gateway status
kubectl get gateway kserve-ingress-gateway -n kserve
# Check Gateway has external IP (may take a few minutes)
kubectl get gateway kserve-ingress-gateway -n kserve -o jsonpath='{.status.addresses[0].value}'
Expected output:
- ✅ All pods in
Runningstate - ✅ Gateway shows
READY: True - ✅ Gateway has
EXTERNAL-IPorADDRESSassigned
You should see the LLMInferenceService controller up and running:
NAME READY STATUS RESTARTS AGE
llmisvc-controller-manager-7f5b6c4d8f-abcde 1/1 Running 0 2m
Gateway should have an address:
NAME CLASS ADDRESS PROGRAMMED AGE
gateway.gateway.networking.k8s.io/kserve-ingress-gateway envoy <external-ip> True 2m
kubectl get pods -n kserve
You should see the KServe controller up and running:
NAME READY STATUS RESTARTS AGE
kserve-controller-manager-7f5b6c4d8f-abcde 1/1 Running 0 2m
kserve-localmodel-controller-manager-5b8b6574c7-jz42m 1/1 Running 0 2m
Next Steps
Now that you have a KServe Quickstart environment set up, you can start deploying and testing machine learning models. Here are some recommended next steps:
- Generative AI (LLMInferenceService)
- Predictive AI (InferenceService)
- 📖 First LLMInferenceService - Deploy your first LLM using LLMInferenceService
- 📖 LLMInferenceService Overview - Learn about LLMInferenceService architecture and features
- 📖 LLMInferenceService Configuration - Explore configuration options for your LLM deployments
- 📖 First GenAI InferenceService - Deploy your first GenAI model using InferenceService.
- 📖 First Predictive InferenceService - Deploy your first predictive model using InferenceService.