Kubernetes Deployment Installation

KServe supports Standard mode to enable InferenceService deployment for both Predictive Inference and Generative Inference workloads with minimal dependencies on Kubernetes resources.

This approach uses standard Kubernetes resources:

Deployment for managing container instances
Service for internal communication
Ingress / Gateway API for external access
Horizontal Pod Autoscaler for scaling

Compared to Knative mode which depends on Knative for request-driven autoscaling, in Standard mode KEDA can be installed optionally to enable autoscaling based on any custom metrics. However, note that Scale from Zero is currently not supported in Standard mode for HTTP requests.

Installation Requirements

KServe has the following minimum requirements:

Kubernetes: Version 1.30+
Cert Manager: Version 1.15.0+
Network Controller: Choice of Gateway API (recommended) or Ingress controllers

note

Gateway API is the recommended option for KServe while Ingress API is still supported. Follow the Gateway API migration guide to migrate from Kubernetes Ingress to Gateway API.

Deployment Considerations

For Generative Inference

Standard Kubernetes deployment is the recommended approach for generative inference workloads because it provides:

Full control over resource allocation for GPU-accelerated models
Better handling of long-running inference requests
More predictable scaling behavior for resource-intensive workloads
Support for streaming responses with appropriate networking configuration

For Predictive Inference

Standard Kubernetes deployment is suitable for predictive inference workloads when:

You need direct control over Kubernetes resources
Your models require specific resource configurations
You want to use standard Kubernetes scaling mechanisms
You're integrating with existing Kubernetes monitoring solutions

Prerequisites

Kubernetes cluster (v1.30+)
kubectl configured to access your cluster
Cluster admin permissions

Installation

1. Install Cert Manager

The minimally required Cert Manager version is 1.15.0 and you can refer to the Cert Manager installation guide.

note

Cert Manager is required to provision webhook certs for production-grade installation. Alternatively, you can run a self-signed certs generation script.

2. Install Network Controller

Gateway API
Kubernetes Ingress

The Kubernetes Gateway API is a newer, more flexible and standardized way to manage traffic ingress and egress in Kubernetes clusters. KServe implements the Gateway API version 1.2.1.

The Gateway API is not part of the Kubernetes cluster, therefore it needs to be installed manually:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/standard-install.yaml

Then, create a GatewayClass resource using your preferred network controller. For this example, we will use Envoy Gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: envoy
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller  

Create a Gateway resource to expose the InferenceService:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: kserve-ingress-gateway
  namespace: kserve
spec:
  gatewayClassName: envoy
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - kind: Secret
            name: my-secret
            namespace: kserve
      allowedRoutes:
        namespaces:
          from: All
  infrastructure:
    labels:
      serving.kserve.io/gateway: kserve-ingress-gateway

note

KServe can automatically create a default Gateway named kserve-ingress-gateway during installation if the Helm value kserve.controller.gateway.ingressGateway.createGateway is set to true. If you choose to use this default gateway, you can skip creating your own gateway.

In this guide, we choose to install Istio as ingress controller. The minimally required Istio version is 1.22 and you can refer to the Istio install guide.

Once Istio is installed, create IngressClass resource for istio:

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: istio
spec:
  controller: istio.io/ingress-controller

note

Istio ingress is recommended, but you can choose to install with other Ingress controllers and create IngressClass resource for your Ingress option.

3. Install KServe

note

The default KServe deployment mode is Knative which depends on Knative. The following step changes the default deployment mode to Standard before installing KServe.

Gateway API with Helm
Gateway API with YAML
Ingress with Helm
Ingress with YAML

Install KServe CRDs

helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version v0.15.0

Install KServe Resources

Set the kserve.controller.deploymentMode to Standard and configure the Gateway API:

helm install kserve oci://ghcr.io/kserve/charts/kserve --version v0.15.0 \
  --set kserve.controller.deploymentMode=Standard \
  --set kserve.controller.gateway.ingressGateway.enableGatewayApi=true \
  --set kserve.controller.gateway.ingressGateway.kserveGateway=kserve/kserve-ingress-gateway

Install KServe: --server-side option is required as the InferenceService CRD is large.

kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve.yaml

Install KServe default serving runtimes:

kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve-cluster-resources.yaml

Change default deployment mode and ingress option

First in the ConfigMap inferenceservice-config modify the defaultDeploymentMode to Standard:

kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"deploy": "{\"defaultDeploymentMode\": \"Standard\"}"}}'

Then enable Gateway API and configure the Gateway:

kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"ingress": "{\"enableGatewayApi\": true, \"kserveIngressGateway\": \"kserve/kserve-ingress-gateway\"}"}}'

Install KServe CRDs

helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version v0.15.0

Install KServe Resources

Set the kserve.controller.deploymentMode to Standard and configure the Ingress class:

helm install kserve oci://ghcr.io/kserve/charts/kserve --version v0.15.0 \
  --set kserve.controller.deploymentMode=Standard \
  --set kserve.controller.gateway.ingressGateway.className=istio

Install KServe: --server-side option is required as the InferenceService CRD is large.

kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve.yaml

Install KServe default serving runtimes:

kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve-cluster-resources.yaml

Change default deployment mode and ingress option

First in the ConfigMap inferenceservice-config modify the defaultDeploymentMode to Standard:

kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"deploy": "{\"defaultDeploymentMode\": \"Standard\"}"}}'

Then configure the Ingress class:

kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"ingress": "{\"ingressClassName\": \"istio\"}"}}'

Features

Standard Mode

In standard mode, KServe creates:

Kubernetes Deployments instead of Knative Services
Standard Kubernetes Services for networking
Ingress resources for external access
HorizontalPodAutoscaler for scaling

Benefits

Simplicity: No dependency on Knative or Istio
Control: Direct control over Kubernetes resources
Compatibility: Works with standard Kubernetes tooling
Predictability: No serverless overhead

Verification

Check that all components are running:

kubectl get pods -n kserve
kubectl get crd | grep serving.kserve.io

Expected Output

NAME          URL                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
sklearn-iris  http://sklearn-iris.default.svc.cluster.local   True           100                        sklearn-iris-predictor-default-00001   5m

Next Steps

Deploy your first GenAI InferenceService.
Deploy your first Predictive InferenceService.
Configure auto-scaling for your GenAI models.

Installation Requirements​

Deployment Considerations​

For Generative Inference​

For Predictive Inference​

Prerequisites​

Installation​

1. Install Cert Manager​

2. Install Network Controller​

3. Install KServe​

Features​

Standard Mode​

Benefits​

Verification​

Next Steps​

Installation Requirements

Deployment Considerations

For Generative Inference

For Predictive Inference

Prerequisites

Installation

1. Install Cert Manager

2. Install Network Controller

3. Install KServe

Features

Standard Mode

Benefits

Verification

Next Steps