Skip to main content

Kubernetes Deployment Installation

KServe supports RawDeployment mode to enable InferenceService deployment for both Predictive Inference and Generative Inference workloads with minimal dependencies on Kubernetes resources: Deployment, Service, Ingress / Gateway API and Horizontal Pod Autoscaler.

Compared to Serverless mode which depends on Knative for request-driven autoscaling, in RawDeployment mode KEDA can be installed optionally to enable autoscaling based on any custom metrics. Note that Scale from Zero is currently not supported in RawDeployment mode for HTTP requests.

Installation Requirements

KServe has the following minimum requirements:

  • Kubernetes: Version 1.30+
  • Cert Manager: Version 1.15.0+
  • Network Controller: Choice of Gateway API (recommended) or Ingress controllers
note

Gateway API is the recommended option for KServe while Ingress API is still supported. Follow the Gateway API migration guide to migrate from Kubernetes Ingress to Gateway API.

Deployment Considerations

For Generative Inference

Raw Kubernetes deployment is the recommended approach for generative inference workloads because it provides:

  • Full control over resource allocation for GPU-accelerated models
  • Better handling of long-running inference requests
  • More predictable scaling behavior for resource-intensive workloads
  • Support for streaming responses with appropriate networking configuration

For Predictive Inference

Raw Kubernetes deployment is suitable for predictive inference workloads when:

  • You need direct control over Kubernetes resources
  • Your models require specific resource configurations
  • You want to use standard Kubernetes scaling mechanisms
  • You're integrating with existing Kubernetes monitoring solutions

Prerequisites

  • Kubernetes cluster (v1.30+)
  • kubectl configured to access your cluster
  • Cluster admin permissions

Installation

1. Install Cert Manager

The minimally required Cert Manager version is 1.15.0 and you can refer to the Cert Manager installation guide.

note

Cert Manager is required to provision webhook certs for production-grade installation. Alternatively, you can run a self-signed certs generation script.

2. Install Network Controller

The Kubernetes Gateway API is a newer, more flexible and standardized way to manage traffic ingress and egress in Kubernetes clusters. KServe implements the Gateway API version 1.2.1.

The Gateway API is not part of the Kubernetes cluster, therefore it needs to be installed manually:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/standard-install.yaml

Then, create a GatewayClass resource using your preferred network controller. For this example, we will use Envoy Gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: envoy
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller

Create a Gateway resource to expose the InferenceService:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: kserve-ingress-gateway
namespace: kserve
spec:
gatewayClassName: envoy
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: my-secret
namespace: kserve
allowedRoutes:
namespaces:
from: All
infrastructure:
labels:
serving.kserve.io/gateway: kserve-ingress-gateway
note

KServe can automatically create a default Gateway named kserve-ingress-gateway during installation if the Helm value kserve.controller.gateway.ingressGateway.createGateway is set to true. If you choose to use this default gateway, you can skip creating your own gateway.

3. Install KServe

note

The default KServe deployment mode is Serverless which depends on Knative. The following step changes the default deployment mode to RawDeployment before installing KServe.

  1. Install KServe CRDs
helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version v0.15.0
  1. Install KServe Resources

Set the kserve.controller.deploymentMode to RawDeployment and configure the Gateway API:

helm install kserve oci://ghcr.io/kserve/charts/kserve --version v0.15.0 \
--set kserve.controller.deploymentMode=RawDeployment \
--set kserve.controller.gateway.ingressGateway.enableGatewayApi=true \
--set kserve.controller.gateway.ingressGateway.kserveGateway=kserve/kserve-ingress-gateway

Features

Raw Deployment Mode

In raw deployment mode, KServe creates:

  • Kubernetes Deployments instead of Knative Services
  • Standard Kubernetes Services for networking
  • Ingress resources for external access
  • HorizontalPodAutoscaler for scaling

Benefits

  • Simplicity: No dependency on Knative or Istio
  • Control: Direct control over Kubernetes resources
  • Compatibility: Works with standard Kubernetes tooling
  • Predictability: No serverless overhead

Verification

Check that all components are running:

kubectl get pods -n kserve
kubectl get crd | grep serving.kserve.io
Expected Output
NAME          URL                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
sklearn-iris http://sklearn-iris.default.svc.cluster.local True 100 sklearn-iris-predictor-default-00001 5m

Next Steps