KServe Resources

KServe extends Kubernetes with custom resources that enable declarative model serving. This section covers all the custom resources provided by KServe and how they work together to create a complete model serving platform.

Overview

KServe introduces several Custom Resource Definitions (CRDs) that allow you to declaratively define and manage model serving workloads using standard Kubernetes patterns. These resources are managed by the KServe control plane and enable everything from simple single-model serving to complex inference graphs.

Core Resources

InferenceService

The primary resource for deploying and managing model serving workloads:

InferenceService: The main abstraction for deploying models with automatic scaling, versioning, and traffic management
InferenceService Spec: Detailed specification reference for InferenceService configurations

InferenceGraph

Orchestrates complex multi-model inference workflows:

Concepts: Overview of Inference Graph for building complex ML pipelines
InferenceGraph: CRD that Defines and manages inference pipelines and model chains
Graph Routing: Request routing and data flow within inference graphs

Runtime Resources

ClusterServingRuntime & ServingRuntime

Define the runtime environments for serving models:

ClusterServingRuntime: CRD for Cluster-wide runtime definitions available to all namespaces
ServingRuntime: CRD for Namespace-scoped runtime definitions for specific workloads
Custom Runtimes: Creating and configuring custom model serving runtimes

Storage Resources

ClusterStorageContainer & StorageContainer

Manage model storage and access patterns:

ClusterStorageContainer: CRD for Cluster-wide storage configuration for model repositories
Storage Backends: Supported storage systems and configuration options

Local Model Cache Resources

LocalModel & LocalModelNode

Enables local model caching and management:

Concepts: Overview of local model caching in KServe.
LocalModelCache: CRD that Defines local model caching requirements and policies
LocalModelNode: CRD that handles Node-level model caching management
LocalModelNodeGroup: CRD for Grouping of local model nodes for management and orchestration of cached models

Configuration Resources

ConfigMaps and Secrets

Standard Kubernetes resources used for KServe configuration:

InferenceService ConfigMap: Global configuration for InferenceService behavior
Logger ConfigMap: Logging and observability configuration
Ingress Configuration: Gateway and routing configuration

Resource Relationships

Understanding how KServe resources interact with each other:

┌─────────────────┐    ┌────────────────────┐    ┌──────────────────┐
│ InferenceService│───▶│ ServingRuntime     │───▶│ StorageContainer │
│                 │    │                    │    │                  │
└─────────────────┘    └────────────────────┘    └──────────────────┘
         │                       │
         ▼                       ▼
┌─────────────────┐    ┌────────────────────┐
│ InferenceGraph  │    │ LocalModel         │
│                 │    │                    │
└─────────────────┘    └────────────────────┘

API Reference

For complete API specifications, see:

KServe API Reference

Next Steps

Start with InferenceService to understand the core serving resource
Explore ServingRuntime to understand runtime configuration
Learn about InferenceGraph for advanced inference workflows

Overview​

Core Resources​

InferenceService​

InferenceGraph​

Runtime Resources​

ClusterServingRuntime & ServingRuntime​

Storage Resources​

ClusterStorageContainer & StorageContainer​

Local Model Cache Resources​

LocalModel & LocalModelNode​

Configuration Resources​

ConfigMaps and Secrets​

Resource Relationships​

API Reference​

Next Steps​