KServe Resources
KServe extends Kubernetes with custom resources that enable declarative model serving. This section covers all the custom resources provided by KServe and how they work together to create a complete model serving platform.
Overviewβ
KServe introduces several Custom Resource Definitions (CRDs) that allow you to declaratively define and manage model serving workloads using standard Kubernetes patterns. These resources are managed by the KServe control plane and enable everything from simple single-model serving to complex inference graphs.
Core Resourcesβ
InferenceServiceβ
The primary resource for deploying and managing model serving workloads:
- InferenceService: The main abstraction for deploying models with automatic scaling, versioning, and traffic management
- InferenceService Spec: Detailed specification reference for InferenceService configurations
InferenceGraphβ
Orchestrates complex multi-model inference workflows:
- Concepts: Overview of Inference Graph for building complex ML pipelines
- InferenceGraph: CRD that Defines and manages inference pipelines and model chains
- Graph Routing: Request routing and data flow within inference graphs
Runtime Resourcesβ
ClusterServingRuntime & ServingRuntimeβ
Define the runtime environments for serving models:
- ClusterServingRuntime: CRD for Cluster-wide runtime definitions available to all namespaces
- ServingRuntime: CRD for Namespace-scoped runtime definitions for specific workloads
- Custom Runtimes: Creating and configuring custom model serving runtimes
Storage Resourcesβ
ClusterStorageContainer & StorageContainerβ
Manage model storage and access patterns:
- ClusterStorageContainer: CRD for Cluster-wide storage configuration for model repositories
- Storage Backends: Supported storage systems and configuration options
Local Model Cache Resourcesβ
LocalModel & LocalModelNodeβ
Enables local model caching and management:
- Concepts: Overview of local model caching in KServe.
- LocalModelCache: CRD that Defines local model caching requirements and policies
- LocalModelNode: CRD that handles Node-level model caching management
- LocalModelNodeGroup: CRD for Grouping of local model nodes for management and orchestration of cached models
Configuration Resourcesβ
ConfigMaps and Secretsβ
Standard Kubernetes resources used for KServe configuration:
- InferenceService ConfigMap: Global configuration for InferenceService behavior
- Logger ConfigMap: Logging and observability configuration
- Ingress Configuration: Gateway and routing configuration
Resource Relationshipsβ
Understanding how KServe resources interact with each other:
βββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββ
β InferenceServiceβββββΆβ ServingRuntime βββββΆβ StorageContainer β
β β β β β β
βββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββββ
β InferenceGraph β β LocalModel β
β β β β
βββββββββββββββββββ ββββββββββββββββββββββ
API Referenceβ
For complete API specifications, see:
Next Stepsβ
- Start with InferenceService to understand the core serving resource
- Explore ServingRuntime to understand runtime configuration
- Learn about InferenceGraph for advanced inference workflows