Skip to main content

KServe Resources

KServe extends Kubernetes with custom resources that enable declarative model serving. This section covers all the custom resources provided by KServe and how they work together to create a complete model serving platform.

Overview​

KServe introduces several Custom Resource Definitions (CRDs) that allow you to declaratively define and manage model serving workloads using standard Kubernetes patterns. These resources are managed by the KServe control plane and enable everything from simple single-model serving to complex inference graphs.

Core Resources​

InferenceService​

The primary resource for deploying and managing model serving workloads:

  • InferenceService: The main abstraction for deploying models with automatic scaling, versioning, and traffic management
  • InferenceService Spec: Detailed specification reference for InferenceService configurations

InferenceGraph​

Orchestrates complex multi-model inference workflows:

  • Concepts: Overview of Inference Graph for building complex ML pipelines
  • InferenceGraph: CRD that Defines and manages inference pipelines and model chains
  • Graph Routing: Request routing and data flow within inference graphs

Runtime Resources​

ClusterServingRuntime & ServingRuntime​

Define the runtime environments for serving models:

Storage Resources​

ClusterStorageContainer & StorageContainer​

Manage model storage and access patterns:

Local Model Cache Resources​

LocalModel & LocalModelNode​

Enables local model caching and management:

  • Concepts: Overview of local model caching in KServe.
  • LocalModelCache: CRD that Defines local model caching requirements and policies
  • LocalModelNode: CRD that handles Node-level model caching management
  • LocalModelNodeGroup: CRD for Grouping of local model nodes for management and orchestration of cached models

Configuration Resources​

ConfigMaps and Secrets​

Standard Kubernetes resources used for KServe configuration:

Resource Relationships​

Understanding how KServe resources interact with each other:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ InferenceService│───▢│ ServingRuntime │───▢│ StorageContainer β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ InferenceGraph β”‚ β”‚ LocalModel β”‚
β”‚ β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

API Reference​

For complete API specifications, see:

Next Steps​