Version: 0.16

KServe Concepts

Welcome to the KServe Concepts section! This section provides a comprehensive overview of the key concepts, components, and architecture that make up the KServe model serving platform.

Architecture

KServe follows a clean separation between control plane and data plane components:

Architecture Overview: Understand the high-level architecture of KServe, including its control and data planes
Control Plane: Manages the lifecycle of inference services, inference graphs, handles resource creation, and coordinates with Kubernetes
Data Plane: Handles actual inference requests, including generation, prediction, transformation, and explanation workflows

Resources

KServe extends Kubernetes with custom resources for declarative model serving:

InferenceService: The primary resource for deploying and managing model serving workloads
InferenceGraph: Orchestrates complex multi-model inference workflows
ServingRuntime: Defines runtime environments for serving models
StorageContainer: Manages model storage and access patterns
LocalModelCache: Enables local model caching and management
Configuration Resources: Standard Kubernetes resources used for KServe configuration

Next Steps

Ready to dive deeper? Start with the Architecture section to understand how KServe works under the hood, or jump to Resources to learn about the specific Kubernetes resources that power KServe.

Architecture​

Resources​

Next Steps​

Architecture

Resources

Next Steps