KServe Concepts
Welcome to the KServe Concepts section! This section provides a comprehensive overview of the key concepts, components, and architecture that make up the KServe model serving platform.
Architecture
KServe follows a clean separation between control plane and data plane components:
- Architecture Overview: Understand the high-level architecture of KServe, including its control and data planes
- Control Plane: Manages the lifecycle of inference services, inference graphs, handles resource creation, and coordinates with Kubernetes
- Data Plane: Handles actual inference requests, including generation, prediction, transformation, and explanation workflows
Resources
KServe extends Kubernetes with custom resources for declarative model serving:
- InferenceService: The primary resource for deploying and managing model serving workloads
- InferenceGraph: Orchestrates complex multi-model inference workflows
- ServingRuntime: Defines runtime environments for serving models
- StorageContainer: Manages model storage and access patterns
- LocalModelCache: Enables local model caching and management
- Configuration Resources: Standard Kubernetes resources used for KServe configuration
Next Steps
Ready to dive deeper? Start with the Architecture section to understand how KServe works under the hood, or jump to Resources to learn about the specific Kubernetes resources that power KServe.