- KServe is a standard Model Inference Platform on Kubernetes, built for highly scalable use cases.
- Provides performant, standardized inference protocol across ML frameworks.
- Support modern serverless inference workload with Autoscaling including Scale to Zero on GPU.
- Provides high scalability, density packing and intelligent routing using ModelMesh
- Simple and Pluggable production serving for production ML serving including prediction, pre/post processing, monitoring and explainability.
- Advanced deployments with canary rollout, experiments, ensembles and transformers.
ModelMesh is designed for high-scale, high-density and frequently-changing model use cases. ModelMesh intelligently loads and unloads AI models to and from memory to strike an intelligent trade-off between responsiveness to users and computational footprint.