Skip to main content

ModelMesh Installation

ModelMesh provides high-scale, high-density model serving for scenarios with frequent model changes and large numbers of models. It's designed for efficient resource utilization and intelligent model loading, making it particularly well-suited for predictive inference workloads.

Overview

info

ModelMesh is optimized for predictive inference workloads with high model density requirements.

ModelMesh is designed for predictive inference use cases where:

  • You have many models (hundreds to thousands)
  • Models are frequently updated or changed
  • Resource efficiency is critical
  • You need intelligent model placement and caching
  • Model inference times are relatively short
  • Models can share computational resources efficiently

Prerequisites

  • Kubernetes cluster (v1.30+)
  • kubectl configured to access your cluster
  • Cluster admin permissions

Installation

Option 1: Quick Install with KServe

Install KServe with ModelMesh support:

curl -s "https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/scripts/install.sh" | bash

Option 2: Manual Installation

1. Install etcd (for model metadata storage)

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/dependencies/etcd.yaml

2. Install ModelMesh Serving

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/default/modelmesh-serving.yaml

3. Install KServe Controller

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve.yaml

Configuration

Enable ModelMesh Mode

Configure KServe to use ModelMesh:

kubectl patch configmap inferenceservice-config -n kserve-system -p '{
"data": {
"deploy": "{\"defaultDeploymentMode\": \"ModelMesh\"}"
}
}'

Storage Configuration

Configure storage for model repositories:

apiVersion: v1
kind: Secret
metadata:
name: model-storage-config
namespace: modelmesh-serving
data:
localMinIO: |
{
"type": "s3",
"access_key_id": "minioadmin",
"secret_access_key": "minioadmin",
"endpoint_url": "http://minio.minio.svc.cluster.local:9000",
"default_bucket": "modelmesh-example-models",
"region": "us-south"
}

Features

Intelligent Model Management

  • Model Caching: Frequently accessed models stay in memory
  • LRU Eviction: Least recently used models are evicted when memory is full
  • Predictive Loading: Models can be pre-loaded based on usage patterns

High Density Serving

  • Resource Sharing: Multiple models share the same runtime pods
  • Dynamic Loading: Models are loaded and unloaded as needed
  • Efficient Packing: Optimal placement of models across available resources

Performance Optimization

  • Fast Model Loading: Optimized model loading and caching
  • Connection Pooling: Efficient request routing to model instances
  • Minimal Overhead: Low latency model switching