ModelMesh Installation

ModelMesh installation provides high-scale, high-density model serving for scenarios with frequent model changes and large numbers of models, making it particularly well-suited for predictive inference workloads.

It uses a distributed architecture particularly designed for:

High-scale model serving
Multi-model management
Intelligent model loading
Efficient resource utilization
Frequent model updates

Use Cases

ModelMesh is designed for predictive inference use cases where:

You have many models (hundreds to thousands)
Models are frequently updated or changed
Resource efficiency is critical
You need intelligent model placement and caching
Model inference times are relatively short
Models can share computational resources efficiently

Prerequisites

Kubernetes cluster (v1.30+)
kubectl configured to access your cluster
Cluster admin permissions

Installation

Option 1: Quick Install with KServe

Install KServe with ModelMesh support:

curl -s "https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/scripts/install.sh" | bash

Option 2: Manual Installation

1. Install etcd (for model metadata storage)

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/dependencies/etcd.yaml

2. Install ModelMesh Serving

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/default/modelmesh-serving.yaml

3. Install KServe Controller

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.15.0/kserve.yaml

Configuration

Enable ModelMesh Mode

Configure KServe to use ModelMesh:

kubectl patch configmap inferenceservice-config -n kserve-system -p '{
  "data": {
    "deploy": "{\"defaultDeploymentMode\": \"ModelMesh\"}"
  }
}'

Storage Configuration

Configure storage for model repositories:

apiVersion: v1
kind: Secret
metadata:
  name: model-storage-config
  namespace: modelmesh-serving
data:
  localMinIO: |
    {
      "type": "s3",
      "access_key_id": "minioadmin",
      "secret_access_key": "minioadmin",
      "endpoint_url": "http://minio.minio.svc.cluster.local:9000",
      "default_bucket": "modelmesh-example-models",
      "region": "us-south"
    }

Features

Intelligent Model Management

Model Caching: Frequently accessed models stay in memory
LRU Eviction: Least recently used models are evicted when memory is full
Predictive Loading: Models can be pre-loaded based on usage patterns

High Density Serving

Resource Sharing: Multiple models share the same runtime pods
Dynamic Loading: Models are loaded and unloaded as needed
Efficient Packing: Optimal placement of models across available resources

Performance Optimization

Fast Model Loading: Optimized model loading and caching
Connection Pooling: Efficient request routing to model instances
Minimal Overhead: Low latency model switching

Use Cases​

Prerequisites​

Installation​

Option 1: Quick Install with KServe​

Option 2: Manual Installation​

1. Install etcd (for model metadata storage)​

2. Install ModelMesh Serving​

3. Install KServe Controller​

Configuration​

Enable ModelMesh Mode​

Storage Configuration​

Features​

Intelligent Model Management​

High Density Serving​

Performance Optimization​