Deploying TensorFlow Models with KServe

This guide walks you through how to deploy a TensorFlow model using KServe's InferenceService. You'll learn how to serve models through both HTTP/REST and gRPC endpoints, and implement canary rollout strategies for model updates.

Prerequisites

Before you begin, make sure you have:

A Kubernetes cluster with KServe installed.
kubectl CLI configured to communicate with your cluster.
Basic knowledge of Kubernetes concepts and TensorFlow saved models.

Creating the InferenceService with V1 REST Endpoints

Create an InferenceService resource that specifies the TensorFlow model format and a storageUri pointing to your saved TensorFlow model:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "flower-sample"
spec:
  predictor:
    model:
      modelFormat:
        name: tensorflow
      storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
      resources:
        requests:
          cpu: "100m"
          memory: "256Mi"
        limits:
          cpu: "1"
          memory: "1Gi"

Apply the YAML configuration to create the InferenceService:

kubectl apply -f tensorflow.yaml

tip

inferenceservice.serving.kserve.io/flower-sample created

Wait for the InferenceService to be in ready state:

kubectl get isvc flower-sample

Expected Output

You should see output similar to:

NAME            URL                                        READY   PREV   LATEST   PREVROLLEDOUTREVISION        LATESTREADYREVISION                     AGE
flower-sample   http://flower-sample.default.example.com   True           100                                   flower-sample-predictor-default-n9zs6   7m15s

Running a Prediction

To test your deployed model:

First, determine the ingress IP and ports for your cluster.
Set the INGRESS_HOST and INGRESS_PORT environment variables accordingly.
Use the following command to send a prediction request with sample input:

MODEL_NAME=flower-sample
INPUT_PATH=@./input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

Expected Output

* Connected to localhost (::1) port 8080 (#0)
> POST /v1/models/tensorflow-sample:predict HTTP/1.1
> Host: tensorflow-sample.default.example.com
> User-Agent: curl/7.73.0
> Accept: */*
> Content-Length: 16201
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 16201 out of 16201 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 222
< content-type: application/json
< date: Sun, 31 Jan 2021 01:01:50 GMT
< x-envoy-upstream-service-time: 280
< server: istio-envoy
< 
{
    "predictions": [
        {
            "scores": [0.999114931, 9.20987877e-05, 0.000136786213, 0.000337257545, 0.000300532585, 1.84813616e-05],
            "prediction": 0,
            "key": "   1"
        }
    ]
}

Creating the InferenceService with gRPC Endpoints

KServe also supports gRPC for inference requests. To create an InferenceService that exposes a gRPC endpoint:

Knative
Standard Deployment

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "flower-grpc"
spec:
  predictor:
    model:
      modelFormat:
        name: tensorflow
      storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
      resources:
        requests:
          cpu: "100m"
          memory: "256Mi"
        limits:
          cpu: "1"
          memory: "1Gi"
      ports:
        - containerPort: 9000
          name: h2c          # knative expects grpc port name to be 'h2c'
          protocol: TCP

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "flower-grpc"
spec:
  predictor:
    model:
      modelFormat:
        name: tensorflow
      storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
      resources:
        requests:
          cpu: "100m"
          memory: "256Mi"
        limits:
          cpu: "1"
          memory: "1Gi"
      ports:
        - containerPort: 9000
          name: grpc-port    # KServe requires the port name to be in the format <protocol>[-<suffix>]
          protocol: TCP

Apply the YAML configuration to create the gRPC InferenceService:

kubectl apply -f grpc.yaml

Expected Output

inferenceservice.serving.kserve.io/flower-grpc created

Running a gRPC Prediction

To run predictions using the gRPC endpoint, you'll need to:

Set up a Python virtual environment with TensorFlow Serving API:

# The prediction script is written in TensorFlow 1.x
pip install tensorflow-serving-api>=1.14.0,<2.0.0

Create a Python script named grpc_client.py to handle gRPC requests. Below is a Python client example for making gRPC requests to your deployed TensorFlow model:

import argparse
import json
import base64
import grpc

from tensorflow.contrib.util import make_tensor_proto
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc


def predict(host, port, hostname, model, signature_name, input_path):
    # If hostname not set, we assume the host is a valid knative dns.
    if hostname:
        host_option = (('grpc.ssl_target_name_override', hostname,),)
    else:
        host_option = None
    channel = grpc.insecure_channel(target='{host}:{port}'.format(host=host, port=port), options=host_option)
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    with open(input_path) as json_file:
        data = json.load(json_file)
    image = data['instances'][0]['image_bytes']['b64']
    key = data['instances'][0]['key']

    # Call classification model to make prediction
    request = predict_pb2.PredictRequest()
    request.model_spec.name = model
    request.model_spec.signature_name = signature_name
    image = base64.b64decode(image)
    request.inputs['image_bytes'].CopyFrom(
        make_tensor_proto(image, shape=[1]))
    request.inputs['key'].CopyFrom(make_tensor_proto(key, shape=[1]))

    result = stub.Predict(request, 10.0)
    print(result)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--host', help='Ingress Host Name', default='localhost', type=str)
    parser.add_argument('--port', help='Ingress Port', default=80, type=int)
    parser.add_argument('--model', help='TensorFlow Model Name', type=str)
    parser.add_argument('--signature_name', help='Signature name of saved TensorFlow model',
                        default='serving_default', type=str)
    parser.add_argument('--hostname', help='Service Host Name', default='', type=str)
    parser.add_argument('--input_path', help='Prediction data input path', default='./input.json', type=str)

    args = parser.parse_args()
    predict(args.host, args.port, args.hostname, args.model, args.signature_name, args.input_path)

Run the gRPC prediction script with the sample input:

MODEL_NAME=flower-grpc
INPUT_PATH=./input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
python grpc_client.py --host $INGRESS_HOST --port $INGRESS_PORT --model $MODEL_NAME --hostname $SERVICE_HOSTNAME --input_path $INPUT_PATH

Expected Output

outputs {
  key: "key"
  value {
    dtype: DT_STRING
    tensor_shape {
      dim {
        size: 1
      }
    }
    string_val: "   1"
  }
}
outputs {
  key: "prediction"
  value {
    dtype: DT_INT64
    tensor_shape {
      dim {
        size: 1
      }
    }
    int64_val: 0
  }
}
outputs {
  key: "scores"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 6
      }
    }
    float_val: 0.9991149306297302
    float_val: 9.209887502947822e-05
    float_val: 0.00013678647519554943
    float_val: 0.0003372581850271672
    float_val: 0.0003005331673193723
    float_val: 1.848137799242977e-05
  }
}
model_spec {
  name: "flowers-sample"
  version {
    value: 1
  }
  signature_name: "serving_default"
}

Prerequisites​

Creating the InferenceService with V1 REST Endpoints​

Running a Prediction​

Creating the InferenceService with gRPC Endpoints​

Running a gRPC Prediction​

Prerequisites

Creating the InferenceService with V1 REST Endpoints

Running a Prediction

Creating the InferenceService with gRPC Endpoints

Running a gRPC Prediction