Version: 0.16

Basic Inference Logger

This document explains how to set up basic inference logging in KServe. Inference logging allows you to capture and monitor prediction requests and responses, which is useful for debugging, auditing, and monitoring your machine learning models in production.

Prerequisites

Before setting up inference logging, make sure you have:

A Kubernetes cluster with KServe installed.
kubectl CLI tool installed and configured.
Basic knowledge of Kubernetes and KServe concepts.

Create Message Dumper

First, you need to set up a message dumper service that will receive and log the inference events.

Standard Deployment
Knative Deployment (Knative)

Create a standard Kubernetes deployment and service as the message dumper:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: message-dumper
spec:
  replicas: 1
  selector:
    matchLabels:
      app: message-dumper
  template:
    metadata:
      labels:
        app: message-dumper
    spec:
      containers:
      - name: message-dumper
        image: gcr.io/knative-releases/knative.dev/eventing-contrib/cmd/event_display
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: message-dumper
spec:
  selector:
    app: message-dumper
  ports:
  - port: 80
    targetPort: 8080

Create a Knative Service as the message dumper:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: message-dumper
spec:
  template:
    spec:
      containers:
      - image: gcr.io/knative-releases/knative.dev/eventing-contrib/cmd/event_display

Apply the configuration:

kubectl apply -f message-dumper.yaml

Configure the InferenceService

Next, create an InferenceService with the logger configured:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kserve-examples/models/sklearn/1.0/model"
    logger:
      mode: all
      url: http://message-dumper.default.svc.cluster.local

Apply the InferenceService configuration:

kubectl apply -f sklearn-iris.yaml

Verify the InferenceService is ready:

kubectl get inferenceservices sklearn-iris

Send Test Request

Create an input.json file with the following content:

{
  "instances": [
    [6.8, 2.8, 4.8, 1.4],
    [6.0, 3.4, 4.5, 1.6]
  ]
}

First, determine the ingress IP and ports and set INGRESS_HOST and INGRESS_PORT:

Then send a request:

```bash
MODEL_NAME=sklearn-iris
INPUT_PATH=@./input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

Expected Output

{
    "predictions": [1, 1]
}

Check CloudEvents

Check the logs of the message dumper to see the CloudEvents associated with your previous curl request. The command differs depending on your deployment type.

Raw Kubernetes Deployment
Knative Deployment (Knative)

kubectl logs $(kubectl get pod -l app=message-dumper -o jsonpath='{.items[0].metadata.name}')

kubectl logs $(kubectl get pod -l serving.knative.dev/service=message-dumper -o jsonpath='{.items[0].metadata.name}') user-container

You should see CloudEvents showing both the request and response payloads.

Expected Output

☁️  cloudevents.Event
Validation: valid
Context Attributes,
  specversion: 1.0
  type: org.kubeflow.serving.inference.request
  source: http://localhost:9081/
  id: e58f1edd-43bc-42f6-a10a-d1adb5cab831
  time: 2023-01-30T10:45:00.000Z
  datacontenttype: application/json
Extensions,
  endpoint: 
  inferenceservicename: sklearn-iris
  namespace: default
  traceparent: 00-90a2951c292515b95056d5f778a7824b-d3c911c939c7bef6-00
Data,
  {
    "instances": [
      [
        6.8,
        2.8,
        4.8,
        1.4
      ],
      [
        6.0,
        3.4,
        4.5,
        1.6
      ]
    ]
  }

☁️  cloudevents.Event
Validation: valid
Context Attributes,
  specversion: 1.0
  type: org.kubeflow.serving.inference.response
  source: http://localhost:9081/
  id: e58f1edd-43bc-42f6-a10a-d1adb5cab831
  time: 2023-01-30T10:45:00.000Z
  datacontenttype: application/json
Extensions,
  endpoint: 
  inferenceservicename: sklearn-iris
  namespace: default
  traceparent: 00-90a2951c292515b95056d5f778a7824b-d3c911c939c7bef6-00
Data,
  {
    "predictions": [
      1,
      1
    ]
  }

Cleanup

When you're done experimenting with the inference logger, clean up the resources depending on your deployment method:

Raw Kubernetes Deployment
Knative Deployment (Knative)

kubectl delete isvc sklearn-iris
kubectl delete deploy message-dumper
kubectl delete svc message-dumper

kubectl delete isvc sklearn-iris
kubectl delete ksvc message-dumper

Prerequisites​

Create Message Dumper​

Configure the InferenceService​

Send Test Request​

Check CloudEvents​

Cleanup​