Skip to content

Prometheus Metrics

Exposing a Prometheus metrics port

All supported serving runtimes support exporting prometheus metrics on a specified port in the inference service's pod. The appropriate port for the model server is defined in the kserve/config/runtimes YAML files. For example, torchserve defines its prometheus port as 8082 in kserve-torchserve.yaml.

metadata:
  name: kserve-torchserve
spec:
  annotations:
    prometheus.kserve.io/port: '8082'
    prometheus.kserve.io/path: "/metrics"

If needed, this value can be overridden in the InferenceService YAML.

To enable prometheus metrics, add the annotation serving.kserve.io/enable-prometheus-scraping to the InferenceService YAML.

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-irisv2"
  annotations:
    serving.kserve.io/enable-prometheus-scraping: "true"
spec:
  predictor:
    sklearn:
      protocolVersion: v2
      storageUri: "gs://seldon-models/sklearn/iris"

The default values for serving.kserve.io/enable-prometheus-scraping can be set in the inferenceservice-config configmap. See the docs for more info.

There is not currently a unified set of metrics exported by the model servers. Each model server may implement its own set of metrics to export.

Note

This annotation defines the prometheus port and path, but it does not trigger the prometheus to scrape. Users must configure prometheus to scrape data from inference service's pod according to the prometheus settings.

Metrics for lgbserver, paddleserver, pmmlserver, sklearnserver, xgbserver, custom transformer/predictor

Prometheus latency histograms are emitted for each of the steps (pre/postprocessing, explain, predict). Additionally, the latencies of each step are logged per request. See also modelserver prometheus label definitions and metric implementation.

Metric Name Description Type
request_preprocess_seconds pre-processing request latency Histogram
request_explain_seconds explain request latency Histogram
request_predict_seconds prediction request latency Histogram
request_postprocess_seconds pre-processing request latency Histogram

Other serving runtime metrics

Some model servers define their own metrics.

Exporting metrics

Exporting metrics in serverless mode requires that the queue-proxy extension image is used.

For more information on how to export metrics, see Queue Proxy Extension documentation.

Knative/Queue-Proxy metrics

Queue proxy emits metrics be default on port 9091. If aggregation metrics are set up with the queue proxy extension, the default port for the aggregated metrics will be 9088. See the Knative documentation (and additional metrics defined in the code) for more information about the metrics queue-proxy exposes.

Back to top