Skip to main content

Deploying PMML Models with KServe

PMML (Predictive Model Markup Language) is an XML-based format for describing data mining and statistical models, including inputs to the models, transformations used to prepare data, and the parameters that define the models themselves. This guide demonstrates how to deploy PMML models using KServe's InferenceService.

Prerequisites

Before you begin, make sure you have:

  • A Kubernetes cluster with KServe installed
  • kubectl CLI configured to communicate with your cluster
  • Basic knowledge of Kubernetes concepts and PMML models
  • Access to cloud storage (like Google Cloud Storage) or a persistent volume to store your model artifacts
  • For local testing: Python environment with PMML libraries and OpenJDK-11 installed

Testing the Model Locally

Once you have your model serialized as model.pmml, you can use KServe PMML Server to test it locally before deployment.

tip

This local testing step is optional. You can skip to the deployment section below if you prefer.

Using KServe PMMLServer Locally

Prerequisites

To use KServe PMML server locally, install the required dependencies:

  1. Install OpenJDK-11

  2. Clone the KServe repository:

    git clone https://github.com/kserve/kserve
  3. Install the pmmlserver runtime using Uv (ensure you have Uv installed):

    cd python/pmmlserver
    uv sync

Serving the Model Locally

The pmmlserver package takes two arguments:

  • --model_dir: The directory path where the model is stored
  • --model_name: The name of the model to be deployed (optional, default is model)

Start your server with:

python3 pmmlserver --model_dir /path/to/model_dir --model_name pmml-iris
Performance Considerations

The pmmlserver is based on Py4J and doesn't support multi-process mode, so you can't set spec.predictor.containerConcurrency. If you want to scale the PMMLServer to improve prediction performance, you should set the InferenceService's resources.limits.cpu to 1 and scale the replica size.

Deploying PMML Model with V1 Protocol

Creating the InferenceService

To deploy a PMML model, create an InferenceService manifest:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "pmml-demo"
spec:
predictor:
model:
modelFormat:
name: pmml
storageUri: "gs://kfserving-examples/models/pmml"

Apply the YAML manifest:

kubectl apply -f pmml.yaml
Expected Output
inferenceservice.serving.kserve.io/pmml-demo created

Running a Prediction

First, determine the ingress IP and ports, then set the INGRESS_HOST and INGRESS_PORT environment variables.

Create a file named iris-input.json with the following sample input:

{
"instances": [
[5.1, 3.5, 1.4, 0.2]
]
}

Send the inference request:

MODEL_NAME=pmml-demo
INPUT_PATH=@./iris-input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice pmml-demo -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
Expected Output
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST /v1/models/pmml-demo:predict HTTP/1.1
> Host: pmml-demo.default.example.com
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 45
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 45 out of 45 bytes
< HTTP/1.1 200 OK
< content-length: 39
< content-type: application/json; charset=UTF-8
< date: Sun, 18 Oct 2020 15:50:02 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 12
<
* Connection #0 to host localhost left intact
{"predictions": [{'Species': 'setosa', 'Probability_setosa': 1.0, 'Probability_versicolor': 0.0, 'Probability_virginica': 0.0, 'Node_Id': '2'}]}

Deploying the Model with REST Endpoint Using Open Inference Protocol

To deploy your PMML model with the Open Inference Protocol (V2), create an InferenceService resource specifying protocolVersion: v2:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "pmml-iris"
spec:
predictor:
model:
modelFormat:
name: pmml
protocolVersion: v2
runtime: kserve-pmmlserver
storageUri: "gs://kfserving-examples/models/pmml"

Apply the YAML manifest:

kubectl apply -f pmml-v2.yaml

Testing the Deployed Model

First, determine the ingress IP and ports, then set the INGRESS_HOST and INGRESS_PORT environment variables.

Create a file named iris-input-v2.json with the following sample input:

{
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"data": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
]
}

Send the inference request:

SERVICE_HOSTNAME=$(kubectl get inferenceservice pmml-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-H "Content-Type: application/json" \
-d @./iris-input-v2.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/pmml-iris/infer
Expected Output
{
"model_name": "pmml-iris",
"model_version": null,
"id": "a187a478-c614-46ce-a7de-2f07871f43f3",
"parameters": null,
"outputs": [
{
"name": "Species",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"versicolor",
"versicolor"
]
},
{
"name": "Probability_setosa",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0,
0
]
},
{
"name": "Probability_versicolor",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.9074074074074074,
0.9074074074074074
]
},
{
"name": "Probability_virginica",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.09259259259259259,
0.09259259259259259
]
},
{
"name": "Node_Id",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"6",
"6"
]
}
]
}

Deploying the Model with gRPC Endpoint

For applications requiring gRPC communication, you can expose a gRPC endpoint by modifying the InferenceService definition.

tip

KServe currently supports exposing either HTTP or gRPC port, not both simultaneously. By default, the HTTP port is exposed.

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "pmml-iris-grpc"
spec:
predictor:
model:
modelFormat:
name: pmml
protocolVersion: v2
runtime: kserve-pmmlserver
storageUri: "gs://kfserving-examples/models/pmml"
ports:
- name: h2c # knative expects grpc port name to be 'h2c'
protocol: TCP
containerPort: 8081

Apply the YAML to create the gRPC InferenceService:

kubectl apply -f pmml-grpc.yaml

Testing the gRPC Endpoint with grpcurl

First, determine the ingress IP and ports, then set the INGRESS_HOST and INGRESS_PORT environment variables.

After the gRPC InferenceService becomes ready, use grpcurl to send gRPC requests:

# Download the proto file
curl -O https://raw.githubusercontent.com/kserve/open-inference-protocol/main/specification/protocol/open_inference_grpc.proto

INPUT_PATH=iris-input-grpc.json
PROTO_FILE=open_inference_grpc.proto
SERVICE_HOSTNAME=$(kubectl get inferenceservice pmml-iris-grpc -o jsonpath='{.status.url}' | cut -d "/" -f 3)

First, check if the server is ready:

grpcurl \
-plaintext \
-proto ${PROTO_FILE} \
-authority ${SERVICE_HOSTNAME} \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ServerReady
Expected Output
{
"ready": true
}

To test the model with inference requests, create an input file iris-input-grpc.json:

{
"model_name": "pmml-iris-grpc",
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"contents": {
"fp32_contents": [6.8, 2.8, 4.8, 1.4, 6.0, 3.4, 4.5, 1.6]
}
}
]
}

Send the gRPC inference request:

grpcurl \
-vv \
-plaintext \
-proto ${PROTO_FILE} \
-authority ${SERVICE_HOSTNAME} \
-d @ \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ModelInfer \
<<< $(cat "$INPUT_PATH")
Expected Output
Response contents:
{
"model_name": "pmml-iris",
"model_version": null,
"id": "a187a478-c614-46ce-a7de-2f07871f43f3",
"parameters": null,
"outputs": [
{
"name": "Species",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"versicolor",
"versicolor"
]
},
{
"name": "Probability_setosa",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0,
0
]
},
{
"name": "Probability_versicolor",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.9074074074074074,
0.9074074074074074
]
},
{
"name": "Probability_virginica",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.09259259259259259,
0.09259259259259259
]
},
{
"name": "Node_Id",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"6",
"6"
]
}
]
}