Deploying PMML Models with KServe
PMML (Predictive Model Markup Language) is an XML-based format for describing data mining and statistical models, including inputs to the models, transformations used to prepare data, and the parameters that define the models themselves. This guide demonstrates how to deploy PMML models using KServe's InferenceService
.
Prerequisites
Before you begin, make sure you have:
- A Kubernetes cluster with KServe installed
kubectl
CLI configured to communicate with your cluster- Basic knowledge of Kubernetes concepts and PMML models
- Access to cloud storage (like Google Cloud Storage) or a persistent volume to store your model artifacts
- For local testing: Python environment with PMML libraries and OpenJDK-11 installed
Testing the Model Locally
Once you have your model serialized as model.pmml
, you can use KServe PMML Server to test it locally before deployment.
This local testing step is optional. You can skip to the deployment section below if you prefer.
Using KServe PMMLServer Locally
Prerequisites
To use KServe PMML server locally, install the required dependencies:
-
Install OpenJDK-11
-
Clone the KServe repository:
git clone https://github.com/kserve/kserve
-
Install the
pmmlserver
runtime using Uv (ensure you have Uv installed):cd python/pmmlserver
uv sync
Serving the Model Locally
The pmmlserver
package takes two arguments:
--model_dir
: The directory path where the model is stored--model_name
: The name of the model to be deployed (optional, default ismodel
)
Start your server with:
python3 pmmlserver --model_dir /path/to/model_dir --model_name pmml-iris
The pmmlserver
is based on Py4J and doesn't support multi-process mode, so you can't set spec.predictor.containerConcurrency
. If you want to scale the PMMLServer to improve prediction performance, you should set the InferenceService's resources.limits.cpu
to 1 and scale the replica size.
Deploying PMML Model with V1 Protocol
Creating the InferenceService
To deploy a PMML model, create an InferenceService
manifest:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "pmml-demo"
spec:
predictor:
model:
modelFormat:
name: pmml
storageUri: "gs://kfserving-examples/models/pmml"
Apply the YAML manifest:
kubectl apply -f pmml.yaml
inferenceservice.serving.kserve.io/pmml-demo created
Running a Prediction
First, determine the ingress IP and ports, then set the INGRESS_HOST
and INGRESS_PORT
environment variables.
Create a file named iris-input.json
with the following sample input:
{
"instances": [
[5.1, 3.5, 1.4, 0.2]
]
}
Send the inference request:
MODEL_NAME=pmml-demo
INPUT_PATH=@./iris-input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice pmml-demo -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
* TCP_NODELAY set
* Connected to localhost (::1) port 8081 (#0)
> POST /v1/models/pmml-demo:predict HTTP/1.1
> Host: pmml-demo.default.example.com
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 45
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 45 out of 45 bytes
< HTTP/1.1 200 OK
< content-length: 39
< content-type: application/json; charset=UTF-8
< date: Sun, 18 Oct 2020 15:50:02 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 12
<
* Connection #0 to host localhost left intact
{"predictions": [{'Species': 'setosa', 'Probability_setosa': 1.0, 'Probability_versicolor': 0.0, 'Probability_virginica': 0.0, 'Node_Id': '2'}]}
Deploying the Model with REST Endpoint Using Open Inference Protocol
To deploy your PMML model with the Open Inference Protocol (V2), create an InferenceService
resource specifying protocolVersion: v2
:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "pmml-iris"
spec:
predictor:
model:
modelFormat:
name: pmml
protocolVersion: v2
runtime: kserve-pmmlserver
storageUri: "gs://kfserving-examples/models/pmml"
Apply the YAML manifest:
kubectl apply -f pmml-v2.yaml
Testing the Deployed Model
First, determine the ingress IP and ports, then set the INGRESS_HOST
and INGRESS_PORT
environment variables.
Create a file named iris-input-v2.json
with the following sample input:
{
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"data": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
]
}
Send the inference request:
SERVICE_HOSTNAME=$(kubectl get inferenceservice pmml-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-H "Content-Type: application/json" \
-d @./iris-input-v2.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/pmml-iris/infer
{
"model_name": "pmml-iris",
"model_version": null,
"id": "a187a478-c614-46ce-a7de-2f07871f43f3",
"parameters": null,
"outputs": [
{
"name": "Species",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"versicolor",
"versicolor"
]
},
{
"name": "Probability_setosa",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0,
0
]
},
{
"name": "Probability_versicolor",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.9074074074074074,
0.9074074074074074
]
},
{
"name": "Probability_virginica",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.09259259259259259,
0.09259259259259259
]
},
{
"name": "Node_Id",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"6",
"6"
]
}
]
}
Deploying the Model with gRPC Endpoint
For applications requiring gRPC communication, you can expose a gRPC endpoint by modifying the InferenceService
definition.
KServe currently supports exposing either HTTP or gRPC port, not both simultaneously. By default, the HTTP port is exposed.
- Serverless
- Raw Kubernetes Deployment
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "pmml-iris-grpc"
spec:
predictor:
model:
modelFormat:
name: pmml
protocolVersion: v2
runtime: kserve-pmmlserver
storageUri: "gs://kfserving-examples/models/pmml"
ports:
- name: h2c # knative expects grpc port name to be 'h2c'
protocol: TCP
containerPort: 8081
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "pmml-iris-grpc"
spec:
predictor:
model:
modelFormat:
name: pmml
protocolVersion: v2
runtime: kserve-pmmlserver
storageUri: "gs://kfserving-examples/models/pmml"
ports:
- name: grpc-port # Istio requires the port name to be in the format <protocol>[-<suffix>]
protocol: TCP
containerPort: 8081
Apply the YAML to create the gRPC InferenceService:
kubectl apply -f pmml-grpc.yaml
Testing the gRPC Endpoint with grpcurl
First, determine the ingress IP and ports, then set the INGRESS_HOST
and INGRESS_PORT
environment variables.
After the gRPC InferenceService
becomes ready, use grpcurl to send gRPC requests:
# Download the proto file
curl -O https://raw.githubusercontent.com/kserve/open-inference-protocol/main/specification/protocol/open_inference_grpc.proto
INPUT_PATH=iris-input-grpc.json
PROTO_FILE=open_inference_grpc.proto
SERVICE_HOSTNAME=$(kubectl get inferenceservice pmml-iris-grpc -o jsonpath='{.status.url}' | cut -d "/" -f 3)
First, check if the server is ready:
grpcurl \
-plaintext \
-proto ${PROTO_FILE} \
-authority ${SERVICE_HOSTNAME} \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ServerReady
{
"ready": true
}
To test the model with inference requests, create an input file iris-input-grpc.json
:
{
"model_name": "pmml-iris-grpc",
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"contents": {
"fp32_contents": [6.8, 2.8, 4.8, 1.4, 6.0, 3.4, 4.5, 1.6]
}
}
]
}
Send the gRPC inference request:
grpcurl \
-vv \
-plaintext \
-proto ${PROTO_FILE} \
-authority ${SERVICE_HOSTNAME} \
-d @ \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ModelInfer \
<<< $(cat "$INPUT_PATH")
Response contents:
{
"model_name": "pmml-iris",
"model_version": null,
"id": "a187a478-c614-46ce-a7de-2f07871f43f3",
"parameters": null,
"outputs": [
{
"name": "Species",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"versicolor",
"versicolor"
]
},
{
"name": "Probability_setosa",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0,
0
]
},
{
"name": "Probability_versicolor",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.9074074074074074,
0.9074074074074074
]
},
{
"name": "Probability_virginica",
"shape": [
2
],
"datatype": "FP64",
"parameters": null,
"data": [
0.09259259259259259,
0.09259259259259259
]
},
{
"name": "Node_Id",
"shape": [
2
],
"datatype": "BYTES",
"parameters": null,
"data": [
"6",
"6"
]
}
]
}