Deploying XGBoost Models with KServe
This guide demonstrates how to deploy XGBoost models using KServe's InferenceService. You'll learn how to serve models through both HTTP/REST and gRPC endpoints using the Open Inference Protocol.
Prerequisites
Before you begin, make sure you have:
- A Kubernetes cluster with KServe installed.
 kubectlCLI configured to communicate with your cluster.- Basic knowledge of Kubernetes concepts and XGBoost models.
 
Training a Sample Model
First, train a sample XGBoost model that will be saved as model.bst:
import xgboost as xgb
from sklearn.datasets import load_iris
import os
model_dir = "."
BST_FILE = "model.bst"
iris = load_iris()
y = iris['target']
X = iris['data']
dtrain = xgb.DMatrix(X, label=y)
param = {'max_depth': 6,
            'eta': 0.1,
            'silent': 1,
            'nthread': 4,
            'num_class': 10,
            'objective': 'multi:softmax'
            }
xgb_model = xgb.train(params=param, dtrain=dtrain)
model_file = os.path.join((model_dir), BST_FILE)
xgb_model.save_model(model_file)
Testing the Model Locally
Once you've serialized your model as model.bst, you can use KServe XGBoost Server to spin up a local server for testing.
This local testing step is optional. You can skip to the deployment section below if you prefer.
Using KServe XGBoostServer Locally
Prerequisites
To use KServe XGBoost server locally, install the xgbserver runtime package:
- 
Clone the KServe repository and navigate into the directory:
git clone https://github.com/kserve/kserve - 
Install the
xgbserverruntime using Uv (ensure you have Uv installed):cd python/xgbserver
uv sync 
Serving the Model Locally
The xgbserver package takes three arguments:
--model_dir: The directory path where the model is stored--model_name: The name of the model to be deployed (optional, default ismodel)--nthread: Number of threads to use by XGBoost (optional, default is 1)
Start your server with:
python3 xgbserver --model_dir /path/to/model_dir --model_name xgboost-iris
Deploying the Model with REST Endpoint
To deploy your trained model on Kubernetes with KServe, create an InferenceService resource specifying protocolVersion: v2 to use the Open Inference Protocol:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "xgboost-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: xgboost
      protocolVersion: v2
      runtime: kserve-xgbserver
      storageUri: "gs://kfserving-examples/models/xgboost/iris"
If the runtime field is not provided for V2 protocol, the mlserver runtime is used by default.
Note that, by default the v1beta1 version will expose your model through an API compatible with the existing V1 Dataplane.
This deployment assumes:
- Your model weights (
model.bst) have been uploaded to a storage location accessible from your cluster - The model file must have any of the following extensions: 
.bst,.json,.ubjfor the XGBoost server to recognize it - Your storage URI points to the directory containing the model file
 - The 
kserve-xgbserverruntime is properly configured in your KServe installation 
Apply the YAML manifest:
kubectl apply -f xgboost.yaml
Testing the Deployed Model
You can test your deployed model by sending a sample request that follows the Open Inference Protocol.
Here's an example input payload (iris-input.json):
{
  "inputs": [
    {
      "name": "input-0",
      "shape": [2, 4],
      "datatype": "FP32",
      "data": [
        [6.8, 2.8, 4.8, 1.4],
        [6.0, 3.4, 4.5, 1.6]
      ]
    }
  ]
}
First, determine the ingress IP and ports, then set the INGRESS_HOST and INGRESS_PORT environment variables.
Send the inference request:
SERVICE_HOSTNAME=$(kubectl get inferenceservice xgboost-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
  -H "Host: ${SERVICE_HOSTNAME}" \
  -H "Content-Type: application/json" \
  -d @./iris-input.json \
  http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/xgboost-iris/infer
{
  "id": "4e546709-0887-490a-abd6-00cbc4c26cf4",
  "model_name": "xgboost-iris",
  "model_version": "v1.0.0",
  "outputs": [
    {
      "data": [1.0, 1.0],
      "datatype": "FP32",
      "name": "predict",
      "parameters": null,
      "shape": [2]
    }
  ]
}
Deploying the Model with gRPC Endpoint
For applications requiring gRPC communication, you can expose a gRPC endpoint by modifying the InferenceService definition.
KServe currently supports exposing either HTTP or gRPC port, not both simultaneously. By default, the HTTP port is exposed.
- Knative
 - Standard Deployment
 
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "xgboost-iris-grpc"
spec:
  predictor:
    model:
      modelFormat:
        name: xgboost
      protocolVersion: v2
      runtime: kserve-xgbserver
      storageUri: "gs://kfserving-examples/models/xgboost/iris"
      ports:
        - name: h2c     # knative expects grpc port name to be 'h2c'
          protocol: TCP
          containerPort: 8081
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "xgboost-iris-grpc"
spec:
  predictor:
    model:
      modelFormat:
        name: xgboost
      protocolVersion: v2
      runtime: kserve-xgbserver
      storageUri: "gs://kfserving-examples/models/xgboost/iris"
      ports:
        - name: grpc-port  # Istio requires the port name to be in the format <protocol>[-<suffix>]
          protocol: TCP
          containerPort: 8081
Apply the YAML to create the gRPC InferenceService:
kubectl apply -f xgboost-grpc.yaml
Testing the gRPC Endpoint with grpcurl
First, determine the ingress IP and ports, then set the INGRESS_HOST and INGRESS_PORT environment variables.
After the gRPC InferenceService becomes ready, use grpcurl to send gRPC requests:
# Download the proto file
curl -O https://raw.githubusercontent.com/kserve/open-inference-protocol/main/specification/protocol/open_inference_grpc.proto
INPUT_PATH=iris-input-grpc.json
PROTO_FILE=open_inference_grpc.proto
SERVICE_HOSTNAME=$(kubectl get inferenceservice xgboost-iris-grpc -o jsonpath='{.status.url}' | cut -d "/" -f 3)
First, check if the server is ready:
grpcurl \
  -plaintext \
  -proto ${PROTO_FILE} \
  -authority ${SERVICE_HOSTNAME} \
  ${INGRESS_HOST}:${INGRESS_PORT} \
  inference.GRPCInferenceService.ServerReady
{
  "ready": true
}
To test the model with inference requests, create an input file iris-input-grpc.json:
{
  "model_name": "xgboost-iris-grpc",
  "inputs": [
    {
      "name": "input-0",
      "shape": [2, 4],
      "datatype": "FP32",
      "contents": {
        "fp32_contents": [6.8, 2.8, 4.8, 1.4, 6.0, 3.4, 4.5, 1.6]
      }
    }
  ]
}
Send the gRPC inference request:
grpcurl \
  -vv \
  -plaintext \
  -proto ${PROTO_FILE} \
  -authority ${SERVICE_HOSTNAME} \
  -d @ \
  ${INGRESS_HOST}:${INGRESS_PORT} \
  inference.GRPCInferenceService.ModelInfer \
  <<< $(cat "$INPUT_PATH")
Resolved method descriptor:
// The ModelInfer API performs inference using the specified model. Errors are
// indicated by the google.rpc.Status returned for the request. The OK code
// indicates success and other codes indicate failure.
rpc ModelInfer ( .inference.ModelInferRequest ) returns ( .inference.ModelInferResponse );
Request metadata to send:
(empty)
Response headers received:
content-type: application/grpc
date: Mon, 09 Oct 2023 11:07:26 GMT
grpc-accept-encoding: identity, deflate, gzip
server: istio-envoy
x-envoy-upstream-service-time: 16
Estimated response size: 83 bytes
Response contents:
{
  "modelName": "xgboost-iris-grpc",
  "id": "41738561-7219-4e4a-984d-5fe19bed6298",
  "outputs": [
    {
      "name": "output-0",
      "datatype": "INT32",
      "shape": [
        "2"
      ],
      "contents": {
        "intContents": [
          1,
          1
        ]
      }
    }
  ]
}
Response trailers received:
(empty)
Sent 1 request and received 1 response