Canary Rollout¶
Create InferenceService with default model¶
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
pytorch:
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier"
Apply the InferenceService
kubectl apply -f torchserve.yaml
Expected Output
$inferenceservice.serving.kserve.io/torchserve created
Create InferenceService with canary model¶
Change the storageUri
for the new model version and apply the InferenceService
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
canaryTrafficPercent: 20
pytorch:
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v2"
Apply the InferenceService
kubectl apply -f canary.yaml
kubectl get revisions -l serving.kserve.io/inferenceservice=torchserve
NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON
torchserve-predictor-default-9lttm torchserve-predictor-default torchserve-predictor-default-9lttm 1 True
torchserve-predictor-default-kxp96 torchserve-predictor-default torchserve-predictor-default-kxp96 2 True
Run a prediction¶
The first step is to determine the ingress IP and ports and set INGRESS_HOST
and INGRESS_PORT
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./mnist.json
Expected Output
* Trying 52.89.19.61...
* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (52.89.19.61) port 80 (#0)
> PUT /v1/models/mnist:predict HTTP/1.1
> Host: torchserve.kserve-test.example.com
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 167
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< cache-control: no-cache; no-store, must-revalidate, private
< content-length: 1
< date: Tue, 27 Oct 2020 08:26:19 GMT
< expires: Thu, 01 Jan 1970 00:00:00 UTC
< pragma: no-cache
< x-request-id: b10cfc9f-cd0f-4cda-9c6c-194c2cdaa517
< x-envoy-upstream-service-time: 6
< server: istio-envoy
<
* Connection #0 to host a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com left intact
{"predictions": ["2"]}
Check the traffic split between the two revisions¶
kubectl get pods -l serving.kserve.io/inferenceservice=torchserve
NAME READY STATUS RESTARTS AGE
torchserve-predictor-default-9lttm-deployment-7dd5cff4cb-tmmlc 2/2 Running 0 21m
torchserve-predictor-default-kxp96-deployment-5d949864df-bmzfk 2/2 Running 0 20m
Check the traffic split
kubectl get ksvc torchserve-predictor-default -oyaml
status:
address:
url: http://torchserve-predictor-default.default.svc.cluster.local
traffic:
- latestRevision: true
percent: 20
revisionName: torchserve-predictor-default-kxp96
tag: latest
url: http://latest-torchserve-predictor-default.default.example.com
- latestRevision: false
percent: 80
revisionName: torchserve-predictor-default-9lttm
tag: prev
url: http://prev-torchserve-predictor-default.default.example.com
url: http://torchserve-predictor-default.default.example.com