Deploy a PyTorch Model with TorchServe InferenceService¶
In this example, we deploy a trained PyTorch MNIST model to predict handwritten digits by running an InferenceService
with TorchServe runtime which is the default installed serving runtime for PyTorch models. Model interpretability is also an important aspect which helps to understand which of the input features were important for a particular classification.
Captum is a model interpretability library. In this example, TorchServe explain endpoint is implemented with Captum's state-of-the-art algorithm, including integrated gradients to provide users with an easy way to understand which features are contributing to the model output. You can refer to the Captum Tutorial for more examples.
Create Model Storage with a Model Archive File and Config¶
The KServe/TorchServe integration expects following model store layout.
├── config
│ ├── config.properties
├── model-store
│ ├── densenet_161.mar
│ ├── mnist.mar
TorchServe provides a utility to package all the model artifacts into a single TorchServe Model Archive File (MAR). After model artifacts are packaged into a MAR file, you then upload to the model-store under the model storage path.
You can store your model and dependent files on remote storage or local persistent volume. The MNIST model and dependent files can be obtained from here.
Note
For remote storage you can choose to start the example using the prebuilt MNIST MAR file stored on KServe example GCS bucket
gs://kfserving-examples/models/torchserve/image_classifier
, or generate the MAR file with torch-model-archiver
and
create the model store on remote storage according to the above layout.
torch-model-archiver --model-name mnist --version 1.0 \
--model-file model-archiver/model-store/mnist/mnist.py \
--serialized-file model-archiver/model-store/mnist/mnist_cnn.pt \
--handler model-archiver/model-store/mnist/mnist_handler.py \
For PVC user please refer to model archive file generation for auto generation of MAR files with the model and dependent files.
TorchServe uses a config.properties file to store configuration. Please see here for more details with the properties supported by the configuration file. The following is a sample file for KServe:
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":10,"responseTimeout":120}}}}
The KServe/TorchServe integration supports KServe v1/v2 REST protocol. In the config.properties
, we need to turn on the flag enable_envvars_config
to enable setting the KServe envelop using an environment variable.
Warning
The previous service_envelope
property has been deprecated and in the config.properties file use the flag enable_envvars_config=true
to enable setting the service envelope at runtime.
The requests are converted from KServe inference request format to TorchServe request format and sent to the inference_address
configured
via local socket.
Deploy PyTorch Model with V1 REST Protocol¶
Create the TorchServe InferenceService¶
KServe by default selects the TorchServe
runtime when you specify the model format pytorch
on new model spec.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
pytorch:
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
model:
modelFormat:
name: pytorch
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
For deploying the model on CPU, apply the following torchserve.yaml to create the InferenceService
.
kubectl apply -f torchserve.yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
pytorch:
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
resources:
limits:
memory: 4Gi
nvidia.com/gpu: "1"
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
model:
modelFormat:
name: pytorch
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
resources:
limits:
memory: 4Gi
nvidia.com/gpu: "1"
For deploying the model on GPU, apply the gpu.yaml to create the GPU InferenceService
.
kubectl apply -f gpu.yaml
Expected Output
$ inferenceservice.serving.kserve.io/torchserve created
Model Inference¶
The first step is to determine the ingress IP and ports and set INGRESS_HOST
and INGRESS_PORT
.
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
You can use image converter to convert the images to base64 byte array, for other models please refer to input request.
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./mnist.json
Expected Output
* Trying 52.89.19.61...
* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (52.89.19.61) port 80 (#0)
> PUT /v1/models/mnist HTTP/1.1
> Host: torchserve.kserve-test.example.com
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 167
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< cache-control: no-cache; no-store, must-revalidate, private
< content-length: 1
< date: Tue, 27 Oct 2020 08:26:19 GMT
< expires: Thu, 01 Jan 1970 00:00:00 UTC
< pragma: no-cache
< x-request-id: b10cfc9f-cd0f-4cda-9c6c-194c2cdaa517
< x-envoy-upstream-service-time: 6
< server: istio-envoy
<
* Connection #0 to host a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com left intact
{"predictions": ["2"]}
Model Explanation¶
To get model explanation:
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist:explain -d @./mnist.json
Expected Output
{"explanations": [[[[0.0005394675730469475, -0.0022280013123036043, -0.003416480100841055, -0.0051329881112415965, -0.009973864160829985, -0.004112560908882716, -0.009223458030656112, -0.0006676354577291628, -0.005249806664413386, -0.0009790519227372953, -0.0026914653993121195, -0.0069470097151383995, -0.00693530415962956, -0.005973878697847718, -0.00425042437288857, 0.0032867281838150977, -0.004297780258633562, -0.005643196661192014, -0.00653025019738562, -0.0047062916121001185, -0.0018656628277792628, -0.0016757477204072532, -0.0010410417081844845, -0.0019093520822156726, -0.004451403461006374, -0.0008552767257773671, -0.0027638888169885267, -0.0], [0.006971297052106784, 0.007316855222185687, 0.012144494329150574, 0.011477799383288441, 0.006846725347670252, 0.01149386176451476, 0.0045351987881190655, 0.007038361889638708, 0.0035855377023272157, 0.003031419502053957, -0.0008611575226775316, -0.0011085224745969223, -0.0050840743637658534, 0.009855491784340777, 0.007220680811043034, 0.011374285598070253, 0.007147725481709019, 0.0037114580912849457, 0.00030763245479291384, 0.0018305492665953394, 0.010106224395114147, 0.012932881164284687, 0.008862892007714321, 0.0070960526615982435, -0.0015931137903787505, 0.0036495747329455906, 0.0002593849391051298, -0.0], [0.006467265785857396, -0.00041793201228071674, 0.004900316089756856, 0.002308395474823997, 0.007859295399592283, 0.003916404948969494, 0.005630750246437249, 0.0043712538044184375, 0.006128530599133763, -0.009446321309831246, -0.014173645867037036, -0.0062988650915794565, -0.011473838941118539, -0.009049151947644047, -0.0007625645864610934, -0.013721416630061238, -0.0005580156670410108, 0.0033404383756480784, -0.006693278798487951, -0.003705084551144756, 0.005100375089529131, 5.5276874714401074e-05, 0.007221745280359063, -0.00573598303916232, -0.006836169033785967, 0.0025401608627538936, 9.303533912921196e-05, -0.0], [0.005914399808621816, 0.00452643561023696, 0.003968242261515448, 0.010422786058967673, 0.007728358107899074, 0.01147115923288383, 0.005683869479056691, 0.011150670502307374, 0.008742555292485278, 0.0032882897575743754, 0.014841138421861584, 0.011741228362482451, 0.0004296862879259221, -0.0035118140680654854, -0.006152254410078331, -0.004925121936901983, -2.3611205202801947e-06, 0.029347073037039074, 0.02901626308947743, 0.023379353021343398, 0.004027157620197582, -0.01677662249919171, -0.013497255736128979, 0.006957482854214602, 0.0018321766800746145, 0.008277034396684563, 0.002733405455464871, -0.0], [0.0049579739156640065, -0.002168016158233997, 0.0020644317321723642, 0.0020912464240293825, 0.004719691119907336, 0.007879231202446626, 0.010594445898145937, 0.006533067778982801, 0.002290214592708113, -0.0036651114968251986, 0.010753227423379443, 0.006402706020466243, -0.047075193909339695, -0.08108259303568185, -0.07646875196692542, -0.1681834845371156, -0.1610307396135756, -0.12010309927453829, -0.016148831320070896, -0.009541525999486027, 0.04575604594761406, 0.031470966329886635, 0.02452149438024385, 0.016594078577569567, 0.012213591301610382, -0.002230875840404426, 0.0036704051254298374, -0.0], [0.006410107592414739, 0.005578283890924384, 0.001977103461731095, 0.008935476507124939, 0.0011305055729953436, 0.0004946313900665659, -0.0040266029554395935, -0.004270765544167256, -0.010832150944943138, -0.01653511868336456, -0.011121302103373972, -0.42038514526905024, -0.22874576003118394, -0.16752936178907055, -0.17021699697722079, -0.09998584936787697, -0.09041117495322142, -0.10230248444795721, -0.15260897522094888, 0.07770835838531896, -0.0813761125123066, 0.027556910053932963, 0.036305965104261866, 0.03407793793894619, 0.01212761779302579, 0.006695133380685627, 0.005331392748588556, -0.0], [0.008342680065996267, -0.00029249776150416367, 0.002782130291086583, 0.0027793744856745373, 0.0020525102690845407, 0.003679269934110004, 0.009373846012918791, -0.0031751745946300403, -0.009042846256743316, 0.0074141593032070775, -0.02796812516561052, -0.593171583786029, -0.4830164472795136, -0.353860128479443, -0.256482708704862, 0.11515586314578445, 0.12700563162828346, 0.0022342450630152204, -0.24673707669992118, -0.012878340813781437, 0.16866821780196756, 0.009739033161051434, -0.000827843726513152, -0.0002137320694585577, -0.004179480126338929, 0.008454049232317358, -0.002767934266266998, -0.0], [0.007070382982749552, 0.005342127805750565, -0.000983984198542354, 0.007910101170274493, 0.001266267696096404, 0.0038575136843053844, 0.006941130321773131, -0.015195182020687892, -0.016954974010578504, -0.031186444096787943, -0.031754626467747966, 0.038918845112017694, 0.06248943950328597, 0.07703301092601872, 0.0438493628024275, -0.0482404449771698, -0.08718650815999045, -0.0014764704694506415, -0.07426336448916614, -0.10378029666564882, 0.008572087846793842, -0.00017173413848283343, 0.010058893270893113, 0.0028410498666004377, 0.002008290211806285, 0.011905375389931099, 0.006071375802943992, -0.0], [0.0076080165949142685, -0.0017127333725310495, 0.00153128150106188, 0.0033391793764531563, 0.005373442509691564, 0.007207746020295443, 0.007422946703693544, -0.00699779191449194, 0.002395328253696969, -0.011682618874195954, -0.012737004464649057, -0.05379966383523857, -0.07174960461749053, -0.03027341304050314, 0.0019411862216381327, -0.0205575129473766, -0.04617091711614171, -0.017655308106959804, -0.009297162816368814, -0.03358572117988279, -0.1626068444778013, -0.015874364762085157, -0.0013736074085577258, -0.014763439328689378, 0.00631805792697278, 0.0021769414283267273, 0.0023061635006792498, -0.0], [0.005569931813561535, 0.004363218328087518, 0.00025609463218383973, 0.009577483244680675, 0.007257755916229399, 0.00976284778532342, -0.006388840235419147, -0.009017880790555707, -0.015308709334434867, -0.016743935775597355, -0.04372596546189275, -0.03523469356755156, -0.017257810114846107, 0.011960489902313411, 0.01529079831828911, -0.020076559119468443, -0.042792547669901516, -0.0029492027218867116, -0.011109560582516062, -0.12985858077848939, -0.2262858575494602, -0.003391725540087574, -0.03063368684328981, -0.01353486587575121, 0.0011140822443932317, 0.006583451102528798, 0.005667533945285076, -0.0], [0.004056272267155598, -0.0006394041203204911, 0.004664893926197093, 0.010593032387298614, 0.014750931538689989, 0.015428721146282149, 0.012167820222401367, 0.017604752451202518, 0.01038886849969188, 0.020544326931163263, -0.0004206566917812794, -0.0037463581359232674, -0.0024656693040735075, 0.0026061897697624353, -0.05186055271869177, -0.09158655048397382, 0.022976389912563913, -0.19851635458461808, -0.11801281807622972, -0.29127727790584423, -0.017138655663803876, -0.04395515676468641, -0.019241432506341576, 0.0011342298743447392, 0.0030625771422964584, -0.0002867924892991192, -0.0017908808807543712, -0.0], [0.0030114260660488892, 0.0020246448273580006, -0.003293361220376816, 0.0036965043883218584, 0.00013185761728146236, -0.004355610866966878, -0.006432601921104354, -0.004148701459814858, 0.005974553907915845, -0.0001399233607281906, 0.010392944122965082, 0.015693249298693028, 0.0459528427528407, -0.013921539948093455, -0.06615556518538708, 0.02921438991320325, -0.16345220625101778, -0.002130491295590408, -0.11449749664916867, -0.030980255589300607, -0.04804122537359171, -0.05144994776295644, 0.005122827412776085, 0.006464862173908011, 0.008624278272940246, 0.0037316228508156427, 0.0036947794337026706, -0.0], [0.0038173843228389405, -0.0017091931226819494, -0.0030871869816778068, 0.002115642501535999, -0.006926441921580917, -0.003023077828426468, -0.014451359520861637, -0.0020793048380231397, -0.010948003939342523, -0.0014460716966395166, -0.01656990336897737, 0.003052317148320358, -0.0026729564809943513, -0.06360067057346147, 0.07780985635080599, -0.1436689936630281, -0.040817177623437874, -0.04373367754296477, -0.18337299150349698, 0.025295182977407064, -0.03874921104331938, -0.002353901742617205, 0.011772560401335033, 0.012480994515707569, 0.006498422579824301, 0.00632320984076023, 0.003407169765754805, -0.0], [0.00944355257990139, 0.009242583578688485, 0.005069860444386138, 0.012666191449103024, 0.00941789912565746, 0.004720427012836104, 0.007597687789204113, 0.008679266528089945, 0.00889322771021875, -0.0008577904940828809, 0.0022973860384607604, 0.025328230809207493, -0.09908781123080951, -0.07836626399832172, -0.1546141264726177, -0.2582207272050766, -0.2297524599578219, -0.29561835103416967, 0.12048787956671528, -0.06279365699861471, -0.03832012404275233, 0.022910264999199934, 0.005803508497672737, -0.003858461926053348, 0.0039451232171312765, 0.003858476747495933, 0.0013034515558609956, -0.0], [0.009725756015628606, -0.0004001101998876524, 0.006490722835571152, 0.00800808023631959, 0.0065880711806331265, -0.0010264326176194034, -0.0018914305972878344, -0.008822522194658438, -0.016650520788128117, -0.03254382594389507, -0.014795713101569494, -0.05826499837818885, -0.05165369567511702, -0.13384277337594377, -0.22572641373340493, -0.21584739544668635, -0.2366836351939208, 0.14937824076489659, -0.08127414932170171, -0.06720440139736879, -0.0038552732903526744, 0.0107597891707803, -5.67453590118174e-05, 0.0020161340511396244, -0.000783322694907436, -0.0006397207517995289, -0.005291639205010064, -0.0], [0.008627543242777584, 0.007700097300051849, 0.0020430960246806138, 0.012949015733198586, 0.008428709579953574, 0.001358177022953576, 0.00421863939925833, 0.002657580000868709, -0.007339431957237175, 0.02008439775442315, -0.0033717631758033114, -0.05176633249899187, -0.013790328758662772, -0.39102366157050594, -0.167341447585844, -0.04813367828213947, 0.1367781582239039, -0.04672809260566293, -0.03237784669978756, 0.03218068777925178, 0.02415063765016493, -0.017849899351200002, -0.002975675228088795, -0.004819438014786686, 0.005106898651831245, 0.0024278620704227456, 6.784303333368138e-05, -0.0], [0.009644258527009343, -0.001331907219439711, -0.0014639718434477777, 0.008481926798958248, 0.010278031715467508, 0.003625808326891529, -0.01121188617599796, -0.0010634587872994379, -0.0002603820881968461, -0.017985648016990465, -0.06446652745470374, 0.07726063173046191, -0.24739929795334742, -0.2701855018480216, -0.08888614776216278, 0.1373325760136816, -0.02316068912438066, -0.042164834956711514, 0.0009266091344106458, 0.03141872420427644, 0.011587728430225652, 0.0004755143243520787, 0.005860642609620605, 0.008979633931394438, 0.005061734169974005, 0.003932710387086098, 0.0015489986106803626, -0.0], [0.010998736164377534, 0.009378969800902604, 0.00030577045264713074, 0.0159329353530375, 0.014849508018911006, -0.0026513365659554225, 0.002923303082126996, 0.01917908707828847, -0.02338288107991566, -0.05706674679291175, 0.009526265752669624, -0.19945255386401284, -0.10725519695909647, -0.3222906835083537, -0.03857038318412844, -0.013279804965996065, -0.046626023244262085, -0.029299060237210447, -0.043269580558906555, -0.03768510002290657, -0.02255977771908117, -0.02632588166863199, -0.014417349488098566, -0.003077271951572957, -0.0004973277708010661, 0.0003475839139671271, -0.0014522783025903258, -0.0], [0.012215315671616316, -0.001693194176229889, 0.011365785434529038, 0.0036964574178487792, -0.010126738168635003, -0.025554378647710443, 0.006538003839811914, -0.03181759044467965, -0.016424751042854728, 0.06177539736110035, -0.43801735323216856, -0.29991040815937386, -0.2516019795363623, 0.037789523540809, -0.010948746374759491, -0.0633901687126727, -0.005976006160777705, 0.006035133605976937, -0.04961632526071937, -0.04142116972831476, -0.07558952727782252, -0.04165176179187153, -0.02021603856619006, -0.0027365663096057032, -0.011145473712733575, 0.0003566937349350848, -0.00546472985268321, -0.0], [0.008009386447317503, 0.006831207743885825, 0.0051306149795546365, 0.016239014770865052, 0.020925441734273218, 0.028344800173195076, -0.004805080609285047, -0.01880521614501033, -0.1272329010865855, -0.39835936819190537, -0.09113694760349819, -0.04061591094832608, -0.12677021961235907, 0.015567707226741051, -0.005615051546243333, -0.06454044862001587, 0.0195457674752272, -0.04219686517155871, -0.08060569979524296, 0.027234494361702787, -0.009152881336047056, -0.030865118003992217, -0.005770311060090559, 0.002905833371986098, 5.606663556872091e-05, 0.003209538083839772, -0.0018588810743365345, -0.0], [0.007587008852984699, -0.0021213639853557625, 0.0007709558092903736, 0.013883256128746423, 0.017328713012428214, 0.03645357525636198, -0.04043993335238427, 0.05730125171252314, -0.2563293727512057, -0.11438826083879326, 0.02662382809034687, 0.03525271352483709, 0.04745678120172762, 0.0336360484090392, -0.002916635707204059, -0.17950855098650784, -0.44161773297052964, -0.4512180227831197, -0.4940283106297913, -0.1970108671285798, 0.04344323143078066, -0.012005120444897523, 0.00987576109166055, -0.0018336757466252476, 0.0004913959502151706, -0.0005409724034216215, -0.005039223900868212, -0.0], [0.00637876531169957, 0.005189469227685454, 0.0007676355246000376, 0.018378100865097655, 0.015739815031394887, -0.035524983116512455, 0.03781006978038308, 0.28859052096740495, 0.0726464110153121, -0.026768468497420147, 0.06278766200288134, 0.17897045813699355, -0.13780371920803108, -0.14176458123649577, -0.1733103177731656, -0.3106508869296763, 0.04788355140275794, 0.04235327890285105, -0.031266625292514394, -0.016263819217960652, -0.031388328800811355, -0.01791363975905968, -0.012025067979443894, 0.008335083985905805, -0.0014386677797296231, 0.0055376544652972854, 0.002241522815466253, -0.0], [0.007455256326741617, -0.0009475207572210404, 0.0020288385162615286, 0.015399640135796092, 0.021133843188103074, -0.019846405097622234, -0.003162485751163173, -0.14199005055318842, -0.044200898667146035, -0.013395459413208084, 0.11019680479230103, -0.014057216041764874, -0.12553853334447865, -0.05992513534766256, 0.06467942189539834, 0.08866056095907732, -0.1451321508061849, -0.07382491447758655, -0.046961739981080476, 0.0008943713493160624, 0.03231044103656507, 0.00036034241706501196, -0.011387669277619417, -0.00014602449257226195, -0.0021863729003374116, 0.0018817840156005856, 0.0037909804578166286, -0.0], [0.006511855618626698, 0.006236866054439829, -0.001440571166157676, 0.012795776609942026, 0.011530545030403624, 0.03495489377257363, 0.04792403136095304, 0.049378583599065225, 0.03296101702085617, -0.0005351385876652296, 0.017744115897640366, 0.0011656622496764954, 0.0232845869823761, -0.0561191397060232, -0.02854070511118366, -0.028614174047247348, -0.007763531086362863, 0.01823079560098924, 0.021961392405283622, -0.009666681805706179, 0.009547046884328725, -0.008729943263791338, 0.006408909680578429, 0.009794327096359952, -0.0025825219195515304, 0.007063559189211571, 0.007867244119267047, -0.0], [0.007936663546039311, -0.00010710180170593153, 0.002716512705673228, 0.0038633557307721487, -0.0014877316616940372, -0.0004788143065635909, 0.012508842248031202, 0.0045381104608414645, -0.010650910516128294, -0.013785341529644855, -0.034287643221318206, -0.022152707546335495, -0.047056481347685974, -0.032166744564720455, -0.021551611335278546, -0.002174962503376043, 0.024344287130424306, 0.015579272560525105, 0.010958169741952194, -0.010607232913436921, -0.005548369726118836, -0.0014630046444242706, 0.013144180105016433, 0.0031349366359021916, 0.0010984887428255974, 0.005426941473328394, 0.006566511860044785, -0.0], [0.0005529184874606495, 0.00026139355020588705, -0.002887623443531047, 0.0013988462990850632, 0.00203365139495493, -0.007276926701775218, -0.004010419939595932, 0.017521952161185662, 0.0006996977433557911, 0.02083134683611201, 0.013690533534289498, -0.005466724359976675, -0.008857712321334327, 0.017408578822635818, 0.0076439343049154425, 0.0017861314923539985, 0.007465865707523924, 0.008034420825988495, 0.003976298558337994, 0.00411970637898539, -0.004572592545819698, 0.0029563907011979935, -0.0006382227820088148, 0.0015153753877889707, -0.0052626601797995595, 0.0025664706985019416, 0.005161751034260073, -0.0], [0.0009424280561998445, -0.0012942360298110595, 0.0011900868416523343, 0.000984424113178899, 0.0020988269382781564, -0.005870080062890889, -0.004950484744457169, 0.003117643454332697, -0.002509563565777083, 0.005831604884101081, 0.009531085216183116, 0.010030206821909806, 0.005858190171099734, 4.9344529936340524e-05, -0.004027895832421331, 0.0025436439920587606, 0.00531153867563076, 0.00495942692369508, 0.009215148318606382, 0.00010011928317543458, 0.0060051362999805355, -0.0008195376963202741, 0.0041728603512658224, -0.0017597169567888774, -0.0010577007775543158, 0.00046033327178068433, -0.0007674196306044449, -0.0], [-0.0, -0.0, 0.0013386963856532302, 0.00035183178922260837, 0.0030610334903526204, 8.951834979315781e-05, 0.0023676793550483524, -0.0002900551076915047, -0.00207019445286608, -7.61697478482574e-05, 0.0012150086715244216, 0.009831239281792168, 0.003479667642621962, 0.0070584324334114525, 0.004161851261339585, 0.0026146296354490665, -9.194746959222099e-05, 0.0013583866966571571, 0.0016821551239318913, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0]]]]}
Deploy PyTorch model with V2 REST Protocol¶
Create the InferenceService¶
KServe by default selects the TorchServe
runtime when you specify the model format pytorch
on new model spec and enables the KServe v1 inference protocol.
To enable v2 inference protocol, specify the protocolVersion
field with the value v2
.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve-mnist-v2"
spec:
predictor:
pytorch:
protocolVersion: v2
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v2
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve-mnist-v2"
spec:
predictor:
model:
modelFormat:
name: pytorch
protocolVersion: v2
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v2
For deploying the model on CPU, apply the mnist_v2.yaml to create the InferenceService
.
kubectl apply -f mnist_v2.yaml
Expected Output
$ inferenceservice.serving.kserve.io/torchserve-mnist-v2 created
Model Inference¶
The first step is to determine the ingress IP and ports and set INGRESS_HOST
and INGRESS_PORT
.
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve-mnist-v2 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
You can send both byte array and tensor with v2 protocol, for byte array use image converter to convert the image to byte array input. Here we use the mnist_v2_bytes.json file to run an example inference.
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/${MODEL_NAME}/infer -d @./mnist_v2_bytes.json
Expected Output
{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "INT64", "data": [1]}]}
For tensor input use the tensor image converter to convert the image to tensor input and here we use the mnist_v2.json file to run an example inference.
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/${MODEL_NAME}/infer -d @./mnist_v2.json
Expected Output
{"id": "2266ec1e-f600-40af-97b5-7429b8195a80", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "INT64", "data": [1]}]}
Model Explanation¶
To get the model explanation with v2 explain endpoint:
MODEL_NAME=mnist
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist/explain -d @./mnist_v2.json
Expected Output
{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "explain", "shape": [1, 28, 28], "datatype": "FP64", "data": [-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0040547528781588035, -0.00022612877200043775, -0.0001273413606783097, 0.005648369508785856, 0.008904784451506994, 0.0026385365879584796, 0.0026802458602499875, -0.002657801604900743, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00024465772895309256, 0.0008218449738666515, 0.015285917610467934, 0.007512832227517626, 0.007094984753782517, 0.003405668751094489, -0.0020919252360163056, -0.00078002938659872, 0.02299587777864007, 0.01900432942654754, -0.001252955497754338, -0.0014666116894338772, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.005298396384926053, -0.0007901605067151054, 0.0039060659788228954, 0.02317408211645009, 0.017237917554858186, 0.010867034286601965, 0.003001563092717309, 0.00622421762838887, 0.006120712336480808, 0.016736329175541464, 0.005674718838256385, 0.004344134814439431, -0.001232842177319105, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, 0.0006867353660007012, 0.00977289933298656, -0.003875493166540815, 0.0017986937404117591, 0.0013075440157543057, -0.0024510980461748236, -0.0008806773426546923, -0.0, -0.0, -0.00014277890422995419, -0.009322313284511257, 0.020608317953885236, 0.004351394739722548, -0.0007875565409186222, -0.0009075897751127677, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00022247237111456804, -0.0007829031603535926, 0.002666369539125161, 0.000973336852105775, 0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, 0.000432321003928822, 0.023657172129172684, 0.010694844898905204, -0.002375952975746018, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0020747972047037, -0.002320101258915877, -0.0012899205783904548, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.007629679655402933, 0.01044862724376463, 0.00025032878924736025, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.00037708370104137974, -0.005156369275302328, 0.0012477582442296628, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -4.442516083381132e-05, 0.01024804634283815, 0.0009971135240970147, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0, -0.0, 0.0004501048968956462, -0.0019630535686311007, -0.0006664793297549408, 0.0020157403539278907, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0022144569383238466, 0.008361583574785395, 0.00314019428604999, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0028943544591141838, -0.0031301383432286406, 0.002113252872926688, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0010321050605717045, 0.008905753926369048, 0.002846438277738756, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.005305288883499087, -0.00192711009725932, 0.0012090042768467344, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0011945156500241256, 0.005654442715832439, 0.0020132075345016807, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0014689356969061985, 0.0010743412638183228, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0017047980586912376, 0.00290660517425009, -0.0007805869640505143, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 5.541* Connection #0 to host localhost left intact
725422148614e-05, 0.0014516114512869852, 0.0002827701966546988, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0014401407633627265, 0.0023812497776698745, 0.002146825301700187, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0011500529125940918, 0.0002865015572973405, 0.0029798151042282686, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0017750295500283872, 0.0008339859126060243, -0.00377073933577687, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0006093176894575109, -0.00046905787892409935, 0.0034053218511795034, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0007450011768391558, 0.001298767372877851, -0.008499247640112315, -6.145166131400234e-05, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0011809726042792137, -0.001838476328106708, 0.00541110661116898, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.002139234224224006, 0.0003259163407641124, -0.005276118873855287, -0.001950984007438105, -9.545670742026532e-07, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0007772404228681039, -0.0001517956264720738, 0.0064814848131711815, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 8.098064985902114e-05, -0.00249042660692983, -0.0020718619200672302, -5.341117902942147e-05, -0.00045564724429915073, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0022750983476959733, 0.0017164060958460778, 0.0003221344707738082, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0015560282678744543, 9.107238495871273e-05, 0.0008772841497928399, 0.0006502978626355868, -0.004128780767525651, 0.0006030386900152659, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.001395995791096219, 0.0026791526689584344, 0.0023995008266391488, -0.0004496096312746451, 0.003101832450753724, 0.007494536066960778, 0.0028641187148287965, -0.0030525907182629075, 0.003420222396518567, 0.0014924018363498125, -0.0009357388301326025, 0.0007856228933169799, -0.0018433973914981437, 1.6031856831240914e-05, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0006999018502034005, 0.004382250870697946, -0.0035419313267119365, -0.0028896748092595375, -0.00048734542493666705, -0.0060873452419295, 0.000388224990424471, 0.002533641537585585, -0.004352836563597573, -0.0006079418766875505, -0.0038101334053377753, -0.000828441340357984, 0.0, -0.0, 0.0, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0010901530866342661, -0.013135008038845744, 0.0004734518707654666, 0.002050423283568135, -0.006609451922460863, 0.0023647861820124366, 0.0046789204256194, -0.0018122527412311837, 0.002137538353955849, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}]}
Autoscaling¶
One of the main serverless inference features is to automatically scale the replicas of an InferenceService
matching the incoming workload.
KServe by default enables Knative Pod Autoscaler which watches traffic flow and scales up and down based on the configured metrics.
Knative Autoscaler¶
KServe supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes’ Horizontal Pod Autoscaler (HPA). The features and limitations of each of these Autoscalers are listed below.
Note
If you want to use Kubernetes Horizontal Pod Autoscaler (HPA), you must install HPA extension
Knative Pod Autoscaler (KPA)
- Part of the Knative Serving core and enabled by default once Knative Serving is installed.
- Supports scale to zero functionality.
- Does not support CPU-based autoscaling.
Horizontal Pod Autoscaler (HPA)
- Not part of the Knative Serving core, and must be enabled after Knative Serving installation.
- Does not support scale to zero functionality.
- Supports CPU-based autoscaling.
Create InferenceService with Concurrency Target¶
Hard/Soft Autoscaling Limit¶
You can configure InferenceService with annotation autoscaling.knative.dev/target
for a soft limit. The soft limit is a targeted limit rather than
a strictly enforced bound, particularly if there is a sudden burst of requests, this value can be exceeded.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
annotations:
autoscaling.knative.dev/target: "10"
spec:
predictor:
pytorch:
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
annotations:
autoscaling.knative.dev/target: "10"
spec:
predictor:
model:
modelFormat:
name: pytorch
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
You can also configure InferenceService with field containerConcurrency
with a hard limit. The hard limit is an enforced upper bound.
If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
containerConcurrency: 10
pytorch:
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
spec:
predictor:
containerConcurrency: 10
model:
modelFormat:
name: pytorch
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
After specifying the soft or hard limits of the scaling target, you can now deploy the InferenceService
with autoscaling.yaml.
kubectl apply -f autoscaling.yaml
Expected Output
$ inferenceservice.serving.kserve.io/torchserve created
Run Inference with Concurrent Requests¶
The first step is to install the hey load generator and then send the concurrent requests to the InferenceService
.
go get -u github.com/rakyll/hey
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
hey -m POST -z 30s -D ./mnist.json -host ${SERVICE_HOSTNAME} http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict
Check Pod Autoscaling¶
hey
by default generates 50 requests concurrently, so you can see that the InferenceService
scales to 5 pods as the container concurrency target is set to 10.
Expected Output
NAME READY STATUS RESTARTS AGE
torchserve-predictor-default-cj2d8-deployment-69444c9c74-67qwb 2/2 Terminating 0 103s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-nnxk8 2/2 Terminating 0 95s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-rq8jq 2/2 Running 0 50m
torchserve-predictor-default-cj2d8-deployment-69444c9c74-tsrwr 2/2 Running 0 113s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-vvpjl 2/2 Running 0 109s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-xvn7t 2/2 Terminating 0 103s
Canary Rollout¶
Canary rollout is a deployment strategy when you release a new version of model to a small percent of the production traffic.
Create InferenceService with Canary Model¶
After the above experiments, now let's see how you can rollout a new model without moving full traffic to the new model by default.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
annotations:
serving.kserve.io/enable-tag-routing: "true"
spec:
predictor:
canaryTrafficPercent: 20
pytorch:
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v2"
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
annotations:
serving.kserve.io/enable-tag-routing: "true"
spec:
predictor:
canaryTrafficPercent: 20
model:
modelFormat:
name: pytorch
storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v2"
In this example we change the storageUri
to the v2 version with canaryTrafficPercent
field and then apply the canary.yaml.
kubectl apply -f canary.yaml
Expected Output
kubectl get revisions -l serving.kserve.io/inferenceservice=torchserve
NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON ACTUAL REPLICAS DESIRED REPLICAS
torchserve-predictor-default-00001 torchserve-predictor-default 1 True 1 1
torchserve-predictor-default-00002 torchserve-predictor-default 2 True 1 1
kubectl get pods -l serving.kserve.io/inferenceservice=torchserve
NAME READY STATUS RESTARTS AGE
torchserve-predictor-default-00001-deployment-7d99979c99-p49gk 2/2 Running 0 28m
torchserve-predictor-default-00002-deployment-c6fcc65dd-rjknq 2/2 Running 0 3m37s
Check Traffic Status¶
After the canary model is rolled out, the traffic should be split between the canary model revision and the "stable" revision which was rolled out with 100% percent traffic, now check the traffic split from the InferenceService
traffic status:
kubectl get isvc torchserve -ojsonpath='{.status.components}'
Expected Output
{
"predictor": {
"address": {
"url": "http://torchserve-predictor-default.default.svc.cluster.local"
},
"latestCreatedRevision": "torchserve-predictor-default-00002",
"latestReadyRevision": "torchserve-predictor-default-00002",
"latestRolledoutRevision": "torchserve-predictor-default-00001",
"traffic": [
{
"latestRevision": true,
"percent": 20,
"revisionName": "torchserve-predictor-default-00002",
"tag": "latest",
"url": "http://latest-torchserve-predictor-default.default.example.com"
},
{
"latestRevision": false,
"percent": 80,
"revisionName": "torchserve-predictor-default-00001",
"tag": "prev",
"url": "http://prev-torchserve-predictor-default.default.example.com"
}
],
"url": "http://torchserve-predictor-default.default.example.com"
}
}
Traffic Rollout¶
Run the following curl requests a few times to the InferenceService
, you can see that requests are sent to the two revisions with 20/80 splits.
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
for i in {1..10}; do curl -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./mnist.json; done
Expected Output
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>Handling connection for 8080
<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
You can notice that when the request hits the canary revision it fails, this is because that the new revision requires the v2 inference input mnist_v2.json which is a breaking change, in addition the traffic is randomly splitted between the two revisions according to the specified traffic percentage. In this case you should rollout the canary model with 0 canaryTrafficPercent and use the latest tagged url to test the canary model before moving the full traffic to the new model.
kubectl patch isvc torchserve --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 0}]'
curl -v -H "Host: latest-torchserve-predictor-default.default.example.com" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./mnist.json
Expected Output
{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [1], "datatype": "INT64", "data": [1]}]}
After the new model is tested and verified, you can now bump the canaryTrafficPercent to 100 to fully rollout the traffic to the new revision and now the latestRolledoutRevision
becomes torchserve-predictor-default-00002 and previousRolledoutRevision
becomes torchserve-predictor-default-00001.
kubectl patch isvc torchserve --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 100}]'
Check the traffic status:
kubectl get isvc torchserve -ojsonpath='{.status.components}'
Expected Output
{
"predictor": {
"address": {
"url": "http://torchserve-predictor-default.default.svc.cluster.local"
},
"latestCreatedRevision": "torchserve-predictor-default-00002",
"latestReadyRevision": "torchserve-predictor-default-00002",
"latestRolledoutRevision": "torchserve-predictor-default-00002",
"previousRolledoutRevision": "torchserve-predictor-default-00001",
"traffic": [
{
"latestRevision": true,
"percent": 100,
"revisionName": "torchserve-predictor-default-00002",
"tag": "latest",
"url": "http://latest-torchserve-predictor-default.default.example.com"
},
],
"url": "http://torchserve-predictor-default.default.example.com"
}
}
Rollback the Model¶
In case the new model version does not work after the traffic is moved to the new revision, you can still patch the canaryTrafficPercent to 0 and move the traffic back to the previously rolled model which is torchserve-predictor-default-00001.
kubectl patch isvc torchserve --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 0}]'
Check the traffic status:
kubectl get isvc torchserve -ojsonpath='{.status.components}'
Expected Output
{
"predictor": {
"address": {
"url": "http://torchserve-predictor-default.default.svc.cluster.local"
},
"latestCreatedRevision": "torchserve-predictor-default-00002",
"latestReadyRevision": "torchserve-predictor-default-00002",
"latestRolledoutRevision": "torchserve-predictor-default-00001",
"previousRolledoutRevision": "torchserve-predictor-default-00001",
"traffic": [
{
"latestRevision": true,
"percent": 0,
"revisionName": "torchserve-predictor-default-00002",
"tag": "latest",
"url": "http://latest-torchserve-predictor-default.default.example.com"
},
{
"latestRevision": false,
"percent": 100,
"revisionName": "torchserve-predictor-default-00001",
"tag": "prev",
"url": "http://prev-torchserve-predictor-default.default.example.com"
}
],
"url": "http://torchserve-predictor-default.default.example.com"
}
}