Deploy a PyTorch Model with TorchServe InferenceService¶
In this example, we deploy a trained PyTorch MNIST model to predict handwritten digits by running an
              InferenceService with TorchServe runtime which
              is the default installed serving runtime for PyTorch models. Model interpretability is also an important
              aspect which helps to understand which of the input features were important for a particular
              classification.
              Captum is a model interpretability library. In this example, TorchServe
              explain endpoint is implemented with Captum's state-of-the-art algorithm, including
              integrated gradients to provide users with an easy way to understand which features are contributing to
              the model output. You can refer to the Captum Tutorial for more
              examples.
            
Create Model Storage with a Model Archive File and Config¶
The KServe/TorchServe integration expects following model store layout.
├── config
│   ├── config.properties
├── model-store
│   ├── densenet_161.mar
│   ├── mnist.mar
            TorchServe provides a utility to package all the model artifacts into a single TorchServe Model Archive File (MAR). After model artifacts are packaged into a MAR file, you then upload to the model-store under the model storage path.
You can store your model and dependent files on remote storage or local persistent volume. The MNIST model and dependent files can be obtained from here.
Note
For remote storage you can choose to start the example using the prebuilt MNIST MAR file stored on
                KServe example GCS bucket
                gs://kfserving-examples/models/torchserve/image_classifier, or generate the MAR file with
                torch-model-archiver and
                create the model store on remote storage according to the above layout.
              
torch-model-archiver --model-name mnist --version 1.0 \
--model-file model-archiver/model-store/mnist/mnist.py \
--serialized-file model-archiver/model-store/mnist/mnist_cnn.pt \
--handler model-archiver/model-store/mnist/mnist_handler.py \
              For PVC user please refer to model archive file generation for auto generation of MAR files with the model and dependent files.
TorchServe uses a config.properties file to store configuration. Please see here for more details with the properties supported by the configuration file. The following is a sample file for KServe:
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":10,"responseTimeout":120}}}}
            The KServe/TorchServe integration supports KServe v1/v2 REST protocol. In the
              config.properties, we need to turn on the flag enable_envvars_config to enable
              setting the KServe envelop using an environment variable.
            
Warning
The previous service_envelope property has been deprecated and in the config.properties file use the flag
                enable_envvars_config=true to enable setting the service envelope at runtime.
                The requests are converted from KServe inference request format to TorchServe request format and sent to
                the inference_address configured
                via local socket.
              
Deploy PyTorch Model with V1 REST Protocol¶
Create the TorchServe InferenceService¶
KServe by default selects the TorchServe runtime when you specify the model format
              pytorch on new model spec.
            
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
spec:
  predictor:
    pytorch:
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
                  apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
                  For deploying the model on CPU, apply the following torchserve.yaml to
              create the InferenceService.
kubectl apply -f torchserve.yaml
            apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
spec:
  predictor:
    pytorch:
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
      resources:
        limits:
          memory: 4Gi
          nvidia.com/gpu: "1"
                  apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
      resources:
        limits:
          memory: 4Gi
          nvidia.com/gpu: "1"
                  For deploying the model on GPU, apply the gpu.yaml to create the GPU
              InferenceService.
            
kubectl apply -f gpu.yaml
            Expected Output
$ inferenceservice.serving.kserve.io/torchserve created
              Model Inference¶
The first step is to determine the ingress IP
                and ports and set INGRESS_HOST and INGRESS_PORT.
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
            You can use image converter to convert the images to base64 byte array, for other models please refer to input request.
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./mnist.json
            Expected Output
*   Trying 52.89.19.61...
* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (52.89.19.61) port 80 (#0)
> PUT /v1/models/mnist HTTP/1.1
> Host: torchserve.kserve-test.example.com
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 167
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< cache-control: no-cache; no-store, must-revalidate, private
< content-length: 1
< date: Tue, 27 Oct 2020 08:26:19 GMT
< expires: Thu, 01 Jan 1970 00:00:00 UTC
< pragma: no-cache
< x-request-id: b10cfc9f-cd0f-4cda-9c6c-194c2cdaa517
< x-envoy-upstream-service-time: 6
< server: istio-envoy
<
* Connection #0 to host a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com left intact
{"predictions": ["2"]}
              Model Explanation¶
To get model explanation:
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist:explain -d @./mnist.json
            Expected Output
{"explanations": [[[[0.0005394675730469475, -0.0022280013123036043, -0.003416480100841055, -0.0051329881112415965, -0.009973864160829985, -0.004112560908882716, -0.009223458030656112, -0.0006676354577291628, -0.005249806664413386, -0.0009790519227372953, -0.0026914653993121195, -0.0069470097151383995, -0.00693530415962956, -0.005973878697847718, -0.00425042437288857, 0.0032867281838150977, -0.004297780258633562, -0.005643196661192014, -0.00653025019738562, -0.0047062916121001185, -0.0018656628277792628, -0.0016757477204072532, -0.0010410417081844845, -0.0019093520822156726, -0.004451403461006374, -0.0008552767257773671, -0.0027638888169885267, -0.0], [0.006971297052106784, 0.007316855222185687, 0.012144494329150574, 0.011477799383288441, 0.006846725347670252, 0.01149386176451476, 0.0045351987881190655, 0.007038361889638708, 0.0035855377023272157, 0.003031419502053957, -0.0008611575226775316, -0.0011085224745969223, -0.0050840743637658534, 0.009855491784340777, 0.007220680811043034, 0.011374285598070253, 0.007147725481709019, 0.0037114580912849457, 0.00030763245479291384, 0.0018305492665953394, 0.010106224395114147, 0.012932881164284687, 0.008862892007714321, 0.0070960526615982435, -0.0015931137903787505, 0.0036495747329455906, 0.0002593849391051298, -0.0], [0.006467265785857396, -0.00041793201228071674, 0.004900316089756856, 0.002308395474823997, 0.007859295399592283, 0.003916404948969494, 0.005630750246437249, 0.0043712538044184375, 0.006128530599133763, -0.009446321309831246, -0.014173645867037036, -0.0062988650915794565, -0.011473838941118539, -0.009049151947644047, -0.0007625645864610934, -0.013721416630061238, -0.0005580156670410108, 0.0033404383756480784, -0.006693278798487951, -0.003705084551144756, 0.005100375089529131, 5.5276874714401074e-05, 0.007221745280359063, -0.00573598303916232, -0.006836169033785967, 0.0025401608627538936, 9.303533912921196e-05, -0.0], [0.005914399808621816, 0.00452643561023696, 0.003968242261515448, 0.010422786058967673, 0.007728358107899074, 0.01147115923288383, 0.005683869479056691, 0.011150670502307374, 0.008742555292485278, 0.0032882897575743754, 0.014841138421861584, 0.011741228362482451, 0.0004296862879259221, -0.0035118140680654854, -0.006152254410078331, -0.004925121936901983, -2.3611205202801947e-06, 0.029347073037039074, 0.02901626308947743, 0.023379353021343398, 0.004027157620197582, -0.01677662249919171, -0.013497255736128979, 0.006957482854214602, 0.0018321766800746145, 0.008277034396684563, 0.002733405455464871, -0.0], [0.0049579739156640065, -0.002168016158233997, 0.0020644317321723642, 0.0020912464240293825, 0.004719691119907336, 0.007879231202446626, 0.010594445898145937, 0.006533067778982801, 0.002290214592708113, -0.0036651114968251986, 0.010753227423379443, 0.006402706020466243, -0.047075193909339695, -0.08108259303568185, -0.07646875196692542, -0.1681834845371156, -0.1610307396135756, -0.12010309927453829, -0.016148831320070896, -0.009541525999486027, 0.04575604594761406, 0.031470966329886635, 0.02452149438024385, 0.016594078577569567, 0.012213591301610382, -0.002230875840404426, 0.0036704051254298374, -0.0], [0.006410107592414739, 0.005578283890924384, 0.001977103461731095, 0.008935476507124939, 0.0011305055729953436, 0.0004946313900665659, -0.0040266029554395935, -0.004270765544167256, -0.010832150944943138, -0.01653511868336456, -0.011121302103373972, -0.42038514526905024, -0.22874576003118394, -0.16752936178907055, -0.17021699697722079, -0.09998584936787697, -0.09041117495322142, -0.10230248444795721, -0.15260897522094888, 0.07770835838531896, -0.0813761125123066, 0.027556910053932963, 0.036305965104261866, 0.03407793793894619, 0.01212761779302579, 0.006695133380685627, 0.005331392748588556, -0.0], [0.008342680065996267, -0.00029249776150416367, 0.002782130291086583, 0.0027793744856745373, 0.0020525102690845407, 0.003679269934110004, 0.009373846012918791, -0.0031751745946300403, -0.009042846256743316, 0.0074141593032070775, -0.02796812516561052, -0.593171583786029, -0.4830164472795136, -0.353860128479443, -0.256482708704862, 0.11515586314578445, 0.12700563162828346, 0.0022342450630152204, -0.24673707669992118, -0.012878340813781437, 0.16866821780196756, 0.009739033161051434, -0.000827843726513152, -0.0002137320694585577, -0.004179480126338929, 0.008454049232317358, -0.002767934266266998, -0.0], [0.007070382982749552, 0.005342127805750565, -0.000983984198542354, 0.007910101170274493, 0.001266267696096404, 0.0038575136843053844, 0.006941130321773131, -0.015195182020687892, -0.016954974010578504, -0.031186444096787943, -0.031754626467747966, 0.038918845112017694, 0.06248943950328597, 0.07703301092601872, 0.0438493628024275, -0.0482404449771698, -0.08718650815999045, -0.0014764704694506415, -0.07426336448916614, -0.10378029666564882, 0.008572087846793842, -0.00017173413848283343, 0.010058893270893113, 0.0028410498666004377, 0.002008290211806285, 0.011905375389931099, 0.006071375802943992, -0.0], [0.0076080165949142685, -0.0017127333725310495, 0.00153128150106188, 0.0033391793764531563, 0.005373442509691564, 0.007207746020295443, 0.007422946703693544, -0.00699779191449194, 0.002395328253696969, -0.011682618874195954, -0.012737004464649057, -0.05379966383523857, -0.07174960461749053, -0.03027341304050314, 0.0019411862216381327, -0.0205575129473766, -0.04617091711614171, -0.017655308106959804, -0.009297162816368814, -0.03358572117988279, -0.1626068444778013, -0.015874364762085157, -0.0013736074085577258, -0.014763439328689378, 0.00631805792697278, 0.0021769414283267273, 0.0023061635006792498, -0.0], [0.005569931813561535, 0.004363218328087518, 0.00025609463218383973, 0.009577483244680675, 0.007257755916229399, 0.00976284778532342, -0.006388840235419147, -0.009017880790555707, -0.015308709334434867, -0.016743935775597355, -0.04372596546189275, -0.03523469356755156, -0.017257810114846107, 0.011960489902313411, 0.01529079831828911, -0.020076559119468443, -0.042792547669901516, -0.0029492027218867116, -0.011109560582516062, -0.12985858077848939, -0.2262858575494602, -0.003391725540087574, -0.03063368684328981, -0.01353486587575121, 0.0011140822443932317, 0.006583451102528798, 0.005667533945285076, -0.0], [0.004056272267155598, -0.0006394041203204911, 0.004664893926197093, 0.010593032387298614, 0.014750931538689989, 0.015428721146282149, 0.012167820222401367, 0.017604752451202518, 0.01038886849969188, 0.020544326931163263, -0.0004206566917812794, -0.0037463581359232674, -0.0024656693040735075, 0.0026061897697624353, -0.05186055271869177, -0.09158655048397382, 0.022976389912563913, -0.19851635458461808, -0.11801281807622972, -0.29127727790584423, -0.017138655663803876, -0.04395515676468641, -0.019241432506341576, 0.0011342298743447392, 0.0030625771422964584, -0.0002867924892991192, -0.0017908808807543712, -0.0], [0.0030114260660488892, 0.0020246448273580006, -0.003293361220376816, 0.0036965043883218584, 0.00013185761728146236, -0.004355610866966878, -0.006432601921104354, -0.004148701459814858, 0.005974553907915845, -0.0001399233607281906, 0.010392944122965082, 0.015693249298693028, 0.0459528427528407, -0.013921539948093455, -0.06615556518538708, 0.02921438991320325, -0.16345220625101778, -0.002130491295590408, -0.11449749664916867, -0.030980255589300607, -0.04804122537359171, -0.05144994776295644, 0.005122827412776085, 0.006464862173908011, 0.008624278272940246, 0.0037316228508156427, 0.0036947794337026706, -0.0], [0.0038173843228389405, -0.0017091931226819494, -0.0030871869816778068, 0.002115642501535999, -0.006926441921580917, -0.003023077828426468, -0.014451359520861637, -0.0020793048380231397, -0.010948003939342523, -0.0014460716966395166, -0.01656990336897737, 0.003052317148320358, -0.0026729564809943513, -0.06360067057346147, 0.07780985635080599, -0.1436689936630281, -0.040817177623437874, -0.04373367754296477, -0.18337299150349698, 0.025295182977407064, -0.03874921104331938, -0.002353901742617205, 0.011772560401335033, 0.012480994515707569, 0.006498422579824301, 0.00632320984076023, 0.003407169765754805, -0.0], [0.00944355257990139, 0.009242583578688485, 0.005069860444386138, 0.012666191449103024, 0.00941789912565746, 0.004720427012836104, 0.007597687789204113, 0.008679266528089945, 0.00889322771021875, -0.0008577904940828809, 0.0022973860384607604, 0.025328230809207493, -0.09908781123080951, -0.07836626399832172, -0.1546141264726177, -0.2582207272050766, -0.2297524599578219, -0.29561835103416967, 0.12048787956671528, -0.06279365699861471, -0.03832012404275233, 0.022910264999199934, 0.005803508497672737, -0.003858461926053348, 0.0039451232171312765, 0.003858476747495933, 0.0013034515558609956, -0.0], [0.009725756015628606, -0.0004001101998876524, 0.006490722835571152, 0.00800808023631959, 0.0065880711806331265, -0.0010264326176194034, -0.0018914305972878344, -0.008822522194658438, -0.016650520788128117, -0.03254382594389507, -0.014795713101569494, -0.05826499837818885, -0.05165369567511702, -0.13384277337594377, -0.22572641373340493, -0.21584739544668635, -0.2366836351939208, 0.14937824076489659, -0.08127414932170171, -0.06720440139736879, -0.0038552732903526744, 0.0107597891707803, -5.67453590118174e-05, 0.0020161340511396244, -0.000783322694907436, -0.0006397207517995289, -0.005291639205010064, -0.0], [0.008627543242777584, 0.007700097300051849, 0.0020430960246806138, 0.012949015733198586, 0.008428709579953574, 0.001358177022953576, 0.00421863939925833, 0.002657580000868709, -0.007339431957237175, 0.02008439775442315, -0.0033717631758033114, -0.05176633249899187, -0.013790328758662772, -0.39102366157050594, -0.167341447585844, -0.04813367828213947, 0.1367781582239039, -0.04672809260566293, -0.03237784669978756, 0.03218068777925178, 0.02415063765016493, -0.017849899351200002, -0.002975675228088795, -0.004819438014786686, 0.005106898651831245, 0.0024278620704227456, 6.784303333368138e-05, -0.0], [0.009644258527009343, -0.001331907219439711, -0.0014639718434477777, 0.008481926798958248, 0.010278031715467508, 0.003625808326891529, -0.01121188617599796, -0.0010634587872994379, -0.0002603820881968461, -0.017985648016990465, -0.06446652745470374, 0.07726063173046191, -0.24739929795334742, -0.2701855018480216, -0.08888614776216278, 0.1373325760136816, -0.02316068912438066, -0.042164834956711514, 0.0009266091344106458, 0.03141872420427644, 0.011587728430225652, 0.0004755143243520787, 0.005860642609620605, 0.008979633931394438, 0.005061734169974005, 0.003932710387086098, 0.0015489986106803626, -0.0], [0.010998736164377534, 0.009378969800902604, 0.00030577045264713074, 0.0159329353530375, 0.014849508018911006, -0.0026513365659554225, 0.002923303082126996, 0.01917908707828847, -0.02338288107991566, -0.05706674679291175, 0.009526265752669624, -0.19945255386401284, -0.10725519695909647, -0.3222906835083537, -0.03857038318412844, -0.013279804965996065, -0.046626023244262085, -0.029299060237210447, -0.043269580558906555, -0.03768510002290657, -0.02255977771908117, -0.02632588166863199, -0.014417349488098566, -0.003077271951572957, -0.0004973277708010661, 0.0003475839139671271, -0.0014522783025903258, -0.0], [0.012215315671616316, -0.001693194176229889, 0.011365785434529038, 0.0036964574178487792, -0.010126738168635003, -0.025554378647710443, 0.006538003839811914, -0.03181759044467965, -0.016424751042854728, 0.06177539736110035, -0.43801735323216856, -0.29991040815937386, -0.2516019795363623, 0.037789523540809, -0.010948746374759491, -0.0633901687126727, -0.005976006160777705, 0.006035133605976937, -0.04961632526071937, -0.04142116972831476, -0.07558952727782252, -0.04165176179187153, -0.02021603856619006, -0.0027365663096057032, -0.011145473712733575, 0.0003566937349350848, -0.00546472985268321, -0.0], [0.008009386447317503, 0.006831207743885825, 0.0051306149795546365, 0.016239014770865052, 0.020925441734273218, 0.028344800173195076, -0.004805080609285047, -0.01880521614501033, -0.1272329010865855, -0.39835936819190537, -0.09113694760349819, -0.04061591094832608, -0.12677021961235907, 0.015567707226741051, -0.005615051546243333, -0.06454044862001587, 0.0195457674752272, -0.04219686517155871, -0.08060569979524296, 0.027234494361702787, -0.009152881336047056, -0.030865118003992217, -0.005770311060090559, 0.002905833371986098, 5.606663556872091e-05, 0.003209538083839772, -0.0018588810743365345, -0.0], [0.007587008852984699, -0.0021213639853557625, 0.0007709558092903736, 0.013883256128746423, 0.017328713012428214, 0.03645357525636198, -0.04043993335238427, 0.05730125171252314, -0.2563293727512057, -0.11438826083879326, 0.02662382809034687, 0.03525271352483709, 0.04745678120172762, 0.0336360484090392, -0.002916635707204059, -0.17950855098650784, -0.44161773297052964, -0.4512180227831197, -0.4940283106297913, -0.1970108671285798, 0.04344323143078066, -0.012005120444897523, 0.00987576109166055, -0.0018336757466252476, 0.0004913959502151706, -0.0005409724034216215, -0.005039223900868212, -0.0], [0.00637876531169957, 0.005189469227685454, 0.0007676355246000376, 0.018378100865097655, 0.015739815031394887, -0.035524983116512455, 0.03781006978038308, 0.28859052096740495, 0.0726464110153121, -0.026768468497420147, 0.06278766200288134, 0.17897045813699355, -0.13780371920803108, -0.14176458123649577, -0.1733103177731656, -0.3106508869296763, 0.04788355140275794, 0.04235327890285105, -0.031266625292514394, -0.016263819217960652, -0.031388328800811355, -0.01791363975905968, -0.012025067979443894, 0.008335083985905805, -0.0014386677797296231, 0.0055376544652972854, 0.002241522815466253, -0.0], [0.007455256326741617, -0.0009475207572210404, 0.0020288385162615286, 0.015399640135796092, 0.021133843188103074, -0.019846405097622234, -0.003162485751163173, -0.14199005055318842, -0.044200898667146035, -0.013395459413208084, 0.11019680479230103, -0.014057216041764874, -0.12553853334447865, -0.05992513534766256, 0.06467942189539834, 0.08866056095907732, -0.1451321508061849, -0.07382491447758655, -0.046961739981080476, 0.0008943713493160624, 0.03231044103656507, 0.00036034241706501196, -0.011387669277619417, -0.00014602449257226195, -0.0021863729003374116, 0.0018817840156005856, 0.0037909804578166286, -0.0], [0.006511855618626698, 0.006236866054439829, -0.001440571166157676, 0.012795776609942026, 0.011530545030403624, 0.03495489377257363, 0.04792403136095304, 0.049378583599065225, 0.03296101702085617, -0.0005351385876652296, 0.017744115897640366, 0.0011656622496764954, 0.0232845869823761, -0.0561191397060232, -0.02854070511118366, -0.028614174047247348, -0.007763531086362863, 0.01823079560098924, 0.021961392405283622, -0.009666681805706179, 0.009547046884328725, -0.008729943263791338, 0.006408909680578429, 0.009794327096359952, -0.0025825219195515304, 0.007063559189211571, 0.007867244119267047, -0.0], [0.007936663546039311, -0.00010710180170593153, 0.002716512705673228, 0.0038633557307721487, -0.0014877316616940372, -0.0004788143065635909, 0.012508842248031202, 0.0045381104608414645, -0.010650910516128294, -0.013785341529644855, -0.034287643221318206, -0.022152707546335495, -0.047056481347685974, -0.032166744564720455, -0.021551611335278546, -0.002174962503376043, 0.024344287130424306, 0.015579272560525105, 0.010958169741952194, -0.010607232913436921, -0.005548369726118836, -0.0014630046444242706, 0.013144180105016433, 0.0031349366359021916, 0.0010984887428255974, 0.005426941473328394, 0.006566511860044785, -0.0], [0.0005529184874606495, 0.00026139355020588705, -0.002887623443531047, 0.0013988462990850632, 0.00203365139495493, -0.007276926701775218, -0.004010419939595932, 0.017521952161185662, 0.0006996977433557911, 0.02083134683611201, 0.013690533534289498, -0.005466724359976675, -0.008857712321334327, 0.017408578822635818, 0.0076439343049154425, 0.0017861314923539985, 0.007465865707523924, 0.008034420825988495, 0.003976298558337994, 0.00411970637898539, -0.004572592545819698, 0.0029563907011979935, -0.0006382227820088148, 0.0015153753877889707, -0.0052626601797995595, 0.0025664706985019416, 0.005161751034260073, -0.0], [0.0009424280561998445, -0.0012942360298110595, 0.0011900868416523343, 0.000984424113178899, 0.0020988269382781564, -0.005870080062890889, -0.004950484744457169, 0.003117643454332697, -0.002509563565777083, 0.005831604884101081, 0.009531085216183116, 0.010030206821909806, 0.005858190171099734, 4.9344529936340524e-05, -0.004027895832421331, 0.0025436439920587606, 0.00531153867563076, 0.00495942692369508, 0.009215148318606382, 0.00010011928317543458, 0.0060051362999805355, -0.0008195376963202741, 0.0041728603512658224, -0.0017597169567888774, -0.0010577007775543158, 0.00046033327178068433, -0.0007674196306044449, -0.0], [-0.0, -0.0, 0.0013386963856532302, 0.00035183178922260837, 0.0030610334903526204, 8.951834979315781e-05, 0.0023676793550483524, -0.0002900551076915047, -0.00207019445286608, -7.61697478482574e-05, 0.0012150086715244216, 0.009831239281792168, 0.003479667642621962, 0.0070584324334114525, 0.004161851261339585, 0.0026146296354490665, -9.194746959222099e-05, 0.0013583866966571571, 0.0016821551239318913, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0]]]]}
              Deploy PyTorch model with V2 REST Protocol¶
Create the InferenceService¶
KServe by default selects the TorchServe runtime when you specify the model format
              pytorch on new model spec and enables the KServe v1 inference protocol.
              To enable v2 inference protocol, specify the protocolVersion field with the value
              v2.
            
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve-mnist-v2"
spec:
  predictor:
    pytorch:
      protocolVersion: v2
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v2
                  apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve-mnist-v2"
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      protocolVersion: v2  
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v2
                  For deploying the model on CPU, apply the mnist_v2.yaml to create the
              InferenceService.
            
kubectl apply -f mnist_v2.yaml
            Expected Output
$ inferenceservice.serving.kserve.io/torchserve-mnist-v2 created
              Model Inference¶
The first step is to determine the ingress IP
                and ports and set INGRESS_HOST and INGRESS_PORT.
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve-mnist-v2 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
            You can send both byte array and tensor with v2 protocol, for byte array use image converter to convert the image to byte array input. Here we use the mnist_v2_bytes.json file to run an example inference.
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/${MODEL_NAME}/infer -d @./mnist_v2_bytes.json
            Expected Output
{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "INT64", "data": [1]}]}
              For tensor input use the tensor image converter to convert the image to tensor input and here we use the mnist_v2.json file to run an example inference.
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/${MODEL_NAME}/infer -d @./mnist_v2.json
            Expected Output
{"id": "2266ec1e-f600-40af-97b5-7429b8195a80", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "INT64", "data": [1]}]}
              Model Explanation¶
To get the model explanation with v2 explain endpoint:
MODEL_NAME=mnist
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist/explain -d @./mnist_v2.json
            Expected Output
{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "explain", "shape": [1, 28, 28], "datatype": "FP64", "data": [-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0040547528781588035, -0.00022612877200043775, -0.0001273413606783097, 0.005648369508785856, 0.008904784451506994, 0.0026385365879584796, 0.0026802458602499875, -0.002657801604900743, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00024465772895309256, 0.0008218449738666515, 0.015285917610467934, 0.007512832227517626, 0.007094984753782517, 0.003405668751094489, -0.0020919252360163056, -0.00078002938659872, 0.02299587777864007, 0.01900432942654754, -0.001252955497754338, -0.0014666116894338772, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.005298396384926053, -0.0007901605067151054, 0.0039060659788228954, 0.02317408211645009, 0.017237917554858186, 0.010867034286601965, 0.003001563092717309, 0.00622421762838887, 0.006120712336480808, 0.016736329175541464, 0.005674718838256385, 0.004344134814439431, -0.001232842177319105, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, 0.0006867353660007012, 0.00977289933298656, -0.003875493166540815, 0.0017986937404117591, 0.0013075440157543057, -0.0024510980461748236, -0.0008806773426546923, -0.0, -0.0, -0.00014277890422995419, -0.009322313284511257, 0.020608317953885236, 0.004351394739722548, -0.0007875565409186222, -0.0009075897751127677, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00022247237111456804, -0.0007829031603535926, 0.002666369539125161, 0.000973336852105775, 0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, 0.000432321003928822, 0.023657172129172684, 0.010694844898905204, -0.002375952975746018, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0020747972047037, -0.002320101258915877, -0.0012899205783904548, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.007629679655402933, 0.01044862724376463, 0.00025032878924736025, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.00037708370104137974, -0.005156369275302328, 0.0012477582442296628, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -4.442516083381132e-05, 0.01024804634283815, 0.0009971135240970147, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0, -0.0, 0.0004501048968956462, -0.0019630535686311007, -0.0006664793297549408, 0.0020157403539278907, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0022144569383238466, 0.008361583574785395, 0.00314019428604999, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0028943544591141838, -0.0031301383432286406, 0.002113252872926688, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0010321050605717045, 0.008905753926369048, 0.002846438277738756, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.005305288883499087, -0.00192711009725932, 0.0012090042768467344, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0011945156500241256, 0.005654442715832439, 0.0020132075345016807, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0014689356969061985, 0.0010743412638183228, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0017047980586912376, 0.00290660517425009, -0.0007805869640505143, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 5.541* Connection #0 to host localhost left intact
725422148614e-05, 0.0014516114512869852, 0.0002827701966546988, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0014401407633627265, 0.0023812497776698745, 0.002146825301700187, -0.0, -0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0011500529125940918, 0.0002865015572973405, 0.0029798151042282686, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0017750295500283872, 0.0008339859126060243, -0.00377073933577687, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0006093176894575109, -0.00046905787892409935, 0.0034053218511795034, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0007450011768391558, 0.001298767372877851, -0.008499247640112315, -6.145166131400234e-05, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0011809726042792137, -0.001838476328106708, 0.00541110661116898, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.002139234224224006, 0.0003259163407641124, -0.005276118873855287, -0.001950984007438105, -9.545670742026532e-07, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0007772404228681039, -0.0001517956264720738, 0.0064814848131711815, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 8.098064985902114e-05, -0.00249042660692983, -0.0020718619200672302, -5.341117902942147e-05, -0.00045564724429915073, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0022750983476959733, 0.0017164060958460778, 0.0003221344707738082, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0015560282678744543, 9.107238495871273e-05, 0.0008772841497928399, 0.0006502978626355868, -0.004128780767525651, 0.0006030386900152659, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.001395995791096219, 0.0026791526689584344, 0.0023995008266391488, -0.0004496096312746451, 0.003101832450753724, 0.007494536066960778, 0.0028641187148287965, -0.0030525907182629075, 0.003420222396518567, 0.0014924018363498125, -0.0009357388301326025, 0.0007856228933169799, -0.0018433973914981437, 1.6031856831240914e-05, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0006999018502034005, 0.004382250870697946, -0.0035419313267119365, -0.0028896748092595375, -0.00048734542493666705, -0.0060873452419295, 0.000388224990424471, 0.002533641537585585, -0.004352836563597573, -0.0006079418766875505, -0.0038101334053377753, -0.000828441340357984, 0.0, -0.0, 0.0, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0010901530866342661, -0.013135008038845744, 0.0004734518707654666, 0.002050423283568135, -0.006609451922460863, 0.0023647861820124366, 0.0046789204256194, -0.0018122527412311837, 0.002137538353955849, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}]}
              Autoscaling¶
One of the main serverless inference features is to automatically scale the replicas of an
              InferenceService matching the incoming workload.
              KServe by default enables Knative Pod
                Autoscaler which watches traffic flow and scales up and down based on the configured metrics.
            
Knative Autoscaler¶
KServe supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes’ Horizontal Pod Autoscaler (HPA). The features and limitations of each of these Autoscalers are listed below.
Note
If you want to use Kubernetes Horizontal Pod Autoscaler (HPA), you must install HPA extension
Knative Pod Autoscaler (KPA)
- Part of the Knative Serving core and enabled by default once Knative Serving is installed.
 - Supports scale to zero functionality.
 - Does not support CPU-based autoscaling.
 
Horizontal Pod Autoscaler (HPA)
- Not part of the Knative Serving core, and must be enabled after Knative Serving installation.
 - Does not support scale to zero functionality.
 - Supports CPU-based autoscaling.
 
Create InferenceService with Concurrency Target¶
Hard/Soft Autoscaling Limit¶
You can configure InferenceService with annotation autoscaling.knative.dev/target for a soft
              limit. The soft limit is a targeted limit rather than
              a strictly enforced bound, particularly if there is a sudden burst of requests, this value can be
              exceeded.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
  annotations:
    autoscaling.knative.dev/target: "10"
spec:
  predictor:
    pytorch:
      storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
                  apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
  annotations:
    autoscaling.knative.dev/target: "10"
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
                  You can also configure InferenceService with field containerConcurrency with a hard limit.
              The hard limit is an enforced upper bound.
              If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough
              capacity is free to execute the requests.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
spec:
  predictor:
    containerConcurrency: 10
    pytorch:
      storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
                  apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
spec:
  predictor:
    containerConcurrency: 10
    model:
      modelFormat:
        name: pytorch
      storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
                  After specifying the soft or hard limits of the scaling target, you can now deploy the
              InferenceService with autoscaling.yaml.
            
kubectl apply -f autoscaling.yaml
            Expected Output
$ inferenceservice.serving.kserve.io/torchserve created
              Run Inference with Concurrent Requests¶
The first step is to install the hey load generator and then send the concurrent
              requests to the InferenceService.
            
go get -u github.com/rakyll/hey
            MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
hey -m POST -z 30s -D ./mnist.json -host ${SERVICE_HOSTNAME} http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict
            Check Pod Autoscaling¶
hey by default generates 50 requests concurrently, so you can see that the
              InferenceService scales to 5 pods as the container concurrency target is set to 10.
            
Expected Output
NAME                                                             READY   STATUS        RESTARTS   AGE
torchserve-predictor-default-cj2d8-deployment-69444c9c74-67qwb   2/2     Terminating   0          103s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-nnxk8   2/2     Terminating   0          95s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-rq8jq   2/2     Running       0          50m
torchserve-predictor-default-cj2d8-deployment-69444c9c74-tsrwr   2/2     Running       0          113s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-vvpjl   2/2     Running       0          109s
torchserve-predictor-default-cj2d8-deployment-69444c9c74-xvn7t   2/2     Terminating   0          103s
              Canary Rollout¶
Canary rollout is a deployment strategy when you release a new version of model to a small percent of the production traffic.
Create InferenceService with Canary Model¶
After the above experiments, now let's see how you can rollout a new model without moving full traffic to the new model by default.
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
  annotations:
    serving.kserve.io/enable-tag-routing: "true"
spec:
  predictor:
    canaryTrafficPercent: 20
    pytorch:
      storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v2"
                  apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "torchserve"
  annotations:
    serving.kserve.io/enable-tag-routing: "true"
spec:
  predictor:
    canaryTrafficPercent: 20
    model:
      modelFormat:
        name: pytorch
      storageUri: "gs://kfserving-examples/models/torchserve/image_classifier/v2"
                  In this example we change the storageUri to the v2 version with
              canaryTrafficPercent field and then apply the canary.yaml.
            
kubectl apply -f canary.yaml
            Expected Output
kubectl get revisions -l serving.kserve.io/inferenceservice=torchserve
NAME                                 CONFIG NAME                    K8S SERVICE NAME   GENERATION   READY   REASON   ACTUAL REPLICAS   DESIRED REPLICAS
torchserve-predictor-default-00001   torchserve-predictor-default                      1            True             1                 1
torchserve-predictor-default-00002   torchserve-predictor-default                      2            True             1                 1
kubectl get pods -l serving.kserve.io/inferenceservice=torchserve
NAME                                                             READY   STATUS    RESTARTS   AGE
torchserve-predictor-default-00001-deployment-7d99979c99-p49gk   2/2     Running   0          28m
torchserve-predictor-default-00002-deployment-c6fcc65dd-rjknq    2/2     Running   0          3m37s
              Check Traffic Status¶
After the canary model is rolled out, the traffic should be split between the canary model revision and
              the "stable" revision which was rolled out with 100% percent traffic, now check the traffic split from the
              InferenceService traffic status:
            
kubectl get isvc torchserve -ojsonpath='{.status.components}'
            Expected Output
{
  "predictor": {
    "address": {
      "url": "http://torchserve-predictor-default.default.svc.cluster.local"
    },
    "latestCreatedRevision": "torchserve-predictor-default-00002",
    "latestReadyRevision": "torchserve-predictor-default-00002",
    "latestRolledoutRevision": "torchserve-predictor-default-00001",
    "traffic": [
      {
        "latestRevision": true,
        "percent": 20,
        "revisionName": "torchserve-predictor-default-00002",
        "tag": "latest",
        "url": "http://latest-torchserve-predictor-default.default.example.com"
      },
      {
        "latestRevision": false,
        "percent": 80,
        "revisionName": "torchserve-predictor-default-00001",
        "tag": "prev",
        "url": "http://prev-torchserve-predictor-default.default.example.com"
      }
    ],
    "url": "http://torchserve-predictor-default.default.example.com"
  }
}
              Traffic Rollout¶
Run the following curl requests a few times to the InferenceService, you can see that
              requests are sent to the two revisions with 20/80 splits.
MODEL_NAME=mnist
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
for i in {1..10}; do curl -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./mnist.json; done
            Expected Output
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>Handling connection for 8080
<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
{"predictions": [2]}Handling connection for 8080
              You can notice that when the request hits the canary revision it fails, this is because that the new revision requires the v2 inference input mnist_v2.json which is a breaking change, in addition the traffic is randomly splitted between the two revisions according to the specified traffic percentage. In this case you should rollout the canary model with 0 canaryTrafficPercent and use the latest tagged url to test the canary model before moving the full traffic to the new model.
kubectl patch isvc torchserve --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 0}]'
            curl -v -H "Host: latest-torchserve-predictor-default.default.example.com" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./mnist.json
            Expected Output
{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "mnist", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [1], "datatype": "INT64", "data": [1]}]}
              After the new model is tested and verified, you can now bump the canaryTrafficPercent to
                100 to fully rollout the traffic to the new revision and now the
              latestRolledoutRevision becomes torchserve-predictor-default-00002 and
              previousRolledoutRevision becomes torchserve-predictor-default-00001.
            
kubectl patch isvc torchserve --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 100}]'
            Check the traffic status:
kubectl get isvc torchserve -ojsonpath='{.status.components}'
            Expected Output
{
  "predictor": {
    "address": {
      "url": "http://torchserve-predictor-default.default.svc.cluster.local"
    },
    "latestCreatedRevision": "torchserve-predictor-default-00002",
    "latestReadyRevision": "torchserve-predictor-default-00002",
    "latestRolledoutRevision": "torchserve-predictor-default-00002",
    "previousRolledoutRevision": "torchserve-predictor-default-00001",
    "traffic": [
      {
        "latestRevision": true,
        "percent": 100,
        "revisionName": "torchserve-predictor-default-00002",
        "tag": "latest",
        "url": "http://latest-torchserve-predictor-default.default.example.com"
      },
    ],
    "url": "http://torchserve-predictor-default.default.example.com"
  }
}
              Rollback the Model¶
In case the new model version does not work after the traffic is moved to the new revision, you can still patch the canaryTrafficPercent to 0 and move the traffic back to the previously rolled model which is torchserve-predictor-default-00001.
kubectl patch isvc torchserve --type='json' -p '[{"op": "replace", "path": "/spec/predictor/canaryTrafficPercent", "value": 0}]'
            Check the traffic status:
kubectl get isvc torchserve -ojsonpath='{.status.components}'
            Expected Output
{
  "predictor": {
    "address": {
      "url": "http://torchserve-predictor-default.default.svc.cluster.local"
    },
    "latestCreatedRevision": "torchserve-predictor-default-00002",
    "latestReadyRevision": "torchserve-predictor-default-00002",
    "latestRolledoutRevision": "torchserve-predictor-default-00001",
    "previousRolledoutRevision": "torchserve-predictor-default-00001",
    "traffic": [
      {
        "latestRevision": true,
        "percent": 0,
        "revisionName": "torchserve-predictor-default-00002",
        "tag": "latest",
        "url": "http://latest-torchserve-predictor-default.default.example.com"
      },
      {
        "latestRevision": false,
        "percent": 100,
        "revisionName": "torchserve-predictor-default-00001",
        "tag": "prev",
        "url": "http://prev-torchserve-predictor-default.default.example.com"
      }
    ],
    "url": "http://torchserve-predictor-default.default.example.com"
  }
}