Control Plane API

Packages:

serving.kserve.io/v1alpha1
serving.kserve.io/v1beta1

serving.kserve.io/v1alpha1

Package v1alpha1 contains API Schema definitions for the serving v1alpha1 API group

Resource Types:

BuiltInAdapter

Field	Description
`serverType` ServerType	ServerType must be one of the supported built-in types such as “triton” or “mlserver”, and the runtime’s container must have the same name
`runtimeManagementPort` int	Port which the runtime server listens for model management requests
`memBufferBytes` int	Fixed memory overhead to subtract from runtime container’s memory allocation to determine model capacity
`modelLoadingTimeoutMillis` int	Timeout for model loading operations in milliseconds
`env` []Kubernetes core/v1.EnvVar	Environment variables used to control other aspects of the built-in adapter’s behaviour (uncommon)

ClusterLocalModel

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
ClusterLocalModelSpec

`sourceModelUri` string	Original StorageUri
`modelSize` k8s.io/apimachinery/pkg/api/resource.Quantity	Model size to make sure it does not exceed the disk space reserved for local models. The limit is defined on the NodeGroup.
`nodeGroup` string	group of nodes to cache the model on.

status
ClusterLocalModelStatus

ClusterLocalModelSpec

(Appears on:ClusterLocalModel)

Field	Description
`sourceModelUri` string	Original StorageUri
`modelSize` k8s.io/apimachinery/pkg/api/resource.Quantity	Model size to make sure it does not exceed the disk space reserved for local models. The limit is defined on the NodeGroup.
`nodeGroup` string	group of nodes to cache the model on.

ClusterLocalModelStatus

(Appears on:ClusterLocalModel)

Field	Description
`nodeStatus` map[string]kserve.io/serving/pkg/apis/serving/v1alpha1.NodeStatus	Status of the model on a node, like NodeDownloaded or NodeNotReady
`copies` ModelCopies	(Optional) How many nodes have the model available locally
`inferenceServices` []NamespacedName	Inference services using this local model

ClusterServingRuntime

ClusterServingRuntime is the Schema for the servingruntimes API

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
ServingRuntimeSpec

`supportedModelFormats` []SupportedModelFormat	Model formats and version supported by this runtime
`multiModel` bool	(Optional) Whether this ServingRuntime is intended for multi-model usage or not.
`disabled` bool	(Optional) Set to true to disable use of this runtime
`protocolVersions` []github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol	(Optional) Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2)
`workerSpec` WorkerSpec	(Optional) Set WorkerSpec to enable multi-node/multi-gpu
`ServingRuntimePodSpec` ServingRuntimePodSpec	(Members of `ServingRuntimePodSpec` are embedded into this type.)
`grpcEndpoint` string	(Optional) Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service) Assumed to be single-model runtime if omitted
`grpcDataEndpoint` string	(Optional) Grpc endpoint for inferencing
`httpDataEndpoint` string	(Optional) HTTP endpoint for inferencing
`replicas` uint16	(Optional) Configure the number of replicas in the Deployment generated by this ServingRuntime If specified, this overrides the podsPerRuntime configuration value
`storageHelper` StorageHelper	(Optional) Configuration for this runtime’s use of the storage helper (model puller) It is enabled unless explicitly disabled
`builtInAdapter` BuiltInAdapter	(Optional) Provide the details about built-in runtime adapter

status
ServingRuntimeStatus

ClusterStorageContainer

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
StorageContainerSpec

`container` Kubernetes core/v1.Container	Container spec for the storage initializer init container
`supportedUriFormats` []SupportedUriFormat	List of URI formats that this container supports
`workloadType` WorkloadType

disabled
bool (Optional)

InferenceGraph

InferenceGraph is the Schema for the InferenceGraph API for multiple models

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
InferenceGraphSpec

`nodes` map[string]kserve.io/serving/pkg/apis/serving/v1alpha1.InferenceRouter	Map of InferenceGraph router nodes Each node defines the router which can be different routing types
`resources` Kubernetes core/v1.ResourceRequirements	(Optional)
`affinity` Kubernetes core/v1.Affinity	(Optional)
`timeout` int64	(Optional) TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
`minReplicas` int	(Optional) Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
`maxReplicas` int	(Optional) Maximum number of replicas for autoscaling.
`scaleTarget` int	(Optional) ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
`scaleMetric` ScaleMetric	(Optional) ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

status
InferenceGraphStatus

InferenceGraphSpec

(Appears on:InferenceGraph)

InferenceGraphSpec defines the InferenceGraph spec

Field	Description
`nodes` map[string]kserve.io/serving/pkg/apis/serving/v1alpha1.InferenceRouter	Map of InferenceGraph router nodes Each node defines the router which can be different routing types
`resources` Kubernetes core/v1.ResourceRequirements	(Optional)
`affinity` Kubernetes core/v1.Affinity	(Optional)
`timeout` int64	(Optional) TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
`minReplicas` int	(Optional) Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
`maxReplicas` int	(Optional) Maximum number of replicas for autoscaling.
`scaleTarget` int	(Optional) ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
`scaleMetric` ScaleMetric	(Optional) ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

InferenceGraphStatus

(Appears on:InferenceGraph)

InferenceGraphStatus defines the InferenceGraph conditions and status

Field Description

Status
knative.dev/pkg/apis/duck/v1.Status

(Members of Status are embedded into this type.)

Conditions for InferenceGraph

url
knative.dev/pkg/apis.URL

(Optional)

Url for the InferenceGraph

InferenceGraphValidator

InferenceGraphValidator is responsible for setting default values on the InferenceGraph resources when created or updated.

NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as it is used only for temporary operations and does not need to be deeply copied.

InferenceRouter

(Appears on:InferenceGraphSpec)

InferenceRouter defines the router for each InferenceGraph node with one or multiple steps

kind: InferenceGraph
metadata:
name: canary-route
spec:
nodes:
root:
routerType: Splitter
routes:
- service: mymodel1
weight: 20
- service: mymodel2
weight: 80

kind: InferenceGraph
metadata:
name: abtest
spec:
nodes:
mymodel:
routerType: Switch
routes:
- service: mymodel1
condition: "{ .input.userId == 1 }"
- service: mymodel2
condition: "{ .input.userId == 2 }"

Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods.

Tree Ensemble constitutes a case where simple algorithms for combining results of either classification or regression trees are well known. Multiple classification trees, for example, are commonly combined using a “majority-vote” method. Multiple regression trees are often combined using various averaging techniques. e.g tagging models with segment identifiers and weights to be used for their combination in these ways.

kind: InferenceGraph
metadata:
name: ensemble
spec:
nodes:
root:
routerType: Sequence
routes:
- service: feast
- nodeName: ensembleModel
data: $response
ensembleModel:
routerType: Ensemble
routes:
- service: sklearn-model
- service: xgboost-model

Scoring a case using a sequence, or chain of models allows the output of one model to be passed in as input to the subsequent models.

kind: InferenceGraph
metadata:
name: model-chainer
spec:
nodes:
root:
routerType: Sequence
routes:
- service: mymodel-s1
- service: mymodel-s2
data: $response
- service: mymodel-s3
data: $response

In the flow described below, the pre_processing node base64 encodes the image and passes it to two model nodes in the flow. The encoded data is available to both these nodes for classification. The second node i.e. dog-breed-classification takes the original input from the pre_processing node along-with the response from the cat-dog-classification node to do further classification of the dog breed if required.

kind: InferenceGraph
metadata:
name: dog-breed-classification
spec:
nodes:
root:
routerType: Sequence
routes:
- service: cat-dog-classifier
- nodeName: breed-classifier
data: $request
breed-classifier:
routerType: Switch
routes:
- service: dog-breed-classifier
condition: { .predictions.class == "dog" }
- service: cat-breed-classifier
condition: { .predictions.class == "cat" }

Field Description

routerType
InferenceRouterType

RouterType

Sequence: chain multiple inference steps with input/output from previous step
Splitter: randomly routes to the target service according to the weight
Ensemble: routes the request to multiple models and then merge the responses
Switch: routes the request to one of the steps based on condition

steps
[]InferenceStep

(Optional)

Steps defines destinations for the current router node

InferenceRouterType (`string` alias)

(Appears on:InferenceRouter)

InferenceRouterType constant for inference routing types

Value	Description
"Ensemble"	Ensemble router routes the requests to multiple models and then merge the responses
"Sequence"	Sequence Default type only route to one destination
"Splitter"	Splitter router randomly routes the requests to the named service according to the weight
"Switch"	Switch routes the request to the model based on certain condition

InferenceStep

(Appears on:InferenceRouter)

InferenceStep defines the inference target of the current step with condition, weights and data.

Field	Description
`name` string	(Optional) Unique name for the step within this node
`InferenceTarget` InferenceTarget	(Members of `InferenceTarget` are embedded into this type.) Node or service used to process this step
`data` string	(Optional) request data sent to the next route with input/output from the previous step $request $response.predictions
`weight` int64	(Optional) the weight for split of the traffic, only used for Split Router when weight is specified all the routing targets should be sum to 100
`condition` string	(Optional) routing based on the condition
`dependency` InferenceStepDependencyType	(Optional) to decide whether a step is a hard or a soft dependency in the Inference Graph

InferenceStepDependencyType (`string` alias)

(Appears on:InferenceStep)

InferenceStepDependencyType constant for inference step dependency

Value	Description
"Hard"	Hard
"Soft"	Soft

InferenceTarget

(Appears on:InferenceStep)

Exactly one InferenceTarget field must be specified

Field	Description
`nodeName` string	(Optional) The node name for routing as next step
`serviceName` string	named reference for InferenceService
`serviceUrl` string	(Optional) InferenceService URL, mutually exclusive with ServiceName

LocalModelNodeGroup

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
LocalModelNodeGroupSpec

`storageLimit` k8s.io/apimachinery/pkg/api/resource.Quantity	Max storage size per node in this node group
`persistentVolumeSpec` Kubernetes core/v1.PersistentVolumeSpec	Used to create PersistentVolumes for downloading models and in inference service namespaces
`persistentVolumeClaimSpec` Kubernetes core/v1.PersistentVolumeClaimSpec	Used to create PersistentVolumeClaims for download and in inference service namespaces

status
LocalModelNodeGroupStatus

LocalModelNodeGroupSpec

(Appears on:LocalModelNodeGroup)

LocalModelNodeGroupSpec defines a group of nodes for to download the model to.

Field	Description
`storageLimit` k8s.io/apimachinery/pkg/api/resource.Quantity	Max storage size per node in this node group
`persistentVolumeSpec` Kubernetes core/v1.PersistentVolumeSpec	Used to create PersistentVolumes for downloading models and in inference service namespaces
`persistentVolumeClaimSpec` Kubernetes core/v1.PersistentVolumeClaimSpec	Used to create PersistentVolumeClaims for download and in inference service namespaces

LocalModelNodeGroupStatus

(Appears on:LocalModelNodeGroup)

Field	Description
`used` k8s.io/apimachinery/pkg/api/resource.Quantity	Used storage space on any node for this node group
`available` k8s.io/apimachinery/pkg/api/resource.Quantity	Available storage space on any node for this node group

ModelCopies

(Appears on:ClusterLocalModelStatus)

Field	Description
`available` int
`total` int	Total number of nodes that we expect the model to be downloaded. Including nodes that are not ready
`failed` int	Download Failed

ModelSpec

(Appears on:TrainedModelSpec)

ModelSpec describes a TrainedModel

Field	Description
`storageUri` string	Storage URI for the model repository
`framework` string	Machine Learning The values could be: “tensorflow”,“pytorch”,“sklearn”,“onnx”,“xgboost”, “myawesomeinternalframework” etc.
`memory` k8s.io/apimachinery/pkg/api/resource.Quantity	Maximum memory this model will consume, this field is used to decide if a model server has enough memory to load this model.

NamespacedName

(Appears on:ClusterLocalModelStatus)

Field	Description
`namespace` string
`name` string

NodeStatus (`string` alias)

(Appears on:ClusterLocalModelStatus)

NodeStatus enum

Value	Description
"NodeDeleted"
"NodeDeleting"
"NodeDeletionError"
"NodeDownloadError"
"NodeDownloadPending"
"NodeDownloaded"
"NodeDownloading"
"NodeNotReady"

ScaleMetric (`string` alias)

(Appears on:InferenceGraphSpec)

ScaleMetric enum

ServerType (`string` alias)

(Appears on:BuiltInAdapter)

ServerType constant for specifying the runtime name

Value	Description
"mlserver"	Model server is MLServer
"ovms"	Model server is OpenVino Model Server
"triton"	Model server is Triton

ServingRuntime

ServingRuntime is the Schema for the servingruntimes API

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
ServingRuntimeSpec

`supportedModelFormats` []SupportedModelFormat	Model formats and version supported by this runtime
`multiModel` bool	(Optional) Whether this ServingRuntime is intended for multi-model usage or not.
`disabled` bool	(Optional) Set to true to disable use of this runtime
`protocolVersions` []github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol	(Optional) Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2)
`workerSpec` WorkerSpec	(Optional) Set WorkerSpec to enable multi-node/multi-gpu
`ServingRuntimePodSpec` ServingRuntimePodSpec	(Members of `ServingRuntimePodSpec` are embedded into this type.)
`grpcEndpoint` string	(Optional) Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service) Assumed to be single-model runtime if omitted
`grpcDataEndpoint` string	(Optional) Grpc endpoint for inferencing
`httpDataEndpoint` string	(Optional) HTTP endpoint for inferencing
`replicas` uint16	(Optional) Configure the number of replicas in the Deployment generated by this ServingRuntime If specified, this overrides the podsPerRuntime configuration value
`storageHelper` StorageHelper	(Optional) Configuration for this runtime’s use of the storage helper (model puller) It is enabled unless explicitly disabled
`builtInAdapter` BuiltInAdapter	(Optional) Provide the details about built-in runtime adapter

status
ServingRuntimeStatus

ServingRuntimePodSpec

(Appears on:ServingRuntimeSpec, WorkerSpec)

Field	Description
`containers` []Kubernetes core/v1.Container	List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
`volumes` []Kubernetes core/v1.Volume	(Optional) List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes
`nodeSelector` map[string]string	(Optional) NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
`affinity` Kubernetes core/v1.Affinity	(Optional) If specified, the pod’s scheduling constraints
`tolerations` []Kubernetes core/v1.Toleration	(Optional) If specified, the pod’s tolerations.
`labels` map[string]string	(Optional) Labels that will be add to the pod. More info: http://kubernetes.io/docs/user-guide/labels
`annotations` map[string]string	(Optional) Annotations that will be add to the pod. More info: http://kubernetes.io/docs/user-guide/annotations
`imagePullSecrets` []Kubernetes core/v1.LocalObjectReference	(Optional) ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
`hostIPC` bool	(Optional) Use the host’s ipc namespace. Optional: Default to false.

ServingRuntimeSpec

(Appears on:ClusterServingRuntime, ServingRuntime, SupportedRuntime)

ServingRuntimeSpec defines the desired state of ServingRuntime. This spec is currently provisional and are subject to change as details regarding single-model serving and multi-model serving are hammered out.

Field	Description
`supportedModelFormats` []SupportedModelFormat	Model formats and version supported by this runtime
`multiModel` bool	(Optional) Whether this ServingRuntime is intended for multi-model usage or not.
`disabled` bool	(Optional) Set to true to disable use of this runtime
`protocolVersions` []github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol	(Optional) Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2)
`workerSpec` WorkerSpec	(Optional) Set WorkerSpec to enable multi-node/multi-gpu
`ServingRuntimePodSpec` ServingRuntimePodSpec	(Members of `ServingRuntimePodSpec` are embedded into this type.)
`grpcEndpoint` string	(Optional) Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service) Assumed to be single-model runtime if omitted
`grpcDataEndpoint` string	(Optional) Grpc endpoint for inferencing
`httpDataEndpoint` string	(Optional) HTTP endpoint for inferencing
`replicas` uint16	(Optional) Configure the number of replicas in the Deployment generated by this ServingRuntime If specified, this overrides the podsPerRuntime configuration value
`storageHelper` StorageHelper	(Optional) Configuration for this runtime’s use of the storage helper (model puller) It is enabled unless explicitly disabled
`builtInAdapter` BuiltInAdapter	(Optional) Provide the details about built-in runtime adapter

ServingRuntimeStatus

(Appears on:ClusterServingRuntime, ServingRuntime)

ServingRuntimeStatus defines the observed state of ServingRuntime

StorageContainerSpec

(Appears on:ClusterStorageContainer)

StorageContainerSpec defines the container spec for the storage initializer init container, and the protocols it supports.

Field	Description
`container` Kubernetes core/v1.Container	Container spec for the storage initializer init container
`supportedUriFormats` []SupportedUriFormat	List of URI formats that this container supports
`workloadType` WorkloadType

StorageHelper

(Appears on:ServingRuntimeSpec)

Field	Description
`disabled` bool	(Optional)

SupportedModelFormat

(Appears on:ServingRuntimeSpec)

Field	Description
`name` string	Name of the model format.
`version` string	(Optional) Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”.
`autoSelect` bool	(Optional) Set to true to allow the ServingRuntime to be used for automatic model placement if this model format is specified with no explicit runtime.
`priority` int32	(Optional) Priority of this serving runtime for auto selection. This is used to select the serving runtime if more than one serving runtime supports the same model format. The value should be greater than zero. The higher the value, the higher the priority. Priority is not considered if AutoSelect is either false or not specified. Priority can be overridden by specifying the runtime in the InferenceService.

SupportedRuntime

SupportedRuntime is the schema for supported runtime result of automatic selection

Field	Description
`Name` string
`Spec` ServingRuntimeSpec

SupportedUriFormat

(Appears on:StorageContainerSpec)

SupportedUriFormat can be either prefix or regex. Todo: Add validation that only one of them is set.

Field	Description
`prefix` string
`regex` string

TrainedModel

TrainedModel is the Schema for the TrainedModel API

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
TrainedModelSpec

`inferenceService` string	parent inference service to deploy to
`model` ModelSpec	Predictor model spec

status
TrainedModelStatus

TrainedModelSpec

(Appears on:TrainedModel)

TrainedModelSpec defines the TrainedModel spec

Field	Description
`inferenceService` string	parent inference service to deploy to
`model` ModelSpec	Predictor model spec

TrainedModelStatus

(Appears on:TrainedModel)

TrainedModelStatus defines the observed state of TrainedModel

Field Description

Status
knative.dev/pkg/apis/duck/v1.Status

(Members of Status are embedded into this type.)

Conditions for trained model

url
knative.dev/pkg/apis.URL

URL holds the url that will distribute traffic over the provided traffic targets. For v1: http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}/v1/models/:predict For v2: http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}/v2/models//infer

address
knative.dev/pkg/apis/duck/v1.Addressable

Addressable endpoint for the deployed trained model http:///v1/models/.metadata.name

TrainedModelValidator

TrainedModelValidator is responsible for setting default values on the TrainedModel resources when created or updated.

NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as it is used only for temporary operations and does not need to be deeply copied.

WorkerSpec

(Appears on:ServingRuntimeSpec)

WorkerSpec is the schema for multi-node/multi-GPU feature

Field	Description
`ServingRuntimePodSpec` ServingRuntimePodSpec	(Members of `ServingRuntimePodSpec` are embedded into this type.)
`size` int	(Optional) Configure the number of replicas in the worker set, each worker set represents the unit of scaling

WorkloadType (`string` alias)

(Appears on:StorageContainerSpec)

Value	Description
"initContainer"
"localModelDownloadJob"

Generated with gen-crd-api-reference-docs on git commit 7e436424.

serving.kserve.io/v1beta1

Package v1beta1 contains API Schema definitions for the serving v1beta1 API group

Resource Types:

ARTExplainerSpec

(Appears on:ExplainerSpec)

ARTExplainerType defines the arguments for configuring an ART Explanation Server

Field Description

type
ARTExplainerType

The type of ART explainer

ExplainerExtensionSpec
ExplainerExtensionSpec

(Members of ExplainerExtensionSpec are embedded into this type.)

Contains fields shared across all explainers

ARTExplainerType (`string` alias)

(Appears on:ARTExplainerSpec)

Value	Description
"SquareAttack"

Batcher

(Appears on:ComponentExtensionSpec)

Batcher specifies optional payload batching available for all components

Field	Description
`maxBatchSize` int	(Optional) Specifies the max number of requests to trigger a batch
`maxLatency` int	(Optional) Specifies the max latency to trigger a batch
`timeout` int	(Optional) Specifies the timeout of a batch

Component

Component interface is implemented by all specs that contain component implementations, e.g. PredictorSpec, ExplainerSpec, TransformerSpec.

ComponentExtensionSpec

(Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)

ComponentExtensionSpec defines the deployment configuration for a given InferenceService component

Field	Description
`minReplicas` int	(Optional) Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
`maxReplicas` int	(Optional) Maximum number of replicas for autoscaling.
`scaleTarget` int	(Optional) ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
`scaleMetric` ScaleMetric	(Optional) ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
`containerConcurrency` int64	(Optional) ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
`timeout` int64	(Optional) TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
`canaryTrafficPercent` int64	(Optional) CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision
`logger` LoggerSpec	(Optional) Activate request/response logging and logger configurations
`batcher` Batcher	(Optional) Activate request batching and batching configurations
`labels` map[string]string	(Optional) Labels that will be added to the component pod. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
`annotations` map[string]string	(Optional) Annotations that will be added to the component pod. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
`deploymentStrategy` Kubernetes apps/v1.DeploymentStrategy	(Optional) The deployment strategy to use to replace existing pods with new ones. Only applicable for raw deployment mode.

ComponentImplementation

ComponentImplementation interface is implemented by predictor, transformer, and explainer implementations

ComponentStatusSpec

(Appears on:InferenceServiceStatus)

ComponentStatusSpec describes the state of the component

Field	Description
`latestReadyRevision` string	(Optional) Latest revision name that is in ready state
`latestCreatedRevision` string	(Optional) Latest revision name that is created
`previousRolledoutRevision` string	(Optional) Previous revision name that is rolled out with 100 percent traffic
`latestRolledoutRevision` string	(Optional) Latest revision name that is rolled out with 100 percent traffic
`traffic` []knative.dev/serving/pkg/apis/serving/v1.TrafficTarget	(Optional) Traffic holds the configured traffic distribution for latest ready revision and previous rolled out revision.
`url` knative.dev/pkg/apis.URL	(Optional) URL holds the primary url that will distribute traffic over the provided traffic targets. This will be one the REST or gRPC endpoints that are available. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}
`restUrl` knative.dev/pkg/apis.URL	(Optional) REST endpoint of the component if available.
`grpcUrl` knative.dev/pkg/apis.URL	(Optional) gRPC endpoint of the component if available.
`address` knative.dev/pkg/apis/duck/v1.Addressable	(Optional) Addressable endpoint for the InferenceService

ComponentType (`string` alias)

ComponentType contains the different types of components of the service

Value	Description
"explainer"
"predictor"
"transformer"

CustomExplainer

CustomExplainer defines arguments for configuring a custom explainer.

Field	Description
`PodSpec` Kubernetes core/v1.PodSpec	(Members of `PodSpec` are embedded into this type.)

CustomPredictor

CustomPredictor defines arguments for configuring a custom server.

Field	Description
`PodSpec` Kubernetes core/v1.PodSpec	(Members of `PodSpec` are embedded into this type.)

CustomTransformer

CustomTransformer defines arguments for configuring a custom transformer.

Field	Description
`PodSpec` Kubernetes core/v1.PodSpec	(Members of `PodSpec` are embedded into this type.)

DeployConfig

Field	Description
`defaultDeploymentMode` string

ExplainerConfig

(Appears on:ExplainersConfig)

Field	Description
`image` string	explainer docker image name
`defaultImageVersion` string	default explainer docker image version

ExplainerExtensionSpec

(Appears on:ARTExplainerSpec)

ExplainerExtensionSpec defines configuration shared across all explainer frameworks

Field	Description
`storageUri` string	The location of a trained explanation model
`runtimeVersion` string	Defaults to latest Explainer Version
`config` map[string]string	Inline custom parameter settings for explainer
`Container` Kubernetes core/v1.Container	(Members of `Container` are embedded into this type.) (Optional) Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec.
`storage` StorageSpec	(Optional) Storage Spec for model location

ExplainerSpec

(Appears on:InferenceServiceSpec)

ExplainerSpec defines the container spec for a model explanation server, The following fields follow a “1-of” semantic. Users must specify exactly one spec.

Field Description

art
ARTExplainerSpec

Spec for ART explainer

PodSpec
PodSpec

(Members of PodSpec are embedded into this type.)

This spec is dual purpose. 1) Users may choose to provide a full PodSpec for their custom explainer. The field PodSpec.Containers is mutually exclusive with other explainers. 2) Users may choose to provide a Explainer and specify PodSpec overrides in the PodSpec. They must not provide PodSpec.Containers in this case.

ComponentExtensionSpec
ComponentExtensionSpec

(Members of ComponentExtensionSpec are embedded into this type.)

Component extension defines the deployment configurations for explainer

ExplainersConfig

(Appears on:InferenceServicesConfig)

Field	Description
`art` ExplainerConfig

FailureInfo

(Appears on:ModelStatus)

Field	Description
`location` string	(Optional) Name of component to which the failure relates (usually Pod name)
`reason` FailureReason	(Optional) High level class of failure
`message` string	(Optional) Detailed error message
`modelRevisionName` string	(Optional) Internal Revision/ID of model, tied to specific Spec contents
`time` Kubernetes meta/v1.Time	(Optional) Time failure occurred or was discovered
`exitCode` int32	(Optional) Exit status from the last termination of the container

FailureReason (`string` alias)

(Appears on:FailureInfo)

FailureReason enum

Value	Description
"InvalidPredictorSpec"	The current Predictor Spec is invalid or unsupported
"ModelLoadFailed"	The model failed to load within a ServingRuntime container
"NoSupportingRuntime"	There are no ServingRuntime which support the specified model type
"RuntimeDisabled"	The ServingRuntime is disabled
"RuntimeNotRecognized"	There is no ServingRuntime defined with the specified runtime name
"RuntimeUnhealthy"	Corresponding ServingRuntime containers failed to start or are unhealthy

HuggingFaceRuntimeSpec

(Appears on:PredictorSpec)

HuggingFaceRuntimeSpec defines arguments for configuring HuggingFace model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

InferenceService

InferenceService is the Schema for the InferenceServices API

Field Description

metadata
Kubernetes meta/v1.ObjectMeta Refer to the Kubernetes API documentation for the fields of the metadata field.

spec
InferenceServiceSpec

`predictor` PredictorSpec	Predictor defines the model serving spec
`explainer` ExplainerSpec	(Optional) Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified.
`transformer` TransformerSpec	(Optional) Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service.

status
InferenceServiceStatus

InferenceServiceDefaulter

InferenceServiceDefaulter is responsible for setting default values on the InferenceService when created or updated.

NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as it is used only for temporary operations and does not need to be deeply copied.

InferenceServiceSpec

(Appears on:InferenceService)

InferenceServiceSpec is the top level type for this resource

Field	Description
`predictor` PredictorSpec	Predictor defines the model serving spec
`explainer` ExplainerSpec	(Optional) Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified.
`transformer` TransformerSpec	(Optional) Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service.

InferenceServiceStatus

(Appears on:InferenceService)

InferenceServiceStatus defines the observed state of InferenceService

Field	Description
`Status` knative.dev/pkg/apis/duck/v1.Status	(Members of `Status` are embedded into this type.) Conditions for the InferenceService - PredictorReady: predictor readiness condition; - TransformerReady: transformer readiness condition; - ExplainerReady: explainer readiness condition; - RoutesReady (serverless mode only): aggregated routing condition, i.e. endpoint readiness condition; - LatestDeploymentReady (serverless mode only): aggregated configuration condition, i.e. latest deployment readiness condition; - Ready: aggregated condition;
`address` knative.dev/pkg/apis/duck/v1.Addressable	(Optional) Addressable endpoint for the InferenceService
`url` knative.dev/pkg/apis.URL	(Optional) URL holds the url that will distribute traffic over the provided traffic targets. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}
`components` map[kserve.io/serving/pkg/apis/serving/v1beta1.ComponentType]kserve.io/serving/pkg/apis/serving/v1beta1.ComponentStatusSpec	Statuses for the components of the InferenceService
`modelStatus` ModelStatus	Model related statuses

InferenceServiceValidator

InferenceServiceValidator is responsible for validating the InferenceService resource when it is created, updated, or deleted.

NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as this struct is used only for temporary operations and does not need to be deeply copied.

InferenceServicesConfig

Field	Description
`explainers` ExplainersConfig	Explainer configurations

IngressConfig

Field	Description
`ingressGateway` string
`knativeLocalGatewayService` string
`localGateway` string
`localGatewayService` string
`ingressDomain` string
`ingressClassName` string
`additionalIngressDomains` []string
`domainTemplate` string
`urlScheme` string
`disableIstioVirtualHost` bool
`pathTemplate` string
`disableIngressCreation` bool

LightGBMSpec

(Appears on:PredictorSpec)

LightGBMSpec defines arguments for configuring LightGBMSpec model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

LocalModelConfig

Field	Description
`enabled` bool
`jobNamespace` string
`defaultJobImage` string
`fsGroup` int64

LoggerSpec

(Appears on:ComponentExtensionSpec)

LoggerSpec specifies optional payload logging available for all components

Field	Description
`url` string	(Optional) URL to send logging events
`mode` LoggerType	(Optional) Specifies the scope of the loggers. Valid values are: - “all” (default): log both request and response; - “request”: log only request; - “response”: log only response
`metadataHeaders` []string	(Optional) Matched metadata HTTP headers for propagating to inference logger cloud events.

LoggerType (`string` alias)

(Appears on:LoggerSpec)

LoggerType controls the scope of log publishing

Value	Description
"all"	LogAll Logger mode to log both request and response
"request"	LogRequest Logger mode to log only request
"response"	LogResponse Logger mode to log only response

ModelCopies

(Appears on:ModelStatus)

Field	Description
`failedCopies` int	How many copies of this predictor’s models failed to load recently
`totalCopies` int	(Optional) Total number copies of this predictor’s models that are currently loaded

ModelFormat

(Appears on:ModelSpec)

Field	Description
`name` string	Name of the model format.
`version` string	(Optional) Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”.

ModelRevisionStates

(Appears on:ModelStatus)

Field	Description
`activeModelState` ModelState	High level state string: Pending, Standby, Loading, Loaded, FailedToLoad
`targetModelState` ModelState

ModelSpec

(Appears on:PredictorSpec)

Field	Description
`modelFormat` ModelFormat	ModelFormat being served.
`runtime` string	(Optional) Specific ClusterServingRuntime/ServingRuntime name to use for deployment.
`PredictorExtensionSpec` PredictorExtensionSpec	(Members of `PredictorExtensionSpec` are embedded into this type.)

ModelState (`string` alias)

(Appears on:ModelRevisionStates)

ModelState enum

Value	Description
"FailedToLoad"	All copies of the model failed to load
"Loaded"	At least one copy of the model is loaded
"Loading"	Model is loading
"Pending"	Model is not yet registered
"Standby"	Model is available but not loaded (will load when used)

ModelStatus

(Appears on:InferenceServiceStatus)

Field	Description
`transitionStatus` TransitionStatus	Whether the available predictor endpoints reflect the current Spec or is in transition
`states` ModelRevisionStates	(Optional) State information of the predictor’s model.
`lastFailureInfo` FailureInfo	(Optional) Details of last failure, when load of target model is failed or blocked.
`copies` ModelCopies	(Optional) Model copy information of the predictor’s model.

ONNXRuntimeSpec

(Appears on:PredictorSpec)

ONNXRuntimeSpec defines arguments for configuring ONNX model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

PMMLSpec

(Appears on:PredictorSpec)

PMMLSpec defines arguments for configuring PMML model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

PaddleServerSpec

(Appears on:PredictorSpec)

Field	Description
`PredictorExtensionSpec` PredictorExtensionSpec	(Members of `PredictorExtensionSpec` are embedded into this type.)

PodSpec

(Appears on:ExplainerSpec, PredictorSpec, TransformerSpec, WorkerSpec)

PodSpec is a description of a pod.

Field	Description
`volumes` []Kubernetes core/v1.Volume	(Optional) List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes
`initContainers` []Kubernetes core/v1.Container	List of initialization containers belonging to the pod. Init containers are executed in order prior to containers being started. If any init container fails, the pod is considered to have failed and is handled according to its restartPolicy. The name for an init container or normal container must be unique among all containers. Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes. The resourceRequirements of an init container are taken into account during scheduling by finding the highest request/limit for each resource type, and then using the max of of that value or the sum of the normal containers. Limits are applied to init containers in a similar fashion. Init containers cannot currently be added or removed. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
`containers` []Kubernetes core/v1.Container	List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
`ephemeralContainers` []Kubernetes core/v1.EphemeralContainer	(Optional) List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing pod to perform user-initiated actions such as debugging. This list cannot be specified when creating a pod, and it cannot be modified by updating the pod spec. In order to add an ephemeral container to an existing pod, use the pod’s ephemeralcontainers subresource. This field is beta-level and available on clusters that haven’t disabled the EphemeralContainers feature gate.
`restartPolicy` Kubernetes core/v1.RestartPolicy	(Optional) Restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to Always. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
`terminationGracePeriodSeconds` int64	(Optional) Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates stop immediately via the kill signal (no opportunity to shut down). If this value is nil, the default grace period will be used instead. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. Defaults to 30 seconds.
`activeDeadlineSeconds` int64	(Optional) Optional duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers. Value must be a positive integer.
`dnsPolicy` Kubernetes core/v1.DNSPolicy	(Optional) Set DNS policy for the pod. Defaults to “ClusterFirst”. Valid values are ‘ClusterFirstWithHostNet’, ‘ClusterFirst’, ‘Default’ or ‘None’. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to ‘ClusterFirstWithHostNet’.
`nodeSelector` map[string]string	(Optional) NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
`serviceAccountName` string	(Optional) ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
`serviceAccount` string	(Optional) DeprecatedServiceAccount is a depreciated alias for ServiceAccountName. Deprecated: Use serviceAccountName instead.
`automountServiceAccountToken` bool	(Optional) AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
`nodeName` string	(Optional) NodeName is a request to schedule this pod onto a specific node. If it is non-empty, the scheduler simply schedules this pod onto that node, assuming that it fits resource requirements.
`hostNetwork` bool	(Optional) Host networking requested for this pod. Use the host’s network namespace. If this option is set, the ports that will be used must be specified. Default to false.
`hostPID` bool	(Optional) Use the host’s pid namespace. Optional: Default to false.
`hostIPC` bool	(Optional) Use the host’s ipc namespace. Optional: Default to false.
`shareProcessNamespace` bool	(Optional) Share a single process namespace between all of the containers in a pod. When this is set containers will be able to view and signal processes from other containers in the same pod, and the first process in each container will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both be set. Optional: Default to false.
`securityContext` Kubernetes core/v1.PodSecurityContext	(Optional) SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field.
`imagePullSecrets` []Kubernetes core/v1.LocalObjectReference	(Optional) ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
`hostname` string	(Optional) Specifies the hostname of the Pod If not specified, the pod’s hostname will be set to a system-defined value.
`subdomain` string	(Optional) If specified, the fully qualified Pod hostname will be “...svc.”. If not specified, the pod will not have a domainname at all.
`affinity` Kubernetes core/v1.Affinity	(Optional) If specified, the pod’s scheduling constraints
`schedulerName` string	(Optional) If specified, the pod will be dispatched by specified scheduler. If not specified, the pod will be dispatched by default scheduler.
`tolerations` []Kubernetes core/v1.Toleration	(Optional) If specified, the pod’s tolerations.
`hostAliases` []Kubernetes core/v1.HostAlias	(Optional) HostAliases is an optional list of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods.
`priorityClassName` string	(Optional) If specified, indicates the pod’s priority. “system-node-critical” and “system-cluster-critical” are two special keywords which indicate the highest priorities with the former being the highest priority. Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default.
`priority` int32	(Optional) The priority value. Various system components use this field to find the priority of the pod. When Priority Admission Controller is enabled, it prevents users from setting this field. The admission controller populates this field from PriorityClassName. The higher the value, the higher the priority.
`dnsConfig` Kubernetes core/v1.PodDNSConfig	(Optional) Specifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy.
`readinessGates` []Kubernetes core/v1.PodReadinessGate	(Optional) If specified, all readiness gates will be evaluated for pod readiness. A pod is ready when all its containers are ready AND all conditions specified in the readiness gates have status equal to “True” More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
`runtimeClassName` string	(Optional) RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the “legacy” RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class This is a beta feature as of Kubernetes v1.14.
`enableServiceLinks` bool	(Optional) EnableServiceLinks indicates whether information about services should be injected into pod’s environment variables, matching the syntax of Docker links. Optional: Defaults to true.
`preemptionPolicy` Kubernetes core/v1.PreemptionPolicy	(Optional) PreemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset. This field is beta-level, gated by the NonPreemptingPriority feature-gate.
`overhead` Kubernetes core/v1.ResourceList	(Optional) Overhead represents the resource overhead associated with running a pod for a given RuntimeClass. This field will be autopopulated at admission time by the RuntimeClass admission controller. If the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests. The RuntimeClass admission controller will reject Pod create requests which have the overhead already set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero. More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md This field is beta-level as of Kubernetes v1.18, and is only honored by servers that enable the PodOverhead feature.
`topologySpreadConstraints` []Kubernetes core/v1.TopologySpreadConstraint	(Optional) TopologySpreadConstraints describes how a group of pods ought to spread across topology domains. Scheduler will schedule pods in a way which abides by the constraints. All topologySpreadConstraints are ANDed.
`setHostnameAsFQDN` bool	(Optional) If true the pod’s hostname will be configured as the pod’s FQDN, rather than the leaf name (the default). In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname). In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to FQDN. If a pod does not have FQDN, this has no effect. Default to false.
`os` Kubernetes core/v1.PodOS	(Optional) Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set. If the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions If the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[].securityContext.seLinuxOptions - spec.containers[].securityContext.seccompProfile - spec.containers[].securityContext.capabilities - spec.containers[].securityContext.readOnlyRootFilesystem - spec.containers[].securityContext.privileged - spec.containers[].securityContext.allowPrivilegeEscalation - spec.containers[].securityContext.procMount - spec.containers[].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup This is an alpha field and requires the IdentifyPodOS feature
`hostUsers` bool	(Optional) Use the host’s user namespace. Optional: Default to true. If set to true or not present, the pod will be run in the host user namespace, useful for when the pod needs a feature only available to the host user namespace, such as loading a kernel module with CAP_SYS_MODULE. When set to false, a new userns is created for the pod. Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root without actually having root privileges on the host. This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
`schedulingGates` []Kubernetes core/v1.PodSchedulingGate	(Optional) SchedulingGates is an opaque list of values that if specified will block scheduling the pod. If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the scheduler will not attempt to schedule the pod. SchedulingGates can only be set at pod creation time, and be removed only afterwards. This is a beta feature enabled by the PodSchedulingReadiness feature gate.
`resourceClaims` []Kubernetes core/v1.PodResourceClaim	(Optional) ResourceClaims defines which ResourceClaims must be allocated and reserved before the Pod is allowed to start. The resources will be made available to those containers which consume them by name. This is an alpha field and requires enabling the DynamicResourceAllocation feature gate. This field is immutable.

PredictorExtensionSpec

(Appears on:HuggingFaceRuntimeSpec, LightGBMSpec, ModelSpec, ONNXRuntimeSpec, PMMLSpec, PaddleServerSpec, SKLearnSpec, TFServingSpec, TorchServeSpec, TritonSpec, XGBoostSpec)

PredictorExtensionSpec defines configuration shared across all predictor frameworks

Field	Description
`storageUri` string	(Optional) This field points to the location of the trained model which is mounted onto the pod.
`runtimeVersion` string	(Optional) Runtime version of the predictor docker image
`protocolVersion` github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol	(Optional) Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
`Container` Kubernetes core/v1.Container	(Members of `Container` are embedded into this type.) (Optional) Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec.
`storage` StorageSpec	(Optional) Storage Spec for model location

PredictorImplementation

PredictorImplementation defines common functions for all predictors e.g Tensorflow, Triton, etc

PredictorSpec

(Appears on:InferenceServiceSpec)

PredictorSpec defines the configuration for a predictor, The following fields follow a “1-of” semantic. Users must specify exactly one spec.

Field	Description
`sklearn` SKLearnSpec	Spec for SKLearn model server
`xgboost` XGBoostSpec	Spec for XGBoost model server
`tensorflow` TFServingSpec	Spec for TFServing (https://github.com/tensorflow/serving)
`pytorch` TorchServeSpec	Spec for TorchServe (https://pytorch.org/serve)
`triton` TritonSpec	Spec for Triton Inference Server (https://github.com/triton-inference-server/server)
`onnx` ONNXRuntimeSpec	Spec for ONNX runtime (https://github.com/microsoft/onnxruntime)
`huggingface` HuggingFaceRuntimeSpec	Spec for HuggingFace runtime (https://github.com/huggingface)
`pmml` PMMLSpec	Spec for PMML (http://dmg.org/pmml/v4-1/GeneralStructure.html)
`lightgbm` LightGBMSpec	Spec for LightGBM model server
`paddle` PaddleServerSpec	Spec for Paddle model server (https://github.com/PaddlePaddle/Serving)
`model` ModelSpec	Model spec for any arbitrary framework.
`workerSpec` WorkerSpec	WorkerSpec for enabling multi-node/multi-gpu
`PodSpec` PodSpec	(Members of `PodSpec` are embedded into this type.) This spec is dual purpose. 1) Provide a full PodSpec for custom predictor. The field PodSpec.Containers is mutually exclusive with other predictors (i.e. TFServing). 2) Provide a predictor (i.e. TFServing) and specify PodSpec overrides, you must not provide PodSpec.Containers in this case.
`ComponentExtensionSpec` ComponentExtensionSpec	(Members of `ComponentExtensionSpec` are embedded into this type.) Component extension defines the deployment configurations for a predictor

SKLearnSpec

(Appears on:PredictorSpec)

SKLearnSpec defines arguments for configuring SKLearn model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

ScaleMetric (`string` alias)

(Appears on:ComponentExtensionSpec)

ScaleMetric enum

Value	Description
"cpu"
"concurrency"
"memory"
"rps"

SecurityConfig

Field	Description
`autoMountServiceAccountToken` bool

StorageSpec

(Appears on:ExplainerExtensionSpec, PredictorExtensionSpec)

Field	Description
`path` string	(Optional) The path to the model object in the storage. It cannot co-exist with the storageURI.
`schemaPath` string	(Optional) The path to the model schema file in the storage.
`parameters` map[string]string	(Optional) Parameters to override the default storage credentials and config.
`key` string	(Optional) The Storage Key in the secret for this model.

TFServingSpec

(Appears on:PredictorSpec)

TFServingSpec defines arguments for configuring Tensorflow model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

TorchServeSpec

(Appears on:PredictorSpec)

TorchServeSpec defines arguments for configuring PyTorch model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

TransformerSpec

(Appears on:InferenceServiceSpec)

TransformerSpec defines transformer service for pre/post processing

Field Description

PodSpec
PodSpec

(Members of PodSpec are embedded into this type.)

This spec is dual purpose.
1) Provide a full PodSpec for custom transformer. The field PodSpec.Containers is mutually exclusive with other transformers.
2) Provide a transformer and specify PodSpec overrides, you must not provide PodSpec.Containers in this case.

ComponentExtensionSpec
ComponentExtensionSpec

(Members of ComponentExtensionSpec are embedded into this type.)

Component extension defines the deployment configurations for a transformer

TransitionStatus (`string` alias)

(Appears on:ModelStatus)

TransitionStatus enum

Value	Description
"BlockedByFailedLoad"	Target model failed to load
"InProgress"	Waiting for target model to reach state of active model
"InvalidSpec"	Target predictor spec failed validation
"UpToDate"	Predictor is up-to-date (reflects current spec)

TritonSpec

(Appears on:PredictorSpec)

TritonSpec defines arguments for configuring Triton model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

WorkerSpec

(Appears on:PredictorSpec)

Field	Description
`PodSpec` PodSpec	(Members of `PodSpec` are embedded into this type.)
`size` int	(Optional) Configure the number of replicas in the worker set, each worker set represents the unit of scaling

XGBoostSpec

(Appears on:PredictorSpec)

XGBoostSpec defines arguments for configuring XGBoost model serving.

Field Description

PredictorExtensionSpec
PredictorExtensionSpec

(Members of PredictorExtensionSpec are embedded into this type.)

Contains fields shared across all predictors

Generated with gen-crd-api-reference-docs on git commit 7e436424.

Control Plane API

serving.kserve.io/v1alpha1

BuiltInAdapter

ClusterLocalModel

ClusterLocalModelSpec

ClusterLocalModelStatus

ClusterServingRuntime

ClusterStorageContainer

InferenceGraph

InferenceGraphSpec

InferenceGraphStatus

InferenceGraphValidator

InferenceRouter

InferenceRouterType (string alias)

InferenceStep

InferenceStepDependencyType (string alias)

InferenceTarget

LocalModelNodeGroup

LocalModelNodeGroupSpec

LocalModelNodeGroupStatus

ModelCopies

ModelSpec

NamespacedName

NodeStatus (string alias)

ScaleMetric (string alias)

ServerType (string alias)

ServingRuntime

ServingRuntimePodSpec

ServingRuntimeSpec

ServingRuntimeStatus

StorageContainerSpec

StorageHelper

SupportedModelFormat

SupportedRuntime

SupportedUriFormat

TrainedModel

TrainedModelSpec

TrainedModelStatus

TrainedModelValidator

WorkerSpec

WorkloadType (string alias)

serving.kserve.io/v1beta1

ARTExplainerSpec

ARTExplainerType (string alias)

Batcher

Component

ComponentExtensionSpec

ComponentImplementation

ComponentStatusSpec

ComponentType (string alias)

CustomExplainer

CustomPredictor

CustomTransformer

DeployConfig

ExplainerConfig

ExplainerExtensionSpec

ExplainerSpec

ExplainersConfig

FailureInfo

FailureReason (string alias)

HuggingFaceRuntimeSpec

InferenceService

InferenceServiceDefaulter

InferenceServiceSpec

InferenceServiceStatus

InferenceServiceValidator

InferenceServicesConfig

IngressConfig

LightGBMSpec

LocalModelConfig

LoggerSpec

LoggerType (string alias)

ModelCopies

ModelFormat

ModelRevisionStates

ModelSpec

ModelState (string alias)

ModelStatus

ONNXRuntimeSpec

PMMLSpec

InferenceRouterType (`string` alias)

InferenceStepDependencyType (`string` alias)

NodeStatus (`string` alias)

ScaleMetric (`string` alias)

ServerType (`string` alias)

WorkloadType (`string` alias)

ARTExplainerType (`string` alias)

ComponentType (`string` alias)

FailureReason (`string` alias)

LoggerType (`string` alias)

ModelState (`string` alias)

ScaleMetric (`string` alias)

TransitionStatus (`string` alias)