Skip to main content

Control Plane API

serving.kserve.io/v1alpha1

Package v1alpha1 contains API Schema definitions for the serving v1alpha1 API group

Package v1alpha1 contains API Schema definitions for the serving v1alpha1 API group

Resource Kinds

Available Kinds

Kind Definitions

ClusterServingRuntime

Appears in:

ClusterServingRuntime is the Schema for the servingruntimes API

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a ClusterServingRuntime resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
specrequired
statusrequired

ClusterServingRuntimeList

ClusterServingRuntimeList contains a list of ServingRuntime

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a ClusterServingRuntimeList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

ClusterStorageContainer

Appears in:

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a ClusterStorageContainer resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
disabledoptional
boolean

ClusterStorageContainerList

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a ClusterStorageContainerList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

InferenceGraph

Appears in:

InferenceGraph is the Schema for the InferenceGraph API for multiple models

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a InferenceGraph resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
specrequired
statusrequired

InferenceGraphList

InferenceGraphList contains a list of InferenceGraph

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a InferenceGraphList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

LLMInferenceService

Appears in:

LLMInferenceService is the Schema for the llminferenceservices API, representing a single LLM deployment. It orchestrates the creation of underlying Kubernetes resources like Deployments and Services, and configures networking for exposing the model.

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LLMInferenceService resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".

LLMInferenceServiceConfig

Appears in:

LLMInferenceServiceConfig is the Schema for the llminferenceserviceconfigs API. It acts as a template to provide base configurations that can be inherited by multiple LLMInferenceService instances.

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LLMInferenceServiceConfig resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".

LLMInferenceServiceConfigList

LLMInferenceServiceConfigList is the list type for LLMInferenceServiceConfig.

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LLMInferenceServiceConfigList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

LLMInferenceServiceList

LLMInferenceServiceList is the list type for LLMInferenceService.

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LLMInferenceServiceList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

LocalModelCache

Appears in:

LocalModelCache

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LocalModelCache resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
statusrequired

LocalModelCacheList

LocalModelCacheList

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LocalModelCacheList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

LocalModelNode

Appears in:

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LocalModelNode resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
specrequired
statusrequired

LocalModelNodeGroup

Appears in:

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LocalModelNodeGroup resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".

LocalModelNodeGroupList

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LocalModelNodeGroupList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

LocalModelNodeList

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a LocalModelNodeList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

ServingRuntime

Appears in:

ServingRuntime is the Schema for the servingruntimes API

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a ServingRuntime resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
specrequired
statusrequired

ServingRuntimeList

ServingRuntimeList contains a list of ServingRuntime

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a ServingRuntimeList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

TrainedModel

Appears in:

TrainedModel is the Schema for the TrainedModel API

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a TrainedModel resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
specrequired
statusrequired

TrainedModelList

TrainedModelList contains a list of TrainedModel

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1alpha1 of the API.
kindrequired
String
This is a TrainedModelList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

Supporting Types

Available Types

Type Definitions

BuiltInAdapter

Appears in:

Fields
serverTyperequired
ServerType must be one of the supported built-in types such as "triton" or "mlserver",
and the runtime's container must have the same name
runtimeManagementPortrequired
integer
Port which the runtime server listens for model management requests
memBufferBytesrequired
integer
Fixed memory overhead to subtract from runtime container's memory allocation to determine model capacity
modelLoadingTimeoutMillisrequired
integer
Timeout for model loading operations in milliseconds
envrequired
EnvVar array
Environment variables used to control other aspects of the built-in adapter's behaviour (uncommon)

GatewayRoutesSpec

Appears in:

GatewayRoutesSpec defines the configuration for a Gateway API route.

Fields
httpoptional
HTTP route configuration.

GatewaySpec

Appears in:

GatewaySpec defines the configuration for a Gateway API Gateway.

Fields
refsoptional
Refs provides references to existing, user-managed Gateway objects ("Bring Your Own" gateway).
The controller will use the specified Gateway instead of creating one.

HTTPRouteSpec

Appears in:

HTTPRouteSpec defines configurations for a Gateway API HTTPRoute. 'Spec' and 'Refs' are mutually exclusive and determine whether the route is managed by the controller or user-managed.

Fields
refsoptional
Refs provides references to existing, user-managed HTTPRoute objects ("Bring Your Own" route).
The controller will validate the existence of these routes but will not modify them.
specoptional
Spec allows for providing a custom specification for an HTTPRoute.
If provided, the controller will create and manage an HTTPRoute with this specification.

InfereceGraphRouterTimeouts

Appears in:

Fields
serverReadoptional
integer
ServerRead specifies the number of seconds to wait before timing out a request read by the server.
serverWriteoptional
integer
ServerWrite specifies the maximum duration in seconds before timing out writes of the response.
serverIdleoptional
integer
ServerIdle specifies the maximum amount of time in seconds to wait for the next request when keep-alives are enabled.
serviceClientoptional
integer
ServiceClient specifies a time limit in seconds for requests made to the graph components by HTTP client.

InferenceGraphSpec

Appears in:

InferenceGraphSpec defines the InferenceGraph spec

Fields
nodesrequired
object (keys:string, values:InferenceRouter)
Map of InferenceGraph router nodes
Each node defines the router which can be different routing types
resourcesoptional
affinityoptional
timeoutoptional
integer
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
routerTimeoutsoptional
minReplicasoptional
integer
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
maxReplicasoptional
integer
Maximum number of replicas for autoscaling.
scaleTargetoptional
integer
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
concurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
scaleMetricoptional
ScaleMetric defines the scaling metric type watched by autoscaler
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
tolerationsoptional
Toleration specifies the toleration for the InferenceGraph.
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector specifies the node selector for the InferenceGraph.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
nodeNameoptional
string
NodeName specifies the node name for the InferenceGraph.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
serviceAccountNameoptional
string
ServiceAccountName specifies the service account name for the InferenceGraph.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

InferenceGraphStatus

Appears in:

InferenceGraphStatus defines the InferenceGraph conditions and status

Fields
observedGenerationoptional
integer
ObservedGeneration is the 'Generation' of the Service that
was last processed by the controller.
conditionsoptional
Conditions the latest available observations of a resource's current state.
annotationsrequired
object (keys:string, values:string)
Annotations is additional Status fields for the Resource to save some
additional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
urloptional
Url for the InferenceGraph
deploymentModerequired
string
InferenceGraph DeploymentMode

InferencePoolSpec

Appears in:

InferencePoolSpec defines the configuration for an InferencePool. 'Spec' and 'Ref' are mutually exclusive.

Fields
specoptional
Spec defines an inline InferencePool specification.
Ref is a reference to an existing InferencePool.

InferenceRouter

Appears in:

InferenceRouter defines the router for each InferenceGraph node with one or multiple steps

kind: InferenceGraph
metadata:


name: canary-route


spec:


nodes:
root:
routerType: Splitter
routes:
- service: mymodel1
weight: 20
- service: mymodel2
weight: 80


kind: InferenceGraph
metadata:


name: abtest


spec:


nodes:
mymodel:
routerType: Switch
routes:
- service: mymodel1
condition: "{ .input.userId == 1 }"
- service: mymodel2
condition: "{ .input.userId == 2 }"


Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods.

Tree Ensemble constitutes a case where simple algorithms for combining results of either classification or regression trees are well known. Multiple classification trees, for example, are commonly combined using a "majority-vote" method. Multiple regression trees are often combined using various averaging techniques. e.g tagging models with segment identifiers and weights to be used for their combination in these ways.

kind: InferenceGraph
metadata:


name: ensemble


spec:


nodes:
root:
routerType: Sequence
routes:
- service: feast
- nodeName: ensembleModel
data: $response
ensembleModel:
routerType: Ensemble
routes:
- service: sklearn-model
- service: xgboost-model


Scoring a case using a sequence, or chain of models allows the output of one model to be passed in as input to the subsequent models.

kind: InferenceGraph
metadata:


name: model-chainer


spec:


nodes:
root:
routerType: Sequence
routes:
- service: mymodel-s1
- service: mymodel-s2
data: $response
- service: mymodel-s3
data: $response


In the flow described below, the pre_processing node base64 encodes the image and passes it to two model nodes in the flow. The encoded data is available to both these nodes for classification. The second node i.e. dog-breed-classification takes the original input from the pre_processing node along-with the response from the cat-dog-classification node to do further classification of the dog breed if required.

kind: InferenceGraph
metadata:


name: dog-breed-classification


spec:


nodes:
root:
routerType: Sequence
routes:
- service: cat-dog-classifier
- nodeName: breed-classifier
data: $request
breed-classifier:
routerType: Switch
routes:
- service: dog-breed-classifier
condition: { .predictions.class == "dog" }
- service: cat-breed-classifier
condition: { .predictions.class == "cat" }


Fields
routerTyperequired
RouterType

- Sequence: chain multiple inference steps with input/output from previous step

- Splitter: randomly routes to the target service according to the weight

- Ensemble: routes the request to multiple models and then merge the responses

- Switch: routes the request to one of the steps based on condition
stepsoptional
Steps defines destinations for the current router node

InferenceRouterType

Underlying type: string

Appears in:

InferenceRouterType constant for inference routing types

Possible Values
Sequence
Sequence Default type only route to one destination
Splitter
Splitter router randomly routes the requests to the named service according to the weight
Ensemble
Ensemble router routes the requests to multiple models and then merge the responses
Switch
Switch routes the request to the model based on certain condition

InferenceStep

Appears in:

InferenceStep defines the inference target of the current step with condition, weights and data.

Fields
nameoptional
string
Unique name for the step within this node
nodeNameoptional
string
The node name for routing as next step
serviceNamerequired
string
named reference for InferenceService
serviceUrloptional
string
InferenceService URL, mutually exclusive with ServiceName
dataoptional
string
request data sent to the next route with input/output from the previous step
$request
$response.predictions
weightoptional
integer
the weight for split of the traffic, only used for Split Router
when weight is specified all the routing targets should be sum to 100
conditionoptional
string
routing based on the condition
dependencyoptional
to decide whether a step is a hard or a soft dependency in the Inference Graph

InferenceStepDependencyType

Underlying type: string

Appears in:

InferenceStepDependencyType constant for inference step dependency

Possible Values
Soft
Soft
Hard
Hard

InferenceTarget

Appears in:

Exactly one InferenceTarget field must be specified

Fields
nodeNameoptional
string
The node name for routing as next step
serviceNamerequired
string
named reference for InferenceService
serviceUrloptional
string
InferenceService URL, mutually exclusive with ServiceName

IngressSpec

Appears in:

IngressSpec defines the configuration for a Kubernetes Ingress.

Fields
refsoptional
Refs provides a reference to an existing, user-managed Ingress object ("Bring Your Own" ingress).
The controller will not create an Ingress but will use the referenced one to populate status URLs.

LLMInferenceServiceSpec

Appears in:

LLMInferenceServiceSpec defines the desired state of LLMInferenceService.

Fields
modelrequired
Model specification, including its URI, potential LoRA adapters, and storage details.
replicasoptional
integer
Number of replicas for the deployment.
parallelismoptional
Parallelism configurations for the runtime, such as tensor and pipeline parallelism.
These values are used to configure the underlying inference runtime (e.g., vLLM).
templateoptional
Template for the main pod spec.
In a multi-node deployment, this configures the "head" or "master" pod.
In a disaggregated deployment, this configures the "decode" pod if it's the top-level template,
or the "prefill" pod if it's within the Prefill block.
workeroptional
Worker configuration for multi-node deployments.
The presence of this field triggers the creation of a multi-node (distributed) setup.
This spec defines the configuration for the worker pods, while the main 'Template' field defines the head pod.
The controller is responsible for enabling discovery between head and worker pods.
routeroptional
Router configuration for how the service is exposed. This section dictates the creation and management
of networking resources like Ingress or Gateway API objects (HTTPRoute, Gateway).
prefilloptional
Prefill configuration for disaggregated serving.
When this section is included, the controller creates a separate deployment for prompt processing (prefill)
in addition to the main 'decode' deployment, inspired by the llm-d architecture.
This allows for independent scaling and hardware allocation for prefill and decode steps.
baseRefsoptional
BaseRefs allows inheriting and overriding configurations from one or more LLMInferenceServiceConfig instances.
The controller merges these base configurations, with the current LLMInferenceService spec taking the highest precedence.
When multiple baseRefs are provided, the last one in the list overrides previous ones.

LLMInferenceServiceStatus

Appears in:

LLMInferenceServiceStatus defines the observed state of LLMInferenceService.

Fields
urloptional
URL of the publicly exposed service.
observedGenerationoptional
integer
ObservedGeneration is the 'Generation' of the Service that
was last processed by the controller.
conditionsoptional
Conditions the latest available observations of a resource's current state.
annotationsrequired
object (keys:string, values:string)
Annotations is additional Status fields for the Resource to save some
additional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
addressoptional
Address is a single Addressable address.
If Addresses is present, Address will be ignored by clients.
addressesoptional
Addresses is a list of addresses for different protocols (HTTP and HTTPS)
If Addresses is present, Address must be ignored by clients.

LLMModelSpec

Appears in:

LLMModelSpec defines the model source and its characteristics.

Fields
urirequired
URI of the model, specifying its location, e.g., hf://meta-llama/Llama-4-Scout-17B-16E-Instruct
The storage-initializer init container uses this URI to download the model.
nameoptional
string
Name is the name of the model as it will be set in the "model" parameter for an incoming request.
If omitted, it will default to metadata.name. For LoRA adapters, this field is required.
criticalityoptional
Criticality defines how important it is to serve the model compared to other models.
This is used by the Inference Gateway scheduler.
loraoptional
LoRA (Low-Rank Adaptation) adapters configurations.
Allows for specifying one or more LoRA adapters to be applied to the base model.
storagerequired
Storage specification for the model, such as path and credentials.
This is used by the storage-initializer to correctly download the model from the specified URI.

LLMStorageSpec

Appears in:

LLMStorageSpec is a copy of the v1beta1.StorageSpec. It is duplicated here to avoid import cycles between the v1alpha1 and v1beta1 API packages.

Fields
pathoptional
string
The path to the model object in the storage. It cannot co-exist
with the storageURI.
parametersoptional
map[string]string
Parameters to override the default storage credentials and config.
keyoptional
string
The Storage Key in the secret for this model.

LoRASpec

Appears in:

LoRASpec defines the configuration for LoRA adapters.

Fields
adaptersoptional
ModelSpec array
Adapters is the static specification for one or more LoRA adapters.
Each adapter is defined by its own ModelSpec.

LocalModelCacheSpec

Appears in:

LocalModelCacheSpec

Fields
sourceModelUrirequired
string
Original StorageUri
modelSizerequired
Model size to make sure it does not exceed the disk space reserved for local models. The limit is defined on the NodeGroup.
nodeGroupsrequired
string array
group of nodes to cache the model on.
Todo: support more than 1 node groups

LocalModelCacheStatus

Appears in:

Fields
nodeStatusrequired
object (keys:string, values:NodeStatus)
Status of the model on a node, like NodeDownloaded or NodeNotReady
copiesoptional
How many nodes have the model available locally
inferenceServicesrequired
Inference services using this local model

LocalModelInfo

Appears in:

Fields
sourceModelUrirequired
string
Original StorageUri
modelNamerequired
string
Model name. Used as the subdirectory name to store this model on local file system

LocalModelNodeGroupSpec

Appears in:

LocalModelNodeGroupSpec defines a group of nodes for to download the model to.

Fields
storageLimitrequired
Max storage size per node in this node group
persistentVolumeSpecrequired
Used to create PersistentVolumes for downloading models and in inference service namespaces
persistentVolumeClaimSpecrequired
Used to create PersistentVolumeClaims for download and in inference service namespaces

LocalModelNodeGroupStatus

Appears in:

Fields
usedrequired
Used storage space on any node for this node group
availablerequired
Available storage space on any node for this node group

LocalModelNodeSpec

Appears in:

Fields
localModelsrequired
List of model source URI and their names

LocalModelNodeStatus

Appears in:

Fields
modelStatusrequired
object (keys:string, values:ModelStatus)
Status of each local model

ModelCopies

Appears in:

Fields
availablerequired
integer
totalrequired
integer
Total number of nodes that we expect the model to be downloaded. Including nodes that are not ready
failedrequired
integer
Download Failed

ModelSpec

Appears in:

ModelSpec describes a TrainedModel

Fields
storageUrirequired
string
Storage URI for the model repository
frameworkrequired
string
Machine Learning
The values could be: "tensorflow","pytorch","sklearn","onnx","xgboost", "myawesomeinternalframework" etc.
memoryrequired
Maximum memory this model will consume, this field is used to decide if a model server has enough memory to load this model.

ModelStatus

Underlying type: string

Appears in:

ModelStatus enum

Possible Values
ModelDownloadPending
ModelDownloading
ModelDownloaded
ModelDownloadError

NamespacedName

Appears in:

Fields
namespacerequired
string
namerequired
string

NodeStatus

Underlying type: string

Appears in:

NodeStatus enum

Possible Values
NodeNotReady
NodeDownloadPending
NodeDownloading
NodeDownloaded
NodeDownloadError

ParallelismSpec

Appears in:

ParallelismSpec defines the parallelism parameters for distributed inference.

Fields
tensoroptional
integer
Tensor parallelism size.
pipelineoptional
integer
Pipeline parallelism size.

RouterSpec

Appears in:

RouterSpec defines the routing configuration for exposing the service. It supports Kubernetes Ingress and the Gateway API. The fields are mutually exclusive.

Fields
routeoptional
Route configuration for the Gateway API.
If an empty object \{\} is provided, the controller creates and manages a new HTTPRoute.
gatewayoptional
Gateway configuration for the Gateway API, mutually exclusive with Ingress.
If an empty object \{\} is provided, the controller uses a default Gateway.
This must be used in conjunction with the 'Route' field for managed Gateway API resources.
ingressoptional
Ingress configuration. This is mutually exclusive with Route and Gateway.
If an empty object \{\} is provided, the controller creates and manages a default Ingress resource.
scheduleroptional
Scheduler configuration for the Inference Gateway extension.
If this field is non-empty, an InferenceModel resource will be created to integrate with the gateway's scheduler.

ScaleMetric

Underlying type: string

Appears in:

ScaleMetric enum

SchedulerSpec

Appears in:

SchedulerSpec defines the Inference Gateway extension configuration.

The SchedulerSpec configures the connection from the Gateway to the model deployment leveraging the LLM optimized request Scheduler, also known as the Endpoint Picker (EPP) which determines the exact pod that should handle the request and responds back to Envoy with the target pod, Envoy will then forward the request to the chosen pod.

The Scheduler is only effective when having multiple inference pod replicas.

Step 1: Gateway (Envoy) <-- ExtProc --> EPP (select the optimal replica to handle the request) Step 2: Gateway (Envoy) <-- forward request --> Inference Pod X

Fields
pooloptional
Pool configuration for the InferencePool, which is part of the Inference Gateway extension.
templateoptional
Template for the Inference Gateway Extension pod spec.
This configures the Endpoint Picker (EPP) Deployment.

ServerType

Underlying type: string

Appears in:

ServerType constant for specifying the runtime name

Possible Values
triton
Model server is Triton
mlserver
Model server is MLServer
ovms
Model server is OpenVino Model Server

ServingRuntimePodSpec

Appears in:

Fields
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
affinityoptional
If specified, the pod's scheduling constraints
tolerationsoptional
If specified, the pod's tolerations.
labelsoptional
object (keys:string, values:string)
Labels that will be add to the pod.
More info: http://kubernetes.io/docs/user-guide/labels
annotationsoptional
object (keys:string, values:string)
Annotations that will be add to the pod.
More info: http://kubernetes.io/docs/user-guide/annotations
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use. For example,
in the case of docker, only DockerConfig type secrets are honored.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.

ServingRuntimeSpec

Appears in:

ServingRuntimeSpec defines the desired state of ServingRuntime. This spec is currently provisional and are subject to change as details regarding single-model serving and multi-model serving are hammered out.

Fields
supportedModelFormatsrequired
Model formats and version supported by this runtime
multiModeloptional
boolean
Whether this ServingRuntime is intended for multi-model usage or not.
disabledoptional
boolean
Set to true to disable use of this runtime
protocolVersionsoptional
InferenceServiceProtocol array
Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2)
workerSpecoptional
Set WorkerSpec to enable multi-node/multi-gpu
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
affinityoptional
If specified, the pod's scheduling constraints
tolerationsoptional
If specified, the pod's tolerations.
labelsoptional
object (keys:string, values:string)
Labels that will be add to the pod.
More info: http://kubernetes.io/docs/user-guide/labels
annotationsoptional
object (keys:string, values:string)
Annotations that will be add to the pod.
More info: http://kubernetes.io/docs/user-guide/annotations
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use. For example,
in the case of docker, only DockerConfig type secrets are honored.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.
grpcEndpointoptional
string
Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service)
Assumed to be single-model runtime if omitted
grpcDataEndpointoptional
string
Grpc endpoint for inferencing
httpDataEndpointoptional
string
HTTP endpoint for inferencing
replicasoptional
integer
Configure the number of replicas in the Deployment generated by this ServingRuntime
If specified, this overrides the podsPerRuntime configuration value
storageHelperoptional
Configuration for this runtime's use of the storage helper (model puller)
It is enabled unless explicitly disabled
builtInAdapteroptional
Provide the details about built-in runtime adapter

ServingRuntimeStatus

Appears in:

ServingRuntimeStatus defines the observed state of ServingRuntime

StorageContainerSpec

Appears in:

StorageContainerSpec defines the container spec for the storage initializer init container, and the protocols it supports.

Fields
containerrequired
Container spec for the storage initializer init container
supportedUriFormatsrequired
List of URI formats that this container supports
workloadTyperequired
initContainer

StorageHelper

Appears in:

Fields
disabledoptional
boolean

SupportedModelFormat

Appears in:

Fields
namerequired
string
Name of the model format.
versionoptional
string
Version of the model format.
Used in validating that a predictor is supported by a runtime.
Can be "major", "major.minor" or "major.minor.patch".
autoSelectoptional
boolean
Set to true to allow the ServingRuntime to be used for automatic model placement if
this model format is specified with no explicit runtime.
priorityoptional
integer
Priority of this serving runtime for auto selection.
This is used to select the serving runtime if more than one serving runtime supports the same model format.
The value should be greater than zero. The higher the value, the higher the priority.
Priority is not considered if AutoSelect is either false or not specified.
Priority can be overridden by specifying the runtime in the InferenceService.

SupportedUriFormat

Appears in:

SupportedUriFormat can be either prefix or regex. Todo: Add validation that only one of them is set.

Fields
prefixrequired
string
regexrequired
string

TrainedModelSpec

Appears in:

TrainedModelSpec defines the TrainedModel spec

Fields
inferenceServicerequired
string
parent inference service to deploy to
modelrequired
Predictor model spec

TrainedModelStatus

Appears in:

TrainedModelStatus defines the observed state of TrainedModel

Fields
observedGenerationoptional
integer
ObservedGeneration is the 'Generation' of the Service that
was last processed by the controller.
conditionsoptional
Conditions the latest available observations of a resource's current state.
annotationsrequired
object (keys:string, values:string)
Annotations is additional Status fields for the Resource to save some
additional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
urlrequired
URL holds the url that will distribute traffic over the provided traffic targets.
For v1: http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}/v1/models/:predict
For v2: http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}/v2/models//infer
addressrequired
Addressable endpoint for the deployed trained model
http:///v1/models/.metadata.name

UntypedObjectReference

Appears in:

UntypedObjectReference is a reference to an object without a specific Group/Version/Kind. It's used for referencing networking resources like Gateways and Ingresses where the exact type might be inferred or is not strictly required by this controller.

Fields
namerequired
Name of the referenced object.
namespacerequired
Namespace of the referenced object.

WorkerSpec

Appears in:

WorkerSpec is the schema for multi-node/multi-GPU feature

Fields
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
affinityoptional
If specified, the pod's scheduling constraints
tolerationsoptional
If specified, the pod's tolerations.
labelsoptional
object (keys:string, values:string)
Labels that will be add to the pod.
More info: http://kubernetes.io/docs/user-guide/labels
annotationsoptional
object (keys:string, values:string)
Annotations that will be add to the pod.
More info: http://kubernetes.io/docs/user-guide/annotations
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use. For example,
in the case of docker, only DockerConfig type secrets are honored.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.
pipelineParallelSizeoptional
integer
PipelineParallelSize defines the number of parallel workers.
It specifies the number of model partitions across multiple devices, allowing large models to be split and processed concurrently across these partitions
It also represents the number of replicas in the worker set, where each worker set serves as a scaling unit.
tensorParallelSizeoptional
integer
TensorParallelSize specifies the number of GPUs to be used per node.
It indicates the degree of parallelism for tensor computations across the available GPUs.

WorkloadSpec

Appears in:

WorkloadSpec defines the configuration for a deployment workload, such as replicas and pod specifications.

Fields
replicasoptional
integer
Number of replicas for the deployment.
parallelismoptional
Parallelism configurations for the runtime, such as tensor and pipeline parallelism.
These values are used to configure the underlying inference runtime (e.g., vLLM).
templateoptional
Template for the main pod spec.
In a multi-node deployment, this configures the "head" or "master" pod.
In a disaggregated deployment, this configures the "decode" pod if it's the top-level template,
or the "prefill" pod if it's within the Prefill block.
workeroptional
Worker configuration for multi-node deployments.
The presence of this field triggers the creation of a multi-node (distributed) setup.
This spec defines the configuration for the worker pods, while the main 'Template' field defines the head pod.
The controller is responsible for enabling discovery between head and worker pods.

WorkloadType

Underlying type: string

Appears in:

Possible Values
initContainer
localModelDownloadJob

serving.kserve.io/v1beta1

Package v1beta1 contains API Schema definitions for the serving v1beta1 API group

Package v1beta1 contains API Schema definitions for the serving v1beta1 API group

Resource Kinds

Available Kinds

Kind Definitions

InferenceService

Appears in:

InferenceService is the Schema for the InferenceServices API

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1beta1 of the API.
kindrequired
String
This is a InferenceService resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".

InferenceServiceList

InferenceServiceList contains a list of Service

Fields
apiVersionrequired
String
We are on version serving.kserve.io/v1beta1 of the API.
kindrequired
String
This is a InferenceServiceList resource
metadatarequired
Refer to Kubernetes API documentation for fields of "metadata".
itemsrequired

Supporting Types

Available Types

Type Definitions

ARTExplainerSpec

Appears in:

ARTExplainerType defines the arguments for configuring an ART Explanation Server

Fields
typerequired
The type of ART explainer
storageUrirequired
string
The location of a trained explanation model
runtimeVersionrequired
string
Defaults to latest Explainer Version
configrequired
object (keys:string, values:string)
Inline custom parameter settings for explainer
storageoptional
Storage Spec for model location

ARTExplainerType

Underlying type: string

Appears in:

Possible Values
SquareAttack

AuthenticationRef

Appears in:

Fields
namerequired
string
name is the name of the authentication secret

AutoScalingSpec

Appears in:

Fields
metricsrequired
metrics is a list of metrics spec to be used for autoscaling

Batcher

Appears in:

Batcher specifies optional payload batching available for all components

Fields
maxBatchSizeoptional
integer
Specifies the max number of requests to trigger a batch
maxLatencyoptional
integer
Specifies the max latency to trigger a batch
timeoutoptional
integer
Specifies the timeout of a batch

ComponentExtensionSpec

Appears in:

ComponentExtensionSpec defines the deployment configuration for a given InferenceService component

Fields
minReplicasoptional
integer
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
maxReplicasoptional
integer
Maximum number of replicas for autoscaling.
scaleTargetoptional
integer
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
concurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
scaleMetricoptional
ScaleMetric defines the scaling metric type watched by autoscaler.
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
scaleMetricTypeoptional
Type of metric to use. Options are Utilization, or AverageValue.
autoScalingoptional
AutoScaling autoscaling spec which is backed up HPA or KEDA.
containerConcurrencyoptional
integer
ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container
concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeoutoptional
integer
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
canaryTrafficPercentoptional
integer
CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision
loggeroptional
Activate request/response logging and logger configurations
batcheroptional
Activate request batching and batching configurations
labelsoptional
object (keys:string, values:string)
Labels that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotationsoptional
object (keys:string, values:string)
Annotations that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
deploymentStrategyoptional
The deployment strategy to use to replace existing pods with new ones. Only applicable for raw deployment mode.

ComponentStatusSpec

Appears in:

ComponentStatusSpec describes the state of the component

Fields
latestReadyRevisionoptional
string
Latest revision name that is in ready state
latestCreatedRevisionoptional
string
Latest revision name that is created
previousRolledoutRevisionoptional
string
Previous revision name that is rolled out with 100 percent traffic
latestRolledoutRevisionoptional
string
Latest revision name that is rolled out with 100 percent traffic
trafficoptional
TrafficTarget array
Traffic holds the configured traffic distribution for latest ready revision and previous rolled out revision.
urloptional
URL holds the primary url that will distribute traffic over the provided traffic targets.
This will be one the REST or gRPC endpoints that are available.
It generally has the form http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}
restUrloptional
REST endpoint of the component if available.
grpcUrloptional
gRPC endpoint of the component if available.
addressoptional
Addressable endpoint for the InferenceService

ComponentType

Underlying type: string

Appears in:

ComponentType contains the different types of components of the service

Possible Values
predictor
explainer
transformer

ExplainerConfig

Appears in:

Fields
imagerequired
string
explainer docker image name
defaultImageVersionrequired
string
default explainer docker image version

ExplainerExtensionSpec

Appears in:

ExplainerExtensionSpec defines configuration shared across all explainer frameworks

Fields
storageUrirequired
string
The location of a trained explanation model
runtimeVersionrequired
string
Defaults to latest Explainer Version
configrequired
object (keys:string, values:string)
Inline custom parameter settings for explainer
storageoptional
Storage Spec for model location

ExplainerSpec

Appears in:

ExplainerSpec defines the container spec for a model explanation server, The following fields follow a "1-of" semantic. Users must specify exactly one spec.

Fields
artrequired
Spec for ART explainer
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
initContainersrequired
Container array
List of initialization containers belonging to the pod.
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
ephemeralContainersoptional
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
restartPolicyoptional
Restart policy for all containers within the pod.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSecondsoptional
integer
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
Value must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSecondsoptional
integer
Optional duration in seconds the pod may be active on the node relative to
StartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
dnsPolicyoptional
Set DNS policy for the pod.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountNameoptional
string
ServiceAccountName is the name of the ServiceAccount to use to run this pod.
More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccountoptional
string
DeprecatedServiceAccount is a deprecated alias for ServiceAccountName.
Deprecated: Use serviceAccountName instead.
automountServiceAccountTokenoptional
boolean
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
nodeNameoptional
string
NodeName indicates in which node this pod is scheduled.
If empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetworkoptional
boolean
Host networking requested for this pod. Use the host's network namespace.
If this option is set, the ports that will be used must be specified.
Default to false.
hostPIDoptional
boolean
Use the host's pid namespace.
Optional: Default to false.
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.
shareProcessNamespaceoptional
boolean
Share a single process namespace between all of the containers in a pod.
When this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
securityContextoptional
SecurityContext holds pod-level security attributes and common container settings.
Optional: Defaults to empty. See type description for default values of each field.
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostnameoptional
string
Specifies the hostname of the Pod
If not specified, the pod's hostname will be set to a system-defined value.
subdomainoptional
string
If specified, the fully qualified Pod hostname will be "...svc.".
If not specified, the pod will not have a domainname at all.
affinityoptional
If specified, the pod's scheduling constraints
schedulerNameoptional
string
If specified, the pod will be dispatched by specified scheduler.
If not specified, the pod will be dispatched by default scheduler.
tolerationsoptional
If specified, the pod's tolerations.
hostAliasesoptional
HostAlias array
HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts
file if specified.
priorityClassNameoptional
string
If specified, indicates the pod's priority. "system-node-critical" and
"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priorityoptional
integer
The priority value. Various system components use this field to find the
priority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
dnsConfigoptional
Specifies the DNS parameters of a pod.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
readinessGatesoptional
If specified, all readiness gates will be evaluated for pod readiness.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassNameoptional
string
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used
to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinksoptional
boolean
EnableServiceLinks indicates whether information about services should be injected into pod's
environment variables, matching the syntax of Docker links.
Optional: Defaults to true.
preemptionPolicyoptional
PreemptionPolicy is the Policy for preempting pods with lower priority.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
overheadoptional
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
topologySpreadConstraintsoptional
TopologySpreadConstraints describes how a group of pods ought to spread across topology
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDNoptional
boolean
If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default).
In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
osoptional
Specifies the OS of the containers in the pod.
Some pod and container fields are restricted if this is set.

If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions

If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsersoptional
boolean
Use the host's user namespace.
Optional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
schedulingGatesoptional
SchedulingGates is an opaque list of values that if specified will block scheduling the pod.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.

SchedulingGates can only be set at pod creation time, and be removed only afterwards.
resourceClaimsoptional
ResourceClaims defines which ResourceClaims must be allocated
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.

This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.

This field is immutable.
resourcesoptional
Resources is the total amount of CPU and Memory resources required by all
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.

This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.

This is an alpha field and requires enabling the PodLevelResources feature
gate.
minReplicasoptional
integer
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
maxReplicasoptional
integer
Maximum number of replicas for autoscaling.
scaleTargetoptional
integer
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
concurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
scaleMetricoptional
ScaleMetric defines the scaling metric type watched by autoscaler.
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
scaleMetricTypeoptional
Type of metric to use. Options are Utilization, or AverageValue.
autoScalingoptional
AutoScaling autoscaling spec which is backed up HPA or KEDA.
containerConcurrencyoptional
integer
ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container
concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeoutoptional
integer
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
canaryTrafficPercentoptional
integer
CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision
loggeroptional
Activate request/response logging and logger configurations
batcheroptional
Activate request batching and batching configurations
labelsoptional
object (keys:string, values:string)
Labels that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotationsoptional
object (keys:string, values:string)
Annotations that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
deploymentStrategyoptional
The deployment strategy to use to replace existing pods with new ones. Only applicable for raw deployment mode.

ExplainersConfig

Appears in:

Fields
artrequired

ExtMetricAuthentication

Appears in:

Fields
authenticationRefrequired
authenticationRef is a reference to the authentication information
for more information see: https://keda.sh/docs/2.17/scalers/prometheus/#authentication-parameters
authModesoptional
string
authModes defines the authentication modes for the metrics backend
possible values are bearer, basic, tls.
for more information see: https://keda.sh/docs/2.17/scalers/prometheus/#authentication-parameters

ExternalMetricSource

Appears in:

Fields
metricrequired
metric identifies the target metric by name and selector
authenticationRefoptional
authenticationRef is a reference to the authentication information
for more information see: https://keda.sh/docs/2.17/scalers/prometheus/#authentication-parameters
targetrequired
target specifies the target value for the given metric

ExternalMetrics

Appears in:

Fields
backendoptional
MetricsBackend defines the scaling metric type watched by autoscaler
possible values are prometheus, graphite.
serverAddressoptional
string
Address of MetricsBackend server.
queryoptional
string
Query to run to get metrics from MetricsBackend
namespaceoptional
string
For namespaced query

FailureInfo

Appears in:

Fields
locationoptional
string
Name of component to which the failure relates (usually Pod name)
reasonoptional
High level class of failure
messageoptional
string
Detailed error message
modelRevisionNameoptional
string
Internal Revision/ID of model, tied to specific Spec contents
timeoptional
Time failure occurred or was discovered
exitCodeoptional
integer
Exit status from the last termination of the container

FailureReason

Underlying type: string

Appears in:

FailureReason enum

Possible Values
ModelLoadFailed
The model failed to load within a ServingRuntime container
RuntimeUnhealthy
Corresponding ServingRuntime containers failed to start or are unhealthy
RuntimeDisabled
The ServingRuntime is disabled
NoSupportingRuntime
There are no ServingRuntime which support the specified model type
RuntimeNotRecognized
There is no ServingRuntime defined with the specified runtime name
InvalidPredictorSpec
The current Predictor Spec is invalid or unsupported

HuggingFaceRuntimeSpec

Appears in:

HuggingFaceRuntimeSpec defines arguments for configuring HuggingFace model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

InferenceServiceSpec

Appears in:

InferenceServiceSpec is the top level type for this resource

Fields
predictorrequired
Predictor defines the model serving spec
explaineroptional
Explainer defines the model explanation service spec,
explainer service calls to predictor or transformer if it is specified.
transformeroptional
Transformer defines the pre/post processing before and after the predictor call,
transformer service calls to predictor service.

InferenceServiceStatus

Appears in:

InferenceServiceStatus defines the observed state of InferenceService

Fields
observedGenerationoptional
integer
ObservedGeneration is the 'Generation' of the Service that
was last processed by the controller.
conditionsoptional
Conditions the latest available observations of a resource's current state.
annotationsrequired
object (keys:string, values:string)
Annotations is additional Status fields for the Resource to save some
additional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
addressoptional
Addressable endpoint for the InferenceService
urloptional
URL holds the url that will distribute traffic over the provided traffic targets.
It generally has the form http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}
componentsrequired
object (keys:ComponentType, values:ComponentStatusSpec)
Statuses for the components of the InferenceService
modelStatusrequired
Model related statuses
deploymentModerequired
string
InferenceService DeploymentMode
servingRuntimeNamerequired
string
ServingRuntimeName is the name of the ServingRuntime that the InferenceService is using
clusterServingRuntimeNamerequired
string
ClusterServingRuntimeName is the name of the ClusterServingRuntime that the InferenceService is using

LightGBMSpec

Appears in:

LightGBMSpec defines arguments for configuring LightGBMSpec model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

LoggerSpec

Appears in:

LoggerSpec specifies optional payload logging available for all components

Fields
urloptional
string
URL to send logging events
modeoptional
Specifies the scope of the loggers.
Valid values are:
- "all" (default): log both request and response;
- "request": log only request;
- "response": log only response
metadataHeadersoptional
string array
Matched metadata HTTP headers for propagating to inference logger cloud events.
metadataAnnotationsoptional
string array
Matched inference service annotations for propagating to inference logger cloud events.
storageoptional
Specifies the storage location for the inference logger cloud events.

LoggerStorageSpec

Appears in:

Fields
pathoptional
string
The path to the object in the storage. Note that this path is relative to the storage URI.
parametersoptional
map[string]string
Parameters to override the default storage credentials and config.
keyoptional
string
The Storage Key in the secret for this object.
serviceAccountNamerequired
string

LoggerType

Underlying type: string

Appears in:

LoggerType controls the scope of log publishing

Possible Values
all
LogAll Logger mode to log both request and response
request
LogRequest Logger mode to log only request
response
LogResponse Logger mode to log only response

MetricSourceType

Underlying type: string

Appears in:

MetricSourceType indicates the type of metric.

Possible Values
Resource
ResourceMetricSourceType is a resource metric known to Kubernetes, as
specified in requests and limits, describing each pod in the current
scale target (e.g. CPU or memory). Such metrics are built in to
Kubernetes, and have special scaling options on top of those available
to normal per-pod metrics (the "pods" source).
External
ExternalMetricSourceType is a global metric that is not associated
with any Kubernetes object. It allows autoscaling based on information
coming from components running outside of cluster
(for example length of queue in cloud messaging service, or
QPS from loadbalancer running outside of cluster).
PodMetric
PodMetricSourceType indicates a metric describing each pod in the current
scale target (for example, transactions-processed-per-second). The values
will be averaged together before being compared to the target value.

MetricTarget

Appears in:

MetricTarget defines the target value, average value, or average utilization of a specific metric

Fields
typeoptional
type represents whether the metric type is Utilization, Value, or AverageValue
valueoptional
value is the target value of the metric (as a quantity).
averageValueoptional
averageValue is the target value of the average of the
metric across all relevant pods (as a quantity)
averageUtilizationoptional
integer
averageUtilization is the target value of the average of the
resource metric across all relevant pods, represented as a percentage of
the requested value of the resource for the pods.
Currently only valid for Resource metric source type

MetricTargetType

Underlying type: string

Appears in:

MetricTargetType specifies the type of metric being targeted, and should be either "Value", "AverageValue", or "Utilization"

Possible Values
Utilization
UtilizationMetricType declares a MetricTarget is an AverageUtilization value
Value
ValueMetricType declares a MetricTarget is a raw value
AverageValue
AverageValueMetricType declares a MetricTarget is an

MetricsBackend

Underlying type: string

Appears in:

MetricsBackend enum

Possible Values
prometheus
graphite

MetricsSpec

Appears in:

MetricsSpec specifies how to scale based on a single metric (only type and one other matching field should be set at once).

Fields
typerequired
type is the type of metric source. It should be one of "Resource", "External", "PodMetric".
"Resource" or "External" each mapping to a matching field in the object.
resourceoptional
resource refers to a resource metric (such as those specified in
requests and limits) known to Kubernetes describing each pod in the
current scale target (e.g. CPU or memory). Such metrics are built in to
Kubernetes, and have special scaling options on top of those available
to normal per-pod metrics using the "pods" source.
externaloptional
external refers to a global metric that is not associated
with any Kubernetes object. It allows autoscaling based on information
coming from components running outside of cluster
(for example length of queue in cloud messaging service, or
QPS from load balancer running outside of cluster).
podmetricoptional
pods refers to a metric describing each pod in the current scale target
(for example, transactions-processed-per-second). The values will be
averaged together before being compared to the target value.

ModelCopies

Appears in:

Fields
failedCopiesrequired
integer
0
How many copies of this predictor's models failed to load recently
totalCopiesoptional
integer
Total number copies of this predictor's models that are currently loaded

ModelFormat

Appears in:

Fields
namerequired
string
Name of the model format.
versionoptional
string
Version of the model format.
Used in validating that a predictor is supported by a runtime.
Can be "major", "major.minor" or "major.minor.patch".

ModelRevisionStates

Appears in:

Fields
activeModelStaterequired
Pending
High level state string: Pending, Standby, Loading, Loaded, FailedToLoad
targetModelStaterequired

ModelSpec

Appears in:

Fields
modelFormatrequired
ModelFormat being served.
runtimeoptional
string
Specific ClusterServingRuntime/ServingRuntime name to use for deployment.
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

ModelState

Underlying type: string

Appears in:

ModelState enum

Possible Values
Pending
Model is not yet registered
Standby
Model is available but not loaded (will load when used)
Loading
Model is loading
Loaded
At least one copy of the model is loaded
FailedToLoad
All copies of the model failed to load

ModelStatus

Appears in:

Fields
transitionStatusrequired
UpToDate
Whether the available predictor endpoints reflect the current Spec or is in transition
statesoptional
State information of the predictor's model.
lastFailureInfooptional
Details of last failure, when load of target model is failed or blocked.
copiesoptional
Model copy information of the predictor's model.

ModelStorageSpec

Appears in:

Fields
pathoptional
string
The path to the object in the storage. Note that this path is relative to the storage URI.
parametersoptional
map[string]string
Parameters to override the default storage credentials and config.
keyoptional
string
The Storage Key in the secret for this object.
schemaPathoptional
string
The path to the model schema file in the storage.

ONNXRuntimeSpec

Appears in:

ONNXRuntimeSpec defines arguments for configuring ONNX model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

PMMLSpec

Appears in:

PMMLSpec defines arguments for configuring PMML model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

PaddleServerSpec

Appears in:

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

PodMetricSource

Appears in:

PodMetricSource indicates how to scale on a metric describing each pod in the current scale target (for example, transactions-processed-per-second). The values will be averaged together before being compared to the target value.

Fields
metricrequired
metric identifies the target metric by name and selector
targetrequired
target specifies the target value for the given metric

PodMetrics

Appears in:

Fields
backendoptional
Backend defines the scaling metric type watched by the autoscaler.
Possible value: opentelemetry.
serverAddressoptional
string
ServerAddress specifies the address of the PodsMetricsBackend server.
metricNamesoptional
string array
MetricNames is the list of metric names in the backend.
queryoptional
string
Query specifies the query to run to get metrics from the PodsMetricsBackend.
operationOverTimeoptional
string
OperationOverTime specifies the operation to aggregate the metrics over time.
Possible values are last_one, avg, max, min, rate, count. Default is 'last_one'.

PodSpec

Appears in:

PodSpec is a description of a pod.

Fields
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
initContainersrequired
Container array
List of initialization containers belonging to the pod.
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
ephemeralContainersoptional
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
restartPolicyoptional
Restart policy for all containers within the pod.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSecondsoptional
integer
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
Value must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSecondsoptional
integer
Optional duration in seconds the pod may be active on the node relative to
StartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
dnsPolicyoptional
Set DNS policy for the pod.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountNameoptional
string
ServiceAccountName is the name of the ServiceAccount to use to run this pod.
More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccountoptional
string
DeprecatedServiceAccount is a deprecated alias for ServiceAccountName.
Deprecated: Use serviceAccountName instead.
automountServiceAccountTokenoptional
boolean
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
nodeNameoptional
string
NodeName indicates in which node this pod is scheduled.
If empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetworkoptional
boolean
Host networking requested for this pod. Use the host's network namespace.
If this option is set, the ports that will be used must be specified.
Default to false.
hostPIDoptional
boolean
Use the host's pid namespace.
Optional: Default to false.
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.
shareProcessNamespaceoptional
boolean
Share a single process namespace between all of the containers in a pod.
When this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
securityContextoptional
SecurityContext holds pod-level security attributes and common container settings.
Optional: Defaults to empty. See type description for default values of each field.
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostnameoptional
string
Specifies the hostname of the Pod
If not specified, the pod's hostname will be set to a system-defined value.
subdomainoptional
string
If specified, the fully qualified Pod hostname will be "...svc.".
If not specified, the pod will not have a domainname at all.
affinityoptional
If specified, the pod's scheduling constraints
schedulerNameoptional
string
If specified, the pod will be dispatched by specified scheduler.
If not specified, the pod will be dispatched by default scheduler.
tolerationsoptional
If specified, the pod's tolerations.
hostAliasesoptional
HostAlias array
HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts
file if specified.
priorityClassNameoptional
string
If specified, indicates the pod's priority. "system-node-critical" and
"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priorityoptional
integer
The priority value. Various system components use this field to find the
priority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
dnsConfigoptional
Specifies the DNS parameters of a pod.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
readinessGatesoptional
If specified, all readiness gates will be evaluated for pod readiness.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassNameoptional
string
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used
to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinksoptional
boolean
EnableServiceLinks indicates whether information about services should be injected into pod's
environment variables, matching the syntax of Docker links.
Optional: Defaults to true.
preemptionPolicyoptional
PreemptionPolicy is the Policy for preempting pods with lower priority.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
overheadoptional
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
topologySpreadConstraintsoptional
TopologySpreadConstraints describes how a group of pods ought to spread across topology
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDNoptional
boolean
If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default).
In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
osoptional
Specifies the OS of the containers in the pod.
Some pod and container fields are restricted if this is set.

If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions

If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsersoptional
boolean
Use the host's user namespace.
Optional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
schedulingGatesoptional
SchedulingGates is an opaque list of values that if specified will block scheduling the pod.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.

SchedulingGates can only be set at pod creation time, and be removed only afterwards.
resourceClaimsoptional
ResourceClaims defines which ResourceClaims must be allocated
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.

This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.

This field is immutable.
resourcesoptional
Resources is the total amount of CPU and Memory resources required by all
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.

This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.

This is an alpha field and requires enabling the PodLevelResources feature
gate.

PodsMetricsBackend

Underlying type: string

Appears in:

PodsMetricsBackend enum

Possible Values
opentelemetry

PredictorExtensionSpec

Appears in:

PredictorExtensionSpec defines configuration shared across all predictor frameworks

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

PredictorSpec

Appears in:

PredictorSpec defines the configuration for a predictor, The following fields follow a "1-of" semantic. Users must specify exactly one spec.

Fields
sklearnrequired
Spec for SKLearn model server
xgboostrequired
Spec for XGBoost model server
tensorflowrequired
Spec for TFServing (https://github.com/tensorflow/serving)
pytorchrequired
Spec for TorchServe (https://pytorch.org/serve)
tritonrequired
Spec for Triton Inference Server (https://github.com/triton-inference-server/server)
onnxrequired
Spec for ONNX runtime (https://github.com/microsoft/onnxruntime)
huggingfacerequired
Spec for HuggingFace runtime (https://github.com/huggingface)
pmmlrequired
Spec for PMML (http://dmg.org/pmml/v4-1/GeneralStructure.html)
lightgbmrequired
Spec for LightGBM model server
paddlerequired
Spec for Paddle model server (https://github.com/PaddlePaddle/Serving)
modelrequired
Model spec for any arbitrary framework.
workerSpecrequired
WorkerSpec for enabling multi-node/multi-gpu
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
initContainersrequired
Container array
List of initialization containers belonging to the pod.
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
ephemeralContainersoptional
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
restartPolicyoptional
Restart policy for all containers within the pod.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSecondsoptional
integer
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
Value must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSecondsoptional
integer
Optional duration in seconds the pod may be active on the node relative to
StartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
dnsPolicyoptional
Set DNS policy for the pod.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountNameoptional
string
ServiceAccountName is the name of the ServiceAccount to use to run this pod.
More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccountoptional
string
DeprecatedServiceAccount is a deprecated alias for ServiceAccountName.
Deprecated: Use serviceAccountName instead.
automountServiceAccountTokenoptional
boolean
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
nodeNameoptional
string
NodeName indicates in which node this pod is scheduled.
If empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetworkoptional
boolean
Host networking requested for this pod. Use the host's network namespace.
If this option is set, the ports that will be used must be specified.
Default to false.
hostPIDoptional
boolean
Use the host's pid namespace.
Optional: Default to false.
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.
shareProcessNamespaceoptional
boolean
Share a single process namespace between all of the containers in a pod.
When this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
securityContextoptional
SecurityContext holds pod-level security attributes and common container settings.
Optional: Defaults to empty. See type description for default values of each field.
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostnameoptional
string
Specifies the hostname of the Pod
If not specified, the pod's hostname will be set to a system-defined value.
subdomainoptional
string
If specified, the fully qualified Pod hostname will be "...svc.".
If not specified, the pod will not have a domainname at all.
affinityoptional
If specified, the pod's scheduling constraints
schedulerNameoptional
string
If specified, the pod will be dispatched by specified scheduler.
If not specified, the pod will be dispatched by default scheduler.
tolerationsoptional
If specified, the pod's tolerations.
hostAliasesoptional
HostAlias array
HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts
file if specified.
priorityClassNameoptional
string
If specified, indicates the pod's priority. "system-node-critical" and
"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priorityoptional
integer
The priority value. Various system components use this field to find the
priority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
dnsConfigoptional
Specifies the DNS parameters of a pod.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
readinessGatesoptional
If specified, all readiness gates will be evaluated for pod readiness.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassNameoptional
string
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used
to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinksoptional
boolean
EnableServiceLinks indicates whether information about services should be injected into pod's
environment variables, matching the syntax of Docker links.
Optional: Defaults to true.
preemptionPolicyoptional
PreemptionPolicy is the Policy for preempting pods with lower priority.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
overheadoptional
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
topologySpreadConstraintsoptional
TopologySpreadConstraints describes how a group of pods ought to spread across topology
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDNoptional
boolean
If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default).
In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
osoptional
Specifies the OS of the containers in the pod.
Some pod and container fields are restricted if this is set.

If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions

If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsersoptional
boolean
Use the host's user namespace.
Optional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
schedulingGatesoptional
SchedulingGates is an opaque list of values that if specified will block scheduling the pod.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.

SchedulingGates can only be set at pod creation time, and be removed only afterwards.
resourceClaimsoptional
ResourceClaims defines which ResourceClaims must be allocated
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.

This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.

This field is immutable.
resourcesoptional
Resources is the total amount of CPU and Memory resources required by all
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.

This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.

This is an alpha field and requires enabling the PodLevelResources feature
gate.
minReplicasoptional
integer
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
maxReplicasoptional
integer
Maximum number of replicas for autoscaling.
scaleTargetoptional
integer
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
concurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
scaleMetricoptional
ScaleMetric defines the scaling metric type watched by autoscaler.
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
scaleMetricTypeoptional
Type of metric to use. Options are Utilization, or AverageValue.
autoScalingoptional
AutoScaling autoscaling spec which is backed up HPA or KEDA.
containerConcurrencyoptional
integer
ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container
concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeoutoptional
integer
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
canaryTrafficPercentoptional
integer
CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision
loggeroptional
Activate request/response logging and logger configurations
batcheroptional
Activate request batching and batching configurations
labelsoptional
object (keys:string, values:string)
Labels that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotationsoptional
object (keys:string, values:string)
Annotations that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
deploymentStrategyoptional
The deployment strategy to use to replace existing pods with new ones. Only applicable for raw deployment mode.

ResourceConfig

Appears in:

Fields
cpuLimitrequired
string
memoryLimitrequired
string
cpuRequestrequired
string
memoryRequestrequired
string

ResourceMetric

Underlying type: string

Appears in:

ResourceMetric enum

Possible Values
cpu
memory

ResourceMetricSource

Appears in:

Fields
namerequired
name is the name of the resource in question.
targetrequired
target specifies the target value for the given metric

SKLearnSpec

Appears in:

SKLearnSpec defines arguments for configuring SKLearn model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

ScaleMetric

Underlying type: string

Appears in:

ScaleMetric enum

Possible Values
cpu
memory
concurrency
rps

StorageSpec

Appears in:

StorageSpec defines a spec for an object in an object store

Fields
pathoptional
string
The path to the object in the storage. Note that this path is relative to the storage URI.
parametersoptional
map[string]string
Parameters to override the default storage credentials and config.
keyoptional
string
The Storage Key in the secret for this object.

TFServingSpec

Appears in:

TFServingSpec defines arguments for configuring Tensorflow model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

TorchServeSpec

Appears in:

TorchServeSpec defines arguments for configuring PyTorch model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

TransformerSpec

Appears in:

TransformerSpec defines transformer service for pre/post processing

Fields
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
initContainersrequired
Container array
List of initialization containers belonging to the pod.
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
ephemeralContainersoptional
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
restartPolicyoptional
Restart policy for all containers within the pod.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSecondsoptional
integer
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
Value must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSecondsoptional
integer
Optional duration in seconds the pod may be active on the node relative to
StartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
dnsPolicyoptional
Set DNS policy for the pod.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountNameoptional
string
ServiceAccountName is the name of the ServiceAccount to use to run this pod.
More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccountoptional
string
DeprecatedServiceAccount is a deprecated alias for ServiceAccountName.
Deprecated: Use serviceAccountName instead.
automountServiceAccountTokenoptional
boolean
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
nodeNameoptional
string
NodeName indicates in which node this pod is scheduled.
If empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetworkoptional
boolean
Host networking requested for this pod. Use the host's network namespace.
If this option is set, the ports that will be used must be specified.
Default to false.
hostPIDoptional
boolean
Use the host's pid namespace.
Optional: Default to false.
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.
shareProcessNamespaceoptional
boolean
Share a single process namespace between all of the containers in a pod.
When this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
securityContextoptional
SecurityContext holds pod-level security attributes and common container settings.
Optional: Defaults to empty. See type description for default values of each field.
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostnameoptional
string
Specifies the hostname of the Pod
If not specified, the pod's hostname will be set to a system-defined value.
subdomainoptional
string
If specified, the fully qualified Pod hostname will be "...svc.".
If not specified, the pod will not have a domainname at all.
affinityoptional
If specified, the pod's scheduling constraints
schedulerNameoptional
string
If specified, the pod will be dispatched by specified scheduler.
If not specified, the pod will be dispatched by default scheduler.
tolerationsoptional
If specified, the pod's tolerations.
hostAliasesoptional
HostAlias array
HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts
file if specified.
priorityClassNameoptional
string
If specified, indicates the pod's priority. "system-node-critical" and
"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priorityoptional
integer
The priority value. Various system components use this field to find the
priority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
dnsConfigoptional
Specifies the DNS parameters of a pod.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
readinessGatesoptional
If specified, all readiness gates will be evaluated for pod readiness.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassNameoptional
string
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used
to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinksoptional
boolean
EnableServiceLinks indicates whether information about services should be injected into pod's
environment variables, matching the syntax of Docker links.
Optional: Defaults to true.
preemptionPolicyoptional
PreemptionPolicy is the Policy for preempting pods with lower priority.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
overheadoptional
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
topologySpreadConstraintsoptional
TopologySpreadConstraints describes how a group of pods ought to spread across topology
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDNoptional
boolean
If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default).
In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
osoptional
Specifies the OS of the containers in the pod.
Some pod and container fields are restricted if this is set.

If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions

If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsersoptional
boolean
Use the host's user namespace.
Optional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
schedulingGatesoptional
SchedulingGates is an opaque list of values that if specified will block scheduling the pod.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.

SchedulingGates can only be set at pod creation time, and be removed only afterwards.
resourceClaimsoptional
ResourceClaims defines which ResourceClaims must be allocated
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.

This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.

This field is immutable.
resourcesoptional
Resources is the total amount of CPU and Memory resources required by all
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.

This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.

This is an alpha field and requires enabling the PodLevelResources feature
gate.
minReplicasoptional
integer
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.
maxReplicasoptional
integer
Maximum number of replicas for autoscaling.
scaleTargetoptional
integer
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
concurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
scaleMetricoptional
ScaleMetric defines the scaling metric type watched by autoscaler.
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
scaleMetricTypeoptional
Type of metric to use. Options are Utilization, or AverageValue.
autoScalingoptional
AutoScaling autoscaling spec which is backed up HPA or KEDA.
containerConcurrencyoptional
integer
ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container
concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeoutoptional
integer
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.
canaryTrafficPercentoptional
integer
CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision
loggeroptional
Activate request/response logging and logger configurations
batcheroptional
Activate request batching and batching configurations
labelsoptional
object (keys:string, values:string)
Labels that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotationsoptional
object (keys:string, values:string)
Annotations that will be added to the component pod.
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
deploymentStrategyoptional
The deployment strategy to use to replace existing pods with new ones. Only applicable for raw deployment mode.

TransitionStatus

Underlying type: string

Appears in:

TransitionStatus enum

Possible Values
UpToDate
Predictor is up-to-date (reflects current spec)
InProgress
Waiting for target model to reach state of active model
BlockedByFailedLoad
Target model failed to load
InvalidSpec
Target predictor spec failed validation

TritonSpec

Appears in:

TritonSpec defines arguments for configuring Triton model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location

WorkerSpec

Appears in:

Fields
volumesoptional
Volume array
List of volumes that can be mounted by containers belonging to the pod.
More info: https://kubernetes.io/docs/concepts/storage/volumes
initContainersrequired
Container array
List of initialization containers belonging to the pod.
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
containersrequired
Container array
List of containers belonging to the pod.
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
ephemeralContainersoptional
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
restartPolicyoptional
Restart policy for all containers within the pod.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSecondsoptional
integer
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
Value must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSecondsoptional
integer
Optional duration in seconds the pod may be active on the node relative to
StartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
dnsPolicyoptional
Set DNS policy for the pod.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelectoroptional
object (keys:string, values:string)
NodeSelector is a selector which must be true for the pod to fit on a node.
Selector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountNameoptional
string
ServiceAccountName is the name of the ServiceAccount to use to run this pod.
More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccountoptional
string
DeprecatedServiceAccount is a deprecated alias for ServiceAccountName.
Deprecated: Use serviceAccountName instead.
automountServiceAccountTokenoptional
boolean
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
nodeNameoptional
string
NodeName indicates in which node this pod is scheduled.
If empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetworkoptional
boolean
Host networking requested for this pod. Use the host's network namespace.
If this option is set, the ports that will be used must be specified.
Default to false.
hostPIDoptional
boolean
Use the host's pid namespace.
Optional: Default to false.
hostIPCoptional
boolean
Use the host's ipc namespace.
Optional: Default to false.
shareProcessNamespaceoptional
boolean
Share a single process namespace between all of the containers in a pod.
When this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
securityContextoptional
SecurityContext holds pod-level security attributes and common container settings.
Optional: Defaults to empty. See type description for default values of each field.
imagePullSecretsoptional
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostnameoptional
string
Specifies the hostname of the Pod
If not specified, the pod's hostname will be set to a system-defined value.
subdomainoptional
string
If specified, the fully qualified Pod hostname will be "...svc.".
If not specified, the pod will not have a domainname at all.
affinityoptional
If specified, the pod's scheduling constraints
schedulerNameoptional
string
If specified, the pod will be dispatched by specified scheduler.
If not specified, the pod will be dispatched by default scheduler.
tolerationsoptional
If specified, the pod's tolerations.
hostAliasesoptional
HostAlias array
HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts
file if specified.
priorityClassNameoptional
string
If specified, indicates the pod's priority. "system-node-critical" and
"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priorityoptional
integer
The priority value. Various system components use this field to find the
priority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
dnsConfigoptional
Specifies the DNS parameters of a pod.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
readinessGatesoptional
If specified, all readiness gates will be evaluated for pod readiness.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassNameoptional
string
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used
to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinksoptional
boolean
EnableServiceLinks indicates whether information about services should be injected into pod's
environment variables, matching the syntax of Docker links.
Optional: Defaults to true.
preemptionPolicyoptional
PreemptionPolicy is the Policy for preempting pods with lower priority.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
overheadoptional
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
topologySpreadConstraintsoptional
TopologySpreadConstraints describes how a group of pods ought to spread across topology
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDNoptional
boolean
If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default).
In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
osoptional
Specifies the OS of the containers in the pod.
Some pod and container fields are restricted if this is set.

If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions

If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsersoptional
boolean
Use the host's user namespace.
Optional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
schedulingGatesoptional
SchedulingGates is an opaque list of values that if specified will block scheduling the pod.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.

SchedulingGates can only be set at pod creation time, and be removed only afterwards.
resourceClaimsoptional
ResourceClaims defines which ResourceClaims must be allocated
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.

This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.

This field is immutable.
resourcesoptional
Resources is the total amount of CPU and Memory resources required by all
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.

This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.

This is an alpha field and requires enabling the PodLevelResources feature
gate.
pipelineParallelSizeoptional
integer
PipelineParallelSize defines the number of parallel workers.
It also represents the number of replicas in the worker set, where each worker set serves as a scaling unit.
tensorParallelSizeoptional
integer
TensorParallelSize specifies the number of GPUs to be used per node.
It indicates the degree of parallelism for tensor computations across the available GPUs.

XGBoostSpec

Appears in:

XGBoostSpec defines arguments for configuring XGBoost model serving.

Fields
storageUrioptional
string
This field points to the location of the trained model which is mounted onto the pod.
runtimeVersionoptional
string
Runtime version of the predictor docker image
protocolVersionoptional
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)
storageoptional
Storage Spec for model location