Control Plane API
serving.kserve.io/v1alpha1
Package v1alpha1 contains API Schema definitions for the serving v1alpha1 API group
Package v1alpha1 contains API Schema definitions for the serving v1alpha1 API group
Resource Kinds
Available Kinds
- ClusterServingRuntime
- ClusterServingRuntimeList
- ClusterStorageContainer
- ClusterStorageContainerList
- InferenceGraph
- InferenceGraphList
- LLMInferenceService
- LLMInferenceServiceConfig
- LLMInferenceServiceConfigList
- LLMInferenceServiceList
- LocalModelCache
- LocalModelCacheList
- LocalModelNode
- LocalModelNodeGroup
- LocalModelNodeGroupList
- LocalModelNodeList
- ServingRuntime
- ServingRuntimeList
- TrainedModel
- TrainedModelList
Kind Definitions
ClusterServingRuntime
Appears in:
ClusterServingRuntime is the Schema for the servingruntimes API
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredClusterServingRuntime
resourceClusterServingRuntimeList
ClusterServingRuntimeList contains a list of ServingRuntime
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredClusterServingRuntimeList
resourceClusterStorageContainer
Appears in:
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredClusterStorageContainer
resourcedisabled
optionalClusterStorageContainerList
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredClusterStorageContainerList
resourceInferenceGraph
Appears in:
InferenceGraph is the Schema for the InferenceGraph API for multiple models
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredInferenceGraph
resourceInferenceGraphList
InferenceGraphList contains a list of InferenceGraph
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredInferenceGraphList
resourceLLMInferenceService
Appears in:
LLMInferenceService is the Schema for the llminferenceservices API, representing a single LLM deployment. It orchestrates the creation of underlying Kubernetes resources like Deployments and Services, and configures networking for exposing the model.
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLLMInferenceService
resourceLLMInferenceServiceConfig
Appears in:
LLMInferenceServiceConfig is the Schema for the llminferenceserviceconfigs API. It acts as a template to provide base configurations that can be inherited by multiple LLMInferenceService instances.
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLLMInferenceServiceConfig
resourceLLMInferenceServiceConfigList
LLMInferenceServiceConfigList is the list type for LLMInferenceServiceConfig.
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLLMInferenceServiceConfigList
resourceLLMInferenceServiceList
LLMInferenceServiceList is the list type for LLMInferenceService.
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLLMInferenceServiceList
resourceLocalModelCache
Appears in:
LocalModelCache
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLocalModelCache
resourceLocalModelCacheList
LocalModelCacheList
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLocalModelCacheList
resourceLocalModelNode
Appears in:
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLocalModelNode
resourceLocalModelNodeGroup
Appears in:
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLocalModelNodeGroup
resourceLocalModelNodeGroupList
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLocalModelNodeGroupList
resourceLocalModelNodeList
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredLocalModelNodeList
resourceServingRuntime
Appears in:
ServingRuntime is the Schema for the servingruntimes API
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredServingRuntime
resourceServingRuntimeList
ServingRuntimeList contains a list of ServingRuntime
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredServingRuntimeList
resourceTrainedModel
Appears in:
TrainedModel is the Schema for the TrainedModel API
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredTrainedModel
resourceTrainedModelList
TrainedModelList contains a list of TrainedModel
Fields
apiVersion
requiredserving.kserve.io/v1alpha1
of the API.kind
requiredTrainedModelList
resourceSupporting Types
Available Types
- BuiltInAdapter
- GatewayRoutesSpec
- GatewaySpec
- HTTPRouteSpec
- InfereceGraphRouterTimeouts
- InferenceGraphSpec
- InferenceGraphStatus
- InferenceGraphValidator
- InferencePoolSpec
- InferenceRouter
- InferenceRouterType
- InferenceStep
- InferenceStepDependencyType
- InferenceTarget
- IngressSpec
- LLMInferenceServiceSpec
- LLMInferenceServiceStatus
- LLMModelSpec
- LLMStorageSpec
- LoRASpec
- LocalModelCacheSpec
- LocalModelCacheStatus
- LocalModelInfo
- LocalModelNodeGroupSpec
- LocalModelNodeGroupStatus
- LocalModelNodeSpec
- LocalModelNodeStatus
- ModelCopies
- ModelSpec
- ModelStatus
- NamespacedName
- NodeStatus
- ParallelismSpec
- RouterSpec
- ScaleMetric
- SchedulerSpec
- ServerType
- ServingRuntimePodSpec
- ServingRuntimeSpec
- ServingRuntimeStatus
- StorageContainerSpec
- StorageHelper
- SupportedModelFormat
- SupportedRuntime
- SupportedUriFormat
- TrainedModelSpec
- TrainedModelStatus
- TrainedModelValidator
- UntypedObjectReference
- WorkerSpec
- WorkloadSpec
- WorkloadType
Type Definitions
BuiltInAdapter
Appears in:
Fields
and the runtime's container must have the same name
runtimeManagementPort
requiredmemBufferBytes
requiredmodelLoadingTimeoutMillis
requiredGatewayRoutesSpec
Appears in:
GatewayRoutesSpec defines the configuration for a Gateway API route.
Fields
GatewaySpec
Appears in:
GatewaySpec defines the configuration for a Gateway API Gateway.
Fields
The controller will use the specified Gateway instead of creating one.
HTTPRouteSpec
Appears in:
HTTPRouteSpec defines configurations for a Gateway API HTTPRoute. 'Spec' and 'Refs' are mutually exclusive and determine whether the route is managed by the controller or user-managed.
Fields
The controller will validate the existence of these routes but will not modify them.
If provided, the controller will create and manage an HTTPRoute with this specification.
InfereceGraphRouterTimeouts
Appears in:
Fields
serverRead
optionalserverWrite
optionalserverIdle
optionalserviceClient
optionalInferenceGraphSpec
Appears in:
InferenceGraphSpec defines the InferenceGraph spec
Fields
Each node defines the router which can be different routing types
timeout
optionalminReplicas
optionalmaxReplicas
optionalscaleTarget
optionalconcurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
nodeSelector
optionalhttps://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
nodeName
optionalhttps://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
serviceAccountName
optionalhttps://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
InferenceGraphStatus
Appears in:
InferenceGraphStatus defines the InferenceGraph conditions and status
Fields
observedGeneration
optionalwas last processed by the controller.
annotations
requiredadditional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
deploymentMode
requiredInferencePoolSpec
Appears in:
InferencePoolSpec defines the configuration for an InferencePool. 'Spec' and 'Ref' are mutually exclusive.
Fields
InferenceRouter
Appears in:
InferenceRouter defines the router for each InferenceGraph node with one or multiple steps
kind: InferenceGraph
metadata:
name: canary-route
spec:
nodes:
root:
routerType: Splitter
routes:
- service: mymodel1
weight: 20
- service: mymodel2
weight: 80
kind: InferenceGraph
metadata:
name: abtest
spec:
nodes:
mymodel:
routerType: Switch
routes:
- service: mymodel1
condition: "{ .input.userId == 1 }"
- service: mymodel2
condition: "{ .input.userId == 2 }"
Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods.
Tree Ensemble constitutes a case where simple algorithms for combining results of either classification or regression trees are well known. Multiple classification trees, for example, are commonly combined using a "majority-vote" method. Multiple regression trees are often combined using various averaging techniques. e.g tagging models with segment identifiers and weights to be used for their combination in these ways.
kind: InferenceGraph
metadata:
name: ensemble
spec:
nodes:
root:
routerType: Sequence
routes:
- service: feast
- nodeName: ensembleModel
data: $response
ensembleModel:
routerType: Ensemble
routes:
- service: sklearn-model
- service: xgboost-model
Scoring a case using a sequence, or chain of models allows the output of one model to be passed in as input to the subsequent models.
kind: InferenceGraph
metadata:
name: model-chainer
spec:
nodes:
root:
routerType: Sequence
routes:
- service: mymodel-s1
- service: mymodel-s2
data: $response
- service: mymodel-s3
data: $response
In the flow described below, the pre_processing node base64 encodes the image and passes it to two model nodes in the flow. The encoded data is available to both these nodes for classification. The second node i.e. dog-breed-classification takes the original input from the pre_processing node along-with the response from the cat-dog-classification node to do further classification of the dog breed if required.
kind: InferenceGraph
metadata:
name: dog-breed-classification
spec:
nodes:
root:
routerType: Sequence
routes:
- service: cat-dog-classifier
- nodeName: breed-classifier
data: $request
breed-classifier:
routerType: Switch
routes:
- service: dog-breed-classifier
condition: { .predictions.class == "dog" }
- service: cat-breed-classifier
condition: { .predictions.class == "cat" }
Fields
-
Sequence:
chain multiple inference steps with input/output from previous step-
Splitter:
randomly routes to the target service according to the weight-
Ensemble:
routes the request to multiple models and then merge the responses-
Switch:
routes the request to one of the steps based on conditionInferenceRouterType
Underlying type: string
Appears in:
InferenceRouterType constant for inference routing types
Possible Values
Sequence
Splitter
Ensemble
Switch
InferenceStep
Appears in:
InferenceStep defines the inference target of the current step with condition, weights and data.
Fields
name
optionalnodeName
optionalserviceName
requiredserviceUrl
optionaldata
optional$request
$response.predictions
weight
optionalwhen weight is specified all the routing targets should be sum to 100
condition
optionalInferenceStepDependencyType
Underlying type: string
Appears in:
InferenceStepDependencyType constant for inference step dependency
Possible Values
Soft
Hard
InferenceTarget
Appears in:
Exactly one InferenceTarget field must be specified
Fields
nodeName
optionalserviceName
requiredserviceUrl
optionalIngressSpec
Appears in:
IngressSpec defines the configuration for a Kubernetes Ingress.
Fields
The controller will not create an Ingress but will use the referenced one to populate status URLs.
LLMInferenceServiceSpec
Appears in:
LLMInferenceServiceSpec defines the desired state of LLMInferenceService.
Fields
replicas
optionalThese values are used to configure the underlying inference runtime (e.g., vLLM).
In a multi-node deployment, this configures the "head" or "master" pod.
In a disaggregated deployment, this configures the "decode" pod if it's the top-level template,
or the "prefill" pod if it's within the Prefill block.
The presence of this field triggers the creation of a multi-node (distributed) setup.
This spec defines the configuration for the worker pods, while the main 'Template' field defines the head pod.
The controller is responsible for enabling discovery between head and worker pods.
of networking resources like Ingress or Gateway API objects (HTTPRoute, Gateway).
When this section is included, the controller creates a separate deployment for prompt processing (prefill)
in addition to the main 'decode' deployment, inspired by the llm-d architecture.
This allows for independent scaling and hardware allocation for prefill and decode steps.
The controller merges these base configurations, with the current LLMInferenceService spec taking the highest precedence.
When multiple baseRefs are provided, the last one in the list overrides previous ones.
LLMInferenceServiceStatus
Appears in:
LLMInferenceServiceStatus defines the observed state of LLMInferenceService.
Fields
observedGeneration
optionalwas last processed by the controller.
annotations
requiredadditional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
If Addresses is present, Address will be ignored by clients.
If Addresses is present, Address must be ignored by clients.
LLMModelSpec
Appears in:
LLMModelSpec defines the model source and its characteristics.
Fields
The storage-initializer init container uses this URI to download the model.
name
optionalIf omitted, it will default to
metadata.name
. For LoRA adapters, this field is required.This is used by the Inference Gateway scheduler.
Allows for specifying one or more LoRA adapters to be applied to the base model.
This is used by the storage-initializer to correctly download the model from the specified URI.
LLMStorageSpec
Appears in:
LLMStorageSpec is a copy of the v1beta1.StorageSpec. It is duplicated here to avoid import cycles between the v1alpha1 and v1beta1 API packages.
Fields
path
optionalwith the storageURI.
parameters
optionalkey
optionalLoRASpec
Appears in:
LoRASpec defines the configuration for LoRA adapters.
Fields
Each adapter is defined by its own ModelSpec.
LocalModelCacheSpec
Appears in:
LocalModelCacheSpec
Fields
sourceModelUri
requirednodeGroups
requiredTodo: support more than 1 node groups
LocalModelCacheStatus
Appears in:
Fields
LocalModelInfo
Appears in:
Fields
sourceModelUri
requiredmodelName
requiredLocalModelNodeGroupSpec
Appears in:
LocalModelNodeGroupSpec defines a group of nodes for to download the model to.
Fields
LocalModelNodeGroupStatus
Appears in:
Fields
LocalModelNodeSpec
Appears in:
Fields
LocalModelNodeStatus
Appears in:
Fields
ModelCopies
Appears in:
Fields
available
requiredtotal
requiredfailed
requiredModelSpec
Appears in:
ModelSpec describes a TrainedModel
Fields
storageUri
requiredframework
requiredThe values could be: "tensorflow","pytorch","sklearn","onnx","xgboost", "myawesomeinternalframework" etc.
ModelStatus
Underlying type: string
Appears in:
ModelStatus enum
Possible Values
ModelDownloadPending
ModelDownloading
ModelDownloaded
ModelDownloadError
NamespacedName
Appears in:
Fields
namespace
requiredname
requiredNodeStatus
Underlying type: string
Appears in:
NodeStatus enum
Possible Values
NodeNotReady
NodeDownloadPending
NodeDownloading
NodeDownloaded
NodeDownloadError
ParallelismSpec
Appears in:
ParallelismSpec defines the parallelism parameters for distributed inference.
Fields
tensor
optionalpipeline
optionalRouterSpec
Appears in:
RouterSpec defines the routing configuration for exposing the service. It supports Kubernetes Ingress and the Gateway API. The fields are mutually exclusive.
Fields
If an empty object
\{\}
is provided, the controller creates and manages a new HTTPRoute.If an empty object
\{\}
is provided, the controller uses a default Gateway.This must be used in conjunction with the 'Route' field for managed Gateway API resources.
If an empty object
\{\}
is provided, the controller creates and manages a default Ingress resource.If this field is non-empty, an InferenceModel resource will be created to integrate with the gateway's scheduler.
ScaleMetric
Underlying type: string
Appears in:
ScaleMetric enum
SchedulerSpec
Appears in:
SchedulerSpec defines the Inference Gateway extension configuration.
The SchedulerSpec configures the connection from the Gateway to the model deployment leveraging the LLM optimized request Scheduler, also known as the Endpoint Picker (EPP) which determines the exact pod that should handle the request and responds back to Envoy with the target pod, Envoy will then forward the request to the chosen pod.
The Scheduler is only effective when having multiple inference pod replicas.
Step 1: Gateway (Envoy) <-- ExtProc --> EPP (select the optimal replica to handle the request) Step 2: Gateway (Envoy) <-- forward request --> Inference Pod X
Fields
This configures the Endpoint Picker (EPP) Deployment.
ServerType
Underlying type: string
Appears in:
ServerType constant for specifying the runtime name
Possible Values
triton
mlserver
ovms
ServingRuntimePodSpec
Appears in:
Fields
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/storage/volumes
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
labels
optionalMore info: http://kubernetes.io/docs/user-guide/labels
annotations
optionalMore info: http://kubernetes.io/docs/user-guide/annotations
If specified, these secrets will be passed to individual puller implementations for them to use. For example,
in the case of docker, only DockerConfig type secrets are honored.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostIPC
optionalOptional: Default to false.
ServingRuntimeSpec
Appears in:
ServingRuntimeSpec defines the desired state of ServingRuntime. This spec is currently provisional and are subject to change as details regarding single-model serving and multi-model serving are hammered out.
Fields
multiModel
optionaldisabled
optionalprotocolVersions
optionalContainers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/storage/volumes
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
labels
optionalMore info: http://kubernetes.io/docs/user-guide/labels
annotations
optionalMore info: http://kubernetes.io/docs/user-guide/annotations
If specified, these secrets will be passed to individual puller implementations for them to use. For example,
in the case of docker, only DockerConfig type secrets are honored.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostIPC
optionalOptional: Default to false.
grpcEndpoint
optionalAssumed to be single-model runtime if omitted
grpcDataEndpoint
optionalhttpDataEndpoint
optionalreplicas
optionalIf specified, this overrides the podsPerRuntime configuration value
It is enabled unless explicitly disabled
ServingRuntimeStatus
Appears in:
ServingRuntimeStatus defines the observed state of ServingRuntime
StorageContainerSpec
Appears in:
StorageContainerSpec defines the container spec for the storage initializer init container, and the protocols it supports.
Fields
StorageHelper
Appears in:
Fields
disabled
optionalSupportedModelFormat
Appears in:
Fields
name
requiredversion
optionalUsed in validating that a predictor is supported by a runtime.
Can be "major", "major.minor" or "major.minor.patch".
autoSelect
optionalthis model format is specified with no explicit runtime.
priority
optionalThis is used to select the serving runtime if more than one serving runtime supports the same model format.
The value should be greater than zero. The higher the value, the higher the priority.
Priority is not considered if AutoSelect is either false or not specified.
Priority can be overridden by specifying the runtime in the InferenceService.
SupportedUriFormat
Appears in:
SupportedUriFormat can be either prefix or regex. Todo: Add validation that only one of them is set.
Fields
prefix
requiredregex
requiredTrainedModelSpec
Appears in:
TrainedModelSpec defines the TrainedModel spec
Fields
inferenceService
requiredTrainedModelStatus
Appears in:
TrainedModelStatus defines the observed state of TrainedModel
Fields
observedGeneration
optionalwas last processed by the controller.
annotations
requiredadditional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
For v1: http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}/v1/models/
For v2: http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}/v2/models/
http://
UntypedObjectReference
Appears in:
UntypedObjectReference is a reference to an object without a specific Group/Version/Kind. It's used for referencing networking resources like Gateways and Ingresses where the exact type might be inferred or is not strictly required by this controller.
Fields
WorkerSpec
Appears in:
WorkerSpec is the schema for multi-node/multi-GPU feature
Fields
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/storage/volumes
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
labels
optionalMore info: http://kubernetes.io/docs/user-guide/labels
annotations
optionalMore info: http://kubernetes.io/docs/user-guide/annotations
If specified, these secrets will be passed to individual puller implementations for them to use. For example,
in the case of docker, only DockerConfig type secrets are honored.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostIPC
optionalOptional: Default to false.
pipelineParallelSize
optionalIt specifies the number of model partitions across multiple devices, allowing large models to be split and processed concurrently across these partitions
It also represents the number of replicas in the worker set, where each worker set serves as a scaling unit.
tensorParallelSize
optionalIt indicates the degree of parallelism for tensor computations across the available GPUs.
WorkloadSpec
Appears in:
WorkloadSpec defines the configuration for a deployment workload, such as replicas and pod specifications.
Fields
replicas
optionalThese values are used to configure the underlying inference runtime (e.g., vLLM).
In a multi-node deployment, this configures the "head" or "master" pod.
In a disaggregated deployment, this configures the "decode" pod if it's the top-level template,
or the "prefill" pod if it's within the Prefill block.
The presence of this field triggers the creation of a multi-node (distributed) setup.
This spec defines the configuration for the worker pods, while the main 'Template' field defines the head pod.
The controller is responsible for enabling discovery between head and worker pods.
WorkloadType
Underlying type: string
Appears in:
Possible Values
initContainer
localModelDownloadJob
serving.kserve.io/v1beta1
Package v1beta1 contains API Schema definitions for the serving v1beta1 API group
Package v1beta1 contains API Schema definitions for the serving v1beta1 API group
Resource Kinds
Available Kinds
Kind Definitions
InferenceService
Appears in:
InferenceService is the Schema for the InferenceServices API
Fields
apiVersion
requiredserving.kserve.io/v1beta1
of the API.kind
requiredInferenceService
resourceInferenceServiceList
InferenceServiceList contains a list of Service
Fields
apiVersion
requiredserving.kserve.io/v1beta1
of the API.kind
requiredInferenceServiceList
resourceSupporting Types
Available Types
- ARTExplainerSpec
- ARTExplainerType
- AuthenticationRef
- AutoScalingSpec
- Batcher
- Component
- ComponentExtensionSpec
- ComponentImplementation
- ComponentStatusSpec
- ComponentType
- CustomExplainer
- CustomPredictor
- CustomTransformer
- DeployConfig
- ExplainerConfig
- ExplainerExtensionSpec
- ExplainerSpec
- ExplainersConfig
- ExtMetricAuthentication
- ExternalMetricSource
- ExternalMetrics
- FailureInfo
- FailureReason
- HuggingFaceRuntimeSpec
- InferenceServiceDefaulter
- InferenceServiceSpec
- InferenceServiceStatus
- InferenceServiceValidator
- InferenceServicesConfig
- IngressConfig
- LightGBMSpec
- LocalModelConfig
- LoggerSpec
- LoggerStorageSpec
- LoggerType
- MetricSourceType
- MetricTarget
- MetricTargetType
- MetricsBackend
- MetricsSpec
- ModelCopies
- ModelFormat
- ModelRevisionStates
- ModelSpec
- ModelState
- ModelStatus
- ModelStorageSpec
- MultiNodeConfig
- ONNXRuntimeSpec
- OtelCollectorConfig
- PMMLSpec
- PaddleServerSpec
- PodMetricSource
- PodMetrics
- PodSpec
- PodsMetricsBackend
- PredictorExtensionSpec
- PredictorImplementation
- PredictorSpec
- ResourceConfig
- ResourceMetric
- ResourceMetricSource
- SKLearnSpec
- ScaleMetric
- SecurityConfig
- ServiceConfig
- StorageSpec
- TFServingSpec
- TorchServeSpec
- TransformerSpec
- TransitionStatus
- TritonSpec
- WorkerSpec
- XGBoostSpec
Type Definitions
ARTExplainerSpec
Appears in:
ARTExplainerType defines the arguments for configuring an ART Explanation Server
Fields
storageUri
requiredruntimeVersion
requiredconfig
requiredARTExplainerType
Underlying type: string
Appears in:
Possible Values
SquareAttack
AuthenticationRef
Appears in:
Fields
name
requiredAutoScalingSpec
Appears in:
Fields
Batcher
Appears in:
Batcher specifies optional payload batching available for all components
Fields
maxBatchSize
optionalmaxLatency
optionaltimeout
optionalComponentExtensionSpec
Appears in:
ComponentExtensionSpec defines the deployment configuration for a given InferenceService component
Fields
minReplicas
optionalmaxReplicas
optionalscaleTarget
optionalconcurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
containerConcurrency
optionalconcurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeout
optionalcanaryTrafficPercent
optionallabels
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotations
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
ComponentStatusSpec
Appears in:
ComponentStatusSpec describes the state of the component
Fields
latestReadyRevision
optionallatestCreatedRevision
optionalpreviousRolledoutRevision
optionallatestRolledoutRevision
optionaltraffic
optionalThis will be one the REST or gRPC endpoints that are available.
It generally has the form http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}
ComponentType
Underlying type: string
Appears in:
ComponentType contains the different types of components of the service
Possible Values
predictor
explainer
transformer
ExplainerConfig
Appears in:
Fields
image
requireddefaultImageVersion
requiredExplainerExtensionSpec
Appears in:
ExplainerExtensionSpec defines configuration shared across all explainer frameworks
Fields
storageUri
requiredruntimeVersion
requiredconfig
requiredExplainerSpec
Appears in:
ExplainerSpec defines the container spec for a model explanation server, The following fields follow a "1-of" semantic. Users must specify exactly one spec.
Fields
More info: https://kubernetes.io/docs/concepts/storage/volumes
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSeconds
optionalValue must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSeconds
optionalStartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountName
optionalMore info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount
optionalDeprecated: Use serviceAccountName instead.
automountServiceAccountToken
optionalnodeName
optionalIf empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetwork
optionalIf this option is set, the ports that will be used must be specified.
Default to false.
hostPID
optionalOptional: Default to false.
hostIPC
optionalOptional: Default to false.
shareProcessNamespace
optionalWhen this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
Optional: Defaults to empty. See type description for default values of each field.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostname
optionalIf not specified, the pod's hostname will be set to a system-defined value.
subdomain
optionalIf not specified, the pod will not have a domainname at all.
schedulerName
optionalIf not specified, the pod will be dispatched by default scheduler.
file if specified.
priorityClassName
optional"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priority
optionalpriority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassName
optionalto run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinks
optionalenvironment variables, matching the syntax of Docker links.
Optional: Defaults to true.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDN
optionalIn Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
Some pod and container fields are restricted if this is set.
If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions
If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsers
optionalOptional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.
SchedulingGates can only be set at pod creation time, and be removed only afterwards.
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.
This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.
This field is immutable.
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.
This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.
This is an alpha field and requires enabling the PodLevelResources feature
gate.
minReplicas
optionalmaxReplicas
optionalscaleTarget
optionalconcurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
containerConcurrency
optionalconcurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeout
optionalcanaryTrafficPercent
optionallabels
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotations
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
ExplainersConfig
Appears in:
Fields
ExtMetricAuthentication
Appears in:
Fields
for more information see: https://keda.sh/docs/2.17/scalers/prometheus/#authentication-parameters
authModes
optionalpossible values are bearer, basic, tls.
for more information see: https://keda.sh/docs/2.17/scalers/prometheus/#authentication-parameters
ExternalMetricSource
Appears in:
Fields
for more information see: https://keda.sh/docs/2.17/scalers/prometheus/#authentication-parameters
ExternalMetrics
Appears in:
Fields
possible values are prometheus, graphite.
serverAddress
optionalquery
optionalnamespace
optionalFailureInfo
Appears in:
Fields
location
optionalmessage
optionalmodelRevisionName
optionalexitCode
optionalFailureReason
Underlying type: string
Appears in:
FailureReason enum
Possible Values
ModelLoadFailed
RuntimeUnhealthy
RuntimeDisabled
NoSupportingRuntime
RuntimeNotRecognized
InvalidPredictorSpec
HuggingFaceRuntimeSpec
Appears in:
HuggingFaceRuntimeSpec defines arguments for configuring HuggingFace model serving.
Fields
storageUri
optionalruntimeVersion
optionalInferenceServiceSpec
Appears in:
InferenceServiceSpec is the top level type for this resource
Fields
explainer service calls to predictor or transformer if it is specified.
transformer service calls to predictor service.
InferenceServiceStatus
Appears in:
InferenceServiceStatus defines the observed state of InferenceService
Fields
observedGeneration
optionalwas last processed by the controller.
annotations
requiredadditional State as well as convey more information to the user. This is
roughly akin to Annotations on any k8s resource, just the reconciler conveying
richer information outwards.
It generally has the form http[s]://\{route-name\}.\{route-namespace\}.\{cluster-level-suffix\}
deploymentMode
requiredservingRuntimeName
requiredclusterServingRuntimeName
requiredLightGBMSpec
Appears in:
LightGBMSpec defines arguments for configuring LightGBMSpec model serving.
Fields
storageUri
optionalruntimeVersion
optionalLoggerSpec
Appears in:
LoggerSpec specifies optional payload logging available for all components
Fields
url
optionalValid values are:
- "all" (default): log both request and response;
- "request": log only request;
- "response": log only response
metadataHeaders
optionalmetadataAnnotations
optionalLoggerStorageSpec
Appears in:
Fields
path
optionalparameters
optionalkey
optionalserviceAccountName
requiredLoggerType
Underlying type: string
Appears in:
LoggerType controls the scope of log publishing
Possible Values
all
request
response
MetricSourceType
Underlying type: string
Appears in:
MetricSourceType indicates the type of metric.
Possible Values
Resource
specified in requests and limits, describing each pod in the current
scale target (e.g. CPU or memory). Such metrics are built in to
Kubernetes, and have special scaling options on top of those available
to normal per-pod metrics (the "pods" source).
External
with any Kubernetes object. It allows autoscaling based on information
coming from components running outside of cluster
(for example length of queue in cloud messaging service, or
QPS from loadbalancer running outside of cluster).
PodMetric
scale target (for example, transactions-processed-per-second). The values
will be averaged together before being compared to the target value.
MetricTarget
Appears in:
MetricTarget defines the target value, average value, or average utilization of a specific metric
Fields
metric across all relevant pods (as a quantity)
averageUtilization
optionalresource metric across all relevant pods, represented as a percentage of
the requested value of the resource for the pods.
Currently only valid for Resource metric source type
MetricTargetType
Underlying type: string
Appears in:
MetricTargetType specifies the type of metric being targeted, and should be either "Value", "AverageValue", or "Utilization"
Possible Values
Utilization
Value
AverageValue
MetricsBackend
Underlying type: string
Appears in:
MetricsBackend enum
Possible Values
prometheus
graphite
MetricsSpec
Appears in:
MetricsSpec specifies how to scale based on a single metric
(only type
and one other matching field should be set at once).
Fields
"Resource" or "External" each mapping to a matching field in the object.
requests and limits) known to Kubernetes describing each pod in the
current scale target (e.g. CPU or memory). Such metrics are built in to
Kubernetes, and have special scaling options on top of those available
to normal per-pod metrics using the "pods" source.
with any Kubernetes object. It allows autoscaling based on information
coming from components running outside of cluster
(for example length of queue in cloud messaging service, or
QPS from load balancer running outside of cluster).
(for example, transactions-processed-per-second). The values will be
averaged together before being compared to the target value.
ModelCopies
Appears in:
Fields
failedCopies
requiredtotalCopies
optionalModelFormat
Appears in:
Fields
name
requiredversion
optionalUsed in validating that a predictor is supported by a runtime.
Can be "major", "major.minor" or "major.minor.patch".
ModelRevisionStates
Appears in:
Fields
ModelSpec
Appears in:
Fields
runtime
optionalstorageUri
optionalruntimeVersion
optionalModelState
Underlying type: string
Appears in:
ModelState enum
Possible Values
Pending
Standby
Loading
Loaded
FailedToLoad
ModelStatus
Appears in:
Fields
ModelStorageSpec
Appears in:
- ARTExplainerSpec
- ExplainerExtensionSpec
- HuggingFaceRuntimeSpec
- LightGBMSpec
- ModelSpec
- ONNXRuntimeSpec
- PMMLSpec
- PaddleServerSpec
- PredictorExtensionSpec
- SKLearnSpec
- TFServingSpec
- TorchServeSpec
- TritonSpec
- XGBoostSpec
Fields
path
optionalparameters
optionalkey
optionalschemaPath
optionalONNXRuntimeSpec
Appears in:
ONNXRuntimeSpec defines arguments for configuring ONNX model serving.
Fields
storageUri
optionalruntimeVersion
optionalPMMLSpec
Appears in:
PMMLSpec defines arguments for configuring PMML model serving.
Fields
storageUri
optionalruntimeVersion
optionalPaddleServerSpec
Appears in:
Fields
storageUri
optionalruntimeVersion
optionalPodMetricSource
Appears in:
PodMetricSource indicates how to scale on a metric describing each pod in the current scale target (for example, transactions-processed-per-second). The values will be averaged together before being compared to the target value.
Fields
PodMetrics
Appears in:
Fields
Possible value: opentelemetry.
serverAddress
optionalmetricNames
optionalquery
optionaloperationOverTime
optionalPossible values are last_one, avg, max, min, rate, count. Default is 'last_one'.
PodSpec
Appears in:
PodSpec is a description of a pod.
Fields
More info: https://kubernetes.io/docs/concepts/storage/volumes
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSeconds
optionalValue must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSeconds
optionalStartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountName
optionalMore info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount
optionalDeprecated: Use serviceAccountName instead.
automountServiceAccountToken
optionalnodeName
optionalIf empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetwork
optionalIf this option is set, the ports that will be used must be specified.
Default to false.
hostPID
optionalOptional: Default to false.
hostIPC
optionalOptional: Default to false.
shareProcessNamespace
optionalWhen this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
Optional: Defaults to empty. See type description for default values of each field.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostname
optionalIf not specified, the pod's hostname will be set to a system-defined value.
subdomain
optionalIf not specified, the pod will not have a domainname at all.
schedulerName
optionalIf not specified, the pod will be dispatched by default scheduler.
file if specified.
priorityClassName
optional"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priority
optionalpriority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassName
optionalto run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinks
optionalenvironment variables, matching the syntax of Docker links.
Optional: Defaults to true.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDN
optionalIn Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
Some pod and container fields are restricted if this is set.
If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions
If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsers
optionalOptional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.
SchedulingGates can only be set at pod creation time, and be removed only afterwards.
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.
This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.
This field is immutable.
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.
This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.
This is an alpha field and requires enabling the PodLevelResources feature
gate.
PodsMetricsBackend
Underlying type: string
Appears in:
PodsMetricsBackend enum
Possible Values
opentelemetry
PredictorExtensionSpec
Appears in:
- HuggingFaceRuntimeSpec
- LightGBMSpec
- ModelSpec
- ONNXRuntimeSpec
- PMMLSpec
- PaddleServerSpec
- SKLearnSpec
- TFServingSpec
- TorchServeSpec
- TritonSpec
- XGBoostSpec
PredictorExtensionSpec defines configuration shared across all predictor frameworks
Fields
storageUri
optionalruntimeVersion
optionalPredictorSpec
Appears in:
PredictorSpec defines the configuration for a predictor, The following fields follow a "1-of" semantic. Users must specify exactly one spec.
Fields
More info: https://kubernetes.io/docs/concepts/storage/volumes
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSeconds
optionalValue must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSeconds
optionalStartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountName
optionalMore info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount
optionalDeprecated: Use serviceAccountName instead.
automountServiceAccountToken
optionalnodeName
optionalIf empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetwork
optionalIf this option is set, the ports that will be used must be specified.
Default to false.
hostPID
optionalOptional: Default to false.
hostIPC
optionalOptional: Default to false.
shareProcessNamespace
optionalWhen this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
Optional: Defaults to empty. See type description for default values of each field.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostname
optionalIf not specified, the pod's hostname will be set to a system-defined value.
subdomain
optionalIf not specified, the pod will not have a domainname at all.
schedulerName
optionalIf not specified, the pod will be dispatched by default scheduler.
file if specified.
priorityClassName
optional"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priority
optionalpriority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassName
optionalto run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinks
optionalenvironment variables, matching the syntax of Docker links.
Optional: Defaults to true.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDN
optionalIn Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
Some pod and container fields are restricted if this is set.
If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions
If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsers
optionalOptional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.
SchedulingGates can only be set at pod creation time, and be removed only afterwards.
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.
This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.
This field is immutable.
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.
This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.
This is an alpha field and requires enabling the PodLevelResources feature
gate.
minReplicas
optionalmaxReplicas
optionalscaleTarget
optionalconcurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
containerConcurrency
optionalconcurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeout
optionalcanaryTrafficPercent
optionallabels
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotations
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
ResourceConfig
Appears in:
Fields
cpuLimit
requiredmemoryLimit
requiredcpuRequest
requiredmemoryRequest
requiredResourceMetric
Underlying type: string
Appears in:
ResourceMetric enum
Possible Values
cpu
memory
ResourceMetricSource
Appears in:
Fields
SKLearnSpec
Appears in:
SKLearnSpec defines arguments for configuring SKLearn model serving.
Fields
storageUri
optionalruntimeVersion
optionalScaleMetric
Underlying type: string
Appears in:
ScaleMetric enum
Possible Values
cpu
memory
concurrency
rps
StorageSpec
Appears in:
StorageSpec defines a spec for an object in an object store
Fields
path
optionalparameters
optionalkey
optionalTFServingSpec
Appears in:
TFServingSpec defines arguments for configuring Tensorflow model serving.
Fields
storageUri
optionalruntimeVersion
optionalTorchServeSpec
Appears in:
TorchServeSpec defines arguments for configuring PyTorch model serving.
Fields
storageUri
optionalruntimeVersion
optionalTransformerSpec
Appears in:
TransformerSpec defines transformer service for pre/post processing
Fields
More info: https://kubernetes.io/docs/concepts/storage/volumes
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSeconds
optionalValue must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSeconds
optionalStartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountName
optionalMore info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount
optionalDeprecated: Use serviceAccountName instead.
automountServiceAccountToken
optionalnodeName
optionalIf empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetwork
optionalIf this option is set, the ports that will be used must be specified.
Default to false.
hostPID
optionalOptional: Default to false.
hostIPC
optionalOptional: Default to false.
shareProcessNamespace
optionalWhen this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
Optional: Defaults to empty. See type description for default values of each field.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostname
optionalIf not specified, the pod's hostname will be set to a system-defined value.
subdomain
optionalIf not specified, the pod will not have a domainname at all.
schedulerName
optionalIf not specified, the pod will be dispatched by default scheduler.
file if specified.
priorityClassName
optional"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priority
optionalpriority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassName
optionalto run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinks
optionalenvironment variables, matching the syntax of Docker links.
Optional: Defaults to true.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDN
optionalIn Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
Some pod and container fields are restricted if this is set.
If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions
If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsers
optionalOptional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.
SchedulingGates can only be set at pod creation time, and be removed only afterwards.
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.
This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.
This field is immutable.
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.
This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.
This is an alpha field and requires enabling the PodLevelResources feature
gate.
minReplicas
optionalmaxReplicas
optionalscaleTarget
optionalconcurrency and rps targets are supported by Knative Pod Autoscaler
(https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).
containerConcurrency
optionalconcurrency(https://knative.dev/docs/serving/autoscaling/concurrency).
timeout
optionalcanaryTrafficPercent
optionallabels
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
annotations
optionalMore info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
TransitionStatus
Underlying type: string
Appears in:
TransitionStatus enum
Possible Values
UpToDate
InProgress
BlockedByFailedLoad
InvalidSpec
TritonSpec
Appears in:
TritonSpec defines arguments for configuring Triton model serving.
Fields
storageUri
optionalruntimeVersion
optionalWorkerSpec
Appears in:
Fields
More info: https://kubernetes.io/docs/concepts/storage/volumes
Init containers are executed in order prior to containers being started. If any
init container fails, the pod is considered to have failed and is handled according
to its restartPolicy. The name for an init container or normal container must be
unique among all containers.
Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
The resourceRequirements of an init container are taken into account during scheduling
by finding the highest request/limit for each resource type, and then using the max of
that value or the sum of the normal containers. Limits are applied to init containers
in a similar fashion.
Init containers cannot currently be added or removed.
Cannot be updated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Containers cannot currently be added or removed.
There must be at least one container in a Pod.
Cannot be updated.
pod to perform user-initiated actions such as debugging. This list cannot be specified when
creating a pod, and it cannot be modified by updating the pod spec. In order to add an
ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
One of Always, OnFailure, Never. In some contexts, only a subset of those values may be permitted.
Default to Always.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
terminationGracePeriodSeconds
optionalValue must be non-negative integer. The value zero indicates stop immediately via
the kill signal (no opportunity to shut down).
If this value is nil, the default grace period will be used instead.
The grace period is the duration in seconds after the processes running in the pod are sent
a termination signal and the time when the processes are forcibly halted with a kill signal.
Set this value longer than the expected cleanup time for your process.
Defaults to 30 seconds.
activeDeadlineSeconds
optionalStartTime before the system will actively try to mark it failed and kill associated containers.
Value must be a positive integer.
Defaults to "ClusterFirst".
Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
To have DNS options set along with hostNetwork, you have to specify DNS policy
explicitly to 'ClusterFirstWithHostNet'.
nodeSelector
optionalSelector which must match a node's labels for the pod to be scheduled on that node.
More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
serviceAccountName
optionalMore info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount
optionalDeprecated: Use serviceAccountName instead.
automountServiceAccountToken
optionalnodeName
optionalIf empty, this pod is a candidate for scheduling by the scheduler defined in schedulerName.
Once this field is set, the kubelet for this node becomes responsible for the lifecycle of this pod.
This field should not be used to express a desire for the pod to be scheduled on a specific node.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename
hostNetwork
optionalIf this option is set, the ports that will be used must be specified.
Default to false.
hostPID
optionalOptional: Default to false.
hostIPC
optionalOptional: Default to false.
shareProcessNamespace
optionalWhen this is set containers will be able to view and signal processes from other containers
in the same pod, and the first process in each container will not be assigned PID 1.
HostPID and ShareProcessNamespace cannot both be set.
Optional: Default to false.
Optional: Defaults to empty. See type description for default values of each field.
If specified, these secrets will be passed to individual puller implementations for them to use.
More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
hostname
optionalIf not specified, the pod's hostname will be set to a system-defined value.
subdomain
optionalIf not specified, the pod will not have a domainname at all.
schedulerName
optionalIf not specified, the pod will be dispatched by default scheduler.
file if specified.
priorityClassName
optional"system-cluster-critical" are two special keywords which indicate the
highest priorities with the former being the highest priority. Any other
name must be defined by creating a PriorityClass object with that name.
If not specified, the pod priority will be default or zero if there is no
default.
priority
optionalpriority of the pod. When Priority Admission Controller is enabled, it
prevents users from setting this field. The admission controller populates
this field from PriorityClassName.
The higher the value, the higher the priority.
Parameters specified here will be merged to the generated DNS
configuration based on DNSPolicy.
A pod is ready when all its containers are ready AND
all conditions specified in the readiness gates have status equal to "True"
More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
runtimeClassName
optionalto run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
empty definition that uses the default runtime handler.
More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
enableServiceLinks
optionalenvironment variables, matching the syntax of Docker links.
Optional: Defaults to true.
One of Never, PreemptLowerPriority.
Defaults to PreemptLowerPriority if unset.
This field will be autopopulated at admission time by the RuntimeClass admission controller. If
the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
The RuntimeClass admission controller will reject Pod create requests which have the overhead already
set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md
domains. Scheduler will schedule pods in a way which abides by the constraints.
All topologySpreadConstraints are ANDed.
setHostnameAsFQDN
optionalIn Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
If a pod does not have FQDN, this has no effect.
Default to false.
Some pod and container fields are restricted if this is set.
If the OS field is set to linux, the following fields must be unset:
-securityContext.windowsOptions
If the OS field is set to windows, following fields must be unset:
- spec.hostPID
- spec.hostIPC
- spec.hostUsers
- spec.securityContext.appArmorProfile
- spec.securityContext.seLinuxOptions
- spec.securityContext.seccompProfile
- spec.securityContext.fsGroup
- spec.securityContext.fsGroupChangePolicy
- spec.securityContext.sysctls
- spec.shareProcessNamespace
- spec.securityContext.runAsUser
- spec.securityContext.runAsGroup
- spec.securityContext.supplementalGroups
- spec.securityContext.supplementalGroupsPolicy
- spec.containers[*].securityContext.appArmorProfile
- spec.containers[*].securityContext.seLinuxOptions
- spec.containers[*].securityContext.seccompProfile
- spec.containers[*].securityContext.capabilities
- spec.containers[*].securityContext.readOnlyRootFilesystem
- spec.containers[*].securityContext.privileged
- spec.containers[*].securityContext.allowPrivilegeEscalation
- spec.containers[*].securityContext.procMount
- spec.containers[*].securityContext.runAsUser
- spec.containers[*].securityContext.runAsGroup
hostUsers
optionalOptional: Default to true.
If set to true or not present, the pod will be run in the host user namespace, useful
for when the pod needs a feature only available to the host user namespace, such as
loading a kernel module with CAP_SYS_MODULE.
When set to false, a new userns is created for the pod. Setting false is useful for
mitigating container breakout vulnerabilities even allowing users to run their
containers as root without actually having root privileges on the host.
This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.
If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the
scheduler will not attempt to schedule the pod.
SchedulingGates can only be set at pod creation time, and be removed only afterwards.
and reserved before the Pod is allowed to start. The resources
will be made available to those containers which consume them
by name.
This is an alpha field and requires enabling the
DynamicResourceAllocation feature gate.
This field is immutable.
containers in the pod. It supports specifying Requests and Limits for
"cpu" and "memory" resource names only. ResourceClaims are not supported.
This field enables fine-grained control over resource allocation for the
entire pod, allowing resource sharing among containers in a pod.
TODO: For beta graduation, expand this comment with a detailed explanation.
This is an alpha field and requires enabling the PodLevelResources feature
gate.
pipelineParallelSize
optionalIt also represents the number of replicas in the worker set, where each worker set serves as a scaling unit.
tensorParallelSize
optionalIt indicates the degree of parallelism for tensor computations across the available GPUs.
XGBoostSpec
Appears in:
XGBoostSpec defines arguments for configuring XGBoost model serving.
Fields
storageUri
optionalruntimeVersion
optional