Control Plane API

Packages:

serving.kserve.io/v1alpha1

Package v1alpha1 contains API Schema definitions for the serving v1alpha1 API group

Resource Types:

    BuiltInAdapter

    (Appears on:ServingRuntimeSpec)

    Field Description
    serverType
    ServerType

    ServerType must be one of the supported built-in types such as “triton” or “mlserver”, and the runtime’s container must have the same name

    runtimeManagementPort
    int

    Port which the runtime server listens for model management requests

    memBufferBytes
    int

    Fixed memory overhead to subtract from runtime container’s memory allocation to determine model capacity

    modelLoadingTimeoutMillis
    int

    Timeout for model loading operations in milliseconds

    env
    []Kubernetes core/v1.EnvVar

    Environment variables used to control other aspects of the built-in adapter’s behaviour (uncommon)

    ClusterLocalModel

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    ClusterLocalModelSpec


    sourceModelUri
    string

    Original StorageUri

    modelSize
    k8s.io/apimachinery/pkg/api/resource.Quantity

    Model size to make sure it does not exceed the disk space reserved for local models. The limit is defined on the NodeGroup.

    nodeGroup
    string

    group of nodes to cache the model on.

    status
    ClusterLocalModelStatus

    ClusterLocalModelSpec

    (Appears on:ClusterLocalModel)

    Field Description
    sourceModelUri
    string

    Original StorageUri

    modelSize
    k8s.io/apimachinery/pkg/api/resource.Quantity

    Model size to make sure it does not exceed the disk space reserved for local models. The limit is defined on the NodeGroup.

    nodeGroup
    string

    group of nodes to cache the model on.

    ClusterLocalModelStatus

    (Appears on:ClusterLocalModel)

    Field Description
    nodeStatus
    map[string]kserve.io/serving/pkg/apis/serving/v1alpha1.NodeStatus

    Status of the model on a node, like NodeDownloaded or NodeNotReady

    copies
    ModelCopies
    (Optional)

    How many nodes have the model available locally

    inferenceServices
    []NamespacedName

    Inference services using this local model

    ClusterServingRuntime

    ClusterServingRuntime is the Schema for the servingruntimes API

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    ServingRuntimeSpec


    supportedModelFormats
    []SupportedModelFormat

    Model formats and version supported by this runtime

    multiModel
    bool
    (Optional)

    Whether this ServingRuntime is intended for multi-model usage or not.

    disabled
    bool
    (Optional)

    Set to true to disable use of this runtime

    protocolVersions
    []github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol
    (Optional)

    Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2)

    workerSpec
    WorkerSpec
    (Optional)

    Set WorkerSpec to enable multi-node/multi-gpu

    ServingRuntimePodSpec
    ServingRuntimePodSpec

    (Members of ServingRuntimePodSpec are embedded into this type.)

    grpcEndpoint
    string
    (Optional)

    Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service) Assumed to be single-model runtime if omitted

    grpcDataEndpoint
    string
    (Optional)

    Grpc endpoint for inferencing

    httpDataEndpoint
    string
    (Optional)

    HTTP endpoint for inferencing

    replicas
    uint16
    (Optional)

    Configure the number of replicas in the Deployment generated by this ServingRuntime If specified, this overrides the podsPerRuntime configuration value

    storageHelper
    StorageHelper
    (Optional)

    Configuration for this runtime’s use of the storage helper (model puller) It is enabled unless explicitly disabled

    builtInAdapter
    BuiltInAdapter
    (Optional)

    Provide the details about built-in runtime adapter

    status
    ServingRuntimeStatus

    ClusterStorageContainer

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    StorageContainerSpec


    container
    Kubernetes core/v1.Container

    Container spec for the storage initializer init container

    supportedUriFormats
    []SupportedUriFormat

    List of URI formats that this container supports

    workloadType
    WorkloadType
    disabled
    bool
    (Optional)

    InferenceGraph

    InferenceGraph is the Schema for the InferenceGraph API for multiple models

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    InferenceGraphSpec


    nodes
    map[string]kserve.io/serving/pkg/apis/serving/v1alpha1.InferenceRouter

    Map of InferenceGraph router nodes Each node defines the router which can be different routing types

    resources
    Kubernetes core/v1.ResourceRequirements
    (Optional)
    affinity
    Kubernetes core/v1.Affinity
    (Optional)
    timeout
    int64
    (Optional)

    TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.

    minReplicas
    int
    (Optional)

    Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.

    maxReplicas
    int
    (Optional)

    Maximum number of replicas for autoscaling.

    scaleTarget
    int
    (Optional)

    ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

    scaleMetric
    ScaleMetric
    (Optional)

    ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

    status
    InferenceGraphStatus

    InferenceGraphSpec

    (Appears on:InferenceGraph)

    InferenceGraphSpec defines the InferenceGraph spec

    Field Description
    nodes
    map[string]kserve.io/serving/pkg/apis/serving/v1alpha1.InferenceRouter

    Map of InferenceGraph router nodes Each node defines the router which can be different routing types

    resources
    Kubernetes core/v1.ResourceRequirements
    (Optional)
    affinity
    Kubernetes core/v1.Affinity
    (Optional)
    timeout
    int64
    (Optional)

    TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.

    minReplicas
    int
    (Optional)

    Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.

    maxReplicas
    int
    (Optional)

    Maximum number of replicas for autoscaling.

    scaleTarget
    int
    (Optional)

    ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

    scaleMetric
    ScaleMetric
    (Optional)

    ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

    InferenceGraphStatus

    (Appears on:InferenceGraph)

    InferenceGraphStatus defines the InferenceGraph conditions and status

    Field Description
    Status
    knative.dev/pkg/apis/duck/v1.Status

    (Members of Status are embedded into this type.)

    Conditions for InferenceGraph

    url
    knative.dev/pkg/apis.URL
    (Optional)

    Url for the InferenceGraph

    InferenceGraphValidator

    InferenceGraphValidator is responsible for setting default values on the InferenceGraph resources when created or updated.

    NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as it is used only for temporary operations and does not need to be deeply copied.

    InferenceRouter

    (Appears on:InferenceGraphSpec)

    InferenceRouter defines the router for each InferenceGraph node with one or multiple steps

    kind: InferenceGraph
    metadata:
    name: canary-route
    spec:
    nodes:
    root:
    routerType: Splitter
    routes:
    - service: mymodel1
    weight: 20
    - service: mymodel2
    weight: 80
    
    kind: InferenceGraph
    metadata:
    name: abtest
    spec:
    nodes:
    mymodel:
    routerType: Switch
    routes:
    - service: mymodel1
    condition: "{ .input.userId == 1 }"
    - service: mymodel2
    condition: "{ .input.userId == 2 }"
    

    Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods.

    Tree Ensemble constitutes a case where simple algorithms for combining results of either classification or regression trees are well known. Multiple classification trees, for example, are commonly combined using a “majority-vote” method. Multiple regression trees are often combined using various averaging techniques. e.g tagging models with segment identifiers and weights to be used for their combination in these ways.

    kind: InferenceGraph
    metadata:
    name: ensemble
    spec:
    nodes:
    root:
    routerType: Sequence
    routes:
    - service: feast
    - nodeName: ensembleModel
    data: $response
    ensembleModel:
    routerType: Ensemble
    routes:
    - service: sklearn-model
    - service: xgboost-model
    

    Scoring a case using a sequence, or chain of models allows the output of one model to be passed in as input to the subsequent models.

    kind: InferenceGraph
    metadata:
    name: model-chainer
    spec:
    nodes:
    root:
    routerType: Sequence
    routes:
    - service: mymodel-s1
    - service: mymodel-s2
    data: $response
    - service: mymodel-s3
    data: $response
    

    In the flow described below, the pre_processing node base64 encodes the image and passes it to two model nodes in the flow. The encoded data is available to both these nodes for classification. The second node i.e. dog-breed-classification takes the original input from the pre_processing node along-with the response from the cat-dog-classification node to do further classification of the dog breed if required.

    kind: InferenceGraph
    metadata:
    name: dog-breed-classification
    spec:
    nodes:
    root:
    routerType: Sequence
    routes:
    - service: cat-dog-classifier
    - nodeName: breed-classifier
    data: $request
    breed-classifier:
    routerType: Switch
    routes:
    - service: dog-breed-classifier
    condition: { .predictions.class == "dog" }
    - service: cat-breed-classifier
    condition: { .predictions.class == "cat" }
    
    Field Description
    routerType
    InferenceRouterType

    RouterType

    • Sequence: chain multiple inference steps with input/output from previous step

    • Splitter: randomly routes to the target service according to the weight

    • Ensemble: routes the request to multiple models and then merge the responses

    • Switch: routes the request to one of the steps based on condition

    steps
    []InferenceStep
    (Optional)

    Steps defines destinations for the current router node

    InferenceRouterType (string alias)

    (Appears on:InferenceRouter)

    InferenceRouterType constant for inference routing types

    Value Description

    "Ensemble"

    Ensemble router routes the requests to multiple models and then merge the responses

    "Sequence"

    Sequence Default type only route to one destination

    "Splitter"

    Splitter router randomly routes the requests to the named service according to the weight

    "Switch"

    Switch routes the request to the model based on certain condition

    InferenceStep

    (Appears on:InferenceRouter)

    InferenceStep defines the inference target of the current step with condition, weights and data.

    Field Description
    name
    string
    (Optional)

    Unique name for the step within this node

    InferenceTarget
    InferenceTarget

    (Members of InferenceTarget are embedded into this type.)

    Node or service used to process this step

    data
    string
    (Optional)

    request data sent to the next route with input/output from the previous step $request $response.predictions

    weight
    int64
    (Optional)

    the weight for split of the traffic, only used for Split Router when weight is specified all the routing targets should be sum to 100

    condition
    string
    (Optional)

    routing based on the condition

    dependency
    InferenceStepDependencyType
    (Optional)

    to decide whether a step is a hard or a soft dependency in the Inference Graph

    InferenceStepDependencyType (string alias)

    (Appears on:InferenceStep)

    InferenceStepDependencyType constant for inference step dependency

    Value Description

    "Hard"

    Hard

    "Soft"

    Soft

    InferenceTarget

    (Appears on:InferenceStep)

    Exactly one InferenceTarget field must be specified

    Field Description
    nodeName
    string
    (Optional)

    The node name for routing as next step

    serviceName
    string

    named reference for InferenceService

    serviceUrl
    string
    (Optional)

    InferenceService URL, mutually exclusive with ServiceName

    LocalModelNodeGroup

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    LocalModelNodeGroupSpec


    storageLimit
    k8s.io/apimachinery/pkg/api/resource.Quantity

    Max storage size per node in this node group

    persistentVolumeSpec
    Kubernetes core/v1.PersistentVolumeSpec

    Used to create PersistentVolumes for downloading models and in inference service namespaces

    persistentVolumeClaimSpec
    Kubernetes core/v1.PersistentVolumeClaimSpec

    Used to create PersistentVolumeClaims for download and in inference service namespaces

    status
    LocalModelNodeGroupStatus

    LocalModelNodeGroupSpec

    (Appears on:LocalModelNodeGroup)

    LocalModelNodeGroupSpec defines a group of nodes for to download the model to.

    Field Description
    storageLimit
    k8s.io/apimachinery/pkg/api/resource.Quantity

    Max storage size per node in this node group

    persistentVolumeSpec
    Kubernetes core/v1.PersistentVolumeSpec

    Used to create PersistentVolumes for downloading models and in inference service namespaces

    persistentVolumeClaimSpec
    Kubernetes core/v1.PersistentVolumeClaimSpec

    Used to create PersistentVolumeClaims for download and in inference service namespaces

    LocalModelNodeGroupStatus

    (Appears on:LocalModelNodeGroup)

    Field Description
    used
    k8s.io/apimachinery/pkg/api/resource.Quantity

    Used storage space on any node for this node group

    available
    k8s.io/apimachinery/pkg/api/resource.Quantity

    Available storage space on any node for this node group

    ModelCopies

    (Appears on:ClusterLocalModelStatus)

    Field Description
    available
    int
    total
    int

    Total number of nodes that we expect the model to be downloaded. Including nodes that are not ready

    failed
    int

    Download Failed

    ModelSpec

    (Appears on:TrainedModelSpec)

    ModelSpec describes a TrainedModel

    Field Description
    storageUri
    string

    Storage URI for the model repository

    framework
    string

    Machine Learning The values could be: “tensorflow”,“pytorch”,“sklearn”,“onnx”,“xgboost”, “myawesomeinternalframework” etc.

    memory
    k8s.io/apimachinery/pkg/api/resource.Quantity

    Maximum memory this model will consume, this field is used to decide if a model server has enough memory to load this model.

    NamespacedName

    (Appears on:ClusterLocalModelStatus)

    Field Description
    namespace
    string
    name
    string

    NodeStatus (string alias)

    (Appears on:ClusterLocalModelStatus)

    NodeStatus enum

    Value Description

    "NodeDeleted"

    "NodeDeleting"

    "NodeDeletionError"

    "NodeDownloadError"

    "NodeDownloadPending"

    "NodeDownloaded"

    "NodeDownloading"

    "NodeNotReady"

    ScaleMetric (string alias)

    (Appears on:InferenceGraphSpec)

    ScaleMetric enum

    ServerType (string alias)

    (Appears on:BuiltInAdapter)

    ServerType constant for specifying the runtime name

    Value Description

    "mlserver"

    Model server is MLServer

    "ovms"

    Model server is OpenVino Model Server

    "triton"

    Model server is Triton

    ServingRuntime

    ServingRuntime is the Schema for the servingruntimes API

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    ServingRuntimeSpec


    supportedModelFormats
    []SupportedModelFormat

    Model formats and version supported by this runtime

    multiModel
    bool
    (Optional)

    Whether this ServingRuntime is intended for multi-model usage or not.

    disabled
    bool
    (Optional)

    Set to true to disable use of this runtime

    protocolVersions
    []github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol
    (Optional)

    Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2)

    workerSpec
    WorkerSpec
    (Optional)

    Set WorkerSpec to enable multi-node/multi-gpu

    ServingRuntimePodSpec
    ServingRuntimePodSpec

    (Members of ServingRuntimePodSpec are embedded into this type.)

    grpcEndpoint
    string
    (Optional)

    Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service) Assumed to be single-model runtime if omitted

    grpcDataEndpoint
    string
    (Optional)

    Grpc endpoint for inferencing

    httpDataEndpoint
    string
    (Optional)

    HTTP endpoint for inferencing

    replicas
    uint16
    (Optional)

    Configure the number of replicas in the Deployment generated by this ServingRuntime If specified, this overrides the podsPerRuntime configuration value

    storageHelper
    StorageHelper
    (Optional)

    Configuration for this runtime’s use of the storage helper (model puller) It is enabled unless explicitly disabled

    builtInAdapter
    BuiltInAdapter
    (Optional)

    Provide the details about built-in runtime adapter

    status
    ServingRuntimeStatus

    ServingRuntimePodSpec

    (Appears on:ServingRuntimeSpec, WorkerSpec)

    Field Description
    containers
    []Kubernetes core/v1.Container

    List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.

    volumes
    []Kubernetes core/v1.Volume
    (Optional)

    List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes

    nodeSelector
    map[string]string
    (Optional)

    NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

    affinity
    Kubernetes core/v1.Affinity
    (Optional)

    If specified, the pod’s scheduling constraints

    tolerations
    []Kubernetes core/v1.Toleration
    (Optional)

    If specified, the pod’s tolerations.

    labels
    map[string]string
    (Optional)

    Labels that will be add to the pod. More info: http://kubernetes.io/docs/user-guide/labels

    annotations
    map[string]string
    (Optional)

    Annotations that will be add to the pod. More info: http://kubernetes.io/docs/user-guide/annotations

    imagePullSecrets
    []Kubernetes core/v1.LocalObjectReference
    (Optional)

    ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod

    hostIPC
    bool
    (Optional)

    Use the host’s ipc namespace. Optional: Default to false.

    ServingRuntimeSpec

    (Appears on:ClusterServingRuntime, ServingRuntime, SupportedRuntime)

    ServingRuntimeSpec defines the desired state of ServingRuntime. This spec is currently provisional and are subject to change as details regarding single-model serving and multi-model serving are hammered out.

    Field Description
    supportedModelFormats
    []SupportedModelFormat

    Model formats and version supported by this runtime

    multiModel
    bool
    (Optional)

    Whether this ServingRuntime is intended for multi-model usage or not.

    disabled
    bool
    (Optional)

    Set to true to disable use of this runtime

    protocolVersions
    []github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol
    (Optional)

    Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2)

    workerSpec
    WorkerSpec
    (Optional)

    Set WorkerSpec to enable multi-node/multi-gpu

    ServingRuntimePodSpec
    ServingRuntimePodSpec

    (Members of ServingRuntimePodSpec are embedded into this type.)

    grpcEndpoint
    string
    (Optional)

    Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service) Assumed to be single-model runtime if omitted

    grpcDataEndpoint
    string
    (Optional)

    Grpc endpoint for inferencing

    httpDataEndpoint
    string
    (Optional)

    HTTP endpoint for inferencing

    replicas
    uint16
    (Optional)

    Configure the number of replicas in the Deployment generated by this ServingRuntime If specified, this overrides the podsPerRuntime configuration value

    storageHelper
    StorageHelper
    (Optional)

    Configuration for this runtime’s use of the storage helper (model puller) It is enabled unless explicitly disabled

    builtInAdapter
    BuiltInAdapter
    (Optional)

    Provide the details about built-in runtime adapter

    ServingRuntimeStatus

    (Appears on:ClusterServingRuntime, ServingRuntime)

    ServingRuntimeStatus defines the observed state of ServingRuntime

    StorageContainerSpec

    (Appears on:ClusterStorageContainer)

    StorageContainerSpec defines the container spec for the storage initializer init container, and the protocols it supports.

    Field Description
    container
    Kubernetes core/v1.Container

    Container spec for the storage initializer init container

    supportedUriFormats
    []SupportedUriFormat

    List of URI formats that this container supports

    workloadType
    WorkloadType

    StorageHelper

    (Appears on:ServingRuntimeSpec)

    Field Description
    disabled
    bool
    (Optional)

    SupportedModelFormat

    (Appears on:ServingRuntimeSpec)

    Field Description
    name
    string

    Name of the model format.

    version
    string
    (Optional)

    Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”.

    autoSelect
    bool
    (Optional)

    Set to true to allow the ServingRuntime to be used for automatic model placement if this model format is specified with no explicit runtime.

    priority
    int32
    (Optional)

    Priority of this serving runtime for auto selection. This is used to select the serving runtime if more than one serving runtime supports the same model format. The value should be greater than zero. The higher the value, the higher the priority. Priority is not considered if AutoSelect is either false or not specified. Priority can be overridden by specifying the runtime in the InferenceService.

    SupportedRuntime

    SupportedRuntime is the schema for supported runtime result of automatic selection

    Field Description
    Name
    string
    Spec
    ServingRuntimeSpec

    SupportedUriFormat

    (Appears on:StorageContainerSpec)

    SupportedUriFormat can be either prefix or regex. Todo: Add validation that only one of them is set.

    Field Description
    prefix
    string
    regex
    string

    TrainedModel

    TrainedModel is the Schema for the TrainedModel API

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    TrainedModelSpec


    inferenceService
    string

    parent inference service to deploy to

    model
    ModelSpec

    Predictor model spec

    status
    TrainedModelStatus

    TrainedModelSpec

    (Appears on:TrainedModel)

    TrainedModelSpec defines the TrainedModel spec

    Field Description
    inferenceService
    string

    parent inference service to deploy to

    model
    ModelSpec

    Predictor model spec

    TrainedModelStatus

    (Appears on:TrainedModel)

    TrainedModelStatus defines the observed state of TrainedModel

    Field Description
    Status
    knative.dev/pkg/apis/duck/v1.Status

    (Members of Status are embedded into this type.)

    Conditions for trained model

    url
    knative.dev/pkg/apis.URL

    URL holds the url that will distribute traffic over the provided traffic targets. For v1: http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}/v1/models/:predict For v2: http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}/v2/models//infer

    address
    knative.dev/pkg/apis/duck/v1.Addressable

    Addressable endpoint for the deployed trained model http:///v1/models/.metadata.name

    TrainedModelValidator

    TrainedModelValidator is responsible for setting default values on the TrainedModel resources when created or updated.

    NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as it is used only for temporary operations and does not need to be deeply copied.

    WorkerSpec

    (Appears on:ServingRuntimeSpec)

    WorkerSpec is the schema for multi-node/multi-GPU feature

    Field Description
    ServingRuntimePodSpec
    ServingRuntimePodSpec

    (Members of ServingRuntimePodSpec are embedded into this type.)

    size
    int
    (Optional)

    Configure the number of replicas in the worker set, each worker set represents the unit of scaling

    WorkloadType (string alias)

    (Appears on:StorageContainerSpec)

    Value Description

    "initContainer"

    "localModelDownloadJob"


    Generated with gen-crd-api-reference-docs on git commit 7e436424.

    serving.kserve.io/v1beta1

    Package v1beta1 contains API Schema definitions for the serving v1beta1 API group

    Resource Types:

      ARTExplainerSpec

      (Appears on:ExplainerSpec)

      ARTExplainerType defines the arguments for configuring an ART Explanation Server

      Field Description
      type
      ARTExplainerType

      The type of ART explainer

      ExplainerExtensionSpec
      ExplainerExtensionSpec

      (Members of ExplainerExtensionSpec are embedded into this type.)

      Contains fields shared across all explainers

      ARTExplainerType (string alias)

      (Appears on:ARTExplainerSpec)

      Value Description

      "SquareAttack"

      Batcher

      (Appears on:ComponentExtensionSpec)

      Batcher specifies optional payload batching available for all components

      Field Description
      maxBatchSize
      int
      (Optional)

      Specifies the max number of requests to trigger a batch

      maxLatency
      int
      (Optional)

      Specifies the max latency to trigger a batch

      timeout
      int
      (Optional)

      Specifies the timeout of a batch

      Component

      Component interface is implemented by all specs that contain component implementations, e.g. PredictorSpec, ExplainerSpec, TransformerSpec.

      ComponentExtensionSpec

      (Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)

      ComponentExtensionSpec defines the deployment configuration for a given InferenceService component

      Field Description
      minReplicas
      int
      (Optional)

      Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.

      maxReplicas
      int
      (Optional)

      Maximum number of replicas for autoscaling.

      scaleTarget
      int
      (Optional)

      ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

      scaleMetric
      ScaleMetric
      (Optional)

      ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

      containerConcurrency
      int64
      (Optional)

      ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).

      timeout
      int64
      (Optional)

      TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.

      canaryTrafficPercent
      int64
      (Optional)

      CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision

      logger
      LoggerSpec
      (Optional)

      Activate request/response logging and logger configurations

      batcher
      Batcher
      (Optional)

      Activate request batching and batching configurations

      labels
      map[string]string
      (Optional)

      Labels that will be added to the component pod. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

      annotations
      map[string]string
      (Optional)

      Annotations that will be added to the component pod. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/

      deploymentStrategy
      Kubernetes apps/v1.DeploymentStrategy
      (Optional)

      The deployment strategy to use to replace existing pods with new ones. Only applicable for raw deployment mode.

      ComponentImplementation

      ComponentImplementation interface is implemented by predictor, transformer, and explainer implementations

      ComponentStatusSpec

      (Appears on:InferenceServiceStatus)

      ComponentStatusSpec describes the state of the component

      Field Description
      latestReadyRevision
      string
      (Optional)

      Latest revision name that is in ready state

      latestCreatedRevision
      string
      (Optional)

      Latest revision name that is created

      previousRolledoutRevision
      string
      (Optional)

      Previous revision name that is rolled out with 100 percent traffic

      latestRolledoutRevision
      string
      (Optional)

      Latest revision name that is rolled out with 100 percent traffic

      traffic
      []knative.dev/serving/pkg/apis/serving/v1.TrafficTarget
      (Optional)

      Traffic holds the configured traffic distribution for latest ready revision and previous rolled out revision.

      url
      knative.dev/pkg/apis.URL
      (Optional)

      URL holds the primary url that will distribute traffic over the provided traffic targets. This will be one the REST or gRPC endpoints that are available. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}

      restUrl
      knative.dev/pkg/apis.URL
      (Optional)

      REST endpoint of the component if available.

      grpcUrl
      knative.dev/pkg/apis.URL
      (Optional)

      gRPC endpoint of the component if available.

      address
      knative.dev/pkg/apis/duck/v1.Addressable
      (Optional)

      Addressable endpoint for the InferenceService

      ComponentType (string alias)

      ComponentType contains the different types of components of the service

      Value Description

      "explainer"

      "predictor"

      "transformer"

      CustomExplainer

      CustomExplainer defines arguments for configuring a custom explainer.

      Field Description
      PodSpec
      Kubernetes core/v1.PodSpec

      (Members of PodSpec are embedded into this type.)

      CustomPredictor

      CustomPredictor defines arguments for configuring a custom server.

      Field Description
      PodSpec
      Kubernetes core/v1.PodSpec

      (Members of PodSpec are embedded into this type.)

      CustomTransformer

      CustomTransformer defines arguments for configuring a custom transformer.

      Field Description
      PodSpec
      Kubernetes core/v1.PodSpec

      (Members of PodSpec are embedded into this type.)

      DeployConfig

      Field Description
      defaultDeploymentMode
      string

      ExplainerConfig

      (Appears on:ExplainersConfig)

      Field Description
      image
      string

      explainer docker image name

      defaultImageVersion
      string

      default explainer docker image version

      ExplainerExtensionSpec

      (Appears on:ARTExplainerSpec)

      ExplainerExtensionSpec defines configuration shared across all explainer frameworks

      Field Description
      storageUri
      string

      The location of a trained explanation model

      runtimeVersion
      string

      Defaults to latest Explainer Version

      config
      map[string]string

      Inline custom parameter settings for explainer

      Container
      Kubernetes core/v1.Container

      (Members of Container are embedded into this type.)

      (Optional)

      Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec.

      storage
      StorageSpec
      (Optional)

      Storage Spec for model location

      ExplainerSpec

      (Appears on:InferenceServiceSpec)

      ExplainerSpec defines the container spec for a model explanation server, The following fields follow a “1-of” semantic. Users must specify exactly one spec.

      Field Description
      art
      ARTExplainerSpec

      Spec for ART explainer

      PodSpec
      PodSpec

      (Members of PodSpec are embedded into this type.)

      This spec is dual purpose. 1) Users may choose to provide a full PodSpec for their custom explainer. The field PodSpec.Containers is mutually exclusive with other explainers. 2) Users may choose to provide a Explainer and specify PodSpec overrides in the PodSpec. They must not provide PodSpec.Containers in this case.

      ComponentExtensionSpec
      ComponentExtensionSpec

      (Members of ComponentExtensionSpec are embedded into this type.)

      Component extension defines the deployment configurations for explainer

      ExplainersConfig

      (Appears on:InferenceServicesConfig)

      Field Description
      art
      ExplainerConfig

      FailureInfo

      (Appears on:ModelStatus)

      Field Description
      location
      string
      (Optional)

      Name of component to which the failure relates (usually Pod name)

      reason
      FailureReason
      (Optional)

      High level class of failure

      message
      string
      (Optional)

      Detailed error message

      modelRevisionName
      string
      (Optional)

      Internal Revision/ID of model, tied to specific Spec contents

      time
      Kubernetes meta/v1.Time
      (Optional)

      Time failure occurred or was discovered

      exitCode
      int32
      (Optional)

      Exit status from the last termination of the container

      FailureReason (string alias)

      (Appears on:FailureInfo)

      FailureReason enum

      Value Description

      "InvalidPredictorSpec"

      The current Predictor Spec is invalid or unsupported

      "ModelLoadFailed"

      The model failed to load within a ServingRuntime container

      "NoSupportingRuntime"

      There are no ServingRuntime which support the specified model type

      "RuntimeDisabled"

      The ServingRuntime is disabled

      "RuntimeNotRecognized"

      There is no ServingRuntime defined with the specified runtime name

      "RuntimeUnhealthy"

      Corresponding ServingRuntime containers failed to start or are unhealthy

      HuggingFaceRuntimeSpec

      (Appears on:PredictorSpec)

      HuggingFaceRuntimeSpec defines arguments for configuring HuggingFace model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      InferenceService

      InferenceService is the Schema for the InferenceServices API

      Field Description
      metadata
      Kubernetes meta/v1.ObjectMeta
      Refer to the Kubernetes API documentation for the fields of the metadata field.
      spec
      InferenceServiceSpec


      predictor
      PredictorSpec

      Predictor defines the model serving spec

      explainer
      ExplainerSpec
      (Optional)

      Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified.

      transformer
      TransformerSpec
      (Optional)

      Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service.

      status
      InferenceServiceStatus

      InferenceServiceDefaulter

      InferenceServiceDefaulter is responsible for setting default values on the InferenceService when created or updated.

      NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as it is used only for temporary operations and does not need to be deeply copied.

      InferenceServiceSpec

      (Appears on:InferenceService)

      InferenceServiceSpec is the top level type for this resource

      Field Description
      predictor
      PredictorSpec

      Predictor defines the model serving spec

      explainer
      ExplainerSpec
      (Optional)

      Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified.

      transformer
      TransformerSpec
      (Optional)

      Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service.

      InferenceServiceStatus

      (Appears on:InferenceService)

      InferenceServiceStatus defines the observed state of InferenceService

      Field Description
      Status
      knative.dev/pkg/apis/duck/v1.Status

      (Members of Status are embedded into this type.)

      Conditions for the InferenceService
      - PredictorReady: predictor readiness condition;
      - TransformerReady: transformer readiness condition;
      - ExplainerReady: explainer readiness condition;
      - RoutesReady (serverless mode only): aggregated routing condition, i.e. endpoint readiness condition;
      - LatestDeploymentReady (serverless mode only): aggregated configuration condition, i.e. latest deployment readiness condition;
      - Ready: aggregated condition;

      address
      knative.dev/pkg/apis/duck/v1.Addressable
      (Optional)

      Addressable endpoint for the InferenceService

      url
      knative.dev/pkg/apis.URL
      (Optional)

      URL holds the url that will distribute traffic over the provided traffic targets. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}

      components
      map[kserve.io/serving/pkg/apis/serving/v1beta1.ComponentType]kserve.io/serving/pkg/apis/serving/v1beta1.ComponentStatusSpec

      Statuses for the components of the InferenceService

      modelStatus
      ModelStatus

      Model related statuses

      InferenceServiceValidator

      InferenceServiceValidator is responsible for validating the InferenceService resource when it is created, updated, or deleted.

      NOTE: The +kubebuilder:object:generate=false and +k8s:deepcopy-gen=false marker prevents controller-gen from generating DeepCopy methods, as this struct is used only for temporary operations and does not need to be deeply copied.

      InferenceServicesConfig

      Field Description
      explainers
      ExplainersConfig

      Explainer configurations

      IngressConfig

      Field Description
      ingressGateway
      string
      knativeLocalGatewayService
      string
      localGateway
      string
      localGatewayService
      string
      ingressDomain
      string
      ingressClassName
      string
      additionalIngressDomains
      []string
      domainTemplate
      string
      urlScheme
      string
      disableIstioVirtualHost
      bool
      pathTemplate
      string
      disableIngressCreation
      bool

      LightGBMSpec

      (Appears on:PredictorSpec)

      LightGBMSpec defines arguments for configuring LightGBMSpec model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      LocalModelConfig

      Field Description
      enabled
      bool
      jobNamespace
      string
      defaultJobImage
      string
      fsGroup
      int64

      LoggerSpec

      (Appears on:ComponentExtensionSpec)

      LoggerSpec specifies optional payload logging available for all components

      Field Description
      url
      string
      (Optional)

      URL to send logging events

      mode
      LoggerType
      (Optional)

      Specifies the scope of the loggers.
      Valid values are:
      - “all” (default): log both request and response;
      - “request”: log only request;
      - “response”: log only response

      metadataHeaders
      []string
      (Optional)

      Matched metadata HTTP headers for propagating to inference logger cloud events.

      LoggerType (string alias)

      (Appears on:LoggerSpec)

      LoggerType controls the scope of log publishing

      Value Description

      "all"

      LogAll Logger mode to log both request and response

      "request"

      LogRequest Logger mode to log only request

      "response"

      LogResponse Logger mode to log only response

      ModelCopies

      (Appears on:ModelStatus)

      Field Description
      failedCopies
      int

      How many copies of this predictor’s models failed to load recently

      totalCopies
      int
      (Optional)

      Total number copies of this predictor’s models that are currently loaded

      ModelFormat

      (Appears on:ModelSpec)

      Field Description
      name
      string

      Name of the model format.

      version
      string
      (Optional)

      Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”.

      ModelRevisionStates

      (Appears on:ModelStatus)

      Field Description
      activeModelState
      ModelState

      High level state string: Pending, Standby, Loading, Loaded, FailedToLoad

      targetModelState
      ModelState

      ModelSpec

      (Appears on:PredictorSpec)

      Field Description
      modelFormat
      ModelFormat

      ModelFormat being served.

      runtime
      string
      (Optional)

      Specific ClusterServingRuntime/ServingRuntime name to use for deployment.

      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      ModelState (string alias)

      (Appears on:ModelRevisionStates)

      ModelState enum

      Value Description

      "FailedToLoad"

      All copies of the model failed to load

      "Loaded"

      At least one copy of the model is loaded

      "Loading"

      Model is loading

      "Pending"

      Model is not yet registered

      "Standby"

      Model is available but not loaded (will load when used)

      ModelStatus

      (Appears on:InferenceServiceStatus)

      Field Description
      transitionStatus
      TransitionStatus

      Whether the available predictor endpoints reflect the current Spec or is in transition

      states
      ModelRevisionStates
      (Optional)

      State information of the predictor’s model.

      lastFailureInfo
      FailureInfo
      (Optional)

      Details of last failure, when load of target model is failed or blocked.

      copies
      ModelCopies
      (Optional)

      Model copy information of the predictor’s model.

      ONNXRuntimeSpec

      (Appears on:PredictorSpec)

      ONNXRuntimeSpec defines arguments for configuring ONNX model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      PMMLSpec

      (Appears on:PredictorSpec)

      PMMLSpec defines arguments for configuring PMML model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      PaddleServerSpec

      (Appears on:PredictorSpec)

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      PodSpec

      (Appears on:ExplainerSpec, PredictorSpec, TransformerSpec, WorkerSpec)

      PodSpec is a description of a pod.

      Field Description
      volumes
      []Kubernetes core/v1.Volume
      (Optional)

      List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes

      initContainers
      []Kubernetes core/v1.Container

      List of initialization containers belonging to the pod. Init containers are executed in order prior to containers being started. If any init container fails, the pod is considered to have failed and is handled according to its restartPolicy. The name for an init container or normal container must be unique among all containers. Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes. The resourceRequirements of an init container are taken into account during scheduling by finding the highest request/limit for each resource type, and then using the max of of that value or the sum of the normal containers. Limits are applied to init containers in a similar fashion. Init containers cannot currently be added or removed. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

      containers
      []Kubernetes core/v1.Container

      List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.

      ephemeralContainers
      []Kubernetes core/v1.EphemeralContainer
      (Optional)

      List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing pod to perform user-initiated actions such as debugging. This list cannot be specified when creating a pod, and it cannot be modified by updating the pod spec. In order to add an ephemeral container to an existing pod, use the pod’s ephemeralcontainers subresource. This field is beta-level and available on clusters that haven’t disabled the EphemeralContainers feature gate.

      restartPolicy
      Kubernetes core/v1.RestartPolicy
      (Optional)

      Restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to Always. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

      terminationGracePeriodSeconds
      int64
      (Optional)

      Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates stop immediately via the kill signal (no opportunity to shut down). If this value is nil, the default grace period will be used instead. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. Defaults to 30 seconds.

      activeDeadlineSeconds
      int64
      (Optional)

      Optional duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers. Value must be a positive integer.

      dnsPolicy
      Kubernetes core/v1.DNSPolicy
      (Optional)

      Set DNS policy for the pod. Defaults to “ClusterFirst”. Valid values are ‘ClusterFirstWithHostNet’, ‘ClusterFirst’, ‘Default’ or ‘None’. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to ‘ClusterFirstWithHostNet’.

      nodeSelector
      map[string]string
      (Optional)

      NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

      serviceAccountName
      string
      (Optional)

      ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

      serviceAccount
      string
      (Optional)

      DeprecatedServiceAccount is a depreciated alias for ServiceAccountName. Deprecated: Use serviceAccountName instead.

      automountServiceAccountToken
      bool
      (Optional)

      AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.

      nodeName
      string
      (Optional)

      NodeName is a request to schedule this pod onto a specific node. If it is non-empty, the scheduler simply schedules this pod onto that node, assuming that it fits resource requirements.

      hostNetwork
      bool
      (Optional)

      Host networking requested for this pod. Use the host’s network namespace. If this option is set, the ports that will be used must be specified. Default to false.

      hostPID
      bool
      (Optional)

      Use the host’s pid namespace. Optional: Default to false.

      hostIPC
      bool
      (Optional)

      Use the host’s ipc namespace. Optional: Default to false.

      shareProcessNamespace
      bool
      (Optional)

      Share a single process namespace between all of the containers in a pod. When this is set containers will be able to view and signal processes from other containers in the same pod, and the first process in each container will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both be set. Optional: Default to false.

      securityContext
      Kubernetes core/v1.PodSecurityContext
      (Optional)

      SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field.

      imagePullSecrets
      []Kubernetes core/v1.LocalObjectReference
      (Optional)

      ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod

      hostname
      string
      (Optional)

      Specifies the hostname of the Pod If not specified, the pod’s hostname will be set to a system-defined value.

      subdomain
      string
      (Optional)

      If specified, the fully qualified Pod hostname will be “...svc.”. If not specified, the pod will not have a domainname at all.

      affinity
      Kubernetes core/v1.Affinity
      (Optional)

      If specified, the pod’s scheduling constraints

      schedulerName
      string
      (Optional)

      If specified, the pod will be dispatched by specified scheduler. If not specified, the pod will be dispatched by default scheduler.

      tolerations
      []Kubernetes core/v1.Toleration
      (Optional)

      If specified, the pod’s tolerations.

      hostAliases
      []Kubernetes core/v1.HostAlias
      (Optional)

      HostAliases is an optional list of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods.

      priorityClassName
      string
      (Optional)

      If specified, indicates the pod’s priority. “system-node-critical” and “system-cluster-critical” are two special keywords which indicate the highest priorities with the former being the highest priority. Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default.

      priority
      int32
      (Optional)

      The priority value. Various system components use this field to find the priority of the pod. When Priority Admission Controller is enabled, it prevents users from setting this field. The admission controller populates this field from PriorityClassName. The higher the value, the higher the priority.

      dnsConfig
      Kubernetes core/v1.PodDNSConfig
      (Optional)

      Specifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy.

      readinessGates
      []Kubernetes core/v1.PodReadinessGate
      (Optional)

      If specified, all readiness gates will be evaluated for pod readiness. A pod is ready when all its containers are ready AND all conditions specified in the readiness gates have status equal to “True” More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates

      runtimeClassName
      string
      (Optional)

      RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the “legacy” RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class This is a beta feature as of Kubernetes v1.14.

      enableServiceLinks
      bool
      (Optional)

      EnableServiceLinks indicates whether information about services should be injected into pod’s environment variables, matching the syntax of Docker links. Optional: Defaults to true.

      preemptionPolicy
      Kubernetes core/v1.PreemptionPolicy
      (Optional)

      PreemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset. This field is beta-level, gated by the NonPreemptingPriority feature-gate.

      overhead
      Kubernetes core/v1.ResourceList
      (Optional)

      Overhead represents the resource overhead associated with running a pod for a given RuntimeClass. This field will be autopopulated at admission time by the RuntimeClass admission controller. If the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests. The RuntimeClass admission controller will reject Pod create requests which have the overhead already set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero. More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md This field is beta-level as of Kubernetes v1.18, and is only honored by servers that enable the PodOverhead feature.

      topologySpreadConstraints
      []Kubernetes core/v1.TopologySpreadConstraint
      (Optional)

      TopologySpreadConstraints describes how a group of pods ought to spread across topology domains. Scheduler will schedule pods in a way which abides by the constraints. All topologySpreadConstraints are ANDed.

      setHostnameAsFQDN
      bool
      (Optional)

      If true the pod’s hostname will be configured as the pod’s FQDN, rather than the leaf name (the default). In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname). In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to FQDN. If a pod does not have FQDN, this has no effect. Default to false.

      os
      Kubernetes core/v1.PodOS
      (Optional)

      Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set.

      If the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions

      If the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[].securityContext.seLinuxOptions - spec.containers[].securityContext.seccompProfile - spec.containers[].securityContext.capabilities - spec.containers[].securityContext.readOnlyRootFilesystem - spec.containers[].securityContext.privileged - spec.containers[].securityContext.allowPrivilegeEscalation - spec.containers[].securityContext.procMount - spec.containers[].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup This is an alpha field and requires the IdentifyPodOS feature

      hostUsers
      bool
      (Optional)

      Use the host’s user namespace. Optional: Default to true. If set to true or not present, the pod will be run in the host user namespace, useful for when the pod needs a feature only available to the host user namespace, such as loading a kernel module with CAP_SYS_MODULE. When set to false, a new userns is created for the pod. Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root without actually having root privileges on the host. This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature.

      schedulingGates
      []Kubernetes core/v1.PodSchedulingGate
      (Optional)

      SchedulingGates is an opaque list of values that if specified will block scheduling the pod. If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the scheduler will not attempt to schedule the pod.

      SchedulingGates can only be set at pod creation time, and be removed only afterwards.

      This is a beta feature enabled by the PodSchedulingReadiness feature gate.

      resourceClaims
      []Kubernetes core/v1.PodResourceClaim
      (Optional)

      ResourceClaims defines which ResourceClaims must be allocated and reserved before the Pod is allowed to start. The resources will be made available to those containers which consume them by name.

      This is an alpha field and requires enabling the DynamicResourceAllocation feature gate.

      This field is immutable.

      PredictorExtensionSpec

      (Appears on:HuggingFaceRuntimeSpec, LightGBMSpec, ModelSpec, ONNXRuntimeSpec, PMMLSpec, PaddleServerSpec, SKLearnSpec, TFServingSpec, TorchServeSpec, TritonSpec, XGBoostSpec)

      PredictorExtensionSpec defines configuration shared across all predictor frameworks

      Field Description
      storageUri
      string
      (Optional)

      This field points to the location of the trained model which is mounted onto the pod.

      runtimeVersion
      string
      (Optional)

      Runtime version of the predictor docker image

      protocolVersion
      github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol
      (Optional)

      Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)

      Container
      Kubernetes core/v1.Container

      (Members of Container are embedded into this type.)

      (Optional)

      Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec.

      storage
      StorageSpec
      (Optional)

      Storage Spec for model location

      PredictorImplementation

      PredictorImplementation defines common functions for all predictors e.g Tensorflow, Triton, etc

      PredictorSpec

      (Appears on:InferenceServiceSpec)

      PredictorSpec defines the configuration for a predictor, The following fields follow a “1-of” semantic. Users must specify exactly one spec.

      Field Description
      sklearn
      SKLearnSpec

      Spec for SKLearn model server

      xgboost
      XGBoostSpec

      Spec for XGBoost model server

      tensorflow
      TFServingSpec

      Spec for TFServing (https://github.com/tensorflow/serving)

      pytorch
      TorchServeSpec

      Spec for TorchServe (https://pytorch.org/serve)

      triton
      TritonSpec

      Spec for Triton Inference Server (https://github.com/triton-inference-server/server)

      onnx
      ONNXRuntimeSpec

      Spec for ONNX runtime (https://github.com/microsoft/onnxruntime)

      huggingface
      HuggingFaceRuntimeSpec

      Spec for HuggingFace runtime (https://github.com/huggingface)

      pmml
      PMMLSpec

      Spec for PMML (http://dmg.org/pmml/v4-1/GeneralStructure.html)

      lightgbm
      LightGBMSpec

      Spec for LightGBM model server

      paddle
      PaddleServerSpec

      Spec for Paddle model server (https://github.com/PaddlePaddle/Serving)

      model
      ModelSpec

      Model spec for any arbitrary framework.

      workerSpec
      WorkerSpec

      WorkerSpec for enabling multi-node/multi-gpu

      PodSpec
      PodSpec

      (Members of PodSpec are embedded into this type.)

      This spec is dual purpose.
      1) Provide a full PodSpec for custom predictor. The field PodSpec.Containers is mutually exclusive with other predictors (i.e. TFServing).
      2) Provide a predictor (i.e. TFServing) and specify PodSpec overrides, you must not provide PodSpec.Containers in this case.

      ComponentExtensionSpec
      ComponentExtensionSpec

      (Members of ComponentExtensionSpec are embedded into this type.)

      Component extension defines the deployment configurations for a predictor

      SKLearnSpec

      (Appears on:PredictorSpec)

      SKLearnSpec defines arguments for configuring SKLearn model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      ScaleMetric (string alias)

      (Appears on:ComponentExtensionSpec)

      ScaleMetric enum

      Value Description

      "cpu"

      "concurrency"

      "memory"

      "rps"

      SecurityConfig

      Field Description
      autoMountServiceAccountToken
      bool

      StorageSpec

      (Appears on:ExplainerExtensionSpec, PredictorExtensionSpec)

      Field Description
      path
      string
      (Optional)

      The path to the model object in the storage. It cannot co-exist with the storageURI.

      schemaPath
      string
      (Optional)

      The path to the model schema file in the storage.

      parameters
      map[string]string
      (Optional)

      Parameters to override the default storage credentials and config.

      key
      string
      (Optional)

      The Storage Key in the secret for this model.

      TFServingSpec

      (Appears on:PredictorSpec)

      TFServingSpec defines arguments for configuring Tensorflow model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      TorchServeSpec

      (Appears on:PredictorSpec)

      TorchServeSpec defines arguments for configuring PyTorch model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      TransformerSpec

      (Appears on:InferenceServiceSpec)

      TransformerSpec defines transformer service for pre/post processing

      Field Description
      PodSpec
      PodSpec

      (Members of PodSpec are embedded into this type.)

      This spec is dual purpose.
      1) Provide a full PodSpec for custom transformer. The field PodSpec.Containers is mutually exclusive with other transformers.
      2) Provide a transformer and specify PodSpec overrides, you must not provide PodSpec.Containers in this case.

      ComponentExtensionSpec
      ComponentExtensionSpec

      (Members of ComponentExtensionSpec are embedded into this type.)

      Component extension defines the deployment configurations for a transformer

      TransitionStatus (string alias)

      (Appears on:ModelStatus)

      TransitionStatus enum

      Value Description

      "BlockedByFailedLoad"

      Target model failed to load

      "InProgress"

      Waiting for target model to reach state of active model

      "InvalidSpec"

      Target predictor spec failed validation

      "UpToDate"

      Predictor is up-to-date (reflects current spec)

      TritonSpec

      (Appears on:PredictorSpec)

      TritonSpec defines arguments for configuring Triton model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors

      WorkerSpec

      (Appears on:PredictorSpec)

      Field Description
      PodSpec
      PodSpec

      (Members of PodSpec are embedded into this type.)

      size
      int
      (Optional)

      Configure the number of replicas in the worker set, each worker set represents the unit of scaling

      XGBoostSpec

      (Appears on:PredictorSpec)

      XGBoostSpec defines arguments for configuring XGBoost model serving.

      Field Description
      PredictorExtensionSpec
      PredictorExtensionSpec

      (Members of PredictorExtensionSpec are embedded into this type.)

      Contains fields shared across all predictors


      Generated with gen-crd-api-reference-docs on git commit 7e436424.

      Back to top