Control Plane API

Packages:

serving.kserve.io/v1beta1

Package v1beta1 contains API Schema definitions for the serving v1beta1 API group

Resource Types:

    AIXExplainerSpec

    (Appears on:ExplainerSpec)

    AIXExplainerSpec defines the arguments for configuring an AIX Explanation Server

    Field Description
    type
    AIXExplainerType

    The type of AIX explainer

    ExplainerExtensionSpec
    ExplainerExtensionSpec

    (Members of ExplainerExtensionSpec are embedded into this type.)

    Contains fields shared across all explainers

    AIXExplainerType (string alias)

    (Appears on:AIXExplainerSpec)

    Value Description

    "LimeImages"

    ARTExplainerSpec

    (Appears on:ExplainerSpec)

    ARTExplainerType defines the arguments for configuring an ART Explanation Server

    Field Description
    type
    ARTExplainerType

    The type of ART explainer

    ExplainerExtensionSpec
    ExplainerExtensionSpec

    (Members of ExplainerExtensionSpec are embedded into this type.)

    Contains fields shared across all explainers

    ARTExplainerType (string alias)

    (Appears on:ARTExplainerSpec)

    Value Description

    "SquareAttack"

    AlibiExplainerSpec

    (Appears on:ExplainerSpec)

    AlibiExplainerSpec defines the arguments for configuring an Alibi Explanation Server

    Field Description
    type
    AlibiExplainerType

    The type of Alibi explainer
    Valid values are:
    - “AnchorTabular”;
    - “AnchorImages”;
    - “AnchorText”;
    - “Counterfactuals”;
    - “Contrastive”;

    ExplainerExtensionSpec
    ExplainerExtensionSpec

    (Members of ExplainerExtensionSpec are embedded into this type.)

    Contains fields shared across all explainers

    AlibiExplainerType (string alias)

    (Appears on:AlibiExplainerSpec)

    AlibiExplainerType is the explanation method

    Value Description

    "AnchorImages"

    "AnchorTabular"

    "AnchorText"

    "Contrastive"

    "Counterfactuals"

    Batcher

    (Appears on:ComponentExtensionSpec)

    Batcher specifies optional payload batching available for all components

    Field Description
    maxBatchSize
    int
    (Optional)

    Specifies the max number of requests to trigger a batch

    maxLatency
    int
    (Optional)

    Specifies the max latency to trigger a batch

    timeout
    int
    (Optional)

    Specifies the timeout of a batch

    Component

    Component interface is implemented by all specs that contain component implementations, e.g. PredictorSpec, ExplainerSpec, TransformerSpec.

    ComponentExtensionSpec

    (Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)

    ComponentExtensionSpec defines the deployment configuration for a given InferenceService component

    Field Description
    minReplicas
    int
    (Optional)

    Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.

    maxReplicas
    int
    (Optional)

    Maximum number of replicas for autoscaling.

    scaleTarget
    int
    (Optional)

    ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).

    scaleMetric
    ScaleMetric
    (Optional)

    ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).

    containerConcurrency
    int64
    (Optional)

    ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).

    timeout
    int64
    (Optional)

    TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.

    canaryTrafficPercent
    int64
    (Optional)

    CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision

    logger
    LoggerSpec
    (Optional)

    Activate request/response logging and logger configurations

    batcher
    Batcher
    (Optional)

    Activate request batching and batching configurations

    ComponentImplementation

    ComponentImplementation interface is implemented by predictor, transformer, and explainer implementations

    ComponentStatusSpec

    (Appears on:InferenceServiceStatus)

    ComponentStatusSpec describes the state of the component

    Field Description
    latestReadyRevision
    string
    (Optional)

    Latest revision name that is in ready state

    latestCreatedRevision
    string
    (Optional)

    Latest revision name that is created

    previousRolledoutRevision
    string
    (Optional)

    Previous revision name that is rolled out with 100 percent traffic

    latestRolledoutRevision
    string
    (Optional)

    Latest revision name that is rolled out with 100 percent traffic

    traffic
    []knative.dev/serving/pkg/apis/serving/v1.TrafficTarget
    (Optional)

    Traffic holds the configured traffic distribution for latest ready revision and previous rolled out revision.

    url
    knative.dev/pkg/apis.URL
    (Optional)

    URL holds the primary url that will distribute traffic over the provided traffic targets. This will be one the REST or gRPC endpoints that are available. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}

    restUrl
    knative.dev/pkg/apis.URL
    (Optional)

    REST endpoint of the component if available.

    grpcUrl
    knative.dev/pkg/apis.URL
    (Optional)

    gRPC endpoint of the component if available.

    address
    knative.dev/pkg/apis/duck/v1.Addressable
    (Optional)

    Addressable endpoint for the InferenceService

    ComponentType (string alias)

    ComponentType contains the different types of components of the service

    Value Description

    "explainer"

    "predictor"

    "transformer"

    CustomExplainer

    CustomExplainer defines arguments for configuring a custom explainer.

    Field Description
    PodSpec
    Kubernetes core/v1.PodSpec

    (Members of PodSpec are embedded into this type.)

    CustomPredictor

    CustomPredictor defines arguments for configuring a custom server.

    Field Description
    PodSpec
    Kubernetes core/v1.PodSpec

    (Members of PodSpec are embedded into this type.)

    CustomTransformer

    CustomTransformer defines arguments for configuring a custom transformer.

    Field Description
    PodSpec
    Kubernetes core/v1.PodSpec

    (Members of PodSpec are embedded into this type.)

    DeployConfig

    Field Description
    defaultDeploymentMode
    string

    ExplainerConfig

    (Appears on:ExplainersConfig)

    Field Description
    image
    string

    explainer docker image name

    defaultImageVersion
    string

    default explainer docker image version

    ExplainerExtensionSpec

    (Appears on:AIXExplainerSpec, ARTExplainerSpec, AlibiExplainerSpec)

    ExplainerExtensionSpec defines configuration shared across all explainer frameworks

    Field Description
    storageUri
    string

    The location of a trained explanation model

    runtimeVersion
    string

    Defaults to latest Explainer Version

    config
    map[string]string

    Inline custom parameter settings for explainer

    Container
    Kubernetes core/v1.Container

    (Members of Container are embedded into this type.)

    (Optional)

    Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec.

    storage
    StorageSpec
    (Optional)

    Storage Spec for model location

    ExplainerSpec

    (Appears on:InferenceServiceSpec)

    ExplainerSpec defines the container spec for a model explanation server, The following fields follow a “1-of” semantic. Users must specify exactly one spec.

    Field Description
    alibi
    AlibiExplainerSpec

    Spec for alibi explainer

    aix
    AIXExplainerSpec

    Spec for AIX explainer

    art
    ARTExplainerSpec

    Spec for ART explainer

    PodSpec
    PodSpec

    (Members of PodSpec are embedded into this type.)

    This spec is dual purpose. 1) Users may choose to provide a full PodSpec for their custom explainer. The field PodSpec.Containers is mutually exclusive with other explainers (i.e. Alibi). 2) Users may choose to provide a Explainer (i.e. Alibi) and specify PodSpec overrides in the PodSpec. They must not provide PodSpec.Containers in this case.

    ComponentExtensionSpec
    ComponentExtensionSpec

    (Members of ComponentExtensionSpec are embedded into this type.)

    Component extension defines the deployment configurations for explainer

    ExplainersConfig

    (Appears on:InferenceServicesConfig)

    Field Description
    alibi
    ExplainerConfig
    aix
    ExplainerConfig
    art
    ExplainerConfig

    FailureInfo

    (Appears on:ModelStatus)

    Field Description
    location
    string
    (Optional)

    Name of component to which the failure relates (usually Pod name)

    reason
    FailureReason
    (Optional)

    High level class of failure

    message
    string
    (Optional)

    Detailed error message

    modelRevisionName
    string
    (Optional)

    Internal Revision/ID of model, tied to specific Spec contents

    time
    Kubernetes meta/v1.Time
    (Optional)

    Time failure occurred or was discovered

    FailureReason (string alias)

    (Appears on:FailureInfo)

    FailureReason enum

    Value Description

    "InvalidPredictorSpec"

    The current Predictor Spec is invalid or unsupported

    "ModelLoadFailed"

    The model failed to load within a ServingRuntime container

    "NoSupportingRuntime"

    There are no ServingRuntime which support the specified model type

    "RuntimeDisabled"

    The ServingRuntime is disabled

    "RuntimeNotRecognized"

    There is no ServingRuntime defined with the specified runtime name

    "RuntimeUnhealthy"

    Corresponding ServingRuntime containers failed to start or are unhealthy

    InferenceService

    InferenceService is the Schema for the InferenceServices API

    Field Description
    metadata
    Kubernetes meta/v1.ObjectMeta
    Refer to the Kubernetes API documentation for the fields of the metadata field.
    spec
    InferenceServiceSpec


    predictor
    PredictorSpec

    Predictor defines the model serving spec

    explainer
    ExplainerSpec
    (Optional)

    Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified.

    transformer
    TransformerSpec
    (Optional)

    Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service.

    status
    InferenceServiceStatus

    InferenceServiceSpec

    (Appears on:InferenceService)

    InferenceServiceSpec is the top level type for this resource

    Field Description
    predictor
    PredictorSpec

    Predictor defines the model serving spec

    explainer
    ExplainerSpec
    (Optional)

    Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified.

    transformer
    TransformerSpec
    (Optional)

    Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service.

    InferenceServiceStatus

    (Appears on:InferenceService)

    InferenceServiceStatus defines the observed state of InferenceService

    Field Description
    Status
    knative.dev/pkg/apis/duck/v1.Status

    (Members of Status are embedded into this type.)

    Conditions for the InferenceService
    - PredictorReady: predictor readiness condition;
    - TransformerReady: transformer readiness condition;
    - ExplainerReady: explainer readiness condition;
    - RoutesReady: aggregated routing condition;
    - Ready: aggregated condition;

    address
    knative.dev/pkg/apis/duck/v1.Addressable
    (Optional)

    Addressable endpoint for the InferenceService

    url
    knative.dev/pkg/apis.URL
    (Optional)

    URL holds the url that will distribute traffic over the provided traffic targets. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}

    components
    map[kserve.io/v1beta1/pkg/apis/serving/v1beta1.ComponentType]kserve.io/v1beta1/pkg/apis/serving/v1beta1.ComponentStatusSpec

    Statuses for the components of the InferenceService

    modelStatus
    ModelStatus

    Model related statuses

    InferenceServicesConfig

    Field Description
    transformers
    TransformersConfig

    Transformer configurations

    predictors
    PredictorsConfig

    Predictor configurations

    explainers
    ExplainersConfig

    Explainer configurations

    IngressConfig

    Field Description
    ingressGateway
    string
    ingressService
    string
    localGateway
    string
    localGatewayService
    string
    ingressDomain
    string
    ingressClassName
    string
    domainTemplate
    string
    urlScheme
    string

    LightGBMSpec

    (Appears on:PredictorSpec)

    LightGBMSpec defines arguments for configuring LightGBMSpec model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors

    LoggerSpec

    (Appears on:ComponentExtensionSpec)

    LoggerSpec specifies optional payload logging available for all components

    Field Description
    url
    string
    (Optional)

    URL to send logging events

    mode
    LoggerType
    (Optional)

    Specifies the scope of the loggers.
    Valid values are:
    - “all” (default): log both request and response;
    - “request”: log only request;
    - “response”: log only response

    LoggerType (string alias)

    (Appears on:LoggerSpec)

    LoggerType controls the scope of log publishing

    Value Description

    "all"

    Logger mode to log both request and response

    "request"

    Logger mode to log only request

    "response"

    Logger mode to log only response

    ModelCopies

    (Appears on:ModelStatus)

    Field Description
    failedCopies
    int

    How many copies of this predictor’s models failed to load recently

    totalCopies
    int
    (Optional)

    Total number copies of this predictor’s models that are currently loaded

    ModelFormat

    (Appears on:ModelSpec)

    Field Description
    name
    string

    Name of the model format.

    version
    string
    (Optional)

    Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”.

    ModelRevisionStates

    (Appears on:ModelStatus)

    Field Description
    activeModelState
    ModelState

    High level state string: Pending, Standby, Loading, Loaded, FailedToLoad

    targetModelState
    ModelState

    ModelSpec

    (Appears on:PredictorSpec)

    Field Description
    modelFormat
    ModelFormat

    ModelFormat being served.

    runtime
    string
    (Optional)

    Specific ClusterServingRuntime/ServingRuntime name to use for deployment.

    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    ModelState (string alias)

    (Appears on:ModelRevisionStates)

    ModelState enum

    Value Description

    "FailedToLoad"

    All copies of the model failed to load

    "Loaded"

    At least one copy of the model is loaded

    "Loading"

    Model is loading

    "Pending"

    Model is not yet registered

    "Standby"

    Model is available but not loaded (will load when used)

    ModelStatus

    (Appears on:InferenceServiceStatus)

    Field Description
    transitionStatus
    TransitionStatus

    Whether the available predictor endpoints reflect the current Spec or is in transition

    states
    ModelRevisionStates
    (Optional)

    State information of the predictor’s model.

    lastFailureInfo
    FailureInfo
    (Optional)

    Details of last failure, when load of target model is failed or blocked.

    copies
    ModelCopies
    (Optional)

    Model copy information of the predictor’s model.

    ONNXRuntimeSpec

    (Appears on:PredictorSpec)

    ONNXRuntimeSpec defines arguments for configuring ONNX model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors

    PMMLSpec

    (Appears on:PredictorSpec)

    PMMLSpec defines arguments for configuring PMML model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors

    PaddleServerSpec

    (Appears on:PredictorSpec)

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    PodSpec

    (Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)

    PodSpec is a description of a pod.

    Field Description
    volumes
    []Kubernetes core/v1.Volume
    (Optional)

    List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes

    initContainers
    []Kubernetes core/v1.Container

    List of initialization containers belonging to the pod. Init containers are executed in order prior to containers being started. If any init container fails, the pod is considered to have failed and is handled according to its restartPolicy. The name for an init container or normal container must be unique among all containers. Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes. The resourceRequirements of an init container are taken into account during scheduling by finding the highest request/limit for each resource type, and then using the max of of that value or the sum of the normal containers. Limits are applied to init containers in a similar fashion. Init containers cannot currently be added or removed. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

    containers
    []Kubernetes core/v1.Container

    List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.

    ephemeralContainers
    []Kubernetes core/v1.EphemeralContainer
    (Optional)

    List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing pod to perform user-initiated actions such as debugging. This list cannot be specified when creating a pod, and it cannot be modified by updating the pod spec. In order to add an ephemeral container to an existing pod, use the pod’s ephemeralcontainers subresource. This field is alpha-level and is only honored by servers that enable the EphemeralContainers feature.

    restartPolicy
    Kubernetes core/v1.RestartPolicy
    (Optional)

    Restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to Always. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

    terminationGracePeriodSeconds
    int64
    (Optional)

    Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates delete immediately. If this value is nil, the default grace period will be used instead. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. Defaults to 30 seconds.

    activeDeadlineSeconds
    int64
    (Optional)

    Optional duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers. Value must be a positive integer.

    dnsPolicy
    Kubernetes core/v1.DNSPolicy
    (Optional)

    Set DNS policy for the pod. Defaults to “ClusterFirst”. Valid values are ‘ClusterFirstWithHostNet’, ‘ClusterFirst’, ‘Default’ or ‘None’. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to ‘ClusterFirstWithHostNet’.

    nodeSelector
    map[string]string
    (Optional)

    NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

    serviceAccountName
    string
    (Optional)

    ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

    serviceAccount
    string
    (Optional)

    DeprecatedServiceAccount is a depreciated alias for ServiceAccountName. Deprecated: Use serviceAccountName instead.

    automountServiceAccountToken
    bool
    (Optional)

    AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.

    nodeName
    string
    (Optional)

    NodeName is a request to schedule this pod onto a specific node. If it is non-empty, the scheduler simply schedules this pod onto that node, assuming that it fits resource requirements.

    hostNetwork
    bool
    (Optional)

    Host networking requested for this pod. Use the host’s network namespace. If this option is set, the ports that will be used must be specified. Default to false.

    hostPID
    bool
    (Optional)

    Use the host’s pid namespace. Optional: Default to false.

    hostIPC
    bool
    (Optional)

    Use the host’s ipc namespace. Optional: Default to false.

    shareProcessNamespace
    bool
    (Optional)

    Share a single process namespace between all of the containers in a pod. When this is set containers will be able to view and signal processes from other containers in the same pod, and the first process in each container will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both be set. Optional: Default to false.

    securityContext
    Kubernetes core/v1.PodSecurityContext
    (Optional)

    SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field.

    imagePullSecrets
    []Kubernetes core/v1.LocalObjectReference
    (Optional)

    ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod

    hostname
    string
    (Optional)

    Specifies the hostname of the Pod If not specified, the pod’s hostname will be set to a system-defined value.

    subdomain
    string
    (Optional)

    If specified, the fully qualified Pod hostname will be “...svc.”. If not specified, the pod will not have a domainname at all.

    affinity
    Kubernetes core/v1.Affinity
    (Optional)

    If specified, the pod’s scheduling constraints

    schedulerName
    string
    (Optional)

    If specified, the pod will be dispatched by specified scheduler. If not specified, the pod will be dispatched by default scheduler.

    tolerations
    []Kubernetes core/v1.Toleration
    (Optional)

    If specified, the pod’s tolerations.

    hostAliases
    []Kubernetes core/v1.HostAlias
    (Optional)

    HostAliases is an optional list of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods.

    priorityClassName
    string
    (Optional)

    If specified, indicates the pod’s priority. “system-node-critical” and “system-cluster-critical” are two special keywords which indicate the highest priorities with the former being the highest priority. Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default.

    priority
    int32
    (Optional)

    The priority value. Various system components use this field to find the priority of the pod. When Priority Admission Controller is enabled, it prevents users from setting this field. The admission controller populates this field from PriorityClassName. The higher the value, the higher the priority.

    dnsConfig
    Kubernetes core/v1.PodDNSConfig
    (Optional)

    Specifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy.

    readinessGates
    []Kubernetes core/v1.PodReadinessGate
    (Optional)

    If specified, all readiness gates will be evaluated for pod readiness. A pod is ready when all its containers are ready AND all conditions specified in the readiness gates have status equal to “True” More info: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/580-pod-readiness-gates

    runtimeClassName
    string
    (Optional)

    RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the “legacy” RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/585-runtime-class This is a beta feature as of Kubernetes v1.14.

    enableServiceLinks
    bool
    (Optional)

    EnableServiceLinks indicates whether information about services should be injected into pod’s environment variables, matching the syntax of Docker links. Optional: Defaults to true.

    preemptionPolicy
    Kubernetes core/v1.PreemptionPolicy
    (Optional)

    PreemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset. This field is beta-level, gated by the NonPreemptingPriority feature-gate.

    overhead
    Kubernetes core/v1.ResourceList
    (Optional)

    Overhead represents the resource overhead associated with running a pod for a given RuntimeClass. This field will be autopopulated at admission time by the RuntimeClass admission controller. If the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests. The RuntimeClass admission controller will reject Pod create requests which have the overhead already set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero. More info: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/688-pod-overhead This field is alpha-level as of Kubernetes v1.16, and is only honored by servers that enable the PodOverhead feature.

    topologySpreadConstraints
    []Kubernetes core/v1.TopologySpreadConstraint
    (Optional)

    TopologySpreadConstraints describes how a group of pods ought to spread across topology domains. Scheduler will schedule pods in a way which abides by the constraints. All topologySpreadConstraints are ANDed.

    setHostnameAsFQDN
    bool
    (Optional)

    If true the pod’s hostname will be configured as the pod’s FQDN, rather than the leaf name (the default). In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname). In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to FQDN. If a pod does not have FQDN, this has no effect. Default to false.

    PredictorConfig

    (Appears on:PredictorProtocols, PredictorsConfig)

    Field Description
    image
    string

    predictor docker image name

    defaultImageVersion
    string

    default predictor docker image version on cpu

    defaultGpuImageVersion
    string

    default predictor docker image version on gpu

    defaultTimeout,string
    int64

    Default timeout of predictor for serving a request, in seconds

    multiModelServer,boolean
    bool

    Flag to determine if multi-model serving is supported

    supportedFrameworks
    []string

    frameworks the model agent is able to run

    PredictorExtensionSpec

    (Appears on:LightGBMSpec, ModelSpec, ONNXRuntimeSpec, PMMLSpec, PaddleServerSpec, SKLearnSpec, TFServingSpec, TorchServeSpec, TritonSpec, XGBoostSpec)

    PredictorExtensionSpec defines configuration shared across all predictor frameworks

    Field Description
    storageUri
    string
    (Optional)

    This field points to the location of the trained model which is mounted onto the pod.

    runtimeVersion
    string
    (Optional)

    Runtime version of the predictor docker image

    protocolVersion
    github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol
    (Optional)

    Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2)

    Container
    Kubernetes core/v1.Container

    (Members of Container are embedded into this type.)

    (Optional)

    Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec.

    storage
    StorageSpec
    (Optional)

    Storage Spec for model location

    PredictorImplementation

    PredictorImplementation defines common functions for all predictors e.g Tensorflow, Triton, etc

    PredictorProtocols

    (Appears on:PredictorsConfig)

    Field Description
    v1
    PredictorConfig
    v2
    PredictorConfig

    PredictorSpec

    (Appears on:InferenceServiceSpec)

    PredictorSpec defines the configuration for a predictor, The following fields follow a “1-of” semantic. Users must specify exactly one spec.

    Field Description
    sklearn
    SKLearnSpec

    Spec for SKLearn model server

    xgboost
    XGBoostSpec

    Spec for XGBoost model server

    tensorflow
    TFServingSpec

    Spec for TFServing (https://github.com/tensorflow/serving)

    pytorch
    TorchServeSpec

    Spec for TorchServe (https://pytorch.org/serve)

    triton
    TritonSpec

    Spec for Triton Inference Server (https://github.com/triton-inference-server/server)

    onnx
    ONNXRuntimeSpec

    Spec for ONNX runtime (https://github.com/microsoft/onnxruntime)

    pmml
    PMMLSpec

    Spec for PMML (http://dmg.org/pmml/v4-1/GeneralStructure.html)

    lightgbm
    LightGBMSpec

    Spec for LightGBM model server

    paddle
    PaddleServerSpec

    Spec for Paddle model server (https://github.com/PaddlePaddle/Serving)

    model
    ModelSpec

    Model spec for any arbitrary framework.

    PodSpec
    PodSpec

    (Members of PodSpec are embedded into this type.)

    This spec is dual purpose.
    1) Provide a full PodSpec for custom predictor. The field PodSpec.Containers is mutually exclusive with other predictors (i.e. TFServing).
    2) Provide a predictor (i.e. TFServing) and specify PodSpec overrides, you must not provide PodSpec.Containers in this case.

    ComponentExtensionSpec
    ComponentExtensionSpec

    (Members of ComponentExtensionSpec are embedded into this type.)

    Component extension defines the deployment configurations for a predictor

    PredictorsConfig

    (Appears on:InferenceServicesConfig)

    Field Description
    tensorflow
    PredictorConfig
    triton
    PredictorConfig
    xgboost
    PredictorProtocols
    sklearn
    PredictorProtocols
    pytorch
    PredictorConfig
    onnx
    PredictorConfig
    pmml
    PredictorConfig
    lightgbm
    PredictorConfig
    paddle
    PredictorConfig

    SKLearnSpec

    (Appears on:PredictorSpec)

    SKLearnSpec defines arguments for configuring SKLearn model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors

    ScaleMetric (string alias)

    (Appears on:ComponentExtensionSpec)

    ScaleMetric enum

    Value Description

    "cpu"

    "concurrency"

    "memory"

    "rps"

    StorageSpec

    (Appears on:ExplainerExtensionSpec, PredictorExtensionSpec)

    Field Description
    path
    string
    (Optional)

    The path to the model object in the storage. It cannot co-exist with the storageURI.

    schemaPath
    string
    (Optional)

    The path to the model schema file in the storage.

    parameters
    map[string]string
    (Optional)

    Parameters to override the default storage credentials and config.

    key
    string
    (Optional)

    The Storage Key in the secret for this model.

    TFServingSpec

    (Appears on:PredictorSpec)

    TFServingSpec defines arguments for configuring Tensorflow model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors

    TorchServeSpec

    (Appears on:PredictorSpec)

    TorchServeSpec defines arguments for configuring PyTorch model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors

    TransformerConfig

    (Appears on:TransformersConfig)

    Field Description
    image
    string

    transformer docker image name

    defaultImageVersion
    string

    default transformer docker image version

    TransformerSpec

    (Appears on:InferenceServiceSpec)

    TransformerSpec defines transformer service for pre/post processing

    Field Description
    PodSpec
    PodSpec

    (Members of PodSpec are embedded into this type.)

    This spec is dual purpose.
    1) Provide a full PodSpec for custom transformer. The field PodSpec.Containers is mutually exclusive with other transformers.
    2) Provide a transformer and specify PodSpec overrides, you must not provide PodSpec.Containers in this case.

    ComponentExtensionSpec
    ComponentExtensionSpec

    (Members of ComponentExtensionSpec are embedded into this type.)

    Component extension defines the deployment configurations for a transformer

    TransformersConfig

    (Appears on:InferenceServicesConfig)

    Field Description
    feast
    TransformerConfig

    TransitionStatus (string alias)

    (Appears on:ModelStatus)

    TransitionStatus enum

    Value Description

    "BlockedByFailedLoad"

    Target model failed to load

    "InProgress"

    Waiting for target model to reach state of active model

    "InvalidSpec"

    Target predictor spec failed validation

    "UpToDate"

    Predictor is up-to-date (reflects current spec)

    TritonSpec

    (Appears on:PredictorSpec)

    TritonSpec defines arguments for configuring Triton model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors

    XGBoostSpec

    (Appears on:PredictorSpec)

    XGBoostSpec defines arguments for configuring XGBoost model serving.

    Field Description
    PredictorExtensionSpec
    PredictorExtensionSpec

    (Members of PredictorExtensionSpec are embedded into this type.)

    Contains fields shared across all predictors


    Generated with gen-crd-api-reference-docs on git commit 133ecebb.

    Back to top