Control Plane API
Packages:
serving.kserve.io/v1alpha1
Package v1alpha1 contains API Schema definitions for the serving v1alpha1 API group
Resource Types:
BuiltInAdapter
(Appears on:ServingRuntimeSpec)
Field | Description |
---|---|
serverType ServerType |
ServerType must be one of the supported built-in types such as “triton” or “mlserver”, and the runtime’s container must have the same name |
runtimeManagementPort int |
Port which the runtime server listens for model management requests |
memBufferBytes int |
Fixed memory overhead to subtract from runtime container’s memory allocation to determine model capacity |
modelLoadingTimeoutMillis int |
Timeout for model loading operations in milliseconds |
env []Kubernetes core/v1.EnvVar |
Environment variables used to control other aspects of the built-in adapter’s behaviour (uncommon) |
ClusterServingRuntime
ClusterServingRuntime is the Schema for the servingruntimes API
Field | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
metadata Kubernetes meta/v1.ObjectMeta |
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||||||||||||||||||||
spec ServingRuntimeSpec |
|
||||||||||||||||||||||
status ServingRuntimeStatus |
ClusterStorageContainer
Field | Description | ||||
---|---|---|---|---|---|
metadata Kubernetes meta/v1.ObjectMeta |
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||
spec StorageContainerSpec |
|
||||
disabled bool |
(Optional) |
InferenceGraph
InferenceGraph is the Schema for the InferenceGraph API for multiple models
Field | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
metadata Kubernetes meta/v1.ObjectMeta |
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||||||||||||||
spec InferenceGraphSpec |
|
||||||||||||||||
status InferenceGraphStatus |
InferenceGraphSpec
(Appears on:InferenceGraph)
InferenceGraphSpec defines the InferenceGraph spec
Field | Description |
---|---|
nodes map[string]kserve.io/serving/pkg/apis/serving/v1alpha1.InferenceRouter |
Map of InferenceGraph router nodes Each node defines the router which can be different routing types |
resources Kubernetes core/v1.ResourceRequirements |
(Optional) |
affinity Kubernetes core/v1.Affinity |
(Optional) |
timeout int64 |
(Optional)
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component. |
minReplicas int |
(Optional)
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero. |
maxReplicas int |
(Optional)
Maximum number of replicas for autoscaling. |
scaleTarget int |
(Optional)
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/). |
scaleMetric ScaleMetric |
(Optional)
ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics). |
InferenceGraphStatus
(Appears on:InferenceGraph)
InferenceGraphStatus defines the InferenceGraph conditions and status
Field | Description |
---|---|
Status knative.dev/pkg/apis/duck/v1.Status |
(Members of Conditions for InferenceGraph |
url knative.dev/pkg/apis.URL |
(Optional)
Url for the InferenceGraph |
InferenceRouter
(Appears on:InferenceGraphSpec)
InferenceRouter defines the router for each InferenceGraph node with one or multiple steps
kind: InferenceGraph
metadata:
name: canary-route
spec:
nodes:
root:
routerType: Splitter
routes:
- service: mymodel1
weight: 20
- service: mymodel2
weight: 80
kind: InferenceGraph
metadata:
name: abtest
spec:
nodes:
mymodel:
routerType: Switch
routes:
- service: mymodel1
condition: "{ .input.userId == 1 }"
- service: mymodel2
condition: "{ .input.userId == 2 }"
Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods.
Tree Ensemble constitutes a case where simple algorithms for combining results of either classification or regression trees are well known. Multiple classification trees, for example, are commonly combined using a “majority-vote” method. Multiple regression trees are often combined using various averaging techniques. e.g tagging models with segment identifiers and weights to be used for their combination in these ways.
kind: InferenceGraph
metadata:
name: ensemble
spec:
nodes:
root:
routerType: Sequence
routes:
- service: feast
- nodeName: ensembleModel
data: $response
ensembleModel:
routerType: Ensemble
routes:
- service: sklearn-model
- service: xgboost-model
Scoring a case using a sequence, or chain of models allows the output of one model to be passed in as input to the subsequent models.
kind: InferenceGraph
metadata:
name: model-chainer
spec:
nodes:
root:
routerType: Sequence
routes:
- service: mymodel-s1
- service: mymodel-s2
data: $response
- service: mymodel-s3
data: $response
In the flow described below, the pre_processing node base64 encodes the image and passes it to two model nodes in the flow. The encoded data is available to both these nodes for classification. The second node i.e. dog-breed-classification takes the original input from the pre_processing node along-with the response from the cat-dog-classification node to do further classification of the dog breed if required.
kind: InferenceGraph
metadata:
name: dog-breed-classification
spec:
nodes:
root:
routerType: Sequence
routes:
- service: cat-dog-classifier
- nodeName: breed-classifier
data: $request
breed-classifier:
routerType: Switch
routes:
- service: dog-breed-classifier
condition: { .predictions.class == "dog" }
- service: cat-breed-classifier
condition: { .predictions.class == "cat" }
Field | Description |
---|---|
routerType InferenceRouterType |
RouterType
|
steps []InferenceStep |
(Optional)
Steps defines destinations for the current router node |
InferenceRouterType
(string
alias)
(Appears on:InferenceRouter)
InferenceRouterType constant for inference routing types
Value | Description |
---|---|
"Ensemble" |
Ensemble router routes the requests to multiple models and then merge the responses |
"Sequence" |
Sequence Default type only route to one destination |
"Splitter" |
Splitter router randomly routes the requests to the named service according to the weight |
"Switch" |
Switch routes the request to the model based on certain condition |
InferenceStep
(Appears on:InferenceRouter)
InferenceStep defines the inference target of the current step with condition, weights and data.
Field | Description |
---|---|
name string |
(Optional)
Unique name for the step within this node |
InferenceTarget InferenceTarget |
(Members of Node or service used to process this step |
data string |
(Optional)
request data sent to the next route with input/output from the previous step $request $response.predictions |
weight int64 |
(Optional)
the weight for split of the traffic, only used for Split Router when weight is specified all the routing targets should be sum to 100 |
condition string |
(Optional)
routing based on the condition |
dependency InferenceStepDependencyType |
(Optional)
to decide whether a step is a hard or a soft dependency in the Inference Graph |
InferenceStepDependencyType
(string
alias)
(Appears on:InferenceStep)
InferenceStepDependencyType constant for inference step dependency
Value | Description |
---|---|
"Hard" |
Hard |
"Soft" |
Soft |
InferenceTarget
(Appears on:InferenceStep)
Exactly one InferenceTarget field must be specified
Field | Description |
---|---|
nodeName string |
(Optional)
The node name for routing as next step |
serviceName string |
named reference for InferenceService |
serviceUrl string |
(Optional)
InferenceService URL, mutually exclusive with ServiceName |
ModelSpec
(Appears on:TrainedModelSpec)
ModelSpec describes a TrainedModel
Field | Description |
---|---|
storageUri string |
Storage URI for the model repository |
framework string |
Machine Learning |
memory k8s.io/apimachinery/pkg/api/resource.Quantity |
Maximum memory this model will consume, this field is used to decide if a model server has enough memory to load this model. |
ScaleMetric
(string
alias)
(Appears on:InferenceGraphSpec)
ScaleMetric enum
ServerType
(string
alias)
(Appears on:BuiltInAdapter)
ServerType constant for specifying the runtime name
Value | Description |
---|---|
"mlserver" |
Model server is MLServer |
"ovms" |
Model server is OpenVino Model Server |
"triton" |
Model server is Triton |
ServingRuntime
ServingRuntime is the Schema for the servingruntimes API
Field | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
metadata Kubernetes meta/v1.ObjectMeta |
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||||||||||||||||||||
spec ServingRuntimeSpec |
|
||||||||||||||||||||||
status ServingRuntimeStatus |
ServingRuntimePodSpec
(Appears on:ServingRuntimeSpec)
Field | Description |
---|---|
containers []Kubernetes core/v1.Container |
List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated. |
volumes []Kubernetes core/v1.Volume |
(Optional)
List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes |
nodeSelector map[string]string |
(Optional)
NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ |
affinity Kubernetes core/v1.Affinity |
(Optional)
If specified, the pod’s scheduling constraints |
tolerations []Kubernetes core/v1.Toleration |
(Optional)
If specified, the pod’s tolerations. |
labels map[string]string |
(Optional)
Labels that will be add to the pod. More info: http://kubernetes.io/docs/user-guide/labels |
annotations map[string]string |
(Optional)
Annotations that will be add to the pod. More info: http://kubernetes.io/docs/user-guide/annotations |
imagePullSecrets []Kubernetes core/v1.LocalObjectReference |
(Optional)
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod |
ServingRuntimeSpec
(Appears on:ClusterServingRuntime, ServingRuntime, SupportedRuntime)
ServingRuntimeSpec defines the desired state of ServingRuntime. This spec is currently provisional and are subject to change as details regarding single-model serving and multi-model serving are hammered out.
Field | Description |
---|---|
supportedModelFormats []SupportedModelFormat |
Model formats and version supported by this runtime |
multiModel bool |
(Optional)
Whether this ServingRuntime is intended for multi-model usage or not. |
disabled bool |
(Optional)
Set to true to disable use of this runtime |
protocolVersions []github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol |
(Optional)
Supported protocol versions (i.e. v1 or v2 or grpc-v1 or grpc-v2) |
ServingRuntimePodSpec ServingRuntimePodSpec |
(Members of |
grpcEndpoint string |
(Optional)
Grpc endpoint for internal model-management (implementing mmesh.ModelRuntime gRPC service) Assumed to be single-model runtime if omitted |
grpcDataEndpoint string |
(Optional)
Grpc endpoint for inferencing |
httpDataEndpoint string |
(Optional)
HTTP endpoint for inferencing |
replicas uint16 |
(Optional)
Configure the number of replicas in the Deployment generated by this ServingRuntime If specified, this overrides the podsPerRuntime configuration value |
storageHelper StorageHelper |
(Optional)
Configuration for this runtime’s use of the storage helper (model puller) It is enabled unless explicitly disabled |
builtInAdapter BuiltInAdapter |
(Optional)
Provide the details about built-in runtime adapter |
ServingRuntimeStatus
(Appears on:ClusterServingRuntime, ServingRuntime)
ServingRuntimeStatus defines the observed state of ServingRuntime
StorageContainerSpec
(Appears on:ClusterStorageContainer)
StorageContainerSpec defines the container spec for the storage initializer init container, and the protocols it supports.
Field | Description |
---|---|
container Kubernetes core/v1.Container |
Container spec for the storage initializer init container |
supportedUriFormats []SupportedUriFormat |
List of URI formats that this container supports |
StorageHelper
(Appears on:ServingRuntimeSpec)
Field | Description |
---|---|
disabled bool |
(Optional) |
SupportedModelFormat
(Appears on:ServingRuntimeSpec)
Field | Description |
---|---|
name string |
Name of the model format. |
version string |
(Optional)
Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”. |
autoSelect bool |
(Optional)
Set to true to allow the ServingRuntime to be used for automatic model placement if this model format is specified with no explicit runtime. |
priority int32 |
(Optional)
Priority of this serving runtime for auto selection. This is used to select the serving runtime if more than one serving runtime supports the same model format. The value should be greater than zero. The higher the value, the higher the priority. Priority is not considered if AutoSelect is either false or not specified. Priority can be overridden by specifying the runtime in the InferenceService. |
SupportedRuntime
SupportedRuntime is the schema for supported runtime result of automatic selection
Field | Description |
---|---|
Name string |
|
Spec ServingRuntimeSpec |
SupportedUriFormat
(Appears on:StorageContainerSpec)
SupportedUriFormat can be either prefix or regex. Todo: Add validation that only one of them is set.
Field | Description |
---|---|
prefix string |
|
regex string |
TrainedModel
TrainedModel is the Schema for the TrainedModel API
Field | Description | ||||
---|---|---|---|---|---|
metadata Kubernetes meta/v1.ObjectMeta |
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||
spec TrainedModelSpec |
|
||||
status TrainedModelStatus |
TrainedModelSpec
(Appears on:TrainedModel)
TrainedModelSpec defines the TrainedModel spec
Field | Description |
---|---|
inferenceService string |
parent inference service to deploy to |
model ModelSpec |
Predictor model spec |
TrainedModelStatus
(Appears on:TrainedModel)
TrainedModelStatus defines the observed state of TrainedModel
Field | Description |
---|---|
Status knative.dev/pkg/apis/duck/v1.Status |
(Members of Conditions for trained model |
url knative.dev/pkg/apis.URL |
URL holds the url that will distribute traffic over the provided traffic targets.
For v1: http[s]://{route-name}.{route-namespace}.{cluster-level-suffix}/v1/models/ |
address knative.dev/pkg/apis/duck/v1.Addressable |
Addressable endpoint for the deployed trained model
http:// |
Generated with gen-crd-api-reference-docs
on git commit 426fe21d
.
serving.kserve.io/v1beta1
Package v1beta1 contains API Schema definitions for the serving v1beta1 API group
Resource Types:
ARTExplainerSpec
(Appears on:ExplainerSpec)
ARTExplainerType defines the arguments for configuring an ART Explanation Server
Field | Description |
---|---|
type ARTExplainerType |
The type of ART explainer |
ExplainerExtensionSpec ExplainerExtensionSpec |
(Members of Contains fields shared across all explainers |
ARTExplainerType
(string
alias)
(Appears on:ARTExplainerSpec)
Value | Description |
---|---|
"SquareAttack" |
AlibiExplainerSpec
(Appears on:ExplainerSpec)
AlibiExplainerSpec defines the arguments for configuring an Alibi Explanation Server
Field | Description |
---|---|
type AlibiExplainerType |
The type of Alibi explainer |
ExplainerExtensionSpec ExplainerExtensionSpec |
(Members of Contains fields shared across all explainers |
AlibiExplainerType
(string
alias)
(Appears on:AlibiExplainerSpec)
AlibiExplainerType is the explanation method
Value | Description |
---|---|
"AnchorImages" |
|
"AnchorTabular" |
|
"AnchorText" |
|
"Contrastive" |
|
"Counterfactuals" |
Batcher
(Appears on:ComponentExtensionSpec)
Batcher specifies optional payload batching available for all components
Field | Description |
---|---|
maxBatchSize int |
(Optional)
Specifies the max number of requests to trigger a batch |
maxLatency int |
(Optional)
Specifies the max latency to trigger a batch |
timeout int |
(Optional)
Specifies the timeout of a batch |
Component
Component interface is implemented by all specs that contain component implementations, e.g. PredictorSpec, ExplainerSpec, TransformerSpec.
ComponentExtensionSpec
(Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)
ComponentExtensionSpec defines the deployment configuration for a given InferenceService component
Field | Description |
---|---|
minReplicas int |
(Optional)
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero. |
maxReplicas int |
(Optional)
Maximum number of replicas for autoscaling. |
scaleTarget int |
(Optional)
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/). |
scaleMetric ScaleMetric |
(Optional)
ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics). |
containerConcurrency int64 |
(Optional)
ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency(https://knative.dev/docs/serving/autoscaling/concurrency). |
timeout int64 |
(Optional)
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component. |
canaryTrafficPercent int64 |
(Optional)
CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision |
logger LoggerSpec |
(Optional)
Activate request/response logging and logger configurations |
batcher Batcher |
(Optional)
Activate request batching and batching configurations |
labels map[string]string |
(Optional)
Labels that will be add to the component pod. More info: http://kubernetes.io/docs/user-guide/labels |
annotations map[string]string |
(Optional)
Annotations that will be add to the component pod. More info: http://kubernetes.io/docs/user-guide/annotations |
ComponentImplementation
ComponentImplementation interface is implemented by predictor, transformer, and explainer implementations
ComponentStatusSpec
(Appears on:InferenceServiceStatus)
ComponentStatusSpec describes the state of the component
Field | Description |
---|---|
latestReadyRevision string |
(Optional)
Latest revision name that is in ready state |
latestCreatedRevision string |
(Optional)
Latest revision name that is created |
previousRolledoutRevision string |
(Optional)
Previous revision name that is rolled out with 100 percent traffic |
latestRolledoutRevision string |
(Optional)
Latest revision name that is rolled out with 100 percent traffic |
traffic []knative.dev/serving/pkg/apis/serving/v1.TrafficTarget |
(Optional)
Traffic holds the configured traffic distribution for latest ready revision and previous rolled out revision. |
url knative.dev/pkg/apis.URL |
(Optional)
URL holds the primary url that will distribute traffic over the provided traffic targets. This will be one the REST or gRPC endpoints that are available. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix} |
restUrl knative.dev/pkg/apis.URL |
(Optional)
REST endpoint of the component if available. |
grpcUrl knative.dev/pkg/apis.URL |
(Optional)
gRPC endpoint of the component if available. |
address knative.dev/pkg/apis/duck/v1.Addressable |
(Optional)
Addressable endpoint for the InferenceService |
ComponentType
(string
alias)
ComponentType contains the different types of components of the service
Value | Description |
---|---|
"explainer" |
|
"predictor" |
|
"transformer" |
CustomExplainer
CustomExplainer defines arguments for configuring a custom explainer.
Field | Description |
---|---|
PodSpec Kubernetes core/v1.PodSpec |
(Members of |
CustomPredictor
CustomPredictor defines arguments for configuring a custom server.
Field | Description |
---|---|
PodSpec Kubernetes core/v1.PodSpec |
(Members of |
CustomTransformer
CustomTransformer defines arguments for configuring a custom transformer.
Field | Description |
---|---|
PodSpec Kubernetes core/v1.PodSpec |
(Members of |
DeployConfig
Field | Description |
---|---|
defaultDeploymentMode string |
ExplainerConfig
(Appears on:ExplainersConfig)
Field | Description |
---|---|
image string |
explainer docker image name |
defaultImageVersion string |
default explainer docker image version |
ExplainerExtensionSpec
(Appears on:ARTExplainerSpec, AlibiExplainerSpec)
ExplainerExtensionSpec defines configuration shared across all explainer frameworks
Field | Description |
---|---|
storageUri string |
The location of a trained explanation model |
runtimeVersion string |
Defaults to latest Explainer Version |
config map[string]string |
Inline custom parameter settings for explainer |
Container Kubernetes core/v1.Container |
(Members of Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec. |
storage StorageSpec |
(Optional)
Storage Spec for model location |
ExplainerSpec
(Appears on:InferenceServiceSpec)
ExplainerSpec defines the container spec for a model explanation server, The following fields follow a “1-of” semantic. Users must specify exactly one spec.
Field | Description |
---|---|
alibi AlibiExplainerSpec |
Spec for alibi explainer |
art ARTExplainerSpec |
Spec for ART explainer |
PodSpec PodSpec |
(Members of This spec is dual purpose. 1) Users may choose to provide a full PodSpec for their custom explainer. The field PodSpec.Containers is mutually exclusive with other explainers (i.e. Alibi). 2) Users may choose to provide a Explainer (i.e. Alibi) and specify PodSpec overrides in the PodSpec. They must not provide PodSpec.Containers in this case. |
ComponentExtensionSpec ComponentExtensionSpec |
(Members of Component extension defines the deployment configurations for explainer |
ExplainersConfig
(Appears on:InferenceServicesConfig)
Field | Description |
---|---|
alibi ExplainerConfig |
|
art ExplainerConfig |
FailureInfo
(Appears on:ModelStatus)
Field | Description |
---|---|
location string |
(Optional)
Name of component to which the failure relates (usually Pod name) |
reason FailureReason |
(Optional)
High level class of failure |
message string |
(Optional)
Detailed error message |
modelRevisionName string |
(Optional)
Internal Revision/ID of model, tied to specific Spec contents |
time Kubernetes meta/v1.Time |
(Optional)
Time failure occurred or was discovered |
exitCode int32 |
(Optional)
Exit status from the last termination of the container |
FailureReason
(string
alias)
(Appears on:FailureInfo)
FailureReason enum
Value | Description |
---|---|
"InvalidPredictorSpec" |
The current Predictor Spec is invalid or unsupported |
"ModelLoadFailed" |
The model failed to load within a ServingRuntime container |
"NoSupportingRuntime" |
There are no ServingRuntime which support the specified model type |
"RuntimeDisabled" |
The ServingRuntime is disabled |
"RuntimeNotRecognized" |
There is no ServingRuntime defined with the specified runtime name |
"RuntimeUnhealthy" |
Corresponding ServingRuntime containers failed to start or are unhealthy |
HuggingFaceRuntimeSpec
(Appears on:PredictorSpec)
HuggingFaceRuntimeSpec defines arguments for configuring HuggingFace model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
InferenceService
InferenceService is the Schema for the InferenceServices API
Field | Description | ||||||
---|---|---|---|---|---|---|---|
metadata Kubernetes meta/v1.ObjectMeta |
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||||
spec InferenceServiceSpec |
|
||||||
status InferenceServiceStatus |
InferenceServiceSpec
(Appears on:InferenceService)
InferenceServiceSpec is the top level type for this resource
Field | Description |
---|---|
predictor PredictorSpec |
Predictor defines the model serving spec |
explainer ExplainerSpec |
(Optional)
Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified. |
transformer TransformerSpec |
(Optional)
Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service. |
InferenceServiceStatus
(Appears on:InferenceService)
InferenceServiceStatus defines the observed state of InferenceService
Field | Description |
---|---|
Status knative.dev/pkg/apis/duck/v1.Status |
(Members of Conditions for the InferenceService |
address knative.dev/pkg/apis/duck/v1.Addressable |
(Optional)
Addressable endpoint for the InferenceService |
url knative.dev/pkg/apis.URL |
(Optional)
URL holds the url that will distribute traffic over the provided traffic targets. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix} |
components map[kserve.io/serving/pkg/apis/serving/v1beta1.ComponentType]kserve.io/serving/pkg/apis/serving/v1beta1.ComponentStatusSpec |
Statuses for the components of the InferenceService |
modelStatus ModelStatus |
Model related statuses |
InferenceServicesConfig
Field | Description |
---|---|
explainers ExplainersConfig |
Explainer configurations |
IngressConfig
Field | Description |
---|---|
ingressGateway string |
|
ingressService string |
|
localGateway string |
|
localGatewayService string |
|
ingressDomain string |
|
ingressClassName string |
|
domainTemplate string |
|
urlScheme string |
|
disableIstioVirtualHost bool |
|
pathTemplate string |
|
disableIngressCreation bool |
LightGBMSpec
(Appears on:PredictorSpec)
LightGBMSpec defines arguments for configuring LightGBMSpec model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
LoggerSpec
(Appears on:ComponentExtensionSpec)
LoggerSpec specifies optional payload logging available for all components
Field | Description |
---|---|
url string |
(Optional)
URL to send logging events |
mode LoggerType |
(Optional)
Specifies the scope of the loggers. |
LoggerType
(string
alias)
(Appears on:LoggerSpec)
LoggerType controls the scope of log publishing
Value | Description |
---|---|
"all" |
Logger mode to log both request and response |
"request" |
Logger mode to log only request |
"response" |
Logger mode to log only response |
ModelCopies
(Appears on:ModelStatus)
Field | Description |
---|---|
failedCopies int |
How many copies of this predictor’s models failed to load recently |
totalCopies int |
(Optional)
Total number copies of this predictor’s models that are currently loaded |
ModelFormat
(Appears on:ModelSpec)
Field | Description |
---|---|
name string |
Name of the model format. |
version string |
(Optional)
Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”. |
ModelRevisionStates
(Appears on:ModelStatus)
Field | Description |
---|---|
activeModelState ModelState |
High level state string: Pending, Standby, Loading, Loaded, FailedToLoad |
targetModelState ModelState |
ModelSpec
(Appears on:PredictorSpec)
Field | Description |
---|---|
modelFormat ModelFormat |
ModelFormat being served. |
runtime string |
(Optional)
Specific ClusterServingRuntime/ServingRuntime name to use for deployment. |
PredictorExtensionSpec PredictorExtensionSpec |
(Members of |
ModelState
(string
alias)
(Appears on:ModelRevisionStates)
ModelState enum
Value | Description |
---|---|
"FailedToLoad" |
All copies of the model failed to load |
"Loaded" |
At least one copy of the model is loaded |
"Loading" |
Model is loading |
"Pending" |
Model is not yet registered |
"Standby" |
Model is available but not loaded (will load when used) |
ModelStatus
(Appears on:InferenceServiceStatus)
Field | Description |
---|---|
transitionStatus TransitionStatus |
Whether the available predictor endpoints reflect the current Spec or is in transition |
states ModelRevisionStates |
(Optional)
State information of the predictor’s model. |
lastFailureInfo FailureInfo |
(Optional)
Details of last failure, when load of target model is failed or blocked. |
copies ModelCopies |
(Optional)
Model copy information of the predictor’s model. |
ONNXRuntimeSpec
(Appears on:PredictorSpec)
ONNXRuntimeSpec defines arguments for configuring ONNX model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
PMMLSpec
(Appears on:PredictorSpec)
PMMLSpec defines arguments for configuring PMML model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
PaddleServerSpec
(Appears on:PredictorSpec)
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of |
PodSpec
(Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)
PodSpec is a description of a pod.
Field | Description |
---|---|
volumes []Kubernetes core/v1.Volume |
(Optional)
List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes |
initContainers []Kubernetes core/v1.Container |
List of initialization containers belonging to the pod. Init containers are executed in order prior to containers being started. If any init container fails, the pod is considered to have failed and is handled according to its restartPolicy. The name for an init container or normal container must be unique among all containers. Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes. The resourceRequirements of an init container are taken into account during scheduling by finding the highest request/limit for each resource type, and then using the max of of that value or the sum of the normal containers. Limits are applied to init containers in a similar fashion. Init containers cannot currently be added or removed. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ |
containers []Kubernetes core/v1.Container |
List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated. |
ephemeralContainers []Kubernetes core/v1.EphemeralContainer |
(Optional)
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing pod to perform user-initiated actions such as debugging. This list cannot be specified when creating a pod, and it cannot be modified by updating the pod spec. In order to add an ephemeral container to an existing pod, use the pod’s ephemeralcontainers subresource. This field is beta-level and available on clusters that haven’t disabled the EphemeralContainers feature gate. |
restartPolicy Kubernetes core/v1.RestartPolicy |
(Optional)
Restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to Always. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy |
terminationGracePeriodSeconds int64 |
(Optional)
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates stop immediately via the kill signal (no opportunity to shut down). If this value is nil, the default grace period will be used instead. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. Defaults to 30 seconds. |
activeDeadlineSeconds int64 |
(Optional)
Optional duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers. Value must be a positive integer. |
dnsPolicy Kubernetes core/v1.DNSPolicy |
(Optional)
Set DNS policy for the pod. Defaults to “ClusterFirst”. Valid values are ‘ClusterFirstWithHostNet’, ‘ClusterFirst’, ‘Default’ or ‘None’. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to ‘ClusterFirstWithHostNet’. |
nodeSelector map[string]string |
(Optional)
NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ |
serviceAccountName string |
(Optional)
ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ |
serviceAccount string |
(Optional)
DeprecatedServiceAccount is a depreciated alias for ServiceAccountName. Deprecated: Use serviceAccountName instead. |
automountServiceAccountToken bool |
(Optional)
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted. |
nodeName string |
(Optional)
NodeName is a request to schedule this pod onto a specific node. If it is non-empty, the scheduler simply schedules this pod onto that node, assuming that it fits resource requirements. |
hostNetwork bool |
(Optional)
Host networking requested for this pod. Use the host’s network namespace. If this option is set, the ports that will be used must be specified. Default to false. |
hostPID bool |
(Optional)
Use the host’s pid namespace. Optional: Default to false. |
hostIPC bool |
(Optional)
Use the host’s ipc namespace. Optional: Default to false. |
shareProcessNamespace bool |
(Optional)
Share a single process namespace between all of the containers in a pod. When this is set containers will be able to view and signal processes from other containers in the same pod, and the first process in each container will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both be set. Optional: Default to false. |
securityContext Kubernetes core/v1.PodSecurityContext |
(Optional)
SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field. |
imagePullSecrets []Kubernetes core/v1.LocalObjectReference |
(Optional)
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod |
hostname string |
(Optional)
Specifies the hostname of the Pod If not specified, the pod’s hostname will be set to a system-defined value. |
subdomain string |
(Optional)
If specified, the fully qualified Pod hostname will be “ |
affinity Kubernetes core/v1.Affinity |
(Optional)
If specified, the pod’s scheduling constraints |
schedulerName string |
(Optional)
If specified, the pod will be dispatched by specified scheduler. If not specified, the pod will be dispatched by default scheduler. |
tolerations []Kubernetes core/v1.Toleration |
(Optional)
If specified, the pod’s tolerations. |
hostAliases []Kubernetes core/v1.HostAlias |
(Optional)
HostAliases is an optional list of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods. |
priorityClassName string |
(Optional)
If specified, indicates the pod’s priority. “system-node-critical” and “system-cluster-critical” are two special keywords which indicate the highest priorities with the former being the highest priority. Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default. |
priority int32 |
(Optional)
The priority value. Various system components use this field to find the priority of the pod. When Priority Admission Controller is enabled, it prevents users from setting this field. The admission controller populates this field from PriorityClassName. The higher the value, the higher the priority. |
dnsConfig Kubernetes core/v1.PodDNSConfig |
(Optional)
Specifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy. |
readinessGates []Kubernetes core/v1.PodReadinessGate |
(Optional)
If specified, all readiness gates will be evaluated for pod readiness. A pod is ready when all its containers are ready AND all conditions specified in the readiness gates have status equal to “True” More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates |
runtimeClassName string |
(Optional)
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the “legacy” RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class This is a beta feature as of Kubernetes v1.14. |
enableServiceLinks bool |
(Optional)
EnableServiceLinks indicates whether information about services should be injected into pod’s environment variables, matching the syntax of Docker links. Optional: Defaults to true. |
preemptionPolicy Kubernetes core/v1.PreemptionPolicy |
(Optional)
PreemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset. This field is beta-level, gated by the NonPreemptingPriority feature-gate. |
overhead Kubernetes core/v1.ResourceList |
(Optional)
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass. This field will be autopopulated at admission time by the RuntimeClass admission controller. If the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests. The RuntimeClass admission controller will reject Pod create requests which have the overhead already set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero. More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead/README.md This field is beta-level as of Kubernetes v1.18, and is only honored by servers that enable the PodOverhead feature. |
topologySpreadConstraints []Kubernetes core/v1.TopologySpreadConstraint |
(Optional)
TopologySpreadConstraints describes how a group of pods ought to spread across topology domains. Scheduler will schedule pods in a way which abides by the constraints. All topologySpreadConstraints are ANDed. |
setHostnameAsFQDN bool |
(Optional)
If true the pod’s hostname will be configured as the pod’s FQDN, rather than the leaf name (the default). In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname). In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to FQDN. If a pod does not have FQDN, this has no effect. Default to false. |
os Kubernetes core/v1.PodOS |
(Optional)
Specifies the OS of the containers in the pod. Some pod and container fields are restricted if this is set. If the OS field is set to linux, the following fields must be unset: -securityContext.windowsOptions If the OS field is set to windows, following fields must be unset: - spec.hostPID - spec.hostIPC - spec.securityContext.seLinuxOptions - spec.securityContext.seccompProfile - spec.securityContext.fsGroup - spec.securityContext.fsGroupChangePolicy - spec.securityContext.sysctls - spec.shareProcessNamespace - spec.securityContext.runAsUser - spec.securityContext.runAsGroup - spec.securityContext.supplementalGroups - spec.containers[].securityContext.seLinuxOptions - spec.containers[].securityContext.seccompProfile - spec.containers[].securityContext.capabilities - spec.containers[].securityContext.readOnlyRootFilesystem - spec.containers[].securityContext.privileged - spec.containers[].securityContext.allowPrivilegeEscalation - spec.containers[].securityContext.procMount - spec.containers[].securityContext.runAsUser - spec.containers[*].securityContext.runAsGroup This is an alpha field and requires the IdentifyPodOS feature |
hostUsers bool |
(Optional)
Use the host’s user namespace. Optional: Default to true. If set to true or not present, the pod will be run in the host user namespace, useful for when the pod needs a feature only available to the host user namespace, such as loading a kernel module with CAP_SYS_MODULE. When set to false, a new userns is created for the pod. Setting false is useful for mitigating container breakout vulnerabilities even allowing users to run their containers as root without actually having root privileges on the host. This field is alpha-level and is only honored by servers that enable the UserNamespacesSupport feature. |
schedulingGates []Kubernetes core/v1.PodSchedulingGate |
(Optional)
SchedulingGates is an opaque list of values that if specified will block scheduling the pod. If schedulingGates is not empty, the pod will stay in the SchedulingGated state and the scheduler will not attempt to schedule the pod. SchedulingGates can only be set at pod creation time, and be removed only afterwards. This is a beta feature enabled by the PodSchedulingReadiness feature gate. |
resourceClaims []Kubernetes core/v1.PodResourceClaim |
(Optional)
ResourceClaims defines which ResourceClaims must be allocated and reserved before the Pod is allowed to start. The resources will be made available to those containers which consume them by name. This is an alpha field and requires enabling the DynamicResourceAllocation feature gate. This field is immutable. |
PredictorExtensionSpec
(Appears on:HuggingFaceRuntimeSpec, LightGBMSpec, ModelSpec, ONNXRuntimeSpec, PMMLSpec, PaddleServerSpec, SKLearnSpec, TFServingSpec, TorchServeSpec, TritonSpec, XGBoostSpec)
PredictorExtensionSpec defines configuration shared across all predictor frameworks
Field | Description |
---|---|
storageUri string |
(Optional)
This field points to the location of the trained model which is mounted onto the pod. |
runtimeVersion string |
(Optional)
Runtime version of the predictor docker image |
protocolVersion github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol |
(Optional)
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2) |
Container Kubernetes core/v1.Container |
(Members of Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec. |
storage StorageSpec |
(Optional)
Storage Spec for model location |
PredictorImplementation
PredictorImplementation defines common functions for all predictors e.g Tensorflow, Triton, etc
PredictorSpec
(Appears on:InferenceServiceSpec)
PredictorSpec defines the configuration for a predictor, The following fields follow a “1-of” semantic. Users must specify exactly one spec.
Field | Description |
---|---|
sklearn SKLearnSpec |
Spec for SKLearn model server |
xgboost XGBoostSpec |
Spec for XGBoost model server |
tensorflow TFServingSpec |
Spec for TFServing (https://github.com/tensorflow/serving) |
pytorch TorchServeSpec |
Spec for TorchServe (https://pytorch.org/serve) |
triton TritonSpec |
Spec for Triton Inference Server (https://github.com/triton-inference-server/server) |
onnx ONNXRuntimeSpec |
Spec for ONNX runtime (https://github.com/microsoft/onnxruntime) |
huggingface HuggingFaceRuntimeSpec |
Spec for HuggingFace runtime (https://github.com/huggingface) |
pmml PMMLSpec |
Spec for PMML (http://dmg.org/pmml/v4-1/GeneralStructure.html) |
lightgbm LightGBMSpec |
Spec for LightGBM model server |
paddle PaddleServerSpec |
Spec for Paddle model server (https://github.com/PaddlePaddle/Serving) |
model ModelSpec |
Model spec for any arbitrary framework. |
PodSpec PodSpec |
(Members of This spec is dual purpose. |
ComponentExtensionSpec ComponentExtensionSpec |
(Members of Component extension defines the deployment configurations for a predictor |
SKLearnSpec
(Appears on:PredictorSpec)
SKLearnSpec defines arguments for configuring SKLearn model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
ScaleMetric
(string
alias)
(Appears on:ComponentExtensionSpec)
ScaleMetric enum
Value | Description |
---|---|
"cpu" |
|
"concurrency" |
|
"memory" |
|
"rps" |
StorageSpec
(Appears on:ExplainerExtensionSpec, PredictorExtensionSpec)
Field | Description |
---|---|
path string |
(Optional)
The path to the model object in the storage. It cannot co-exist with the storageURI. |
schemaPath string |
(Optional)
The path to the model schema file in the storage. |
parameters map[string]string |
(Optional)
Parameters to override the default storage credentials and config. |
key string |
(Optional)
The Storage Key in the secret for this model. |
TFServingSpec
(Appears on:PredictorSpec)
TFServingSpec defines arguments for configuring Tensorflow model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
TorchServeSpec
(Appears on:PredictorSpec)
TorchServeSpec defines arguments for configuring PyTorch model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
TransformerSpec
(Appears on:InferenceServiceSpec)
TransformerSpec defines transformer service for pre/post processing
Field | Description |
---|---|
PodSpec PodSpec |
(Members of This spec is dual purpose. |
ComponentExtensionSpec ComponentExtensionSpec |
(Members of Component extension defines the deployment configurations for a transformer |
TransitionStatus
(string
alias)
(Appears on:ModelStatus)
TransitionStatus enum
Value | Description |
---|---|
"BlockedByFailedLoad" |
Target model failed to load |
"InProgress" |
Waiting for target model to reach state of active model |
"InvalidSpec" |
Target predictor spec failed validation |
"UpToDate" |
Predictor is up-to-date (reflects current spec) |
TritonSpec
(Appears on:PredictorSpec)
TritonSpec defines arguments for configuring Triton model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
XGBoostSpec
(Appears on:PredictorSpec)
XGBoostSpec defines arguments for configuring XGBoost model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
Generated with gen-crd-api-reference-docs
on git commit 426fe21d
.