Control Plane API
Packages:
serving.kserve.io/v1beta1
Package v1beta1 contains API Schema definitions for the serving v1beta1 API group
Resource Types:
AIXExplainerSpec
(Appears on:ExplainerSpec)
AIXExplainerSpec defines the arguments for configuring an AIX Explanation Server
Field | Description |
---|---|
type AIXExplainerType |
The type of AIX explainer |
ExplainerExtensionSpec ExplainerExtensionSpec |
(Members of Contains fields shared across all explainers |
AIXExplainerType
(string
alias)
(Appears on:AIXExplainerSpec)
Value | Description |
---|---|
"LimeImages" |
ARTExplainerSpec
(Appears on:ExplainerSpec)
ARTExplainerType defines the arguments for configuring an ART Explanation Server
Field | Description |
---|---|
type ARTExplainerType |
The type of ART explainer |
ExplainerExtensionSpec ExplainerExtensionSpec |
(Members of Contains fields shared across all explainers |
ARTExplainerType
(string
alias)
(Appears on:ARTExplainerSpec)
Value | Description |
---|---|
"SquareAttack" |
AlibiExplainerSpec
(Appears on:ExplainerSpec)
AlibiExplainerSpec defines the arguments for configuring an Alibi Explanation Server
Field | Description |
---|---|
type AlibiExplainerType |
The type of Alibi explainer |
ExplainerExtensionSpec ExplainerExtensionSpec |
(Members of Contains fields shared across all explainers |
AlibiExplainerType
(string
alias)
(Appears on:AlibiExplainerSpec)
AlibiExplainerType is the explanation method
Value | Description |
---|---|
"AnchorImages" |
|
"AnchorTabular" |
|
"AnchorText" |
|
"Contrastive" |
|
"Counterfactuals" |
Batcher
(Appears on:ComponentExtensionSpec)
Batcher specifies optional payload batching available for all components
Field | Description |
---|---|
maxBatchSize int |
(Optional)
Specifies the max number of requests to trigger a batch |
maxLatency int |
(Optional)
Specifies the max latency to trigger a batch |
timeout int |
(Optional)
Specifies the timeout of a batch |
Component
Component interface is implemented by all specs that contain component implementations, e.g. PredictorSpec, ExplainerSpec, TransformerSpec.
ComponentExtensionSpec
(Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)
ComponentExtensionSpec defines the deployment configuration for a given InferenceService component
Field | Description |
---|---|
minReplicas int |
(Optional)
Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero. |
maxReplicas int |
(Optional)
Maximum number of replicas for autoscaling. |
scaleTarget int |
(Optional)
ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/). |
scaleMetric ScaleMetric |
(Optional)
ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics). |
containerConcurrency int64 |
(Optional)
ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency(https://knative.dev/docs/serving/autoscaling/concurrency). |
timeout int64 |
(Optional)
TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component. |
canaryTrafficPercent int64 |
(Optional)
CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision |
logger LoggerSpec |
(Optional)
Activate request/response logging and logger configurations |
batcher Batcher |
(Optional)
Activate request batching and batching configurations |
ComponentImplementation
ComponentImplementation interface is implemented by predictor, transformer, and explainer implementations
ComponentStatusSpec
(Appears on:InferenceServiceStatus)
ComponentStatusSpec describes the state of the component
Field | Description |
---|---|
latestReadyRevision string |
(Optional)
Latest revision name that is in ready state |
latestCreatedRevision string |
(Optional)
Latest revision name that is created |
previousRolledoutRevision string |
(Optional)
Previous revision name that is rolled out with 100 percent traffic |
latestRolledoutRevision string |
(Optional)
Latest revision name that is rolled out with 100 percent traffic |
traffic []knative.dev/serving/pkg/apis/serving/v1.TrafficTarget |
(Optional)
Traffic holds the configured traffic distribution for latest ready revision and previous rolled out revision. |
url knative.dev/pkg/apis.URL |
(Optional)
URL holds the primary url that will distribute traffic over the provided traffic targets. This will be one the REST or gRPC endpoints that are available. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix} |
restUrl knative.dev/pkg/apis.URL |
(Optional)
REST endpoint of the component if available. |
grpcUrl knative.dev/pkg/apis.URL |
(Optional)
gRPC endpoint of the component if available. |
address knative.dev/pkg/apis/duck/v1.Addressable |
(Optional)
Addressable endpoint for the InferenceService |
ComponentType
(string
alias)
ComponentType contains the different types of components of the service
Value | Description |
---|---|
"explainer" |
|
"predictor" |
|
"transformer" |
CustomExplainer
CustomExplainer defines arguments for configuring a custom explainer.
Field | Description |
---|---|
PodSpec Kubernetes core/v1.PodSpec |
(Members of |
CustomPredictor
CustomPredictor defines arguments for configuring a custom server.
Field | Description |
---|---|
PodSpec Kubernetes core/v1.PodSpec |
(Members of |
CustomTransformer
CustomTransformer defines arguments for configuring a custom transformer.
Field | Description |
---|---|
PodSpec Kubernetes core/v1.PodSpec |
(Members of |
DeployConfig
Field | Description |
---|---|
defaultDeploymentMode string |
ExplainerConfig
(Appears on:ExplainersConfig)
Field | Description |
---|---|
image string |
explainer docker image name |
defaultImageVersion string |
default explainer docker image version |
ExplainerExtensionSpec
(Appears on:AIXExplainerSpec, ARTExplainerSpec, AlibiExplainerSpec)
ExplainerExtensionSpec defines configuration shared across all explainer frameworks
Field | Description |
---|---|
storageUri string |
The location of a trained explanation model |
runtimeVersion string |
Defaults to latest Explainer Version |
config map[string]string |
Inline custom parameter settings for explainer |
Container Kubernetes core/v1.Container |
(Members of Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec. |
storage StorageSpec |
(Optional)
Storage Spec for model location |
ExplainerSpec
(Appears on:InferenceServiceSpec)
ExplainerSpec defines the container spec for a model explanation server, The following fields follow a “1-of” semantic. Users must specify exactly one spec.
Field | Description |
---|---|
alibi AlibiExplainerSpec |
Spec for alibi explainer |
aix AIXExplainerSpec |
Spec for AIX explainer |
art ARTExplainerSpec |
Spec for ART explainer |
PodSpec PodSpec |
(Members of This spec is dual purpose. 1) Users may choose to provide a full PodSpec for their custom explainer. The field PodSpec.Containers is mutually exclusive with other explainers (i.e. Alibi). 2) Users may choose to provide a Explainer (i.e. Alibi) and specify PodSpec overrides in the PodSpec. They must not provide PodSpec.Containers in this case. |
ComponentExtensionSpec ComponentExtensionSpec |
(Members of Component extension defines the deployment configurations for explainer |
ExplainersConfig
(Appears on:InferenceServicesConfig)
Field | Description |
---|---|
alibi ExplainerConfig |
|
aix ExplainerConfig |
|
art ExplainerConfig |
FailureInfo
(Appears on:ModelStatus)
Field | Description |
---|---|
location string |
(Optional)
Name of component to which the failure relates (usually Pod name) |
reason FailureReason |
(Optional)
High level class of failure |
message string |
(Optional)
Detailed error message |
modelRevisionName string |
(Optional)
Internal Revision/ID of model, tied to specific Spec contents |
time Kubernetes meta/v1.Time |
(Optional)
Time failure occurred or was discovered |
FailureReason
(string
alias)
(Appears on:FailureInfo)
FailureReason enum
Value | Description |
---|---|
"InvalidPredictorSpec" |
The current Predictor Spec is invalid or unsupported |
"ModelLoadFailed" |
The model failed to load within a ServingRuntime container |
"NoSupportingRuntime" |
There are no ServingRuntime which support the specified model type |
"RuntimeDisabled" |
The ServingRuntime is disabled |
"RuntimeNotRecognized" |
There is no ServingRuntime defined with the specified runtime name |
"RuntimeUnhealthy" |
Corresponding ServingRuntime containers failed to start or are unhealthy |
InferenceService
InferenceService is the Schema for the InferenceServices API
Field | Description | ||||||
---|---|---|---|---|---|---|---|
metadata Kubernetes meta/v1.ObjectMeta |
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||||
spec InferenceServiceSpec |
|
||||||
status InferenceServiceStatus |
InferenceServiceSpec
(Appears on:InferenceService)
InferenceServiceSpec is the top level type for this resource
Field | Description |
---|---|
predictor PredictorSpec |
Predictor defines the model serving spec |
explainer ExplainerSpec |
(Optional)
Explainer defines the model explanation service spec, explainer service calls to predictor or transformer if it is specified. |
transformer TransformerSpec |
(Optional)
Transformer defines the pre/post processing before and after the predictor call, transformer service calls to predictor service. |
InferenceServiceStatus
(Appears on:InferenceService)
InferenceServiceStatus defines the observed state of InferenceService
Field | Description |
---|---|
Status knative.dev/pkg/apis/duck/v1.Status |
(Members of Conditions for the InferenceService |
address knative.dev/pkg/apis/duck/v1.Addressable |
(Optional)
Addressable endpoint for the InferenceService |
url knative.dev/pkg/apis.URL |
(Optional)
URL holds the url that will distribute traffic over the provided traffic targets. It generally has the form http[s]://{route-name}.{route-namespace}.{cluster-level-suffix} |
components map[kserve.io/v1beta1/pkg/apis/serving/v1beta1.ComponentType]kserve.io/v1beta1/pkg/apis/serving/v1beta1.ComponentStatusSpec |
Statuses for the components of the InferenceService |
modelStatus ModelStatus |
Model related statuses |
InferenceServicesConfig
Field | Description |
---|---|
transformers TransformersConfig |
Transformer configurations |
predictors PredictorsConfig |
Predictor configurations |
explainers ExplainersConfig |
Explainer configurations |
IngressConfig
Field | Description |
---|---|
ingressGateway string |
|
ingressService string |
|
localGateway string |
|
localGatewayService string |
|
ingressDomain string |
|
ingressClassName string |
|
domainTemplate string |
|
urlScheme string |
LightGBMSpec
(Appears on:PredictorSpec)
LightGBMSpec defines arguments for configuring LightGBMSpec model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
LoggerSpec
(Appears on:ComponentExtensionSpec)
LoggerSpec specifies optional payload logging available for all components
Field | Description |
---|---|
url string |
(Optional)
URL to send logging events |
mode LoggerType |
(Optional)
Specifies the scope of the loggers. |
LoggerType
(string
alias)
(Appears on:LoggerSpec)
LoggerType controls the scope of log publishing
Value | Description |
---|---|
"all" |
Logger mode to log both request and response |
"request" |
Logger mode to log only request |
"response" |
Logger mode to log only response |
ModelCopies
(Appears on:ModelStatus)
Field | Description |
---|---|
failedCopies int |
How many copies of this predictor’s models failed to load recently |
totalCopies int |
(Optional)
Total number copies of this predictor’s models that are currently loaded |
ModelFormat
(Appears on:ModelSpec)
Field | Description |
---|---|
name string |
Name of the model format. |
version string |
(Optional)
Version of the model format. Used in validating that a predictor is supported by a runtime. Can be “major”, “major.minor” or “major.minor.patch”. |
ModelRevisionStates
(Appears on:ModelStatus)
Field | Description |
---|---|
activeModelState ModelState |
High level state string: Pending, Standby, Loading, Loaded, FailedToLoad |
targetModelState ModelState |
ModelSpec
(Appears on:PredictorSpec)
Field | Description |
---|---|
modelFormat ModelFormat |
ModelFormat being served. |
runtime string |
(Optional)
Specific ClusterServingRuntime/ServingRuntime name to use for deployment. |
PredictorExtensionSpec PredictorExtensionSpec |
(Members of |
ModelState
(string
alias)
(Appears on:ModelRevisionStates)
ModelState enum
Value | Description |
---|---|
"FailedToLoad" |
All copies of the model failed to load |
"Loaded" |
At least one copy of the model is loaded |
"Loading" |
Model is loading |
"Pending" |
Model is not yet registered |
"Standby" |
Model is available but not loaded (will load when used) |
ModelStatus
(Appears on:InferenceServiceStatus)
Field | Description |
---|---|
transitionStatus TransitionStatus |
Whether the available predictor endpoints reflect the current Spec or is in transition |
states ModelRevisionStates |
(Optional)
State information of the predictor’s model. |
lastFailureInfo FailureInfo |
(Optional)
Details of last failure, when load of target model is failed or blocked. |
copies ModelCopies |
(Optional)
Model copy information of the predictor’s model. |
ONNXRuntimeSpec
(Appears on:PredictorSpec)
ONNXRuntimeSpec defines arguments for configuring ONNX model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
PMMLSpec
(Appears on:PredictorSpec)
PMMLSpec defines arguments for configuring PMML model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
PaddleServerSpec
(Appears on:PredictorSpec)
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of |
PodSpec
(Appears on:ExplainerSpec, PredictorSpec, TransformerSpec)
PodSpec is a description of a pod.
Field | Description |
---|---|
volumes []Kubernetes core/v1.Volume |
(Optional)
List of volumes that can be mounted by containers belonging to the pod. More info: https://kubernetes.io/docs/concepts/storage/volumes |
initContainers []Kubernetes core/v1.Container |
List of initialization containers belonging to the pod. Init containers are executed in order prior to containers being started. If any init container fails, the pod is considered to have failed and is handled according to its restartPolicy. The name for an init container or normal container must be unique among all containers. Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes. The resourceRequirements of an init container are taken into account during scheduling by finding the highest request/limit for each resource type, and then using the max of of that value or the sum of the normal containers. Limits are applied to init containers in a similar fashion. Init containers cannot currently be added or removed. Cannot be updated. More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ |
containers []Kubernetes core/v1.Container |
List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated. |
ephemeralContainers []Kubernetes core/v1.EphemeralContainer |
(Optional)
List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing pod to perform user-initiated actions such as debugging. This list cannot be specified when creating a pod, and it cannot be modified by updating the pod spec. In order to add an ephemeral container to an existing pod, use the pod’s ephemeralcontainers subresource. This field is alpha-level and is only honored by servers that enable the EphemeralContainers feature. |
restartPolicy Kubernetes core/v1.RestartPolicy |
(Optional)
Restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to Always. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy |
terminationGracePeriodSeconds int64 |
(Optional)
Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request. Value must be non-negative integer. The value zero indicates delete immediately. If this value is nil, the default grace period will be used instead. The grace period is the duration in seconds after the processes running in the pod are sent a termination signal and the time when the processes are forcibly halted with a kill signal. Set this value longer than the expected cleanup time for your process. Defaults to 30 seconds. |
activeDeadlineSeconds int64 |
(Optional)
Optional duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers. Value must be a positive integer. |
dnsPolicy Kubernetes core/v1.DNSPolicy |
(Optional)
Set DNS policy for the pod. Defaults to “ClusterFirst”. Valid values are ‘ClusterFirstWithHostNet’, ‘ClusterFirst’, ‘Default’ or ‘None’. DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy. To have DNS options set along with hostNetwork, you have to specify DNS policy explicitly to ‘ClusterFirstWithHostNet’. |
nodeSelector map[string]string |
(Optional)
NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node’s labels for the pod to be scheduled on that node. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ |
serviceAccountName string |
(Optional)
ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ |
serviceAccount string |
(Optional)
DeprecatedServiceAccount is a depreciated alias for ServiceAccountName. Deprecated: Use serviceAccountName instead. |
automountServiceAccountToken bool |
(Optional)
AutomountServiceAccountToken indicates whether a service account token should be automatically mounted. |
nodeName string |
(Optional)
NodeName is a request to schedule this pod onto a specific node. If it is non-empty, the scheduler simply schedules this pod onto that node, assuming that it fits resource requirements. |
hostNetwork bool |
(Optional)
Host networking requested for this pod. Use the host’s network namespace. If this option is set, the ports that will be used must be specified. Default to false. |
hostPID bool |
(Optional)
Use the host’s pid namespace. Optional: Default to false. |
hostIPC bool |
(Optional)
Use the host’s ipc namespace. Optional: Default to false. |
shareProcessNamespace bool |
(Optional)
Share a single process namespace between all of the containers in a pod. When this is set containers will be able to view and signal processes from other containers in the same pod, and the first process in each container will not be assigned PID 1. HostPID and ShareProcessNamespace cannot both be set. Optional: Default to false. |
securityContext Kubernetes core/v1.PodSecurityContext |
(Optional)
SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field. |
imagePullSecrets []Kubernetes core/v1.LocalObjectReference |
(Optional)
ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod |
hostname string |
(Optional)
Specifies the hostname of the Pod If not specified, the pod’s hostname will be set to a system-defined value. |
subdomain string |
(Optional)
If specified, the fully qualified Pod hostname will be “ |
affinity Kubernetes core/v1.Affinity |
(Optional)
If specified, the pod’s scheduling constraints |
schedulerName string |
(Optional)
If specified, the pod will be dispatched by specified scheduler. If not specified, the pod will be dispatched by default scheduler. |
tolerations []Kubernetes core/v1.Toleration |
(Optional)
If specified, the pod’s tolerations. |
hostAliases []Kubernetes core/v1.HostAlias |
(Optional)
HostAliases is an optional list of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods. |
priorityClassName string |
(Optional)
If specified, indicates the pod’s priority. “system-node-critical” and “system-cluster-critical” are two special keywords which indicate the highest priorities with the former being the highest priority. Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default. |
priority int32 |
(Optional)
The priority value. Various system components use this field to find the priority of the pod. When Priority Admission Controller is enabled, it prevents users from setting this field. The admission controller populates this field from PriorityClassName. The higher the value, the higher the priority. |
dnsConfig Kubernetes core/v1.PodDNSConfig |
(Optional)
Specifies the DNS parameters of a pod. Parameters specified here will be merged to the generated DNS configuration based on DNSPolicy. |
readinessGates []Kubernetes core/v1.PodReadinessGate |
(Optional)
If specified, all readiness gates will be evaluated for pod readiness. A pod is ready when all its containers are ready AND all conditions specified in the readiness gates have status equal to “True” More info: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/580-pod-readiness-gates |
runtimeClassName string |
(Optional)
RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the “legacy” RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/585-runtime-class This is a beta feature as of Kubernetes v1.14. |
enableServiceLinks bool |
(Optional)
EnableServiceLinks indicates whether information about services should be injected into pod’s environment variables, matching the syntax of Docker links. Optional: Defaults to true. |
preemptionPolicy Kubernetes core/v1.PreemptionPolicy |
(Optional)
PreemptionPolicy is the Policy for preempting pods with lower priority. One of Never, PreemptLowerPriority. Defaults to PreemptLowerPriority if unset. This field is beta-level, gated by the NonPreemptingPriority feature-gate. |
overhead Kubernetes core/v1.ResourceList |
(Optional)
Overhead represents the resource overhead associated with running a pod for a given RuntimeClass. This field will be autopopulated at admission time by the RuntimeClass admission controller. If the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests. The RuntimeClass admission controller will reject Pod create requests which have the overhead already set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero. More info: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/688-pod-overhead This field is alpha-level as of Kubernetes v1.16, and is only honored by servers that enable the PodOverhead feature. |
topologySpreadConstraints []Kubernetes core/v1.TopologySpreadConstraint |
(Optional)
TopologySpreadConstraints describes how a group of pods ought to spread across topology domains. Scheduler will schedule pods in a way which abides by the constraints. All topologySpreadConstraints are ANDed. |
setHostnameAsFQDN bool |
(Optional)
If true the pod’s hostname will be configured as the pod’s FQDN, rather than the leaf name (the default). In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname). In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to FQDN. If a pod does not have FQDN, this has no effect. Default to false. |
PredictorConfig
(Appears on:PredictorProtocols, PredictorsConfig)
Field | Description |
---|---|
image string |
predictor docker image name |
defaultImageVersion string |
default predictor docker image version on cpu |
defaultGpuImageVersion string |
default predictor docker image version on gpu |
defaultTimeout,string int64 |
Default timeout of predictor for serving a request, in seconds |
multiModelServer,boolean bool |
Flag to determine if multi-model serving is supported |
supportedFrameworks []string |
frameworks the model agent is able to run |
PredictorExtensionSpec
(Appears on:LightGBMSpec, ModelSpec, ONNXRuntimeSpec, PMMLSpec, PaddleServerSpec, SKLearnSpec, TFServingSpec, TorchServeSpec, TritonSpec, XGBoostSpec)
PredictorExtensionSpec defines configuration shared across all predictor frameworks
Field | Description |
---|---|
storageUri string |
(Optional)
This field points to the location of the trained model which is mounted onto the pod. |
runtimeVersion string |
(Optional)
Runtime version of the predictor docker image |
protocolVersion github.com/kserve/kserve/pkg/constants.InferenceServiceProtocol |
(Optional)
Protocol version to use by the predictor (i.e. v1 or v2 or grpc-v1 or grpc-v2) |
Container Kubernetes core/v1.Container |
(Members of Container enables overrides for the predictor. Each framework will have different defaults that are populated in the underlying container spec. |
storage StorageSpec |
(Optional)
Storage Spec for model location |
PredictorImplementation
PredictorImplementation defines common functions for all predictors e.g Tensorflow, Triton, etc
PredictorProtocols
(Appears on:PredictorsConfig)
Field | Description |
---|---|
v1 PredictorConfig |
|
v2 PredictorConfig |
PredictorSpec
(Appears on:InferenceServiceSpec)
PredictorSpec defines the configuration for a predictor, The following fields follow a “1-of” semantic. Users must specify exactly one spec.
Field | Description |
---|---|
sklearn SKLearnSpec |
Spec for SKLearn model server |
xgboost XGBoostSpec |
Spec for XGBoost model server |
tensorflow TFServingSpec |
Spec for TFServing (https://github.com/tensorflow/serving) |
pytorch TorchServeSpec |
Spec for TorchServe (https://pytorch.org/serve) |
triton TritonSpec |
Spec for Triton Inference Server (https://github.com/triton-inference-server/server) |
onnx ONNXRuntimeSpec |
Spec for ONNX runtime (https://github.com/microsoft/onnxruntime) |
pmml PMMLSpec |
Spec for PMML (http://dmg.org/pmml/v4-1/GeneralStructure.html) |
lightgbm LightGBMSpec |
Spec for LightGBM model server |
paddle PaddleServerSpec |
Spec for Paddle model server (https://github.com/PaddlePaddle/Serving) |
model ModelSpec |
Model spec for any arbitrary framework. |
PodSpec PodSpec |
(Members of This spec is dual purpose. |
ComponentExtensionSpec ComponentExtensionSpec |
(Members of Component extension defines the deployment configurations for a predictor |
PredictorsConfig
(Appears on:InferenceServicesConfig)
Field | Description |
---|---|
tensorflow PredictorConfig |
|
triton PredictorConfig |
|
xgboost PredictorProtocols |
|
sklearn PredictorProtocols |
|
pytorch PredictorConfig |
|
onnx PredictorConfig |
|
pmml PredictorConfig |
|
lightgbm PredictorConfig |
|
paddle PredictorConfig |
SKLearnSpec
(Appears on:PredictorSpec)
SKLearnSpec defines arguments for configuring SKLearn model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
ScaleMetric
(string
alias)
(Appears on:ComponentExtensionSpec)
ScaleMetric enum
Value | Description |
---|---|
"cpu" |
|
"concurrency" |
|
"memory" |
|
"rps" |
StorageSpec
(Appears on:ExplainerExtensionSpec, PredictorExtensionSpec)
Field | Description |
---|---|
path string |
(Optional)
The path to the model object in the storage. It cannot co-exist with the storageURI. |
schemaPath string |
(Optional)
The path to the model schema file in the storage. |
parameters map[string]string |
(Optional)
Parameters to override the default storage credentials and config. |
key string |
(Optional)
The Storage Key in the secret for this model. |
TFServingSpec
(Appears on:PredictorSpec)
TFServingSpec defines arguments for configuring Tensorflow model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
TorchServeSpec
(Appears on:PredictorSpec)
TorchServeSpec defines arguments for configuring PyTorch model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
TransformerConfig
(Appears on:TransformersConfig)
Field | Description |
---|---|
image string |
transformer docker image name |
defaultImageVersion string |
default transformer docker image version |
TransformerSpec
(Appears on:InferenceServiceSpec)
TransformerSpec defines transformer service for pre/post processing
Field | Description |
---|---|
PodSpec PodSpec |
(Members of This spec is dual purpose. |
ComponentExtensionSpec ComponentExtensionSpec |
(Members of Component extension defines the deployment configurations for a transformer |
TransformersConfig
(Appears on:InferenceServicesConfig)
Field | Description |
---|---|
feast TransformerConfig |
TransitionStatus
(string
alias)
(Appears on:ModelStatus)
TransitionStatus enum
Value | Description |
---|---|
"BlockedByFailedLoad" |
Target model failed to load |
"InProgress" |
Waiting for target model to reach state of active model |
"InvalidSpec" |
Target predictor spec failed validation |
"UpToDate" |
Predictor is up-to-date (reflects current spec) |
TritonSpec
(Appears on:PredictorSpec)
TritonSpec defines arguments for configuring Triton model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
XGBoostSpec
(Appears on:PredictorSpec)
XGBoostSpec defines arguments for configuring XGBoost model serving.
Field | Description |
---|---|
PredictorExtensionSpec PredictorExtensionSpec |
(Members of Contains fields shared across all predictors |
Generated with gen-crd-api-reference-docs
on git commit 133ecebb
.