Skip to content


ComponentExtensionSpec defines the deployment configuration for a given InferenceService component


Name Type Description Notes
batcher V1beta1Batcher [optional]
canary_traffic_percent int CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision [optional]
container_concurrency int ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency( [optional]
logger V1beta1LoggerSpec [optional]
max_replicas int Maximum number of replicas for autoscaling. [optional]
min_replicas int Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero. [optional]
scale_metric str ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler( [optional]
scale_target int ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler ( [optional]
timeout int TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component. [optional]

[Back to Model list] [Back to API list] [Back to README]

Back to top