V1beta1ComponentExtensionSpec¶

ComponentExtensionSpec defines the deployment configuration for a given InferenceService component

Properties¶

Name	Type	Description	Notes
batcher	V1beta1Batcher		[optional]
canary_traffic_percent	int	CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision	[optional]
container_concurrency	int	ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency(https://knative.dev/docs/serving/autoscaling/concurrency).	[optional]
logger	V1beta1LoggerSpec		[optional]
max_replicas	int	Maximum number of replicas for autoscaling.	[optional]
min_replicas	int	Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.	[optional]
scale_metric	str	ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics).	[optional]
scale_target	int	ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/).	[optional]
timeout	int	TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component.	[optional]