V1beta1ComponentExtensionSpec¶
ComponentExtensionSpec defines the deployment configuration for a given InferenceService component
Properties¶
Name | Type | Description | Notes |
---|---|---|---|
batcher | V1beta1Batcher | [optional] | |
canary_traffic_percent | int | CanaryTrafficPercent defines the traffic split percentage between the candidate revision and the last ready revision | [optional] |
container_concurrency | int | ContainerConcurrency specifies how many requests can be processed concurrently, this sets the hard limit of the container concurrency(https://knative.dev/docs/serving/autoscaling/concurrency). | [optional] |
logger | V1beta1LoggerSpec | [optional] | |
max_replicas | int | Maximum number of replicas for autoscaling. | [optional] |
min_replicas | int | Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero. | [optional] |
scale_metric | str | ScaleMetric defines the scaling metric type watched by autoscaler possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via Knative Pod Autoscaler(https://knative.dev/docs/serving/autoscaling/autoscaling-metrics). | [optional] |
scale_target | int | ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for. concurrency and rps targets are supported by Knative Pod Autoscaler (https://knative.dev/docs/serving/autoscaling/autoscaling-targets/). | [optional] |
timeout | int | TimeoutSeconds specifies the number of seconds to wait before timing out a request to the component. | [optional] |