Skip to main content
Version: Next

LLMInferenceService Status Reference

This page documents the full status contract for LLMInferenceService (v1alpha2) - conditions, reason codes, and observed status fields. It is designed as a reference for consumers of this API - dashboards, CLIs, GitOps pipelines, or any tooling that reads LLMInferenceService status programmatically.

For background on the resource itself, see the LLMInferenceService overview. For spec configuration, see the configuration guide.


Condition Hierarchy

LLMInferenceService uses status conditions to represent readiness. The top-level Ready condition aggregates WorkloadsReady and RouterReady via the Knative LivingConditionSet - it is True only when both are True.

PresetsCombined is not part of the Ready rollup. It is a separate gate: when config resolution fails, the reconciler short-circuits before reaching workload or router reconciliation, so WorkloadsReady and RouterReady stay at their previous values (or Unknown on a new service). Consumers should check PresetsCombined independently.

Optional conditions (marked below) are only present when the corresponding feature is enabled - missing conditions do not block readiness.

Ready (= WorkloadsReady ∧ RouterReady)
├── WorkloadsReady (aggregate)
│ ├── MainWorkloadReady (single-node only)
│ ├── WorkerWorkloadReady (multi-node only)
│ ├── PrefillWorkloadReady (prefill-decode only)
│ ├── PrefillWorkerWorkloadReady (multi-node prefill-decode only)
│ ├── ScalingReady (autoscaling only)
│ └── PrefillScalingReady (prefill-decode autoscaling only)
└── RouterReady (aggregate)
├── GatewaysReady (when gateway refs configured)
├── HTTPRoutesReady (when HTTP route configured)
├── InferencePoolReady (managed scheduler only)
└── SchedulerWorkloadReady (managed scheduler only)

PresetsCombined (independent gate, not part of Ready rollup)

Identifying the Deployment Topology

The set of conditions and status fields that appear tells you what kind of deployment you're looking at. A consumer can determine the topology without inspecting the spec - just read the status.

TopologyDistinguishing signals in status
Single-node vLLMMainWorkloadReady is present. status.workloads.primary.kind is Deployment. No WorkerWorkloadReady.
Multi-node (LeaderWorkerSet)WorkerWorkloadReady is present, MainWorkloadReady is absent. status.workloads.primary.kind is LeaderWorkerSet.
Prefill-decode disaggregated servingPrefillWorkloadReady is present. status.workloads.prefill is populated.
Multi-node prefill-decodeBoth WorkerWorkloadReady and PrefillWorkerWorkloadReady are present.
With llm-d schedulerSchedulerWorkloadReady and InferencePoolReady are present. status.router.scheduler and status.workloads.scheduler are populated. Only for managed schedulers (not external pool refs).
With autoscalingScalingReady (and/or PrefillScalingReady for prefill-decode) is present.

These signals compose - a multi-node deployment with scheduler will show WorkerWorkloadReady, SchedulerWorkloadReady, and InferencePoolReady all at once.


Conditions

All conditions use positive polarity - True means healthy.

Top-Level Conditions

ConditionSet ByTrueFalsePresence
ReadyAggregated (Knative condition set)Both WorkloadsReady and RouterReady are True; the service is accepting trafficAt least one of WorkloadsReady or RouterReady is not TrueAlways
PresetsCombinedConfig reconcilerAll referenced LLMInferenceServiceConfig resources found and mergedConfig lookup or merge failed (see Reason Codes). Blocks reconciliation but does not directly affect ReadyAlways
WorkloadsReadyAggregated by DetermineWorkloadReadinessAll workload sub-conditions that are present are TrueAt least one workload sub-condition is FalseAlways
RouterReadyAggregated by DetermineRouterReadinessAll router sub-conditions that are present are TrueAt least one router sub-condition is FalseAlways

Workload Sub-Conditions

These roll up into WorkloadsReady. Optional conditions are cleared (removed) when the feature is not configured, so they never block readiness.

In single-node mode, MainWorkloadReady tracks the primary Deployment. In multi-node mode, MainWorkloadReady is cleared and the primary workload (a LeaderWorkerSet) is tracked by WorkerWorkloadReady instead.

ConditionSet ByTrueFalsePresence
MainWorkloadReadyWorkload reconcilerPrimary model-serving Deployment has desired replicas and passing readiness probesDeployment not at desired stateSingle-node only (cleared in multi-node)
WorkerWorkloadReadyWorkload reconcilerLeaderWorkerSet workload is available (all groups ready)LeaderWorkerSet not availableOnly with multi-node (LeaderWorkerSet)
PrefillWorkloadReadyWorkload reconcilerPrefill-phase workload is readyPrefill workload not readyOnly with prefill-decode disaggregated serving
PrefillWorkerWorkloadReadyWorkload reconcilerMulti-node LeaderWorkerSet for the prefill workload is availablePrefill LeaderWorkerSet not availableOnly with multi-node prefill-decode
ScalingReadyScaling reconcilerAutoscaler (HPA, KEDA ScaledObject, or VariantAutoscaling) for the primary workload is configured and operationalAutoscaler not ready (may surface propagated reasons from HPA/KEDA)Only when autoscaling is configured
PrefillScalingReadyScaling reconcilerAutoscaler for the prefill workload is configured and operationalAutoscaler not readyOnly with prefill-decode autoscaling

Router Sub-Conditions

These roll up into RouterReady. When no gateway or HTTP route configuration is present, the corresponding conditions are cleared rather than set to True.

ConditionSet ByTrueFalsePresence
GatewaysReadyRouter reconcilerAll referenced Gateway resources exist and report ready statusGateway not found, not ready, or ref invalidOnly when gateway refs are configured
HTTPRoutesReadyRouter reconcilerAll HTTPRoute resources created and accepted by their parent GatewaysHTTPRoute not created, not accepted, or ref invalidOnly when HTTP route is configured
InferencePoolReadyRouter reconcilerInferencePool resource created and readyPool not found, not ready, or waiting for GatewayOnly when managed scheduler is enabled
SchedulerWorkloadReadyScheduler reconcilerEndpoint Picker (EPP) scheduler Deployment has desired replicasScheduler pods not readyOnly when managed scheduler is enabled

Reason Codes

When a condition is False, the reason field indicates what went wrong. The message field provides additional detail.

Workload and scaling conditions can also surface reasons propagated from underlying resources (Deployment, LeaderWorkerSet, HPA, KEDA ScaledObject). For example, MainWorkloadReady may show DeploymentUnavailable or ProgressDeadlineExceeded from the Deployment status, and ScalingReady may show HPAProgressing, ScaledObjectProgressing, FailedGetExternalMetric, or TriggerError from the autoscaler. The tables below list controller-defined reasons; propagated reasons use the originating resource's own reason strings.

Config Reasons (PresetsCombined)

ReasonDescriptionAction
ConfigNotFoundA referenced LLMInferenceServiceConfig does not exist in any searched namespaceVerify the config name and namespace. The controller watches for config creation and will recover automatically
CombineBaseErrorConfig merge failed due to a conflict or validation errorCheck the condition message for details. Review baseRefs in the spec for conflicting fields

Workload Reasons

ReasonCondition(s)Description
StoppedAny workload conditionService is force-stopped via annotation serving.kserve.io/stop: "true"
ReconcileCertsErrorMainWorkloadReadyTLS certificate reconciliation failed
ReconcileWorkloadPermissionsErrorMainWorkloadReadyServiceAccount or RBAC reconciliation failed
ReconcileSingleNodeWorkloadErrorMainWorkloadReadyDeployment creation or update failed
ReconcileMultiNodeWorkloadErrorWorkerWorkloadReadyLeaderWorkerSet reconciliation failed (LWS may also surface its own reasons)
ReconcileWorkloadServiceErrorMainWorkloadReadyWorkload Service creation or update failed
ScalingCRDNotFoundMainWorkloadReadyThe autoscaling CRD (e.g. HPA, KEDA ScaledObject) is not installed on the cluster
ReconcileScalingErrorMainWorkloadReadyAutoscaler resource creation or update failed

Router Reasons

ReasonCondition(s)Description
StoppedAny router conditionService is force-stopped via annotation serving.kserve.io/stop: "true"
RefsInvalidGatewaysReady, HTTPRoutesReadyA gateway or route reference in the spec is malformed or references an unsupported kind
GatewaysNotReadyGatewaysReadyOne or more referenced Gateways are not reporting ready status
GatewayPreconditionNotMetHTTPRoutesReadyGateway preconditions not met before HTTPRoute reconciliation
HTTPRouteReconcileErrorHTTPRoutesReadyHTTPRoute creation or update failed
HTTPRouteFetchErrorHTTPRoutesReadyFailed to fetch referenced HTTPRoute resources
HTTPRoutesNotReadyHTTPRoutesReadyOne or more HTTPRoutes are not accepted by their parent Gateway
PlatformNetworkingReconcileErrorHTTPRoutesReadyPlatform-specific networking reconciliation failed
InferencePoolNotReadyInferencePoolReadyInferencePool exists but is not ready
InferencePoolFetchErrorInferencePoolReadyFailed to fetch the InferencePool resource
WaitingForGatewayInferencePoolReadyInferencePool is waiting for its parent Gateway to become ready
SchedulerReconcileErrorSchedulerWorkloadReadyEPP scheduler Deployment creation or update failed

Observed Status Fields

Beyond conditions, the status includes structured references to the resources the controller created during reconciliation. These fields are populated after a successful reconcile and cleared when the service is force-stopped (annotation serving.kserve.io/stop: "true"). Consumers should treat missing observed fields the same as "not yet reconciled" - don't assume an error if the fields are absent on a newly created or stopped service.

status.workloads

Typed references to the Kubernetes resources backing the LLMInferenceService. Use these to navigate directly to the backing workload without guessing names.

FieldTypeDescriptionPresence
primaryTypedLocalObjectReferenceDeployment (single-node) or LeaderWorkerSet (multi-node) running the main model workloadAlways (when reconciled)
prefillTypedLocalObjectReferenceDeployment or LeaderWorkerSet for the prefill phaseOnly with prefill-decode disaggregation
serviceTypedLocalObjectReferenceClusterIP Service for workload trafficAlways (when reconciled)
schedulerTypedLocalObjectReferenceEPP scheduler DeploymentOnly with managed scheduler (not external pool refs)

status.router

The observed routing topology - the networking resources the controller found during reconciliation.

FieldTypeDescriptionPresence
gateways[]ObservedGatewayGateways with matched listener names and bound HTTPRoutesWhen routing is configured
scheduler.inferencePoolObjectReferenceInferencePool resource referenceOnly with managed scheduler
scheduler.serviceObjectReferenceEPP Service referenceOnly with managed scheduler

Each gateway entry includes:

  • Gateway name, namespace, group, and kind
  • listeners[] - names of the listeners that matched this service
  • httpRoutes[] - HTTPRoute references bound through this gateway

status.addresses

Each address carries an optional origin field (ObjectReference) identifying which Gateway produced it. This enables multi-gateway disambiguation - consumers can group endpoints by their source gateway.

addresses:
- url: "https://my-model.example.com/ns/my-model"
origin:
group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
namespace: istio-system

status.appliedConfigs

An ordered list of LLMInferenceServiceConfig references that contributed to the merged configuration. Each entry carries a source field distinguishing auto-injected well-known configs (Preset) from user-specified configs (UserRef).

appliedConfigs:
- name: multi-node-defaults
namespace: kserve-system
source: Preset
- name: team-overrides
namespace: ml-team
source: UserRef

Sample Status

Healthy service (single-node with scheduler)

status:
url: "https://my-llm.example.com"
conditions:
- type: Ready
status: "True"
lastTransitionTime: "2025-06-01T10:30:00Z"
observedGeneration: 3
- type: PresetsCombined
status: "True"
lastTransitionTime: "2025-06-01T10:28:00Z"
observedGeneration: 3
- type: WorkloadsReady
status: "True"
lastTransitionTime: "2025-06-01T10:30:00Z"
observedGeneration: 3
- type: MainWorkloadReady
status: "True"
lastTransitionTime: "2025-06-01T10:30:00Z"
observedGeneration: 3
- type: RouterReady
status: "True"
lastTransitionTime: "2025-06-01T10:29:00Z"
observedGeneration: 3
- type: GatewaysReady
status: "True"
lastTransitionTime: "2025-06-01T10:28:30Z"
observedGeneration: 3
- type: HTTPRoutesReady
status: "True"
lastTransitionTime: "2025-06-01T10:28:45Z"
observedGeneration: 3
- type: InferencePoolReady
status: "True"
lastTransitionTime: "2025-06-01T10:29:00Z"
observedGeneration: 3
- type: SchedulerWorkloadReady
status: "True"
lastTransitionTime: "2025-06-01T10:29:00Z"
observedGeneration: 3
appliedConfigs:
- name: multi-node-defaults
namespace: kserve-system
source: Preset
- name: team-overrides
namespace: ml-team
source: UserRef
router:
gateways:
- name: inference-gateway
namespace: istio-system
listeners: [https]
httpRoutes:
- name: my-llm-kserve-route
namespace: ml-team
scheduler:
inferencePool:
name: my-llm-inference-pool
namespace: ml-team
service:
name: my-llm-epp-service
namespace: ml-team
workloads:
primary:
apiGroup: apps
kind: Deployment
name: my-llm-kserve
service:
kind: Service
name: my-llm-kserve-workload-svc
scheduler:
apiGroup: apps
kind: Deployment
name: my-llm-kserve-router-scheduler
addresses:
- url: "https://my-llm.example.com"
origin:
group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
namespace: istio-system

Failing service (missing config)

In this example, a referenced LLMInferenceServiceConfig was deleted. The controller surfaces ConfigNotFound on PresetsCombined and short-circuits before workload or router reconciliation. Note that Ready stays at its previous value (or Unknown on a new service) because PresetsCombined is not part of the Ready rollup - consumers must check PresetsCombined independently.

status:
conditions:
- type: Ready
status: "Unknown"
lastTransitionTime: "2025-06-02T14:05:00Z"
observedGeneration: 4
- type: PresetsCombined
status: "False"
reason: "ConfigNotFound"
message: "LLMInferenceServiceConfig 'team-overrides' not found in namespaces [ml-team, kserve-system]"
lastTransitionTime: "2025-06-02T14:05:00Z"
observedGeneration: 4
- type: WorkloadsReady
status: "Unknown"
lastTransitionTime: "2025-06-02T14:05:00Z"
observedGeneration: 4
- type: RouterReady
status: "Unknown"
lastTransitionTime: "2025-06-02T14:05:00Z"
observedGeneration: 4

Troubleshooting

Start by checking conditions. If PresetsCombined is False, that's a config issue - the reconciler won't proceed to workloads or routing. If PresetsCombined is True but Ready is not, drill into WorkloadsReady or RouterReady.

Step 1: Check conditions

kubectl get llmisvc <name> -o jsonpath='{range .status.conditions[*]}{.type}={.status} {.reason} {.message}{"\n"}{end}'

Step 2: Follow the False branch

PresetsCombined=False:

  • ConfigNotFound - check that the referenced config exists: kubectl get llminferenceserviceconfig <name> -n <namespace>
  • CombineBaseError - review the message for conflicting fields, check spec.baseRefs

WorkloadsReady=False - check which workload sub-condition is False, then use status.workloads to find the resource:

# Describe the primary workload, reading kind and name directly from status
kubectl describe $(kubectl get llmisvc <name> -o jsonpath='{.status.workloads.primary.kind}/{.status.workloads.primary.name}')

# List pods for the workload
kubectl get pods -l app.kubernetes.io/name=<name>,app.kubernetes.io/part-of=llminferenceservice --sort-by=.metadata.creationTimestamp

RouterReady=False - check which router sub-condition is False, then use status.router to find the resource:

# Check gateway status
kubectl get llmisvc <name> -o jsonpath='{range .status.router.gateways[*]}{.name}/{.namespace}{"\n"}{end}'
kubectl get gateway <name> -n <namespace>

# Check HTTPRoute status
kubectl get httproute -l app.kubernetes.io/name=<name>,app.kubernetes.io/component=llminferenceservice-router

Step 3: Everything is True but not working

If all conditions are True but inference requests fail, the problem is usually at a layer the controller doesn't observe - the model server itself, the networking data plane, or the scheduler.

Check connectivity and URLs:

# Check the service URL
kubectl get llmisvc <name> -o jsonpath='{.status.url}'

# Check which gateway produced each address
kubectl get llmisvc <name> -o jsonpath='{range .status.addresses[*]}{.url} (via {.origin.name}){"\n"}{end}'

# Test connectivity
curl -v <url>/v1/models

Bypass the networking layer to isolate whether the issue is in the model server or the routing:

# Port-forward directly to the workload service, reading the name from status
kubectl port-forward svc/$(kubectl get llmisvc <name> -o jsonpath='{.status.workloads.service.name}') 8000:8000
curl localhost:8000/v1/models

Check model server logs - the pod may be ready but the model could be failing at inference time (OOM, CUDA errors, corrupted weights). The model server container is named main in all well-known configs:

# Read the workload name from status, then get logs
# For single-node (Deployment):
kubectl logs deploy/$(kubectl get llmisvc <name> -o jsonpath='{.status.workloads.primary.name}') -c main --tail=100

# For multi-node (LeaderWorkerSet) - list pods by label instead:
kubectl logs -l app.kubernetes.io/name=<name>,app.kubernetes.io/part-of=llminferenceservice -c main --tail=100

Check EPP/scheduler logs if the scheduler is enabled - the Endpoint Picker may be rejecting or misrouting requests:

kubectl logs deploy/$(kubectl get llmisvc <name> -o jsonpath='{.status.workloads.scheduler.name}') -c main --tail=100

Check namespace events for issues that conditions don't capture:

kubectl get events -n <namespace> --sort-by=.lastTimestamp --field-selector involvedObject.name=<name>