V1 Protocol
KServe's V1 protocol offers a standardized prediction workflow across all model frameworks. This protocol version is still supported, but it is recommended that users migrate to the V2 protocol for better performance and standardization among serving runtimes. However, if a use case requires a more flexible schema than protocol V2 provides, the V1 protocol is still an option.
Overview
The V1 protocol is based on the TensorFlow Serving REST API and provides a consistent interface for making inference requests across different model frameworks. It supports both prediction and explanation endpoints and includes basic health checking features.
API Endpoints
API | Verb | Path | Request Payload | Response Payload |
---|---|---|---|---|
List Models | GET | /v1/models | {"models": [<model_name>]} | |
Model Ready | GET | /v1/models/<model_name> | {"name": <model_name>, "ready": bool} | |
Predict | POST | /v1/models/<model_name>:predict | {"instances": []}* or {"inputs": []}* | {"predictions": []} |
Explain | POST | /v1/models/<model_name>:explain | {"instances": []}* or {"inputs": []}* | {"predictions": [], "explanations": []} |
*Note: Payload is optional
API Definitions
API | Definition |
---|---|
Predict | The "predict" API performs inference on a model. The response is the prediction result. All InferenceServices speak the Tensorflow V1 HTTP API. |
Explain | The "explain" API is an optional component that provides model explanations in addition to predictions. The standardized explainer interface is identical to the Tensorflow V1 HTTP API with the addition of an ":explain" verb. |
Model Ready | The "model ready" health API indicates if a specific model is ready for inferencing. If the model(s) is downloaded and ready to serve requests, the model ready endpoint returns the list of accessible <model_name>(s). |
List Models | The "models" API exposes a list of models in the model registry. |
Payload Format
The V1 protocol uses a flexible JSON format for both requests and responses:
The response payload in V1 protocol is not strictly enforced. A custom server can define and return its own response payload. We encourage using the KServe defined response payload for consistency.
Request Format
KServe V1 protocol accepts both instances
and inputs
as the root key in the request payload. Both formats are equivalent and supported.
With instances
:
{
"instances": [
{
"feature_0": [value_1, value_2, ...],
"feature_1": [value_1, value_2, ...],
...
},
...
]
}
With inputs
:
{
"inputs": [
{
"feature_0": [value_1, value_2, ...],
"feature_1": [value_1, value_2, ...],
...
},
...
]
}
Or for simpler payloads with instances
:
{
"instances": [[value_1, value_2, ...], [value_1, value_2, ...], ...]
}
Or for simpler payloads with inputs
:
{
"inputs": [[value_1, value_2, ...], [value_1, value_2, ...], ...]
}
Response Format
{
"predictions": [
[value_1, value_2, ...],
[value_1, value_2, ...],
...
]
}
Examples
Predict API
Request with instances
:
curl -X POST http://localhost:8000/v1/models/mymodel:predict \
-d '{"instances": [[1.0, 2.0, 5.0], [5.0, 6.0, 5.0]]}'
Request with inputs
:
curl -X POST http://localhost:8000/v1/models/mymodel:predict \
-d '{"inputs": [[1.0, 2.0, 5.0], [5.0, 6.0, 5.0]]}'
Response:
{
"predictions": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
}
Explain API
Request with instances
:
curl -X POST http://localhost:8000/v1/models/mymodel:explain \
-d '{"instances": [[1.0, 2.0, 5.0], [5.0, 6.0, 5.0]]}'
Request with inputs
:
curl -X POST http://localhost:8000/v1/models/mymodel:explain \
-d '{"inputs": [[1.0, 2.0, 5.0], [5.0, 6.0, 5.0]]}'
Response:
{
"predictions": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
"explanations": [
{"importance_scores": [0.1, 0.2, 0.7]},
{"importance_scores": [0.3, 0.4, 0.3]}
]
}
Benefits and Limitations
Benefits
- Simple API interface that's easy to understand and implement
- Based on established TensorFlow Serving API
- Support for both prediction and explanation endpoints
- Flexible JSON schema for various payload formats
Limitations
- Limited standardization for response format
- Less efficient for large data payloads compared to V2
- No built-in health checking or readiness probes
- No support for binary data formats
- Limited metadata capabilities
When to Use V1 Protocol
- When you need the explain functionality that isn't available in V2
- For compatibility with existing systems built against TensorFlow Serving API
- When a more flexible schema is required for your specific use case
- For simpler deployments where advanced features of V2 aren't needed
Next Steps
- Explore the V2 Protocol for improved performance and standardization