Skip to main content

gRPC API Reference

ServerLive​

The ServerLive API indicates if the inference server is able to receive and respond to metadata and inference requests.

rpc inference.GRPCInferenceService/ServerLive(ServerLiveRequest) returns ServerLiveResponse

ServerReady​

The ServerReady API indicates if the server is ready for inferencing.

rpc inference.GRPCInferenceService/ServerReady(ServerReadyRequest) returns ServerReadyResponse

ModelReady​

The ModelReady API indicates if a specific model is ready for inferencing.

rpc inference.GRPCInferenceService/ModelReady(ModelReadyRequest) returns ModelReadyResponse

ServerMetadata​

The ServerMetadata API provides information about the server. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ServerMetadata(ServerMetadataRequest) returns ServerMetadataResponse

ModelMetadata​

The per-model metadata API provides information about a model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ModelMetadata(ModelMetadataRequest) returns ModelMetadataResponse

ModelInfer​

The ModelInfer API performs inference using the specified model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ModelInfer(ModelInferRequest) returns ModelInferResponse


Messages​

InferParameter​

An inference parameter value. The Parameters message describes a β€œname”/”value” pair, where the β€œname” is the name of the parameter and the β€œvalue” is a boolean, integer, or string corresponding to the parameter.

FieldTypeDescription
oneof parameter_choice.bool_paramboolA boolean parameter value.
oneof parameter_choice.int64_paramint64An int64 parameter value.
oneof parameter_choice.string_paramstringA string parameter value.
oneof parameter_choice.double_paramdoubleA double parameter value.
oneof parameter_choice.uint64_paramuint64A uint64 parameter value.

InferTensorContents​

The data contained in a tensor represented by the repeated type that matches the tensor's data type. Protobuf oneof is not used because oneofs cannot contain repeated fields.

FieldTypeDescription
bool_contentsrepeated boolRepresentation for BOOL data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
int_contentsrepeated int32Representation for INT8, INT16, and INT32 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
int64_contentsrepeated int64Representation for INT64 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
uint_contentsrepeated uint32Representation for UINT8, UINT16, and UINT32 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
uint64_contentsrepeated uint64Representation for UINT64 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
fp32_contentsrepeated floatRepresentation for FP32 data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
fp64_contentsrepeated doubleRepresentation for FP64 data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
bytes_contentsrepeated bytesRepresentation for BYTES data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

ModelInferRequest​

FieldTypeDescription
model_namestringThe name of the model to use for inferencing.
model_versionstringThe version of the model to use for inference. If not given the server will choose a version based on the model and internal policy.
idstringOptional identifier for the request. If specified will be returned in the response.
parametersmap ModelInferRequest.ParametersEntryOptional inference parameters.
inputsrepeated ModelInferRequest.InferInputTensorThe input tensors for the inference.
outputsrepeated ModelInferRequest.InferRequestedOutputTensorThe requested output tensors for the inference. Optional, if not specified all outputs produced by the model will be returned.
raw_input_contentsrepeated bytesThe data contained in an input tensor can be represented in "raw" bytes form or in the repeated type that matches the tensor's data type. To use the raw representation 'raw_input_contents' must be initialized with data for each tensor in the same order as 'inputs'. For each tensor, the size of this content must match what is expected by the tensor's shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferInputTensor::contents must not be specified for any input tensor. |

ModelInferRequest.InferInputTensor​

An input tensor for an inference request.

FieldTypeDescription
namestringThe tensor name.
datatypestringThe tensor data type.
shaperepeated int64The tensor shape.
parametersmap ModelInferRequest.InferInputTensor.ParametersEntryOptional inference input tensor parameters.
contentsInferTensorContentsThe tensor contents using a data-type format. This field must not be specified if "raw" tensor contents are being used for the inference request.

ModelInferRequest.InferInputTensor.ParametersEntry​

FieldTypeDescription
keystringN/A
valueInferParameterN/A

ModelInferRequest.InferRequestedOutputTensor​

An output tensor requested for an inference request.

FieldTypeDescription
namestringThe tensor name.
parametersmap ModelInferRequest.InferRequestedOutputTensor.ParametersEntryOptional requested output tensor parameters.

ModelInferRequest.InferRequestedOutputTensor.ParametersEntry​

FieldTypeDescription
keystringN/A
valueInferParameterN/A

ModelInferRequest.ParametersEntry​

FieldTypeDescription
keystringN/A
valueInferParameterN/A

ModelInferResponse​

FieldTypeDescription
model_namestringThe name of the model used for inference.
model_versionstringThe version of the model used for inference.
idstringThe id of the inference request if one was specified.
parametersmap ModelInferResponse.ParametersEntryOptional inference response parameters.
outputsrepeated ModelInferResponse.InferOutputTensorThe output tensors holding inference results.
raw_output_contentsrepeated bytesThe data contained in an output tensor can be represented in "raw" bytes form or in the repeated type that matches the tensor's data type. To use the raw representation 'raw_output_contents' must be initialized with data for each tensor in the same order as 'outputs'. For each tensor, the size of this content must match what is expected by the tensor's shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferOutputTensor::contents must not be specified for any output tensor. |

ModelInferResponse.InferOutputTensor​

An output tensor returned for an inference request.

FieldTypeDescription
namestringThe tensor name.
datatypestringThe tensor data type.
shaperepeated int64The tensor shape.
parametersmap ModelInferResponse.InferOutputTensor.ParametersEntryOptional output tensor parameters.
contentsInferTensorContentsThe tensor contents using a data-type format. This field must not be specified if "raw" tensor contents are being used for the inference response.

ModelInferResponse.InferOutputTensor.ParametersEntry​

FieldTypeDescription
keystringN/A
valueInferParameterN/A

ModelInferResponse.ParametersEntry​

FieldTypeDescription
keystringN/A
valueInferParameterN/A

ModelMetadataRequest​

FieldTypeDescription
namestringThe name of the model.
versionstringThe version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelMetadataResponse​

FieldTypeDescription
namestringThe model name.
versionsrepeated stringThe versions of the model available on the server.
platformstringThe model's platform. See Platforms.
inputsrepeated ModelMetadataResponse.TensorMetadataThe model's inputs.
outputsrepeated ModelMetadataResponse.TensorMetadataThe model's outputs.
propertiesmap ModelMetadataResponse.PropertiesEntryOptional Model Properties

ModelMetadataResponse.PropertiesEntry​

FieldTypeDescription
keystringN/A
valuestringN/A

ModelMetadataResponse.TensorMetadata​

Metadata for a tensor.

FieldTypeDescription
namestringThe tensor name.
datatypestringThe tensor data type.
shaperepeated int64The tensor shape. A variable-size dimension is represented by a -1 value.

ModelReadyRequest​

FieldTypeDescription
namestringThe name of the model to check for readiness.
versionstringThe version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelReadyResponse​

FieldTypeDescription
readyboolTrue if the model is ready, false if not ready.

ServerLiveRequest​

ServerLiveResponse​

FieldTypeDescription
liveboolTrue if the inference server is live, false if not live.

ServerMetadataRequest​

ServerMetadataResponse​

FieldTypeDescription
namestringThe server name.
versionstringThe server version.
extensionsrepeated stringThe extensions supported by the server.

ServerReadyRequest​

ServerReadyResponse​

FieldTypeDescription
readyboolTrue if the inference server is ready, false if not ready.