Design your ML Models for Vision
The Machine Learning (ML) model service allow you to deploy machine learning models to your smart machine.
Vision services, like an "mlmodel"
detector or classifier, enable your machines to identify and classify objects in images with the deployed models’ predictions.
The two services work closely together, with the vision service relying on the deployed ML model to make inferences.
If you are designing your own ML Model service, you must try to make your ML models’ shapes match the input and output tensors the mlmodel
vision service expects to work with if you want the two services to coordinate in classification or detection.
To be able to use a deployed ML model, the mlmodel
vision service checks for descriptions of these characteristics in the metadata of the model, as defined in the Python SDK.
For an example of this, see Example Metadata.
Input tensor: input_info
in metadata
For both classification and detection models, the vision service sends a single input tensor to the ML Model with the following structure:
- One input tensor called
"image"
with typeuint8
orfloat32
and shape(1, height, width, 3)
, with the last channel3
being the RGB bytes of the pixel. - If image
height
andwidth
are unknown or variable, thenheight
and/orwidth
= -1
. During inference runtime the image will have a known height and width.
Output tensors: output_info
in metadata
Data can be returned by the ML model in many ways, due to the variety of machine learning models for computer vision. The vision service will try to take into account many different forms of models as specified by the metadata of the model. If the model does not provide metadata, the vision service will make the following assumptions:
For classifications:
- The model returns 1 tensor, called
"probability"
with shape(1, n_classifications)
- The data is floating point numbers representing probability, between
0
and1
. - If the data is not between
0
and1
, the vision service computes a softmax over the data, resulting in floating point numbers between0
and1
representing probability.
For detections:
- The model returns 3 tensors
"Location"
: the bounding boxes- Shape:
(1, n_detections, 4)
- Bounding boxes each have shape
(xmin, ymin, xmax, ymax)
- Bounding boxes are the proportion of where the box corner is in the image, using a number between
0
and1
.
- Shape:
"Category"
: the labels on the boxes- Shape:
(1, n_detections)
- Integers representing the index of the label.
- Shape:
"Score"
: The confidence scores of the label- Shape:
(1, n_detections)
- Floating point numbers representing probability, between
0
and1
.
- Shape:
For labels:
Many computer vision models have an associated ’labelfile.txt’ that lists the class labels associated with the model. To get those labels associated with the model, currently the vision service looks at the first element of the
output_info
list in the ML models’ metadata and checks for a key called"labels"
in its"extra"
struct. The value of that key should be the full path to the label file on the machine. See Example Metadata for an example of this.label_path = ml_model_metadata.output_info.extra["labels"]
Example metadata
For example, a TF lite detector model that works with the vision service is structured with the following metadata:
name: "EfficientDet Lite0 V1"
type: "tflite_detector"
description: "Identify which of a known set of objects might be present and provide information about their positions within the given image or a video stream."
input_info {
name: "image"
description: "Input image to be detected. The expected image is 320 x 320, with three channels (red, blue, and green) per pixel. Each value in the tensor is a single byte between 0 and 255."
data_type: "uint8"
shape: 1
shape: 320
shape: 320
shape: 3
extra {
}
}
output_info {
name: "location"
description: "The locations of the detected boxes."
data_type: "float32"
extra {
fields {
key: "labels"
value {
string_value: "/Users/<username>/.viam/packages/.data/ml_model/effdet0-1685040512967/effdetlabels.txt"
}
}
}
}
output_info {
name: "category"
description: "The categories of the detected boxes."
data_type: "float32"
associated_files {
name: "labelmap.txt"
description: "Label of objects that this model can recognize."
label_type: LABEL_TYPE_TENSOR_VALUE
}
extra {
}
}
output_info {
name: "score"
description: "The scores of the detected boxes."
data_type: "float32"
extra {
}
}
output_info {
name: "number of detections"
description: "The number of the detected boxes."
data_type: "float32"
extra {
}
}
Have questions, or want to meet other people working on robots? Join our Community Discord.
If you notice any issues with the documentation, feel free to file an issue or edit this file.
Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!