Train and deploy computer vision models

Many machines have cameras through which they can monitor their environment. With machine leaning, you can train models on patterns within that visual data. You can collect data from the camera stream and label any patterns within the images.

If a camera is pointed at a food display, for example, you can label the image of the display with full or empty, or label items such as individual pizza_slices.

Using a model trained on such images, machines can make inferences about their environments. Your machines can then automatically trigger alerts or perform other actions. If a food display is empty, the machine could, for example, alert a supervisor to restock the display.

Common use cases for this are quality assurance and health and safety applications.

Diagram of the camera component to data management service to ML model service to vision service pipeline.

Prerequisites

A running machine connected to the Viam app. Click to see instructions.
Add a new machine in the Viam app. On the machine’s page, follow the setup instructions to install viam-server on the computer you’re using for your project. Wait until your machine has successfully connected to the Viam app.
A configured camera. Click to see instructions.

First, connect the camera to your machine’s computer if it’s not already connected (like with an inbuilt laptop webcam).

Then, navigate to the CONFIGURE tab of your machine’s page in the Viam app. Click the + icon next to your machine part in the left-hand menu and select Component. The webcam model supports most USB cameras and inbuilt laptop webcams. You can find additional camera models in the camera configuration documentation.

Complete the camera configuration and use the TEST panel in the configuration card to test that the camera is working.

No computer or webcam?

No problem. You don’t need to buy or own any hardware to complete this guide.

Use Try Viam to borrow a rover free of cost online. The rover already has viam-server installed and is configured with some components, including a webcam.

Once you have borrowed a rover, go to its CONTROL tab where you can view camera streams and also drive the rover. You should have a front-facing camera and an overhead view of your rover. Now you know what the rover can perceive.

To change what the front-facing camera is pointed at, find the cam camera panel on the CONTROL tab and click Toggle picture-in-picture so you can continue to view the camera stream. Then, find the viam_base panel and drive the rover around.

Now that you have seen that the cameras on your Try Viam rover work, begin by Creating a dataset and labeling data. You can drive the rover around as you capture data to get a variety of images from different angles.

Create a dataset and label data

You will start by collecting images from your machine as it monitors its environment and add these images to a dataset. By creating a dataset from your images, you can then train a machine learning model. To ensure the model you create performs well, you need to train it on a variety of images that cover the range of things your machine should be able to recognize.

To capture image data from a machine, you will use the data management service.

Just testing and want a dataset to get started with? Click here.

We have two datasets you can use for testing, one with shapes and the other with a wooden figure:

The shapes dataset. The datasets subtab of the data tab in the Viam app, showing a custom 'viam-figure' dataset of 25 images, most containing the wooden Viam figure
  1. Download the shapes dataset or download the wooden figure dataset.

  2. Unzip the download.

  3. Open a terminal and go to the dataset folder.

  4. Create a python script in the dataset’s folder with the following contents:

    # Assumption: The dataset was exported using the `viam dataset export` command.
    # This script is being run from the `destination` directory.
    
    import asyncio
    import os
    import json
    import argparse
    
    from viam.rpc.dial import DialOptions, Credentials
    from viam.app.viam_client import ViamClient
    from viam.proto.app.data import BinaryID
    
    async def connect(args) -> ViamClient:
        dial_options = DialOptions(
            credentials=Credentials(
                type="api-key",
                payload=args.api_key,
            ),
            auth_entity=args.api_key_id
        )
        return await ViamClient.create_from_dial_options(dial_options)
    
    
    async def main():
        parser = argparse.ArgumentParser(
            description='Upload images, metadata, and tags to a new dataset')
        parser.add_argument('-org-id', dest='org_id', action='store',
                            required=True, help='Org Id')
        parser.add_argument('-api-key', dest='api_key', action='store',
                            required=True, help='API KEY with org admin access')
        parser.add_argument('-api-key-id', dest='api_key_id', action='store',
                            required=True, help='API KEY ID with org admin access')
        parser.add_argument('-machine-part-id', dest='machine_part_id',
                            action='store', required=True,
                            help='Machine part id for image metadata')
        parser.add_argument('-location-id', dest='location_id', action='store',
                            required=True, help='Location id for image metadata')
        parser.add_argument('-dataset-name', dest='dataset_name', action='store',
                            required=True,
                            help='Name of the data to create and upload to')
        args = parser.parse_args()
    
    
        # Make a ViamClient
        viam_client = await connect(args)
        # Instantiate a DataClient to run data client API methods on
        data_client = viam_client.data_client
    
        # Create dataset
        try:
            dataset_id = await data_client.create_dataset(
                name=args.dataset_name,
                organization_id=args.org_id
            )
            print("Created dataset: " + dataset_id)
        except Exception:
            print("Error. Check that the dataset name does not already exist.")
            print("See: https://app.viam.com/data/datasets")
            return 1
    
        file_ids = []
    
        for file_name in os.listdir("metadata/"):
            with open("metadata/" + file_name) as f:
                data = json.load(f)
                tags = None
                if "tags" in data["captureMetadata"].keys():
                    tags = data["captureMetadata"]["tags"]
    
                annotations = None
                if "annotations" in data.keys():
                    annotations = data["annotations"]
    
                image_file = data["fileName"]
    
                print("Uploading: " + image_file)
    
                id = await data_client.file_upload_from_path(
                    part_id=args.machine_part_id,
                    tags=tags,
                    filepath=os.path.join("data/", image_file)
                )
                print("FileID: " + id)
    
                binary_id = BinaryID(
                    file_id=id,
                    organization_id=args.org_id,
                    location_id=args.location_id
                )
    
                if annotations:
                    bboxes = annotations["bboxes"]
                    for box in bboxes:
                        await data_client.add_bounding_box_to_image_by_id(
                            binary_id=binary_id,
                            label=box["label"],
                            x_min_normalized=box["xMinNormalized"],
                            y_min_normalized=box["yMinNormalized"],
                            x_max_normalized=box["xMaxNormalized"],
                            y_max_normalized=box["yMaxNormalized"]
                        )
    
                file_ids.append(binary_id)
    
        await data_client.add_binary_data_to_dataset_by_ids(
            binary_ids=file_ids,
            dataset_id=dataset_id
        )
        print("Added files to dataset.")
        print("https://app.viam.com/data/datasets?id=" + dataset_id)
    
        viam_client.close()
    
    if __name__ == '__main__':
        asyncio.run(main())
    
  5. Run the script to upload the images and their metadata into a dataset in Viam app providing the following input:

    python upload_data.py -org-id <ORG-ID> -api-key <API-KEY> \
       -api-key-id <API-KEY-ID> -machine-part-id <MACHINE-PART-ID> \
       -location-id <LOCATION-ID> -dataset-name <NAME>
    
  6. Continue to Train a machine learning model.

1. Enable the data management service

In the configuration pane for your configured camera component, find the Data capture section. Click Add method.

When the Create a data management service prompt appears, click it to add the service to your machine. You can leave the default data manager settings.

2. Capture data

With the data management service configured on your machine, configure how the camera component captures data:

In the Data capture panel of your camera’s configuration, select ReadImage from the method selector.

Set your desired capture frequency. For example, set it to 0.05 to capture an image every 20 seconds.

Set the MIME type to your desired image format, for example image/jpeg.

3. Save to start capturing

Save the config.

With cloud sync enabled, your machine automatically uploads captured data to the Viam app after a short delay.

4. View data in the Viam app

Click on the menu of the camera component and click on View captured data. This takes you to the data tab.

View captured data option in the component menu

If you do not see images from your camera, try waiting a minute and refreshing the page to allow time for the images to be captured and then synced to the app at the interval you configured.

If no data appears after the sync interval, check the LOGS tab for errors.

5. Capture a variety of data

Your camera now saves images at the configured time interval. When training machine learning models, it is important to supply a variety of images. The dataset you create should represent the possible range of visual input. This may include capturing images of different angles, different configurations of objects and different lighting conditions The more varied the provided dataset, the more accurate the resulting model becomes.

Capture at least 10 images of anything you want your machine to recognize.

6. Label your images

Once you have enough images, you can disable data capture to avoid incurring fees for capturing large amounts of training data.

Then use the interface on the DATA tab to label your images.

Most use cases fall into one of two categories:

  • Detecting certain objects and their location within an image. For example, you may wish to know where and how many pizzas there are in an image. In this case, add a label for each object you would like to detect.
For instructions to add labels, click here.

To add a label, click on an image and select the Bounding box mode in the menu that opens. Choose an existing label or create a new label. Click on the image where you would like to add the bounding box and drag to where the bounding box should end.

To expand the image, click on the expand side menu arrow in the corner of the image:

Repeat this with all images.

You can add one or more bounding boxes for objects in each image.

  • Classifying an image as a whole. In other words, determining a descriptive state about an image. For example, you may wish to know whether an image of a food display is full, empty, or average or whether the quality of manufacturing output is good or bad. In this case, add tags to describe your images.
For instructions to add tags, click here.

To tag an image, click on an image and select the Image tags mode in the menu that opens. Add one or more tags to your image.

If you want to expand the image, click on the expand side menu arrow in the corner of the image.

Repeat this with all images.

7. Organize data into a dataset

To train a model, your images must be in a dataset.

Use the interface on the DATA tab to add your labeled images to a dataset.

Also add any unlabelled images to your dataset. Unlabelled images must not comprise more than 20% of your dataset. If you have 25 images in your dataset, at least 20 of those must be labelled.

Want to add images to a dataset programmatically? Click here.

You can also add all images with a certain label to a dataset using the viam dataset data add command or the Data Client API:

viam dataset create --org-id=<org-id> --name=<name>
viam dataset data add filter --dataset-id=<dataset-id> --tags=red_star,blue_square

You can run this script to add all images from your machine to a dataset:

import asyncio

from viam.rpc.dial import DialOptions, Credentials
from viam.app.viam_client import ViamClient
from viam.utils import create_filter
from viam.proto.app.data import BinaryID


async def connect() -> ViamClient:
    dial_options = DialOptions(
      credentials=Credentials(
        type="api-key",
        # Replace "<API-KEY>" (including brackets) with your machine's API key
        payload='<API-KEY>',
      ),
      # Replace "<API-KEY-ID>" (including brackets) with your machine's
      # API key ID
      auth_entity='<API-KEY-ID>'
    )
    return await ViamClient.create_from_dial_options(dial_options)


async def main():
    # Make a ViamClient
    viam_client = await connect()
    # Instantiate a DataClient to run data client API methods on
    data_client = viam_client.data_client

    # Replace "<PART-ID>" (including brackets) with your machine's part id
    my_filter = create_filter(part_id="<PART-ID>")

    print("Getting data for part...")
    binary_metadata, _, _ = await data_client.binary_data_by_filter(
        my_filter,
        include_binary_data=False
    )
    my_binary_ids = []

    for obj in binary_metadata:
        my_binary_ids.append(
            BinaryID(
                file_id=obj.metadata.id,
                organization_id=obj.metadata.capture_metadata.organization_id,
                location_id=obj.metadata.capture_metadata.location_id
                )
            )
    print("Creating dataset...")
    # Create dataset
    try:
        dataset_id = await data_client.create_dataset(
            name="MyDataset",
            organization_id=ORG_ID
        )
        print("Created dataset: " + dataset_id)
    except Exception:
        print("Error. Check that the dataset name does not already exist.")
        print("See: https://app.viam.com/data/datasets")
        return 1

    print("Adding data to dataset...")
    await data_client.add_binary_data_to_dataset_by_ids(
        binary_ids=my_binary_ids,
        dataset_id=dataset_id
    )
    print("Added files to dataset.")
    print("See dataset: https://app.viam.com/data/datasets?id=" + dataset_id)

    viam_client.close()

if __name__ == '__main__':
    asyncio.run(main())

Train a machine learning (ML) model

Now that you have a dataset with your labeled images, you are ready to train a machine learning model.

1. Train an ML model

In the Viam app, navigate to your list of DATASETS and select the one you want to train on.

Click Train model and follow the prompts.

You can train your model using Built-in training or using a training script from the Viam Registry.

Click Next steps.

The shapes dataset.

2. Fill in the details for your ML model

Enter a name for your new model.

For built-in trainings, select a Task Type:

  • Single Label Classification: The resulting model predicts one of the selected labels or UNKNOWN per image. Select this if you only have one label on each image. Ensure that the dataset you are training on also contains unlabeled images.
  • Multi Label Classification: The resulting model predicts one or more of the selected labels per image.
  • Object Detection: The resulting model predicts either no detected objects or any number of object labels alongside their locations per image.

Select the labels you want to train your model on from the Labels section. Unselected labels will be ignored, and will not be part of the resulting model.

Click Train model.

The data tab showing the train a model pane

3. Wait for your model to train

The model now starts training and you can follow its process on the TRAINING tab.

Once the model has finished training, it becomes visible on the MODELS tab.

You will receive an email when your model finishes training.

4. Debug your training job

From the TRAINING tab, click on your training job’s ID to see its logs.

You can also view your training jobs’ logs with the viam train logs command.

Test your ML model

Once your model has finished training, you can test it.

Ideally, you want your ML model to be able to work with a high level of confidence. As you test it, if you notice faulty predictions or confidence scores, you will need adjust your dataset and retrain your model.

If you trained a classification model, you can test it with the following instructions. If you trained a detection model, skip to deploy an ML model.

  1. Navigate to the DATA tab and click on the Images subtab.
  2. Click on an image to open the side menu, and select the Actions tab.
  3. In the Run model section, select your model and specify a confidence threshold.
  4. Click Run model

If the results exceed the confidence threshold, the Run model section shows a label and the responding confidence threshold.

Deploy an ML model

You have your trained model. Now you can deploy it to your machines and make live inferences.

To use an ML model on your machine, you need to deploy the model with an ML model service. The ML model service will run the model.

On its own the ML model service only runs the model. To use it to make inferences on a camera stream, you need to use it alongside a vision service.

Train models

1. Deploy your ML model

Navigate to the CONFIGURE tab of one of your machine in the Viam app. Add an ML model service that supports the ML model you just trained and add the model as the Model. For example use the ML model / TFLite CPU service for TFlite ML models. If you used the built-in training, this is the ML model service you need to use. If you used a custom training script, you may need a different ML model service.

Configure a service

2. Configure an mlmodel vision service

The ML model service will deploy and run the model.

The vision service works with the ML model services. It uses the ML model and applies it to the stream of images from your camera.

Add the vision / ML model service to your machine. Then, from the Select model dropdown, select the name of the ML model service you configured in the last step (for example, mlmodel-1).

Save your changes.

Deploy your model

3. Use your vision service

You can test your vision service by clicking on the Test area of its configuration panel or from the CONTROL tab.

The camera stream will show when the vision service identifies something. Try pointing the camera at a scene similar to your training data.

Detected blue star Detection of a viam figure with a confidence score of 0.97
Want to limit the number of shown classifications or detections? Click here.

If you are seeing a lot of classifications or detections, you can set a minimum confidence threshold.

On the configuration page of the vision service in the top right corner, click {} (Switch to advanced). Add the following JSON to the JSON configuration to set the default_minimum_confidence of the detector:

"default_minimum_confidence": 0.82

The full configuration for the attributes of the vision service should resemble:

{
  "mlmodel_name": "mlmodel-1",
  "default_minimum_confidence": 0.82
}

This optional attribute reduces your output by filtering out classifications or detections below the threshold of 82% confidence. You can adjust this attribute as necessary.

Click the Save button in the top right corner of the page to save your configuration, then close and reopen the Test panel of the vision service configuration panel. Now if you reopen the panel, you will only see classifications or detections with a confidence value higher than the default_minimum_confidence attribute.

For more detailed information, including optional attribute configuration, see the mlmodel docs.

Next steps

Now your machine can make inferences about its environment. The next step is to act based on these inferences:

  • Perform actions: You can use the vision service API to get information about your machine’s inferences and program behavior based on that.
  • Webhooks: You can use triggers to send webhooks when certain inferences are made. For an example of this, see the Helmer Monitoring tutorial

See the following tutorials for examples of using machine learning models to make your machine do thinks based on its inferences about its environment:

Have questions, or want to meet other people working on robots? Join our Community Discord.

If you notice any issues with the documentation, feel free to file an issue or edit this file.