Deep Learning-based PPE Classification

Every year numerous accidents occur in the construction industry, often due to a lack of Personal Protective Equipment (PPE). Consequently, there are rules and regulations requiring workers to wear safety equipment while on the construction site. Usually, there are surveillance cameras on-site, and managers are expected to enforce the rules; however, enforcement across different sites can vary. There are also external parties such as banks with stakes in these operations. In this blog we’ll show you how to use AI technology to reduce managerial overhead, cost, and ensure consistent safety.

Solution Overview

The video above shows the PPE recognition system that we will be creating. A deep-learning detection overlay has been drawn over a video stream of a demo construction site. When a person is detected to not be wearing safety equipment, an alert is raised. These alerts can then be used to close doors, turn off equipment, or raise alarms whenever there has been a PPE infraction, and can be accessed through a convenient API.

Below is a diagram of the architecture for the software used to produce the above demo. The core technologies used in this solution are BrainFrame, VisionCapsules, and the OpenVINO safety gear detection model. BrainFrame will use the OpenVINO-based capsules to perform inference on the video stream.

BrainFrame is our smart-vision platform that is built to be easily scaled, configured, and deployed on-premises or to the cloud. You can check out our website for details.

OpenVisionCapsules is an open-source deep-learning model container format developed by and released under the OpenCV umbrella. It’s used by BrainFrame to define standardized input, output, and configuration for AI algorithms. It allows for easy wrapping and packaging of deep-learning models so that they can be easily deployed and configured for fast integration into other software. For more details, you can read the docs or view the source code here.

OpenVINO is a comprehensive machine-learning toolkit, developed by Intel, used to quickly develop AI applications.

As shown in the above diagram, BrainFrame will

  1. Fetch live video from the camera(s)
  2. Run inference on the video frames using a person detector and a safety equipment classifier, using OpenVINO’s backend
  3. Assign safety equipment attributes to the people detections
  4. Create alerts whenever a person is detected as not wearing safety equipment
  5. Send the inference results and any triggered alerts to the BrainFrame client for display

We’re going to use an open-source deep-learning model that is in the OpenVINO format. To use this model in BrainFrame, we’ll convert it into the OpenVINO model format and then encapsulate it in an OpenVisionCapsules capsule. Then, we’ll be able to easily load the capsule into BrainFrame by dragging and dropping it into the capsules storage directory (you can follow the instructions here). BrainFrame will handle the rest for you: video streaming, loading the model, running inference, recording and processing results, and displaying the video stream and live results.

Converting the Model to OpenVINO Format

Now let’s get to the implementation details. First, we’re going to use a Safety Equipment detection model available through Intel’s IoT developer program. We’ll grab it from their GitHub repository. As this is a Caffe model, and OpenVisionCapsules does not yet natively support Caffe, we’ll first convert this model into the OpenVino IR format. This can be easily done with the OpenVINO toolkit.

You can run the following command with OpenVINO’s Docker images at the model’s repository root. If you have OpenVINO installed locally, you can use this command as a template.

docker run -it \
    --volume $(pwd):/mount \
    --workdir /mount openvino/ubuntu18_dev:2020.3 \
    python3 /opt/intel/openvino/deployment_tools/model_optimizer/ \
        --input_model /mount/worker_safety_mobilenet.caffemodel \
        --output_dir/mount/output \
        --data_type FP32

Encapsulating the Model in an OpenVisionCapsules capsule

After that, we need to wrap the model in an OpenVisionCapsules-format capsule. As this blog is mainly focused on the Safety Equipment application, we’ve gone ahead and done that for you: an open-source capsule is on our Capsule Zoo on GitHub. If you want to learn more about creating the capsule yourself, we have a series of tutorials on our website that will show you how.

Once you have the capsule and load it into BrainFrame, the server will begin performing analysis on the video stream using it. The client will then start receiving and displaying results:

Detecting Lack of Equipment

You’ll see that the deep learning model is producing a bounding box for each piece of safety equipment detected in the video. However, there’s a problem! We’re not detecting when a person isn’t wearing any safety equipment! We can do this by fusing data from multiple capsules. If BrainFrame can also detect people, then it can determine when there are no equipment detections for a corresponding person. 

If we can combine the person detections and safety equipment detections, we can use the safety equipment detector to classify when a person is/isn’t wearing their equipment. The classifier will not produce stand-alone safety equipment bounding boxes, but they will instead become attributes of the person detections. So how do we do this fusion?

First, we’ll need a person detector. We’re going to use an open-source capsule from Aotu’s Capsule Zoo. This capsule also uses an OpenVINO model internally, which you can find in OpenVINO’s model zoo.

Pairing Equipment with Individuals

We’re going to use Intersection over Union (IoU), a common metric in the deep-learning field, to determine how well two detections overlap. We can calculate the IoU of two bounding boxes using the simple equation found below:

Two bounding boxes with a higher IoU are more likely to correspond with each other. If there are multiple person detections in a video frame, their safety equipment detections might overlap. If this happens, we need to match the equipment detections to their corresponding person detection. We’ll need to go through all possible combinations to find the best match.

This has the potential to be a very expensive procedure, but we have a trick to make this more efficient: we’ll convert it to a linear assignment problem. A linear assignment problem is concerned with assigning two sets of items to each other, minimizing the total assignment “cost”.

In our case, we can abstract the cost as cost = 1 – IoU. Two bounding boxes with a low IoU probably do not correspond to each other, so they’ll have a higher assignment cost. Now let’s take a look at the screenshot below. There are three people, two safety hats, and two safety vests in the frame. With our above equation, we can create the IoU cost matrix.

After we get the IoU cost matrix, we’ll use an algorithm to “solve” it and align the detections. We won’t go in-depth about it in this article, but if you are interested, check out this Wikipedia page for more details. OpenVisionCapsules’ provides a linear assignment solver in its utils library. After using the solver, we will get the indices of the best matches (i.e. those with the lowest cost). In this case, they are positions (0, 0) and (1, 1), which means that “hat 1” matches “person 1”, and “hat 2” matches “person 2”.

With this approach, we have converted the safety equipment detector into a classifier. Below is a screenshot from the BrainFrame client, where you’ll see that the person detections now have attributes that indicate whether or not they are wearing safety equipment.


After you have the person detector and PPE classifier loaded, you can set up some alarms to receive alerts whenever a person is detected as not wearing safety equipment. You can follow instructions on how to add alarms here. This would allow you to do things such as alerting a manager or turning off equipment whenever someone is not wearing their safety equipment.


We have now finished our PPE classification solution. You should have detections of people and safety equipment, as well as attributes indicating whether or not those people are wearing their equipment. If you encountered any problems, the classifier and detector are open-source, and available on our Capsule Zoo Github repository.

If you’d like to learn more about BrainFrame, sign up today on our website! You can use it for free and start building AI applications today!

© 2021 Aotu 版权所有