JavaScript seems to be disabled in your browser. For the best experience on our site, be sure to turn on Javascript in your browser.

Home
Person Detection Model | Computer Vision Model Zoo | Survail

Person Detection

What it does. Where it can be used: height, distance, angle of view. Chained Classifications.

Book a Demo

What it does:

Looks for the vertical shape of a human (including children) in video feeds, similar to the shape you would expect to see on a bathroom sign. It was trained on people facing the camera, facing away from the camera, facing sideways, and in a 3/4 view. It is not trained to detect a person when a camera is looking directly downward, such as a camera mounted directly above a person. It should not be expected to work reliably on babies, especially when swaddled.

Allows you to create alerts, filter event feeds and 24/7 views based on the presence of a person. Allows you to run chained classification and recognition models.

Best Practices:

Interaction with Vehicle / Wheeled Object Detections

Obviously, whenever a car is moving, it is highly likely that a person is in it. However, in most situations, a camera cannot see that person and it would not be very useful to get a person alert and a vehicle alert for every vehicle detection. So, typically, a vehicle alert or wheeled object detection will not also create a people detection alert, but this can happen, most commonly with a side view of a delivery vehicle because of their lack of a driver / passenger door.

Camera Placement

In order to detect a human in a video frame, we recommend that you should choose a camera that allows you to see that person at least 40x40 pixels. Our team can help you identify what camera can do this at what distances.

A computer vision model will only work as expected when used in the situation that it was developed around. Our person detector works well at detecting people when viewing them from a horizontal perspective. This model has not been trained on cameras facing directly downward, and will not work in an application such as a camera looking directly down at a doorway. Models only work on their trained use case.

Camera Mounting Height

Camera height plays an important role in video analytics, when using people detection features the camera needs to be able to view the shape of the person. If the camera is too high it will only see a partial human body or even just the top of someone's head. Without multiple reference points such as arms, legs, torso, and head you won’t be getting the most accurate detections.

person too close to camera mounted too steep

A wide-angle, fixed lens camera will have significantly reduced detection accuracy mounted 8 to 9 feet high and nearly impossible at higher than 15 feet.

Vertical Angle

If the camera is mounted high and angled too far downward, it will be looking down at people and it will not be effective. If it cannot see the stick figure outline of a person, it will not understand what it is looking at. If you are looking downwards more than outwards, people detection will not report accurately.

A camera with a too-steep angle of view, especially when also only seeing part of the person, result in the AI thinking it was a vehicle, as seen in this image. The solution is to give the camera more context by making sure that you are mounting where it can see the entirety of the person's body shape.

Distance of Person from Camera

With all video analytics, subject distance plays an important role. If the subject is too far away it will make it difficult to differentiate the subject from other items, as the person is too small to accurately see and identify. With survail you can detect an object that is as small as 20x20 pixels, but accuracy falls off if the object is not at least 40x40 pixels, so 40x40 is the default minimum object evaluation size. Accuracy increases as you increase the minimum object size. You can determine the minimum object size for survail to evaluate if an object is a person either globally (for all cameras) or with specific per-camera overrides.

Objects must be Mostly in View

Detections are significantly less accurate with less than 80% of an object or person in the camera's view and almost impossible with less than 40% of the object or person in view. When a car or person is mostly hidden behind a wall or only partially visible in the camera view, there won’t always be enough of the object’s outline visible to be able to know what the object is.

Limitation: Objects within Objects

When objects overlap it’s difficult to discern when one object starts and the other ends. For example, a person walking in front of a moving car will be more difficult to detect than a person in an empty field.

Limitation: 100 Objects Evaluated per Frame

Computer Vision models for real time video need to move fast. If you need to analyze 15 frames per second, then your analysis can’t take longer than 1/15 of a second. This is why the most popular computer vision frameworks limit the number of objects that can be evaluated, unusually at around 100 objects per frame.

Chained Classification and Recognition Modules In Development

People detections are used to select candidates for further detections, such as facial recognition, re-identifcation, object-in-hand detection, handbag detection, face mask compliance, etc.

Ignoring the requirements listed above will result in many of these chained detection having bad data or not being able to run at all. For example, obviously face detection / recognition will be impossible if the horizontal angle of the camera is so high that you record video of the top of heads rather than of faces.

This model was made by NVIDIA.