Detecting and recognizing moving objects is one of the most challenging tasks in video analysis. The key to motion detection lies in determining changes in an object features over some period of time.
This task can be carried out with Artificial Intelligence (AI). There are several neural networks focused on detecting objects and movements in a video. However, balancing their operation between accuracy and productivity is a difficult task for a development team.
In this article, we discuss several use cases for action detection, key challenges in detecting moving objects, and how to overcome them with AI-powered algorithms.
Why do we need to detect motions in a video?
Action detection in a video can improve operations in industries that require analyzing people movements. No AI can replace human perception, because people can detect actions more accurately and connect them with the context of an event. But using artificial intelligence approaches like deep learning and machine learning is more productive when it comes to detecting the same pattern in similar videos.
Motion detection can be applied to several fields:
- Public safety — Neural networks can recognize crimes or movements with malicious intent, such as traffic violations, fights, drawing a gun, etc. When applied to CCTV footage, a motion detection software can detect a violation n real-time and immediately alert the police. It’s also useful for analyzing recorded data in order to detect criminal behavior.
- Healthcare — Monitoring human actions is applied in healthcare institutions for elderly people. Doctors use deep learning-based software to detect actions that led to trauma and make a more accurate diagnosis.
- Automotive — An autopilot in a driverless car relies heavily on video processing in order to ride the streets safely. AI-powered algorithms for object and action detection help such a car to “see” traffic signs, road marking, pedestrians, and vehicles.
- Sports — Back in the days, statistics during live sport events were gathered by several analysts manually. It’s quite hard to spot the movements of an athlete, so these statistics weren’t always accurate. Today, when you watch a football match and see information on dribbling and pinpoint passes — it was probably gathered by a motion detection algorithm based on neural networks.
- Marketing — A lot of world-wide brands use emotion recognition algorithms to study the reaction of a test group to the brand’s new product, ads, important news, etc. Emotion recognition algorithms are based on AI-powered algorithms applicable to motion detection. By analyzing the face and body language of a person, they detect how this person feels during the test.
Challenges of motion detection
The key challenge of motion detection is achieving high accuracy of results. This can be affected by many factors: low video quality, changes of object’s shape or appearance, unpredicted motions of objects, correlation of an object and background size, etc.
There are several issues that make accurate action detection tricky:
- Continuous detection. This problem arises when you need to detect an interrupted or incomplete action. An algorithm believes a motion to be finished when, for example, the object is occluded. It doesn’t recognize an action as something you need to detect, which leads to an incorrect analysis of a video.
- Movement segmentation. If you need to detect a movement that can be a part of another movement (for example, dribbling during a football game), you need to teach your algorithm to separate one action from another. Otherwise, you’ll get a lot of false-positive alerts or misdetected actions.
- Movements in the changing backgrounds. Let’s suppose you need to spot a certain movement in the crowd. Noticing a certain person in such an environment is hard enough even for a human’s eyes. Detecting a certain person doing a certain motion is even harder as an algorithm needs to analyze a lot of actions happening within one frame. Processing such a video requires powerful hardware and a lot of time.
- Human actions detection. Human activity holds a lot of challenges — it’s abrupt, chaotic, can be partial or incomplete. It’s best detected with spatio-temporal descriptors, but such algorithms require more elaborate training.
How to overcome these challenges with AI
A key to solving those challenges lies in several factors:
- relatable datasets and sufficient training. There’s a vast amount of training datasets for the most common tasks. You can download a relevant dataset, but it’s best to create your own. This way you’ll provide more accurate training for your ML model or neural network.
- using high-resolution videos for motion detection. It’s easier for a human to recognize and interpret motions on a high-resolution video instead of poor-quality one. A neural network works the same way. Using a high-quality video for training and the real project improves the accuracy of detection. Note that processing such a video requires powerful hardware and reduces the performance of your algorithm.
- selecting the most suitable neural network for your task. In the chapter below we will talk about the application of four key neural networks. Before choosing one conduct thorough research to be sure that your artificial neural network suits your project best.
Neural networks for motion detection
An artificial neural network (ANN) is a type of networks that processes information similarly to a human brain. Inspired by biological neural networks, ANN consists of neurons (nodes) that can be trained to perform a certain type of task. There are types of neural networks designed for computer vision, image and speech recognition, machine translation, directing robotics, etc.
The next types of neural networks that can be used for motion detection:
- Convolutional neural network (СNN) is based on the work of a human visual cortex. CNN is mostly used for image and video processing. This type of neural networks offers an accurate object and motion detection, even with segmented or incomplete actions. On the other hand, using CNN requires a lot of computational resources.
- Region-based convolutional neural network (R-CNN) is a type of CNN aimed for object detection. Its architecture allows processing and finding movement patterns in a video, including small items or changing backgrounds. However, R-CNN lacks accuracy when it comes to detecting continuous motions.
- Recurrent neural network (RNN) is designed to analyze sequences of data, e.g. image or word sequences. It’s possible because of the RNN architecture: the nodes are connected with all the other nodes in the network. This way, the network can learn not only from datasets but from its previous experience as well. It allows RNN to analyze continuous or abrupt motion more efficiently compared to other neural networks. However, RNN requires more computing resources.
- Spiking neural network (SNN) is a new approach to motion detection. Instead of relying on object detection and comparing the positioning of an object in each video frame, this type of neural networks recognizes new objects as spikes. SNN is more productive compared with other neural networks. But this network is relatively new, so there isn’t a lot of datasets and strategies for its training.
Detecting motions in the video is a complex task that requires experience and skills in AI development. In order to make your algorithm more accurate, make sure you chose the correct AI-powered algorithm and prepare the most applicable and complete dataset. This challenging processes may require a lot of time and efforts, but the results will impress you.