Real-time vision-based violent actions detection through CCTV cameras with pose estimation
Date published
Free to read from
Supervisor/s
Journal Title
Journal ISSN
Volume Title
Publisher
Department
Type
ISSN
Format
Citation
Abstract
In large structures under video surveillance, or when a place is crowded, a CCTV operator cannot monitor hundreds of people on different video streams. This paper presents a proof of concept for a real-time vision-based system for detecting violent actions through CCTV cameras with pose estimation. The proposed system uses a combination of computer vision techniques, including pose estimation, object tracking, and a deep learning algorithm based on time-series features, to accurately identify violent actions in real-time. Our features are based on a fixed-size rolling window that computes the position of each person’s limb along with their velocity. The proposed pipeline achieves a high accuracy rate of 92% with an overall latency of around 0.07 seconds per frame using a RTX 3060 Mobile GPU, making it a powerful tool for enhancing public safety and security. This system can be deployed in a wide range of scenarios, including public places, transportation hubs, and other critical infrastructure, to provide real-time alerts and facilitate rapid response in case of violent incidents