SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC
This paper delves into the enhancement of Simple Online and Realtime Tracking (SORT) by incorporating appearance information to elevate its tracking capabilities. By integrating a deep association metric learned from a vast person re-identification dataset, the algorithm exhibits improved tracking performance, particularly in scenarios of occlusions where object continuity is crucial. The research demonstrates a significant 45% reduction in identity switches, showcasing competitive tracking accuracy even at high frame rates, thereby advancing the efficiency and robustness of SORT in multiple object tracking applications.
Introduction
The section introduces the evolution of object tracking from batch processing to online scenarios, highlighting methods like Multiple Hypothesis Tracking (MHT) and Joint Probabilistic Data Association Filter (JPDAF) that perform frame-by-frame data association. Simple Online and Realtime Tracking (SORT) emerges as a simpler yet effective approach that utilizes Kalman filtering and the Hungarian method for data association based on bounding box overlap measurements.
- Object tracking has shifted towards tracking-by-detection, relying on global optimization methods for processing video batches, with flow network formulations and probabilistic graphical models being popular choices.
- MHT and JPDAF have been traditionally used for frame-by-frame data association, each with its own approach to hypothesis generation and computational complexity.
- SORT stands out for its simplicity and high-frame-rate performance, showcasing better results than MHT on standard detections, emphasizing the impact of object detector quality on tracking outcomes.
- SORT, while achieving good tracking precision, struggles with identity switches during occlusions due to limitations in the association metric’s accuracy under uncertainty.
- To address occlusion challenges, the authors introduce a more robust association metric integrating motion and appearance information through a pre-trained convolutional neural network (CNN).
- The integration of the CNN enhances tracking performance by improving robustness against misses and occlusions, ensuring ease of implementation and applicability to online scenarios.
- The authors have provided their code and pre-trained CNN model publicly to support further research experimentation and practical applications.
SORT WITH DEEP ASSOCIATION METRIC
The section delves into the integration of a deep association metric into the Simple Online and Realtime Tracking (SORT) framework. Here's a breakdown of the tracking methodology:
- The system employs a single hypothesis tracking approach utilizing recursive Kalman filtering and frame-by-frame data association.
- This methodology is crucial for enhancing the tracking performance by effectively associating measurements with existing tracks.
By incorporating a deep association metric into SORT, the researchers are able to:
- Improve the tracking accuracy, especially during extended occlusions.
- Shift computational complexity to an offline pre-training stage, optimizing real-time performance during online tracking.
- Utilize nearest neighbor queries in visual appearance space to establish measurement-to-track associations during online application.
This integration not only reduces identity switches by 45% but also maintains competitive performance levels at high frame rates, as evidenced by experimental evaluations.