I started by reading a summary for the official paper

Paper Summary

Which turned out to be really complicated so I checked an online article that covers the key components of the architecture instead

Key Components

I then went to the basic code generated by DeepSeek for using Deep SORT and found that the first step is initializing the tracker so I went on and understood the tracker’s parameters

Adjustable Parameters

One parameter controlled the size of the descriptors stored in the VRAM, which confused me because I thought that Deep SORT can run on the CPU, so I had to double check if that was true

CPU vs. GPU

I understood that yes it was feasible to run on the CPU but there was the problem of the Re-ID model being super slow (no real time performance) because it was optimized for GPU usage

Re-ID Model

That was when I started looking into replacements for the Re-ID model that would grant me real time performance

Re-ID Model Replacements

DeepSeek then suggested some optimizations that would go with any Re-ID Model, so I checked those out

Optimizing DeepSORT for Mobile Deployment

One of the suggested optimizations was downsampling the input frames, which I thought would break the model because it expects an input of certain dimensions, so I checked to see if I was right

Downsizing Re-ID Inputs

The optimizations also included using YOLO Nano for object detection to get ~10FPS which I thought was very slow given that for real time performance we need an FPS of ~30, so I checked what FPS should I be aiming for when deploying a tracking model on a mobile, a heads up, the response I got was not satisfactory or even to the point

Increase FPS

The second step was defining a default dictionary for storing the basic information for tracked persons.

Default Dict for Tracked Persons