I started by reading a summary for the official paper
Which turned out to be really complicated so I checked an online article that covers the key components of the architecture instead
I then went to the basic code generated by DeepSeek for using Deep SORT and found that the first step is initializing the tracker so I went on and understood the tracker’s parameters
One parameter controlled the size of the descriptors stored in the VRAM, which confused me because I thought that Deep SORT can run on the CPU, so I had to double check if that was true
I understood that yes it was feasible to run on the CPU but there was the problem of the Re-ID model being super slow (no real time performance) because it was optimized for GPU usage
That was when I started looking into replacements for the Re-ID model that would grant me real time performance
DeepSeek then suggested some optimizations that would go with any Re-ID Model, so I checked those out
Optimizing DeepSORT for Mobile Deployment
One of the suggested optimizations was downsampling the input frames, which I thought would break the model because it expects an input of certain dimensions, so I checked to see if I was right
The optimizations also included using YOLO Nano for object detection to get ~10FPS which I thought was very slow given that for real time performance we need an FPS of ~30, so I checked what FPS should I be aiming for when deploying a tracking model on a mobile, a heads up, the response I got was not satisfactory or even to the point
The second step was defining a default dictionary for storing the basic information for tracked persons.