Key Points:
Dense Mapping vs. Key Points:
Real-Time Performance:
They have a dataset of size 50K which is useful. They have over 25K stars. I am not sure though if we need this level of detailed mapping.
With RTMW-l achieving a 70.2 mAP on the COCO-Wholebody benchmark, making it the first open-source model to exceed 70 mAP on this benchmark.
<aside> 📖
COCO-WholeBody Benchmark:
Outstanding Performance.
In this paper, we introduce Hyperpose, a novel flexible and high-performance pose estimation library. Hyperpose provides expressive Python APIs that enable developers to easily customise pose estimation algorithms for their applications. It further provides a model inference engine highly optimised for real-time pose estimation. This engine can dynamically dispatch carefully designed pose estimation tasks to CPUs and GPUs, thus automatically achieving high utilisation of hardware resources irrespective of deployment environments.
Good enough for a proof of concept.
Irrelevant.
Example: In a video pose estimation task, the spatial level optimization might involve designing a convolutional neural network (CNN) that effectively extracts features from each frame. The temporal level optimization might involve designing a recurrent neural network (RNN) or using temporal convolutional layers to combine features from multiple frames, ensuring that the pose estimation is accurate and consistent over time.
By optimizing both spatial and temporal aspects, the proposed ViPNAS method aims to achieve a better trade-off between accuracy and efficiency, enabling fast and accurate online video pose estimation.
We could learn something about the trade-off between accuracy and efficiency.
Produces a 3D mesh instead of only detecting the 2D keypoints, like a paper mentioned earlier but using a different technique.
Example:
Imagine you have a complex teacher model that has been trained for human pose estimation. This model is accurate but computationally expensive to run. You want to create a smaller, more efficient student model that can be deployed on devices with limited resources (e.g., mobile phones).
The concept of Knowledge Distillation might be useful in this or other tasks.
I guess these are synthesized datasets of random settings, we could explore their technique if we needed to play around with our data.
Synthetic data again.
Worth considering specially if speed became an issue.
This paper covers two concerns about using CNNs to get the 3D keypoints from 2D depth maps and offers an solutions to these concerns.
If we decided to use CNNs we better check the validity of the concerns covered here.