The prepare_train_labels.py script processes COCO-style keypoint annotations to prepare them for training a pose estimation model. It converts the raw annotations into a format suitable for the training pipeline, including calculating centers for cropping, filtering invalid annotations, and organizing data for efficient access during training.


How It Works

1. Command-Line Arguments

parser = argparse.ArgumentParser()
parser.add_argument('--labels', type=str, required=True, help='path to json with keypoints train labels')
parser.add_argument('--output-name', type=str, default='prepared_train_annotation.pkl',
                    help='name of output file with prepared keypoints annotation')
parser.add_argument('--net-input-size', type=int, default=368, help='network input size')
args = parser.parse_args()

2. Load the Input JSON

with open(args.labels, 'r') as f:
    data = json.load(f)

3. Organize Annotations by Image

annotations_per_image_mapping = {}
for annotation in data['annotations']:
    if annotation['num_keypoints'] != 0 and not annotation['iscrowd']:
        if annotation['image_id'] not in annotations_per_image_mapping:
            annotations_per_image_mapping[annotation['image_id']] = [[], []]
        annotations_per_image_mapping[annotation['image_id']][0].append(annotation)

4. Handle Crowd Segmentations

crowd_segmentations_per_image_mapping = {}
for annotation in data['annotations']:
    if annotation['iscrowd']:
        if annotation['image_id'] not in crowd_segmentations_per_image_mapping:
            crowd_segmentations_per_image_mapping[annotation['image_id']] = []
        crowd_segmentations_per_image_mapping[annotation['image_id']].append(annotation['segmentation'])

for image_id, crowd_segmentations in crowd_segmentations_per_image_mapping.items():
    if image_id in annotations_per_image_mapping:
        annotations_per_image_mapping[image_id][1] = crowd_segmentations