Prepare Train Labels

The prepare_train_labels.py script processes COCO-style keypoint annotations to prepare them for training a pose estimation model. It converts the raw annotations into a format suitable for the training pipeline, including calculating centers for cropping, filtering invalid annotations, and organizing data for efficient access during training.

How It Works

1. Command-Line Arguments

parser = argparse.ArgumentParser()
parser.add_argument('--labels', type=str, required=True, help='path to json with keypoints train labels')
parser.add_argument('--output-name', type=str, default='prepared_train_annotation.pkl',
                    help='name of output file with prepared keypoints annotation')
parser.add_argument('--net-input-size', type=int, default=368, help='network input size')
args = parser.parse_args()

The script accepts three command-line arguments:
1. -labels: Path to the input JSON file containing COCO-style keypoint annotations (required).
2. -output-name: Name of the output file where the processed annotations will be saved (default: prepared_train_annotation.pkl).
3. -net-input-size: The input size of the network during training (default: 368).

2. Load the Input JSON

with open(args.labels, 'r') as f:
    data = json.load(f)

The script reads the input JSON file specified by -labels and loads its content into the data dictionary.

3. Organize Annotations by Image

annotations_per_image_mapping = {}
for annotation in data['annotations']:
    if annotation['num_keypoints'] != 0 and not annotation['iscrowd']:
        if annotation['image_id'] not in annotations_per_image_mapping:
            annotations_per_image_mapping[annotation['image_id']] = [[], []]
        annotations_per_image_mapping[annotation['image_id']][0].append(annotation)

The script groups annotations by image_id into the annotations_per_image_mapping dictionary.
It excludes annotations with zero keypoints or marked as "crowd" (iscrowd).

4. Handle Crowd Segmentations

crowd_segmentations_per_image_mapping = {}
for annotation in data['annotations']:
    if annotation['iscrowd']:
        if annotation['image_id'] not in crowd_segmentations_per_image_mapping:
            crowd_segmentations_per_image_mapping[annotation['image_id']] = []
        crowd_segmentations_per_image_mapping[annotation['image_id']].append(annotation['segmentation'])

for image_id, crowd_segmentations in crowd_segmentations_per_image_mapping.items():
    if image_id in annotations_per_image_mapping:
        annotations_per_image_mapping[image_id][1] = crowd_segmentations

Crowd segmentations are stored separately in crowd_segmentations_per_image_mapping.
These segmentations are later added to the corresponding images in annotations_per_image_mapping.