The make_val_subset.py script is a utility for creating a subset of validation data from a larger dataset. It takes a JSON file containing validation labels (e.g., COCO-style annotations) and extracts a specified number of images and their corresponding annotations to create a smaller dataset. This is useful for testing or debugging on a smaller, more manageable dataset.
parser = argparse.ArgumentParser()
parser.add_argument('--labels', type=str, required=True, help='path to json with keypoints val labels')
parser.add_argument('--output-name', type=str, default='val_subset.json',
help='name of output file with subset of val labels')
parser.add_argument('--num-images', type=int, default=250, help='number of images in subset')
args = parser.parse_args()
-labels
: Path to the input JSON file containing validation labels (required).-output-name
: Name of the output JSON file for the subset (default: val_subset.json
).-num-images
: Number of images to include in the subset (default: 250
).with open(args.labels, 'r') as f:
data = json.load(f)
-labels
and loads its content into the data dictionary.random.seed(0)
total_val_images = 5000
idxs = list(range(total_val_images))
random.shuffle(idxs)
images_by_id = {}
for idx in idxs[:args.num_images]:
images_by_id[data['images'][idx]['id']] = data['images'][idx]
0
) is set to ensure reproducibility.5000
validation images (total_val_images
).idxs
) and selects the first num_images
indices.images_by_id
to store the selected images, indexed by their unique id
.annotations_by_image_id = {}
for annotation in data['annotations']:
if annotation['image_id'] in images_by_id:
if not annotation['image_id'] in annotations_by_image_id:
annotations_by_image_id[annotation['image_id']] = []
annotations_by_image_id[annotation['image_id']].append(annotation)
image_id
matches one of the selected images.