Quantitative Evaluations: 2D Pose Estimation
This guide outlines the process to evaluate Sapiens-Pose checkpoints on two datasets.\
- COCO-WholeBody: 133 keypoints (17 kps body, 6 kps feet, 68 kps face, 42 kps hands).
- COCO: 17 keypoints
π 1. Data Preparation
- Set
$DATA_ROOTas your training data root directory. - Download the
val2017images and 17 kps annotations from COCO. - Download the 133 kps annotations from COCO-WholeBody.
- Unzip the images and annotations as subfolders to
$DATA_ROOT. - Additionally, download the bounding-box detection on the
val2017set from COCO_val2017_detections_AP_H_70_person.json and place it under$DATA_ROOT/person_detection_results.
The data directory structure is as follows:
$DATA_ROOT/
β βββ val2017
β β βββ 000000000139.jpg
β β βββ 000000000285.jpg
β β βββ 000000000632.jpg
β βββ annotations
β β βββ person_keypoints_train2017.json
β β βββ person_keypoints_val2017.json
β β βββ coco_wholebody_train_v1.0.json
β β βββ coco_wholebody_val_v1.0.json
β βββ person_detection_results
β β βββ COCO_val2017_detections_AP_H_70_person.json
βοΈ 2. Configuration Update
Let $DATASET be either coco-wholebody for 133 kps or coco for 17 kps.
Edit $SAPIENS_ROOT/pose/configs/sapiens_pose/$DATASET/sapiens_1b-210e_$DATASET-1024x768.py:
- Update
val_dataloader.dataset.data_rootto your$DATA_ROOT. eg.data/coco. - Update
val_evaluator.ann_fileto also point to validation annotation file under$DATA_ROOT. - Update
bbox_fileto point to the bounding box detection file under$DATA_ROOT.
ποΈ 3. Evaluation
The following guide is for Sapiens-1B. You can find other backbones to evaluate under pose_configs_133 and pose_configs_17.
The testing scripts are under: $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b
Make sure you have activated the sapiens python conda environment.
A. π Single-node Testing
Use $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b/node.sh.
Key variables:
CHECKPOINT: Absolute path to your checkpointDEVICES: GPU IDs (e.g., "0,1,2,3,4,5,6,7")TEST_BATCH_SIZE_PER_GPU: Default 32OUTPUT_DIR: Checkpoint and log directorymode=multi-gpu: Launch multi-gpu testing with multiple workers for dataloading.mode=debug: (Optional) To debug. Launched single gpu dry run, with single worker for dataloading. Supports interactive debugging with pdb/ipdb.
Launch:
cd $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b
./node.sh
B. π Multi-node Testing (Slurm)
Use $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b/slurm.sh
Additional variables:
CONDA_ENV: Path to conda environmentNUM_NODES: Number of nodes (default 4, 8 GPUs per node)
Launch:
cd $SAPIENS_ROOT/pose/scripts/test/$DATASET/sapiens_1b
./slurm.sh
π 4. Results
Sapiens achieve state-of-the-art results for keypoint estimation on both datasets. Below we compare them with existing methods.
COCO-WholeBody - 133 Keypoints
| Model | Input Size | Body AP | Body AR | Feet AP | Feet AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | Config | Ckpt |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DeepPose | 384 Γ 288 | 44.4 | 56.8 | 36.8 | 53.7 | 49.3 | 66.3 | 23.5 | 41.0 | 33.5 | 48.4 | - | - |
| SimpleBaseline | 384 Γ 288 | 66.6 | 74.7 | 63.5 | 76.3 | 73.2 | 81.2 | 53.7 | 64.7 | 57.3 | 67.1 | - | - |
| HRNet | 384 Γ 288 | 70.1 | 77.3 | 58.6 | 69.2 | 72.7 | 78.3 | 51.6 | 60.4 | 58.6 | 67.4 | - | - |
| ZoomNAS | 384 Γ 288 | 74.0 | 80.7 | 61.7 | 71.8 | 88.9 | 93.0 | 62.5 | 74.0 | 65.4 | 74.4 | - | - |
| VitPose+-L | 256 Γ 192 | 75.3 | - | 77.1 | - | 63.0 | - | 54.2 | - | 60.6 | - | - | - |
| VitPose+-H | 256 Γ 192 | 75.9 | - | 77.9 | - | 63.6 | - | 54.7 | - | 61.2 | - | - | - |
| RTMPose-x | 384 Γ 288 | 71.4 | 78.4 | 69.2 | 81.0 | 88.8 | 92.2 | 59.0 | 68.5 | 65.3 | 73.3 | - | - |
| DWPose-m | 256 Γ 192 | 68.5 | 76.1 | 63.6 | 77.2 | 82.8 | 88.1 | 52.7 | 63.4 | 60.6 | 69.5 | - | - |
| DWPose-l | 384 Γ 288 | 72.2 | 78.9 | 70.4 | 81.7 | 88.7 | 92.1 | 62.1 | 71.0 | 66.5 | 74.3 | - | - |
| Sapiens-0.3B (Ours) | 1024 Γ 768 | 66.4 | 73.4 | 67.3 | 78.4 | 87.1 | 91.2 | 58.1 | 67.1 | 62.0 | 69.4 | config | ckpt |
| Sapiens-0.6B (Ours) | 1024 Γ 768 | 74.3 | 80.2 | 79.4 | 87.0 | 89.5 | 92.9 | 65.4 | 74.0 | 69.5 (+3.0) | 76.3 (+2.0) | config | ckpt |
| Sapiens-1B (Ours) | 1024 Γ 768 | 77.4 | 82.9 | 83.0 | 89.8 | 90.7 | 93.6 | 69.2 | 77.1 | 72.7 (+6.2) | 79.2 (+4.9) | config | ckpt |
| Sapiens-2B (Ours) | 1024 Γ 768 | 79.2 | 84.6 | 84.1 | 90.9 | 91.2 | 93.8 | 70.4 | 78.1 | 74.4 (+7.9) | 81.0 (+6.7) | config | ckpt |
COCO - 17 Keypoints
| Model | Input Size | AP | AP-50 | AP-75 | AP-M | AP-L | AR | AR-50 | AR-75 | AR-M | AR-L | Config | Ckpt |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SimpleBaseline | 256 Γ 192 | 73.5 | - | - | 69.9 | 80.2 | 79.0 | - | - | - | - | - | - |
| HRNet | 384 Γ 288 | 76.3 | - | - | 72.3 | 83.4 | 81.2 | - | - | - | - | - | - |
| UDP | 384 Γ 288 | 77.2 | - | - | 73.2 | 84.4 | 82.0 | - | - | - | - | - | - |
| FastPose | 256 Γ 192 | 73.3 | - | - | - | - | - | - | - | - | - | - | - |
| HRFormer | 256 Γ 192 | 77.2 | - | - | 73.2 | 84.2 | 82.0 | - | - | - | - | - | - |
| VitPose-S | 256 Γ 192 | 73.8 | - | - | 70.5 | 80.4 | 79.2 | - | - | - | - | - | - |
| VitPose-B | 256 Γ 192 | 75.8 | - | - | 72.1 | 82.2 | 81.1 | - | - | - | - | - | - |
| VitPose-L | 256 Γ 192 | 78.3 | - | - | 74.5 | 85.4 | 83.5 | - | - | - | - | - | - |
| VitPose-H | 256 Γ 192 | 79.1 | - | - | 75.3 | 86.0 | 84.1 | - | - | - | - | - | - |
| VitPose++-S | 256 Γ 192 | 75.8 | - | - | 72.3 | 82.6 | 81.0 | - | - | - | - | - | - |
| VitPose++-B | 256 Γ 192 | 77.0 | - | - | 73.4 | 84.0 | 82.6 | - | - | - | - | - | - |
| VitPose++-L | 256 Γ 192 | 78.6 | - | - | 75.2 | 85.6 | 84.1 | - | - | - | - | - | - |
| VitPose++-H | 256 Γ 192 | 79.4 | - | - | 75.8 | 86.5 | 84.8 | - | - | - | - | - | - |
| Sapiens-0.3B (Ours) | 1024 Γ 768 | 79.6 (+0.2) | 93.0 | 85.7 | 76.0 | 85.6 | 83.6 | 95.6 | 89.0 | 79.9 | 89.1 | config | ckpt |
| Sapiens-0.6B (Ours) | 1024 Γ 768 | 81.2 (+1.8) | 93.8 | 87.3 | 77.6 | 87.2 | 84.9 | 96.0 | 90.4 | 81.3 | 90.3 | config | ckpt |
| Sapiens-1B (Ours) | 1024 Γ 768 | 82.1 (+2.7) | 94.2 | 88.2 | 78.4 | 88.3 | 85.9 | 96.6 | 91.3 | 82.1 | 91.4 | config | ckpt |
| Sapiens-2B (Ours) | 1024 Γ 768 | 82.2 (+2.8) | 94.1 | 88.1 | 78.5 | 88.4 | 86.0 | 96.6 | 91.2 | 82.2 | 91.5 | config | ckpt |