Lightweight Human Pose Estimation
Real-time multi-person 2D pose estimation using a MobileNet backbone with Part Affinity Fields (PAF). Includes a TSA security screening demo that detects persons in a defined zone and checks whether their hands are raised.
Overview
The model detects 18 keypoints per person:
| Index | Keypoint | Index | Keypoint |
|---|---|---|---|
| 0 | nose | 9 | r_knee |
| 1 | neck | 10 | r_ank |
| 2 | r_sho | 11 | l_hip |
| 3 | r_elb | 12 | l_knee |
| 4 | r_wri | 13 | l_ank |
| 5 | l_sho | 14 | r_eye |
| 6 | l_elb | 15 | l_eye |
| 7 | l_wri | 16 | r_ear |
| 8 | r_hip | 17 | l_ear |
Keypoints are grouped into person instances using PAF vectors (19 connection pairs), allowing robust multi-person detection.
Project Structure
pose/
βββ demo.py # Inference demo with TSA screening logic
βββ train.py # Training script
βββ val.py # COCO validation script
βββ requirements.txt
βββ models/
β βββ with_mobilenet.py # PoseEstimationWithMobileNet architecture
β βββ checkpoint_iter_370000.pth # Pretrained checkpoint
βββ modules/
β βββ keypoints.py # Keypoint extraction and grouping (PAF)
β βββ pose.py # Pose class with 18 keypoints, tracking
β βββ conv.py # Conv building blocks
β βββ loss.py # L2 loss
β βββ load_state.py # Checkpoint loading utilities
β βββ get_parameters.py # Parameter group helpers
β βββ one_euro_filter.py # Temporal smoothing filter
βββ datasets/
β βββ coco.py # COCO train/val dataset loaders
β βββ transformations.py # Data augmentation pipeline
βββ scripts/
β βββ prepare_train_labels.py # Convert COCO JSON to internal format
β βββ make_val_subset.py # Create validation subset
β βββ convert_to_onnx.py # Export model to ONNX
βββ TRAIN-ON-CUSTOM-DATASET.md # Guide for custom dataset training
Installation
pip install -r requirements.txt
Requirements: torch>=0.4.1, torchvision>=0.2.1, opencv-python>=3.4.0.14, numpy>=1.14.0, pycocotools==2.0.0, shapely (for demo zone detection).
Running the Demo
The demo runs inference on images or video and overlays detected poses. It includes TSA screening logic: it checks whether a person's hips are inside a configurable polygon zone and whether their wrists are raised above their shoulders.
On a video file:
python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --video path/to/video.mp4
On images:
python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --images path/to/image.jpg
CPU-only:
python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --video path/to/video.mp4 --cpu
| Argument | Default | Description |
|---|---|---|
--checkpoint-path |
required | Path to .pth checkpoint |
--height-size |
256 | Network input height |
--video |
β | Path to video file or webcam id |
--images |
β | One or more image paths |
--cpu |
false | Run on CPU |
--track |
1 | Enable pose ID tracking across frames |
--smooth |
1 | Apply One Euro filter smoothing to keypoints |
Output is saved to test.mp4. The screening zone polygon is defined in demo.py and can be adjusted for your camera setup.
Validation
Evaluate against COCO keypoints annotations:
python val.py \
--labels path/to/val_labels.json \
--images-folder path/to/val2017 \
--checkpoint-path models/checkpoint_iter_370000.pth \
--output-name detections.json
Training
1. Prepare labels (converts COCO annotation JSON to the internal format):
python scripts/prepare_train_labels.py --labels path/to/annotations.json
2. Create a validation subset (optional):
python scripts/make_val_subset.py --labels path/to/val_labels.json
3. Start training:
python train.py \
--prepared-train-labels prepared_train_labels.pkl \
--train-images-folder path/to/train2017 \
--val-labels path/to/val_labels.json \
--val-images-folder path/to/val2017 \
--checkpoint-path models/checkpoint_iter_370000.pth \
--experiment-name my_experiment
Key training arguments:
| Argument | Default | Description |
|---|---|---|
--num-refinement-stages |
1 | Number of PAF refinement stages |
--base-lr |
4e-5 | Initial learning rate |
--batch-size |
80 | Batch size |
--from-mobilenet |
false | Initialize from MobileNet weights only |
--weights-only |
false | Load weights but reset optimizer/scheduler |
--checkpoint-after |
5000 | Save checkpoint every N iterations |
--val-after |
5000 | Run validation every N iterations |
Checkpoints are saved to <experiment-name>_checkpoints/.
Export to ONNX
python scripts/convert_to_onnx.py \
--checkpoint-path models/checkpoint_iter_370000.pth \
--output-name human-pose-estimation.onnx
The exported model expects input shape [1, 3, 256, 456] and produces four outputs: heatmaps and PAFs for each refinement stage.
Custom Dataset Training
See TRAIN-ON-CUSTOM-DATASET.md for a full walkthrough covering:
- Dataset annotation format (COCO JSON)
- How to define keypoint pairs (
BODY_PARTS_KPT_IDS) and PAF channel indices (BODY_PARTS_PAF_IDS) - Code modifications required for a different number of keypoints or skeleton topology
Model tree for Sharath33/POSE
Base model
camenduru/openpose