You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Lightweight Human Pose Estimation

Real-time multi-person 2D pose estimation using a MobileNet backbone with Part Affinity Fields (PAF). Includes a TSA security screening demo that detects persons in a defined zone and checks whether their hands are raised.

Overview

The model detects 18 keypoints per person:

Index Keypoint Index Keypoint
0 nose 9 r_knee
1 neck 10 r_ank
2 r_sho 11 l_hip
3 r_elb 12 l_knee
4 r_wri 13 l_ank
5 l_sho 14 r_eye
6 l_elb 15 l_eye
7 l_wri 16 r_ear
8 r_hip 17 l_ear

Keypoints are grouped into person instances using PAF vectors (19 connection pairs), allowing robust multi-person detection.

Project Structure

pose/
β”œβ”€β”€ demo.py                        # Inference demo with TSA screening logic
β”œβ”€β”€ train.py                       # Training script
β”œβ”€β”€ val.py                         # COCO validation script
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ with_mobilenet.py          # PoseEstimationWithMobileNet architecture
β”‚   └── checkpoint_iter_370000.pth # Pretrained checkpoint
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ keypoints.py               # Keypoint extraction and grouping (PAF)
β”‚   β”œβ”€β”€ pose.py                    # Pose class with 18 keypoints, tracking
β”‚   β”œβ”€β”€ conv.py                    # Conv building blocks
β”‚   β”œβ”€β”€ loss.py                    # L2 loss
β”‚   β”œβ”€β”€ load_state.py              # Checkpoint loading utilities
β”‚   β”œβ”€β”€ get_parameters.py          # Parameter group helpers
β”‚   └── one_euro_filter.py         # Temporal smoothing filter
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ coco.py                    # COCO train/val dataset loaders
β”‚   └── transformations.py         # Data augmentation pipeline
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ prepare_train_labels.py    # Convert COCO JSON to internal format
β”‚   β”œβ”€β”€ make_val_subset.py         # Create validation subset
β”‚   └── convert_to_onnx.py         # Export model to ONNX
└── TRAIN-ON-CUSTOM-DATASET.md     # Guide for custom dataset training

Installation

pip install -r requirements.txt

Requirements: torch>=0.4.1, torchvision>=0.2.1, opencv-python>=3.4.0.14, numpy>=1.14.0, pycocotools==2.0.0, shapely (for demo zone detection).

Running the Demo

The demo runs inference on images or video and overlays detected poses. It includes TSA screening logic: it checks whether a person's hips are inside a configurable polygon zone and whether their wrists are raised above their shoulders.

On a video file:

python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --video path/to/video.mp4

On images:

python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --images path/to/image.jpg

CPU-only:

python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --video path/to/video.mp4 --cpu
Argument Default Description
--checkpoint-path required Path to .pth checkpoint
--height-size 256 Network input height
--video β€” Path to video file or webcam id
--images β€” One or more image paths
--cpu false Run on CPU
--track 1 Enable pose ID tracking across frames
--smooth 1 Apply One Euro filter smoothing to keypoints

Output is saved to test.mp4. The screening zone polygon is defined in demo.py and can be adjusted for your camera setup.

Validation

Evaluate against COCO keypoints annotations:

python val.py \
  --labels path/to/val_labels.json \
  --images-folder path/to/val2017 \
  --checkpoint-path models/checkpoint_iter_370000.pth \
  --output-name detections.json

Training

1. Prepare labels (converts COCO annotation JSON to the internal format):

python scripts/prepare_train_labels.py --labels path/to/annotations.json

2. Create a validation subset (optional):

python scripts/make_val_subset.py --labels path/to/val_labels.json

3. Start training:

python train.py \
  --prepared-train-labels prepared_train_labels.pkl \
  --train-images-folder path/to/train2017 \
  --val-labels path/to/val_labels.json \
  --val-images-folder path/to/val2017 \
  --checkpoint-path models/checkpoint_iter_370000.pth \
  --experiment-name my_experiment

Key training arguments:

Argument Default Description
--num-refinement-stages 1 Number of PAF refinement stages
--base-lr 4e-5 Initial learning rate
--batch-size 80 Batch size
--from-mobilenet false Initialize from MobileNet weights only
--weights-only false Load weights but reset optimizer/scheduler
--checkpoint-after 5000 Save checkpoint every N iterations
--val-after 5000 Run validation every N iterations

Checkpoints are saved to <experiment-name>_checkpoints/.

Export to ONNX

python scripts/convert_to_onnx.py \
  --checkpoint-path models/checkpoint_iter_370000.pth \
  --output-name human-pose-estimation.onnx

The exported model expects input shape [1, 3, 256, 456] and produces four outputs: heatmaps and PAFs for each refinement stage.

Custom Dataset Training

See TRAIN-ON-CUSTOM-DATASET.md for a full walkthrough covering:

  • Dataset annotation format (COCO JSON)
  • How to define keypoint pairs (BODY_PARTS_KPT_IDS) and PAF channel indices (BODY_PARTS_PAF_IDS)
  • Code modifications required for a different number of keypoints or skeleton topology
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Sharath33/POSE

Finetuned
(1)
this model

Collection including Sharath33/POSE