POSE / README.md

Update README.md

afb2ad8 verified 6 days ago

5.88 kB

	---
	license: openrail++
	language:
	- en
	base_model:
	- camenduru/openpose
	pipeline_tag: keypoint-detection
	tags:
	- skeleton
	---
	# Lightweight Human Pose Estimation

	Real-time multi-person 2D pose estimation using a MobileNet backbone with Part Affinity Fields (PAF). Includes a TSA security screening demo that detects persons in a defined zone and checks whether their hands are raised.

	## Overview

	The model detects 18 keypoints per person:

	\| Index \| Keypoint \| Index \| Keypoint \|
	\|-------\|----------\|-------\|----------\|
	\| 0 \| nose \| 9 \| r_knee \|
	\| 1 \| neck \| 10 \| r_ank \|
	\| 2 \| r_sho \| 11 \| l_hip \|
	\| 3 \| r_elb \| 12 \| l_knee \|
	\| 4 \| r_wri \| 13 \| l_ank \|
	\| 5 \| l_sho \| 14 \| r_eye \|
	\| 6 \| l_elb \| 15 \| l_eye \|
	\| 7 \| l_wri \| 16 \| r_ear \|
	\| 8 \| r_hip \| 17 \| l_ear \|

	Keypoints are grouped into person instances using PAF vectors (19 connection pairs), allowing robust multi-person detection.

	## Project Structure

	```
	pose/
	├── demo.py # Inference demo with TSA screening logic
	├── train.py # Training script
	├── val.py # COCO validation script
	├── requirements.txt
	├── models/
	│ ├── with_mobilenet.py # PoseEstimationWithMobileNet architecture
	│ └── checkpoint_iter_370000.pth # Pretrained checkpoint
	├── modules/
	│ ├── keypoints.py # Keypoint extraction and grouping (PAF)
	│ ├── pose.py # Pose class with 18 keypoints, tracking
	│ ├── conv.py # Conv building blocks
	│ ├── loss.py # L2 loss
	│ ├── load_state.py # Checkpoint loading utilities
	│ ├── get_parameters.py # Parameter group helpers
	│ └── one_euro_filter.py # Temporal smoothing filter
	├── datasets/
	│ ├── coco.py # COCO train/val dataset loaders
	│ └── transformations.py # Data augmentation pipeline
	├── scripts/
	│ ├── prepare_train_labels.py # Convert COCO JSON to internal format
	│ ├── make_val_subset.py # Create validation subset
	│ └── convert_to_onnx.py # Export model to ONNX
	└── TRAIN-ON-CUSTOM-DATASET.md # Guide for custom dataset training
	```

	## Installation

	```bash
	pip install -r requirements.txt
	```

	Requirements: `torch>=0.4.1`, `torchvision>=0.2.1`, `opencv-python>=3.4.0.14`, `numpy>=1.14.0`, `pycocotools==2.0.0`, `shapely` (for demo zone detection).

	## Running the Demo

	The demo runs inference on images or video and overlays detected poses. It includes TSA screening logic: it checks whether a person's hips are inside a configurable polygon zone and whether their wrists are raised above their shoulders.

	On a video file:
	```bash
	python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --video path/to/video.mp4
	```

	On images:
	```bash
	python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --images path/to/image.jpg
	```

	CPU-only:
	```bash
	python demo.py --checkpoint-path models/checkpoint_iter_370000.pth --video path/to/video.mp4 --cpu
	```

	\| Argument \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `--checkpoint-path` \| required \| Path to `.pth` checkpoint \|
	\| `--height-size` \| 256 \| Network input height \|
	\| `--video` \| — \| Path to video file or webcam id \|
	\| `--images` \| — \| One or more image paths \|
	\| `--cpu` \| false \| Run on CPU \|
	\| `--track` \| 1 \| Enable pose ID tracking across frames \|
	\| `--smooth` \| 1 \| Apply One Euro filter smoothing to keypoints \|

	Output is saved to `test.mp4`. The screening zone polygon is defined in `demo.py` and can be adjusted for your camera setup.

	## Validation

	Evaluate against COCO keypoints annotations:

	```bash
	python val.py \
	--labels path/to/val_labels.json \
	--images-folder path/to/val2017 \
	--checkpoint-path models/checkpoint_iter_370000.pth \
	--output-name detections.json
	```

	## Training

	1. Prepare labels (converts COCO annotation JSON to the internal format):

	```bash
	python scripts/prepare_train_labels.py --labels path/to/annotations.json
	```

	2. Create a validation subset (optional):

	```bash
	python scripts/make_val_subset.py --labels path/to/val_labels.json
	```

	3. Start training:

	```bash
	python train.py \
	--prepared-train-labels prepared_train_labels.pkl \
	--train-images-folder path/to/train2017 \
	--val-labels path/to/val_labels.json \
	--val-images-folder path/to/val2017 \
	--checkpoint-path models/checkpoint_iter_370000.pth \
	--experiment-name my_experiment
	```

	Key training arguments:

	\| Argument \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `--num-refinement-stages` \| 1 \| Number of PAF refinement stages \|
	\| `--base-lr` \| 4e-5 \| Initial learning rate \|
	\| `--batch-size` \| 80 \| Batch size \|
	\| `--from-mobilenet` \| false \| Initialize from MobileNet weights only \|
	\| `--weights-only` \| false \| Load weights but reset optimizer/scheduler \|
	\| `--checkpoint-after` \| 5000 \| Save checkpoint every N iterations \|
	\| `--val-after` \| 5000 \| Run validation every N iterations \|

	Checkpoints are saved to `<experiment-name>_checkpoints/`.

	## Export to ONNX

	```bash
	python scripts/convert_to_onnx.py \
	--checkpoint-path models/checkpoint_iter_370000.pth \
	--output-name human-pose-estimation.onnx
	```

	The exported model expects input shape `[1, 3, 256, 456]` and produces four outputs: heatmaps and PAFs for each refinement stage.

	## Custom Dataset Training

	See [TRAIN-ON-CUSTOM-DATASET.md](TRAIN-ON-CUSTOM-DATASET.md) for a full walkthrough covering:
	- Dataset annotation format (COCO JSON)
	- How to define keypoint pairs (`BODY_PARTS_KPT_IDS`) and PAF channel indices (`BODY_PARTS_PAF_IDS`)
	- Code modifications required for a different number of keypoints or skeleton topology