|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- pose-estimation |
|
|
- 3d-pose |
|
|
- computer-vision |
|
|
- pytorch |
|
|
- rtmpose |
|
|
datasets: |
|
|
- cocktail14 |
|
|
metrics: |
|
|
- mpjpe |
|
|
library_name: pytorch |
|
|
--- |
|
|
|
|
|
# RTMPose3D |
|
|
|
|
|
Real-time multi-person 3D whole-body pose estimation with 133 keypoints per person. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
RTMPose3D is a real-time 3D pose estimation model that detects and tracks 133 keypoints per person: |
|
|
- **17** body keypoints (COCO format) |
|
|
- **6** foot keypoints |
|
|
- **68** facial landmarks |
|
|
- **42** hand keypoints (21 per hand) |
|
|
|
|
|
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint. |
|
|
|
|
|
## Model Variants |
|
|
|
|
|
This repository contains checkpoints for: |
|
|
|
|
|
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File | |
|
|
|-------|------------|-------|------------------|-----------------| |
|
|
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` | |
|
|
| RTMW3D-L (Large) | ~65M | Real-time | 0.678 | `rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth` | |
|
|
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.680 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` | |
|
|
|
|
|
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint. |
|
|
|
|
|
## Model Variants |
|
|
|
|
|
This repository contains checkpoints for: |
|
|
|
|
|
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File | |
|
|
|-------|------------|-------|------------------|-----------------| |
|
|
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` | |
|
|
| RTMW3D-L (Large) | ~65M | Real-time | 0.045 | `rtmw3d-l_cock14-0d4ad840_20240422.pth` | |
|
|
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.057 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` | |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install git+https://github.com/b-arac/rtmpose3d.git |
|
|
``` |
|
|
|
|
|
Or clone and install locally: |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/b-arac/rtmpose3d.git |
|
|
cd rtmpose3d |
|
|
pip install -r requirements.txt |
|
|
pip install -e . |
|
|
``` |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Using the HuggingFace Transformers-style API |
|
|
|
|
|
```python |
|
|
import cv2 |
|
|
from rtmpose3d import RTMPose3D |
|
|
|
|
|
# Initialize model (auto-downloads checkpoints from this repo) |
|
|
model = RTMPose3D.from_pretrained('rbarac/rtmpose3d', device='cuda:0') |
|
|
|
|
|
# Run inference |
|
|
image = cv2.imread('person.jpg') |
|
|
results = model(image, return_tensors='np') |
|
|
|
|
|
# Access results |
|
|
keypoints_3d = results['keypoints_3d'] # [N, 133, 3] - 3D coords in meters |
|
|
keypoints_2d = results['keypoints_2d'] # [N, 133, 2] - pixel coords |
|
|
scores = results['scores'] # [N, 133] - confidence [0, 1] |
|
|
``` |
|
|
|
|
|
### Using the Simple Inference API |
|
|
|
|
|
```python |
|
|
from rtmpose3d import RTMPose3DInference |
|
|
|
|
|
# Initialize with model size |
|
|
model = RTMPose3DInference(model_size='l', device='cuda:0') # or 'x' for extra large |
|
|
|
|
|
# Run inference |
|
|
results = model(image) |
|
|
print(results['keypoints_3d'].shape) # [N, 133, 3] |
|
|
``` |
|
|
|
|
|
### Single Person Detection |
|
|
|
|
|
Detect only the most prominent person in the image: |
|
|
|
|
|
```python |
|
|
# Works with both APIs |
|
|
results = model(image, single_person=True) # Returns only N=1 |
|
|
``` |
|
|
|
|
|
## Output Format |
|
|
|
|
|
```python |
|
|
{ |
|
|
'keypoints_3d': np.ndarray, # [N, 133, 3] - (X, Y, Z) in meters |
|
|
'keypoints_2d': np.ndarray, # [N, 133, 2] - (x, y) pixel coordinates |
|
|
'scores': np.ndarray, # [N, 133] - confidence scores [0, 1] |
|
|
'bboxes': np.ndarray # [N, 4] - bounding boxes [x1, y1, x2, y2] |
|
|
} |
|
|
``` |
|
|
|
|
|
Where `N` is the number of detected persons. |
|
|
|
|
|
### Coordinate Systems |
|
|
|
|
|
**2D Keypoints** - Pixel coordinates: |
|
|
- X: horizontal position [0, image_width] |
|
|
- Y: vertical position [0, image_height] |
|
|
|
|
|
**3D Keypoints** - Camera-relative coordinates in meters (Z-up convention): |
|
|
- X: horizontal (negative=left, positive=right) |
|
|
- Y: depth (negative=closer, positive=farther) |
|
|
- Z: vertical (negative=down, positive=up) |
|
|
|
|
|
## Keypoint Indices |
|
|
|
|
|
| Index Range | Body Part | Count | Description | |
|
|
|-------------|-----------|-------|-------------| |
|
|
| 0-16 | Body | 17 | Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles | |
|
|
| 17-22 | Feet | 6 | Foot keypoints | |
|
|
| 23-90 | Face | 68 | Facial landmarks | |
|
|
| 91-111 | Left Hand | 21 | Left hand keypoints | |
|
|
| 112-132 | Right Hand | 21 | Right hand keypoints | |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The models were trained on the **Cocktail14** dataset, which combines 14 public 3D pose datasets: |
|
|
- Human3.6M |
|
|
- COCO-WholeBody |
|
|
- UBody |
|
|
- And 11 more datasets |
|
|
|
|
|
## Performance |
|
|
|
|
|
Evaluated on standard 3D pose benchmarks: |
|
|
|
|
|
- **RTMW3D-L**: 0.045 MPJPE, real-time inference (~30 FPS on RTX 3090) |
|
|
- **RTMW3D-X**: 0.057 MPJPE, slower but higher accuracy |
|
|
|
|
|
## Requirements |
|
|
|
|
|
- Python >= 3.8 |
|
|
- PyTorch >= 2.0.0 |
|
|
- CUDA-capable GPU (4GB+ VRAM recommended) |
|
|
- mmcv >= 2.0.0 |
|
|
- MMPose >= 1.0.0 |
|
|
- MMDetection >= 3.0.0 |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{rtmpose3d2025, |
|
|
title={RTMPose3D: Real-Time Multi-Person 3D Pose Estimation}, |
|
|
author={Arac, Bahadir}, |
|
|
year={2025}, |
|
|
publisher={GitHub}, |
|
|
url={https://github.com/b-arac/rtmpose3d} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Built on [MMPose](https://github.com/open-mmlab/mmpose) by OpenMMLab. Models trained by the MMPose team on the Cocktail14 dataset. |
|
|
|
|
|
## Links |
|
|
|
|
|
- **GitHub Repository**: [b-arac/rtmpose3d](https://github.com/b-arac/rtmpose3d) |
|
|
- **Documentation**: See README in the repository |
|
|
- **MMPose**: [open-mmlab/mmpose](https://github.com/open-mmlab/mmpose) |
|
|
|