File size: 5,273 Bytes
3c426f1 1333f57 3c426f1 1333f57 3c426f1 1333f57 3c426f1 1333f57 3c426f1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
---
license: apache-2.0
tags:
- pose-estimation
- 3d-pose
- computer-vision
- pytorch
- rtmpose
datasets:
- cocktail14
metrics:
- mpjpe
library_name: pytorch
---
# RTMPose3D
Real-time multi-person 3D whole-body pose estimation with 133 keypoints per person.
## Model Description
RTMPose3D is a real-time 3D pose estimation model that detects and tracks 133 keypoints per person:
- **17** body keypoints (COCO format)
- **6** foot keypoints
- **68** facial landmarks
- **42** hand keypoints (21 per hand)
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
## Model Variants
This repository contains checkpoints for:
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|-------|------------|-------|------------------|-----------------|
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
| RTMW3D-L (Large) | ~65M | Real-time | 0.678 | `rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth` |
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.680 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
## Model Variants
This repository contains checkpoints for:
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|-------|------------|-------|------------------|-----------------|
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
| RTMW3D-L (Large) | ~65M | Real-time | 0.045 | `rtmw3d-l_cock14-0d4ad840_20240422.pth` |
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.057 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |
## Installation
```bash
pip install git+https://github.com/b-arac/rtmpose3d.git
```
Or clone and install locally:
```bash
git clone https://github.com/b-arac/rtmpose3d.git
cd rtmpose3d
pip install -r requirements.txt
pip install -e .
```
## Quick Start
### Using the HuggingFace Transformers-style API
```python
import cv2
from rtmpose3d import RTMPose3D
# Initialize model (auto-downloads checkpoints from this repo)
model = RTMPose3D.from_pretrained('rbarac/rtmpose3d', device='cuda:0')
# Run inference
image = cv2.imread('person.jpg')
results = model(image, return_tensors='np')
# Access results
keypoints_3d = results['keypoints_3d'] # [N, 133, 3] - 3D coords in meters
keypoints_2d = results['keypoints_2d'] # [N, 133, 2] - pixel coords
scores = results['scores'] # [N, 133] - confidence [0, 1]
```
### Using the Simple Inference API
```python
from rtmpose3d import RTMPose3DInference
# Initialize with model size
model = RTMPose3DInference(model_size='l', device='cuda:0') # or 'x' for extra large
# Run inference
results = model(image)
print(results['keypoints_3d'].shape) # [N, 133, 3]
```
### Single Person Detection
Detect only the most prominent person in the image:
```python
# Works with both APIs
results = model(image, single_person=True) # Returns only N=1
```
## Output Format
```python
{
'keypoints_3d': np.ndarray, # [N, 133, 3] - (X, Y, Z) in meters
'keypoints_2d': np.ndarray, # [N, 133, 2] - (x, y) pixel coordinates
'scores': np.ndarray, # [N, 133] - confidence scores [0, 1]
'bboxes': np.ndarray # [N, 4] - bounding boxes [x1, y1, x2, y2]
}
```
Where `N` is the number of detected persons.
### Coordinate Systems
**2D Keypoints** - Pixel coordinates:
- X: horizontal position [0, image_width]
- Y: vertical position [0, image_height]
**3D Keypoints** - Camera-relative coordinates in meters (Z-up convention):
- X: horizontal (negative=left, positive=right)
- Y: depth (negative=closer, positive=farther)
- Z: vertical (negative=down, positive=up)
## Keypoint Indices
| Index Range | Body Part | Count | Description |
|-------------|-----------|-------|-------------|
| 0-16 | Body | 17 | Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles |
| 17-22 | Feet | 6 | Foot keypoints |
| 23-90 | Face | 68 | Facial landmarks |
| 91-111 | Left Hand | 21 | Left hand keypoints |
| 112-132 | Right Hand | 21 | Right hand keypoints |
## Training Data
The models were trained on the **Cocktail14** dataset, which combines 14 public 3D pose datasets:
- Human3.6M
- COCO-WholeBody
- UBody
- And 11 more datasets
## Performance
Evaluated on standard 3D pose benchmarks:
- **RTMW3D-L**: 0.045 MPJPE, real-time inference (~30 FPS on RTX 3090)
- **RTMW3D-X**: 0.057 MPJPE, slower but higher accuracy
## Requirements
- Python >= 3.8
- PyTorch >= 2.0.0
- CUDA-capable GPU (4GB+ VRAM recommended)
- mmcv >= 2.0.0
- MMPose >= 1.0.0
- MMDetection >= 3.0.0
## Citation
```bibtex
@misc{rtmpose3d2025,
title={RTMPose3D: Real-Time Multi-Person 3D Pose Estimation},
author={Arac, Bahadir},
year={2025},
publisher={GitHub},
url={https://github.com/b-arac/rtmpose3d}
}
```
## License
Apache 2.0
## Acknowledgments
Built on [MMPose](https://github.com/open-mmlab/mmpose) by OpenMMLab. Models trained by the MMPose team on the Cocktail14 dataset.
## Links
- **GitHub Repository**: [b-arac/rtmpose3d](https://github.com/b-arac/rtmpose3d)
- **Documentation**: See README in the repository
- **MMPose**: [open-mmlab/mmpose](https://github.com/open-mmlab/mmpose)
|