Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,192 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- pose-estimation
|
| 5 |
+
- 3d-pose
|
| 6 |
+
- computer-vision
|
| 7 |
+
- pytorch
|
| 8 |
+
- rtmpose
|
| 9 |
+
datasets:
|
| 10 |
+
- cocktail14
|
| 11 |
+
metrics:
|
| 12 |
+
- mpjpe
|
| 13 |
+
library_name: pytorch
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# RTMPose3D
|
| 17 |
+
|
| 18 |
+
Real-time multi-person 3D whole-body pose estimation with 133 keypoints per person.
|
| 19 |
+
|
| 20 |
+
## Model Description
|
| 21 |
+
|
| 22 |
+
RTMPose3D is a real-time 3D pose estimation model that detects and tracks 133 keypoints per person:
|
| 23 |
+
- **17** body keypoints (COCO format)
|
| 24 |
+
- **6** foot keypoints
|
| 25 |
+
- **68** facial landmarks
|
| 26 |
+
- **42** hand keypoints (21 per hand)
|
| 27 |
+
|
| 28 |
+
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
|
| 29 |
+
|
| 30 |
+
## Model Variants
|
| 31 |
+
|
| 32 |
+
This repository contains checkpoints for:
|
| 33 |
+
|
| 34 |
+
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|
| 35 |
+
|-------|------------|-------|------------------|-----------------|
|
| 36 |
+
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
|
| 37 |
+
| RTMW3D-L (Large) | ~65M | Real-time | 0.678 | `rtmw3d-l_8xb64_cocktail14-384x288-794dbc78_20240626.pth` |
|
| 38 |
+
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.680 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |
|
| 39 |
+
|
| 40 |
+
The model outputs both 2D pixel coordinates and 3D spatial coordinates for each keypoint.
|
| 41 |
+
|
| 42 |
+
## Model Variants
|
| 43 |
+
|
| 44 |
+
This repository contains checkpoints for:
|
| 45 |
+
|
| 46 |
+
| Model | Parameters | Speed | Accuracy (MPJPE) | Checkpoint File |
|
| 47 |
+
|-------|------------|-------|------------------|-----------------|
|
| 48 |
+
| RTMDet-M (Detector) | ~50M | Fast | - | `rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth` |
|
| 49 |
+
| RTMW3D-L (Large) | ~65M | Real-time | 0.045 | `rtmw3d-l_cock14-0d4ad840_20240422.pth` |
|
| 50 |
+
| RTMW3D-X (Extra Large) | ~98M | Slower | 0.057 | `rtmw3d-x_8xb64_cocktail14-384x288-b0a0eab7_20240626.pth` |
|
| 51 |
+
|
| 52 |
+
## Installation
|
| 53 |
+
|
| 54 |
+
```bash
|
| 55 |
+
pip install git+https://github.com/mutedeparture/rtmpose3d.git
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Or clone and install locally:
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
git clone https://github.com/mutedeparture/rtmpose3d.git
|
| 62 |
+
cd rtmpose3d
|
| 63 |
+
pip install -r requirements.txt
|
| 64 |
+
pip install -e .
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
## Quick Start
|
| 68 |
+
|
| 69 |
+
### Using the HuggingFace Transformers-style API
|
| 70 |
+
|
| 71 |
+
```python
|
| 72 |
+
import cv2
|
| 73 |
+
from rtmpose3d import RTMPose3D
|
| 74 |
+
|
| 75 |
+
# Initialize model (auto-downloads checkpoints from this repo)
|
| 76 |
+
model = RTMPose3D.from_pretrained('rbarac/rtmpose3d', device='cuda:0')
|
| 77 |
+
|
| 78 |
+
# Run inference
|
| 79 |
+
image = cv2.imread('person.jpg')
|
| 80 |
+
results = model(image, return_tensors='np')
|
| 81 |
+
|
| 82 |
+
# Access results
|
| 83 |
+
keypoints_3d = results['keypoints_3d'] # [N, 133, 3] - 3D coords in meters
|
| 84 |
+
keypoints_2d = results['keypoints_2d'] # [N, 133, 2] - pixel coords
|
| 85 |
+
scores = results['scores'] # [N, 133] - confidence [0, 1]
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
### Using the Simple Inference API
|
| 89 |
+
|
| 90 |
+
```python
|
| 91 |
+
from rtmpose3d import RTMPose3DInference
|
| 92 |
+
|
| 93 |
+
# Initialize with model size
|
| 94 |
+
model = RTMPose3DInference(model_size='l', device='cuda:0') # or 'x' for extra large
|
| 95 |
+
|
| 96 |
+
# Run inference
|
| 97 |
+
results = model(image)
|
| 98 |
+
print(results['keypoints_3d'].shape) # [N, 133, 3]
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
### Single Person Detection
|
| 102 |
+
|
| 103 |
+
Detect only the most prominent person in the image:
|
| 104 |
+
|
| 105 |
+
```python
|
| 106 |
+
# Works with both APIs
|
| 107 |
+
results = model(image, single_person=True) # Returns only N=1
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
## Output Format
|
| 111 |
+
|
| 112 |
+
```python
|
| 113 |
+
{
|
| 114 |
+
'keypoints_3d': np.ndarray, # [N, 133, 3] - (X, Y, Z) in meters
|
| 115 |
+
'keypoints_2d': np.ndarray, # [N, 133, 2] - (x, y) pixel coordinates
|
| 116 |
+
'scores': np.ndarray, # [N, 133] - confidence scores [0, 1]
|
| 117 |
+
'bboxes': np.ndarray # [N, 4] - bounding boxes [x1, y1, x2, y2]
|
| 118 |
+
}
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
Where `N` is the number of detected persons.
|
| 122 |
+
|
| 123 |
+
### Coordinate Systems
|
| 124 |
+
|
| 125 |
+
**2D Keypoints** - Pixel coordinates:
|
| 126 |
+
- X: horizontal position [0, image_width]
|
| 127 |
+
- Y: vertical position [0, image_height]
|
| 128 |
+
|
| 129 |
+
**3D Keypoints** - Camera-relative coordinates in meters (Z-up convention):
|
| 130 |
+
- X: horizontal (negative=left, positive=right)
|
| 131 |
+
- Y: depth (negative=closer, positive=farther)
|
| 132 |
+
- Z: vertical (negative=down, positive=up)
|
| 133 |
+
|
| 134 |
+
## Keypoint Indices
|
| 135 |
+
|
| 136 |
+
| Index Range | Body Part | Count | Description |
|
| 137 |
+
|-------------|-----------|-------|-------------|
|
| 138 |
+
| 0-16 | Body | 17 | Nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles |
|
| 139 |
+
| 17-22 | Feet | 6 | Foot keypoints |
|
| 140 |
+
| 23-90 | Face | 68 | Facial landmarks |
|
| 141 |
+
| 91-111 | Left Hand | 21 | Left hand keypoints |
|
| 142 |
+
| 112-132 | Right Hand | 21 | Right hand keypoints |
|
| 143 |
+
|
| 144 |
+
## Training Data
|
| 145 |
+
|
| 146 |
+
The models were trained on the **Cocktail14** dataset, which combines 14 public 3D pose datasets:
|
| 147 |
+
- Human3.6M
|
| 148 |
+
- COCO-WholeBody
|
| 149 |
+
- UBody
|
| 150 |
+
- And 11 more datasets
|
| 151 |
+
|
| 152 |
+
## Performance
|
| 153 |
+
|
| 154 |
+
Evaluated on standard 3D pose benchmarks:
|
| 155 |
+
|
| 156 |
+
- **RTMW3D-L**: 0.045 MPJPE, real-time inference (~30 FPS on RTX 3090)
|
| 157 |
+
- **RTMW3D-X**: 0.057 MPJPE, slower but higher accuracy
|
| 158 |
+
|
| 159 |
+
## Requirements
|
| 160 |
+
|
| 161 |
+
- Python >= 3.8
|
| 162 |
+
- PyTorch >= 2.0.0
|
| 163 |
+
- CUDA-capable GPU (4GB+ VRAM recommended)
|
| 164 |
+
- mmcv >= 2.0.0
|
| 165 |
+
- MMPose >= 1.0.0
|
| 166 |
+
- MMDetection >= 3.0.0
|
| 167 |
+
|
| 168 |
+
## Citation
|
| 169 |
+
|
| 170 |
+
```bibtex
|
| 171 |
+
@misc{rtmpose3d2025,
|
| 172 |
+
title={RTMPose3D: Real-Time Multi-Person 3D Pose Estimation},
|
| 173 |
+
author={Arac, Bahadir},
|
| 174 |
+
year={2025},
|
| 175 |
+
publisher={GitHub},
|
| 176 |
+
url={https://github.com/mutedeparture/rtmpose3d}
|
| 177 |
+
}
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
## License
|
| 181 |
+
|
| 182 |
+
Apache 2.0
|
| 183 |
+
|
| 184 |
+
## Acknowledgments
|
| 185 |
+
|
| 186 |
+
Built on [MMPose](https://github.com/open-mmlab/mmpose) by OpenMMLab. Models trained by the MMPose team on the Cocktail14 dataset.
|
| 187 |
+
|
| 188 |
+
## Links
|
| 189 |
+
|
| 190 |
+
- **GitHub Repository**: [mutedeparture/rtmpose3d](https://github.com/mutedeparture/rtmpose3d)
|
| 191 |
+
- **Documentation**: See README in the repository
|
| 192 |
+
- **MMPose**: [open-mmlab/mmpose](https://github.com/open-mmlab/mmpose)
|